<a href="https://colab.research.google.com/github/SamiaOsman/New/blob/main/Alberta_COVID_Case_data.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <font style = "color:rgb(20, 53, 103)" > **How to use `Plotly` with `ipywidgets` to Visualize data** </font>

The purpose of this notebook is to teach how `Plotly` and `ipywidgets` can be used together in creating interactive data visualizations.

## <font style = "color:rgb(20, 53, 103)" > **Background** </font>

The Government of Alberta publishes information, in the forms of **CSV files**, related to COVID-19 in the Province. They have been keeping these records since **March 6, 2020**.

These **CSV** files can be downloaded from this Government website: https://www.alberta.ca/stats/covid-19-alberta-statistics.htm#data-export

On this site, there are 4 `CSV` files. They are:

> `Case data` **(This Notebook)**

> `Summary data starting March 6, 2020`

> `Geospatial data` 

> `Vaccine data` 

This notebook will describe and visualize the `Case data` **CSV file**.

<div class="admonition warning alert alert-danger">
<p class="first admonition-title" style="font-weight: bold;">Warning</p>
<p class="last"> This notebook is meant to show you how to use Python, Jupyter Lab, Plotly, and ipywidgets to tell a story. This notebook is not meant to set policy, or to convince anyone what to believe. I respect that everyone will see a different story in the data, therefore the focus is on how to create the visualizations, not the interpretation of it. Any interpreation will be entirely your own, respecting that everyone will see data differently; thats why it's called "Story Telling using Data". 
<p class="last">This notebook is for educational purposes only. It's been put together for the following reasons:
    
<p class="last"> 1. Teach how Plotly and ipywidgets can be used together in creating rich and interactive visualizations. 
<p class="last"> 2. Show how to automate the process of downloading a data set, and inspecting it to prepare for data visualization
<p class="last"> 3. Create code snipets that can be used by anyone in visualizing data
<p class="last"> 4. Show off my python and data science skills to both prospective clients, and any future students that want to be taught "how to"

<div class="admonition note alert alert-info">
<p class="first admonition-title" style="font-weight: bold;">Why the Alberta COVID-19 dataset?</p>
    
<p class="last">One of the biggest challenges to learning python, and data analytics, is in the understanding and interpretation of the data. It's hard to understand what a visualization is saying to you, if you don't first understand the subject. COVID-19 is a universal human experience, we're all connected by what's happening, and our understanding of things.
    
<p class="last">Because COVID-19 is a shared experience; each of us has an intrinsic domain knowledge about the subject and data. I don't need to tell you what an Active, Recovered, or Died status means when it comes to tracking a COVID case. Nobody needs an explanation of what is an Age Groups, Gender, or Alberta Health Services Zone. Even if you're not from Alberta, it's fairly intuitive. 
    
<p class="last">Since everybody understands the meaning of this information; unlike all the other examples out there, you can concentrate your efforts on playing with the code, interacting with the information in order to discover any insights from the data.
    
<p class="last">Hopefully the examples in this book will encourage you to try something on your own.

<p class="last">Enjoy! </p>
</div>

## <font style = "color:rgb(20, 53, 103)" > **Downloading the dataset** </font>

The data sets can be downloaded manually, or by going to this [website](https://www.alberta.ca/stats/covid-19-alberta-statistics.htm#data-export) and clicking on each download link manually, or by clicking on the links below:

> **Case data:** https://www.alberta.ca/data/stats/covid-19-alberta-statistics-data.csv

### <font style = "color:rgb(20, 53, 103)" > **Downloading a dataset using wget** </font>

<div class="admonition caution alert alert-warning">
<p class="first admonition-title" style="font-weight: bold;">Warning!</p>
<p class="last">`wget` is a Linux command and does not work on Windows (try the windows for linux package)?.</p>
</div>

The scripts below use `wget` to download the `CSV` file. 

**Note** The Government of Alberta uses a static naming convention for these files. You will have to `save as` if you want to keep a specific snapshot of the dataset.

The following script will:

1. Download the `Case data` **csv** file from the Government of Alberta website on [COVID-19 Alberta statistics](https://www.alberta.ca/stats/covid-19-alberta-statistics.htm#data-export).

2. Save this file into the `dataset` directory. This is done using the `-P` Option and specifing the path.

3. Overwrite any existing file in this directory, with the new data set. This is done using the `-c` option, which overwrites any previous file with the same name.

Using these wget options ensures that you're always working with the most current version of the dataset published by the Government.

The instructions on how to customize wget comes from this stackoverflow discussion: https://stackoverflow.com/questions/30418188/how-to-force-wget-to-overwrite-an-existing-file-ignoring-timestamp

In [None]:
# This script will download the most current copy of the dataset, and and save it to the ./dataset directory.
# wget will overwrite any existing file in this directory with the new one
# It will download and overwrite 
!wget https://www.alberta.ca/data/stats/covid-19-alberta-statistics-data.csv -P ./dataset/ -c

--2022-04-02 16:21:44--  https://www.alberta.ca/data/stats/covid-19-alberta-statistics-data.csv
Resolving www.alberta.ca (www.alberta.ca)... 2606:4700:10::ac43:1698, 2606:4700:10::6816:2ba2, 2606:4700:10::6816:2aa2, ...
Connecting to www.alberta.ca (www.alberta.ca)|2606:4700:10::ac43:1698|:443... connected.
HTTP request sent, awaiting response... 416 Requested Range Not Satisfiable

    The file is already fully retrieved; nothing to do.



## <font style = "color:rgb(20, 53, 103)" > **Python Import Libraries** </font>

The following are the import libraries needed to run this notebook:

In [None]:
# The following libraries are required to run
# the scripts found in this notebook

# Numpy library for handling array objects
import numpy as np

# Pandas library for handling CSV
# files
import pandas as pd

# Used to convert date formats
import datetime

# Used in displaying interactive widgets
from IPython.display import display

# Used in creating interactive visual
import ipywidgets as widgets

# Plotly libraries for creating graphs
import plotly.express as px
import plotly.figure_factory as ff
import plotly.graph_objects as go

## <font style = "color:rgb(20, 53, 103)" > **Data Understanding** </font>

Data understanding is a 3-step process:

1. **Data Mining** these steps involve data cleaning, data integration, data selection, and data transformation. The goal is to produce tabular data that can be analyzed and  

2. **Data Description** involves inspecting and describing your data both in terms of feature terminology and meaning, as well as descriptive statistics of each feature  

3. **Exploratory data analytics** involves looking for one (or more) of the 5 intersting patterns used by data science:

> **Class / Concept Description: Characterization & Discrimmination**

> **Classification / Regression**

> **Cluster Analysis**

> **Frequent Pattern Analysis**

> **Outlier Detection**

### <font style = "color:rgb(20, 53, 103)" > **Data Minning** </font>

The first step in this process is to load your data into computer memory and inspect it. What you are looking for is:

1. Did the data load correctly?
2. Do the names of the columns look correct?
3. Are there any missing values?
4. Are the `dtypes` for each column as expected?

#### <font style = "color:rgb(20, 53, 103)" > **Load the data** </font>

In [None]:
# The path to the csv file loaded 
cases_data_source_path = 'covid-19-alberta-statistics-data.csv'

In [None]:
# Load the dataset into memory and display the results
df = pd.read_csv(cases_data_source_path,
                 index_col = 0 # Required on this dataset
                               # Creates an extra colum storing index numbers
                               # If not set for this data set.
                               # Do not use as a default, only where required.
                )

df

  exec(code_obj, self.user_global_ns, self.user_ns)


Unnamed: 0,Date reported,Alberta Health Services Zone,Gender,Age group,Case status,Case type
1,2021-04-21,Edmonton Zone,Male,30-39 years,Recovered,Confirmed
2,2021-05-17,North Zone,Male,10-19 years,Recovered,Confirmed
3,2020-12-13,Edmonton Zone,Male,5-9 years,Recovered,Confirmed
4,2021-05-11,Edmonton Zone,Male,60-69 years,Recovered,Confirmed
5,2021-01-30,Calgary Zone,Female,1-4 years,Recovered,Confirmed
...,...,...,...,...,...,...
527098,2022-01-13,North Zone,Female,50-59 years,Recovered,Confirmed
527099,2021-04-08,North Zone,Male,10-19 years,Recovered,Confirmed
527100,2021-01-01,Central Zone,Male,30-39 years,Recovered,Confirmed
527101,2022-02-03,Central Zone,Male,30-39 years,Recovered,Confirmed


If you can see the data frame output, then you know the results worked. You have to add the `index_col = 0` in order for this data set to import correctly. Try running it without this parameter.

#### <font style = "color:rgb(20, 53, 103)" > **Inspecting the data** </font>

If you're satisfied that the data loaded correctly, the next step is to inspect the data to check for missing values as well as confirm the `dtype` of each feature looks correct; according to the kind of informaiton that it's holding.

The following scripts will help you inspect your data.

In [None]:
# Check for missing values and correct dtypes
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 527898 entries, 1 to 527102
Data columns (total 6 columns):
 #   Column                        Non-Null Count   Dtype 
---  ------                        --------------   ----- 
 0   Date reported                 527898 non-null  object
 1   Alberta Health Services Zone  527893 non-null  object
 2   Gender                        527892 non-null  object
 3   Age group                     527896 non-null  object
 4   Case status                   527896 non-null  object
 5   Case type                     527896 non-null  object
dtypes: object(6)
memory usage: 28.2+ MB


This data set has no missing values, and consists of all `object` dtypes. This is a strong indicator that the information found in these fields is [categorical data](http://www.stat.yale.edu/Courses/1997-98/101/catdat.htm), rather than [continous data](https://www.mathsisfun.com/definitions/continuous-data.html).

We will now use an `interactive widget` to inspect each of these features. The purpose will be to create a data dictionary. A it's important that each feature is described in a way that anyone will understand the meaning of the information stored in that feature.

##### <font style = "color:rgb(20, 53, 103)" > **Creating a list of names of the features** </font>

Although this data set only contains 6 features, it would be tedious to inspect all the features one by one. In order to automate this process, let's create an interactive widget.

**First** let's create a list of features for us to inspect. We do this by calling the `df.columns()` method. You can learn more about his method here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.columns.html.

In [None]:
# Create a list that will hold the names of the columns, so we can inspect each feature.
features = df.columns

features

Index(['Date reported', 'Alberta Health Services Zone', 'Gender', 'Age group',
       'Case status', 'Case type'],
      dtype='object')

##### <font style = "color:rgb(20, 53, 103)" > **Create an interactive widget** </font>

The script is an interactive widget that you can use to inspect each feature, in order to better understand your data.

It uses the list of columns we just created in order to allows us to interact with each feature to get a better understanding of the information it contains. The object of this inspection is to create the data dictionary below.

In [None]:
# An interactive application that will help you inspect your feature
# in order to create a data dictionary
@widgets.interact
def feature_describer(variable = features):
    # Inspects the categorical information
    # of the feature
    print("Categorical Information")
    print(df[variable].value_counts())
    
    print('\n')
    
    # Inspects the information of 
    # a continous feature
    print("Continous Information")
    print(df[variable].describe())

interactive(children=(Dropdown(description='variable', options=('Date reported', 'Alberta Health Services Zone…

### <font style = "color:rgb(20, 53, 103)" > **Data Description** </font>

The purpose of this step is to describe the features in such a way that anyone can understand it's meaning. You need to be able to either describe, or provide sources of references on your application of the defintion of the each feature name. It's important that you cite references or describe your features in such a way that people can agree on it's meaning or context (generally speaking).

Any analysis or story you decide to tell with this data has to make sure everyone understands how the data is being interpretted. Everyone has a different story.

In the data dictionary below, the following defintions are being used:

**Definitions**

> `Date reported` - The date a COVID-19 case was either recorded as `Confirmed` or `Probable` in Alberta. This information is stored in the **Case type** field. The **Case status** field tracks these cases as either `Active`, `Recovered` or `Died`. These numbers have been recorded daily since March 6, 2020. 

> `Alberta Health Services Zone` - Identifies the Health Services Zone where the case was reported. There are 6 zones found in this data set, the 5 Alberta Health Zones: `North Health`, `Edmonton Health`, `Central Health`, `Calgary Health`, `South Health`, as well as `Unkown` zone. You can get more information on the Alberta Health Services zone here: https://www.albertahealthservices.ca/assets/about/publications/ahs-ar-2017/zones.html

> `Gender` - Identifies if the individual is an `Unknown`, `Female`, or `Male` gender.

> `Age group` - Identifies the age group of the individual with COVID. The age categories are: `Unknown`, `Under 1 year`, `1 - 4 years`, `5 - 9 years`, `10 - 19 years`, `20 - 29 years`, `30 - 39 years`, `40 - 49 years`, `50 - 59 years`, `60 - 69 years`,`70 - 79 years`, `80+ years`.

> `Case status` - Tracks the day-to-day status of a COVID case. On any given day, the status can be: `Active`, `Recovered`, or `Died`, depending on the outcome of a `Confirmed` or `Probable` case.

> `Case type` - Records if a COVID-19 test is either `Confirmed` or `Probable`. The `Case status` field tracks the outcome of any `Confirmed` or `Probable` case.


#### <font style = "color:rgb(20, 53, 103)" > **Data Dictionary** </font>

The following is a description of the features found in this dataset

|Index   | Feature         | Description                                                             | Categorical or Continous |  Range of values         |
|:------ |:----------------|:-------------------------------------------------------------- ---------|:-------------------------|:-------------------------|
| 0      | Date reported   |The date a COVID-19 test was recorded in Alberta                         |Categorical               |`YYYY-MM-DD`              |
| 1      | Alberta Health Services Zone | Identifies the health services zone where a case was recorded |Categorical            |`Unknown`, `North Zone`, `Edmonton Zone`, `Central Zone`, `Calgary Zone`, `South Zone` |
| 2      | Gender          |The gender of the recorded case                                             |Categorical              |`Unknown`, `Female`, `Male` |
| 3      | Age group       |The age group of a recorded case                 |Categorical   |`Unknown`, `Under 1 year`, `1 - 4 years`, `5 - 9 years`, `10 - 19 years`, `20 - 29 years`, `30 - 39 years`, `40 - 49 years`, `50 - 59 years`, `60 - 69 years`,`70 - 79 years`, `80+ years` |
| 4      | Case status   |Tracks the day-to-day status of a COVID-19 case                        |Categorical               |`Active`, `Recovered`, `Died` |
| 5      | Case type | Tracks a COVID positive case                          |Categorical               |`Confirmed` or `Probable` |

#### <font style = "color:rgb(20, 53, 103)" > **Descriptive Statistics** </font>

It's now time to visually inspect this information in order to get a better understanding and context of the data, and it's distrubution.

## <font style = "color:rgb(20, 53, 103)" > **Descriptive Analytics Using the Plotly Graphing Library** </font>

The [Plotly](https://plotly.com/python/) graphing library is one of the most powerful tools available to the free and open source community. The visualizations one can produce with [Plotly](https://plotly.com/python/) rival those you can create with either Power-Bi or Tableau. 

One of the main reasons you want to learn python is learning how to story tell using the [Plotly](https://plotly.com/python/). One of the reasons [Plotly](https://plotly.com/python/) is so popular is because it provides rich, interactive graphs for a person to better understand, and interpret the results. What's more; all the infographics on the Government of Alberta website on COVID use the [Plotly](https://plotly.com/python/) library. The following link is an example of interactive data. However of the graphs: https://www.alberta.ca/stats/covid-19-alberta-statistics.htm#pre-existing-conditions 

### <font style = "color:rgb(20, 53, 103)" > **Creating Dictionaries for Graph Consistency** </font>

In order to maintain a consistent coloring, as well as ordering scheme in certain visualizations, the following dictionaries will be used when visualizing the information.

In [None]:
# Simple dictionary that assigns specific color shades
# to the listed categorical values
# These colors can be applied to all plots so colors stay consistent
# You can see a list of supported colors here: https://developer.mozilla.org/en-US/docs/Web/CSS/color_value#color_keywords
color_discrete_map = {'Recovered': 'Green',
                      'Active': 'Yellow',
                      'Died': 'Red',
                      
                      'Male': 'Blue',
                      'Female': 'Pink',
                      'Unkonwn': 'Fuchsia',
                      
                      'Under 1 year': 'Blue',
                      '1-4 years': 'Aqua',
                      '5-9 years': 'darkslateblue',
                      '10-19 years': 'Cadetblue',
                      '20-29 years': 'darkolivegreen',
                      '30-39 years': 'darkviolet',
                      '40-49 years': 'darksalmon',
                      '50-59 years': 'darkgoldenrod',
                      '60-69 years': 'Green',
                      '70-79 years': 'Yellow',
                      '80+ years': 'Red',
                      
                      'North Zone': 'navy',
                      'Edmonton Zone': 'orangered',
                      'Central Zone': 'darkgreen',
                      'Calgary Zone': 'purple',
                      'South Zone': 'maroon',
                      'Unknown': 'black',
                      
                      
                      
                     }

In [None]:
# A dictionary that holds the order in which things are displayed
category_orders = {"Age group": ['Unknown', 'Under 1 year', '1-4 years', '5-9 years', '10-19 years', '20-29 years',
                                 '30-39 years', '40-49 years', '50-59 years', '60-69 years', '70-79 years', '80+ years'],
                   
                   "Alberta Health Services Zone": ['North Zone', 'Edmonton Zone', 'Central Zone', 'Calgary Zone', 'South Zone', 'Unknown'],
                   
                   'Year': [2020, 2021],
                   }

### <font style = "color:rgb(20, 53, 103)" > **Creating Date, Time, and Week Day Fields** </font>

The date reported field is not a numerical field, it's an `object` data-type, considered a categorical value. In order to do special filterering sorting, another field will be added that will split this field down into `Year`, `Month`, and `Day` in order to do enhance querying. These steps can also be considered part of `data wrangling` since you are adding more features to the data set.

In [None]:
# Confirm the data types for each field:
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 527898 entries, 1 to 527102
Data columns (total 6 columns):
 #   Column                        Non-Null Count   Dtype 
---  ------                        --------------   ----- 
 0   Date reported                 527898 non-null  object
 1   Alberta Health Services Zone  527893 non-null  object
 2   Gender                        527892 non-null  object
 3   Age group                     527896 non-null  object
 4   Case status                   527896 non-null  object
 5   Case type                     527896 non-null  object
dtypes: object(6)
memory usage: 28.2+ MB


In [None]:
# Create a new field called date, that will convert the 'Date reported' into an ISO date
# You need to do this in order to get the week number
df['ISO Date'] = pd.to_datetime(df['Date reported'], errors = 'coerce')

df

Unnamed: 0,Date reported,Alberta Health Services Zone,Gender,Age group,Case status,Case type,ISO Date
1,2021-04-21,Edmonton Zone,Male,30-39 years,Recovered,Confirmed,2021-04-21
2,2021-05-17,North Zone,Male,10-19 years,Recovered,Confirmed,2021-05-17
3,2020-12-13,Edmonton Zone,Male,5-9 years,Recovered,Confirmed,2020-12-13
4,2021-05-11,Edmonton Zone,Male,60-69 years,Recovered,Confirmed,2021-05-11
5,2021-01-30,Calgary Zone,Female,1-4 years,Recovered,Confirmed,2021-01-30
...,...,...,...,...,...,...,...
527098,2022-01-13,North Zone,Female,50-59 years,Recovered,Confirmed,2022-01-13
527099,2021-04-08,North Zone,Male,10-19 years,Recovered,Confirmed,2021-04-08
527100,2021-01-01,Central Zone,Male,30-39 years,Recovered,Confirmed,2021-01-01
527101,2022-02-03,Central Zone,Male,30-39 years,Recovered,Confirmed,2022-02-03


In [None]:
# The following steps are used to create these new fields:

# Creates information from the 'Date recorded' feature
df['Year'] = pd.DatetimeIndex(df['Date reported']).year
df['Month'] = pd.DatetimeIndex(df['Date reported']).month
df['Day'] = pd.DatetimeIndex(df['Date reported']).day
df['Day of Year'] = pd.DatetimeIndex(df['Date reported']).dayofyear
df['Day of Week'] = pd.DatetimeIndex(df['Date reported']).weekday
df['Quarter'] = pd.DatetimeIndex(df['Date reported']).quarter

# Creates information from the newly created 'Date' feature
# extract the week number
df['Week number'] = df['ISO Date'].dt.isocalendar().week

# Convert the week number from int32 to int64
df['Week number'] = df['Week number'].astype('int')

df

ParserError: Unknown string format: Recovered

In [None]:
# Rearrange the columns
# The 'ISO Date' is needed for making selections further in the notebook
df = df[['Date reported', 'ISO Date', 'Year', 'Quarter', 'Month', 'Week number', 'Day', 'Day of Week', 'Day of Year', 'Alberta Health Services Zone', 'Gender', 'Age group', 'Case status', 'Case type']]

df

In [None]:
# Review the object type of each feature
df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 527898 entries, 1 to 527102
Data columns (total 7 columns):
 #   Column                        Non-Null Count   Dtype         
---  ------                        --------------   -----         
 0   Date reported                 527898 non-null  object        
 1   Alberta Health Services Zone  527893 non-null  object        
 2   Gender                        527892 non-null  object        
 3   Age group                     527896 non-null  object        
 4   Case status                   527896 non-null  object        
 5   Case type                     527896 non-null  object        
 6   ISO Date                      527896 non-null  datetime64[ns]
dtypes: datetime64[ns](1), object(6)
memory usage: 32.2+ MB


### <font style = "color:rgb(20, 53, 103)" > **Creating additional data frames to help in visualizations** </font>

Two additional data frames will be created, one that contains just the `Active` cases, as well as one that contains just the people who've died in Alberta:

In [None]:
# Creates a dataset of just the active cases to further explore
df_active = df[df['Case status'] == 'Active']

df_active

In [None]:
# Creates a dataset of just the people who have died from COVID-19
df_fatal = df[df['Case status'] == 'Died']

df_fatal

### <font style = "color:rgb(20, 53, 103)" > **Creating automated title information** </font>

[Plotly](https://plotly.com/python/) is capable of accepting list, dictionaries, and other custom data. This allows for Graph Titles to be updated automatically, making sure the information reported is current.

In [None]:
# Get the total records of the data set, store in a variable
total_records = df.shape[0]

print('The total number of records in this dataset is: {:,}'.format(total_records))

In [None]:
# Stores the first date of recorded data in this data set
first_date_reported = df['Date reported'].min()

print('The first date that a case was reported in Alberta is: {:} (YYYY-MM-DD)'.format(first_date_reported)) 

In [None]:
# Stores the latest date of recorded data in this data set
current_date_reported = df['Date reported'].max()

print('The most recently recorded date in this data set is: {:} (YYYY-MM-DD)'.format(current_date_reported)) 

**Converting date formats**

As an option, you can also use the datetime function found in Python to convert a dates into a more readable format. The solution to this problem comes from this stackoverflow response: https://stackoverflow.com/questions/6288892/python-how-to-convert-datetime-format

In [None]:
# Store the result of this conversion into an object
d = datetime.datetime.strptime(current_date_reported, '%Y-%m-%d') 

In [None]:
# Formate the object
data_set_date = d.strftime('%b %d,%Y')

data_set_date

**Generating an automated title**

The following script will generate an automated title that will be used on some of the graphs. It's important to time stamp all information, since this information is fluid and can change day by day.

In [None]:
# Create a general chart title
General_title = f"This data set is current to {data_set_date}. There are {total_records:,} total records cases so far"

General_title

### <font style = "color:rgb(20, 53, 103)" > **Creating lists to for interactive data inspection** </font>

The following lists will be used to interact with the data and can be used in any widgets:

In [None]:
# This list was already created above
# It's being shown here as a reminder that we already
# have a list that stores the name of all our features
features

In [None]:
# Labels work best with categorical data. Although "Date reported" is technically a categorical 
# feature, it should be considered a continous value field.
# Therefore, this list will exclude "Date reported" and re-arrange the feature names in order of 
# interest
labels = [None, 'Case status', 'Age group', 'Alberta Health Services Zone', 'Gender', 'Case type']

### <font style = "color:rgb(20, 53, 103)" > **Histograms** </font>

[Histograms show the shape of your data](https://www.jmp.com/en_us/statistics-knowledge-portal/exploratory-data-analysis/histogram.html). Histograms help you see the center, spread and shape of a set of data. You can also use them as a visual tool to check for normality.

You can learn more about histograms, and where to use them at [storytellingwithdata.com](https://www.storytellingwithdata.com/blog/2021/1/28/histograms-and-bar-charts) and the [Statistical Knowledge Portal](https://www.jmp.com/en_us/statistics-knowledge-portal/exploratory-data-analysis/histogram.html)

#### <font style = "color:rgb(20, 53, 103)" > **Example 1 - Interactive Histograms** </font>

The code below shows you how to construct a histogram using the [Plotly](https://plotly.com/python/) data visualization library. 

Refer to the [Histograms](https://plotly.com/python/histograms/) section of the Plotly website for more details.

The histogram will allow you to explore each feature available in the data set:

* Date reported
* Alberta Health Services Zone
* Gender
* Age group
* Case status
* Case type

This histogram will also let you choose a `label` for your data, and give you a better understanding of the data and distribution. The data will be presented in the way that [Plotly](https://plotly.com/python/) sorts it.

##### <font style = "color:rgb(20, 53, 103)" > **All Cases** </font>

This is the histogram plot of the entire data set.

##### <font style = "color:rgb(20, 53, 103)" > **Update the General Title** </font>

Update the title for this map:

In [None]:
# Create an object that stores the counts of active, recovered, and deaths in the current
# Alberta COVID-19 data set
case_status = df['Case status'].value_counts() 

case_status

In [None]:
# Break-out each these values into their own variables
recovered = case_status.Recovered

active = case_status.Active

died = case_status.Died

In [None]:
# Create a general chart title
General_title_all = f"There have been {total_records:,} total cases in Alberta as of {data_set_date}."

General_title_all

##### <font style = "color:rgb(20, 53, 103)" > **Interactive Widgets** </font>

These interactive widgest will be assigned to work with the `All Cases` dataset only. You need to do this, otherwise your widgets will sync and upate all graphics associated to it.

In [None]:
# This checkbox will allow you to enable / disable
# the X-Axis going onto a log scale
# default is FALSE
log_scaleX_allCases = widgets.Checkbox(value = False,
                                       description = 'Log X scale',
                                       disabled = False,
                                       indent = False
                                      )

In [None]:
# This checkbox will allow you to enable / disable
# the Y-Axis going onto a log scale
# default is FALSE
log_scaleY_allCases = widgets.Checkbox(value = False,
                                       description = 'Log Y scale',
                                       disabled = False,
                                       indent = False
                                      )

In [None]:
# This radiobutton holds options that 
# allows us to understand the distribution
# using boxplots, violon, rug, etc...

marginals_allCases = widgets.RadioButtons(options = [None, 'box', 'violin', 'rug'])  # 'histogram' Histogram is excluded from this list

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_allCases = widgets.RadioButtons(options = ['relative', 'group', 'overlay']) # Declare the set of radio buttons and provide options

In [None]:
# This widget allows you to create an interactive histogram.
# Notice how the ipywidget objects are being passed as parameters into 
# this function
@widgets.interact
def hist_inspect(distribution = marginals_allCases, grouping = groupings_allCases, label = labels, feature = features, log_x = log_scaleX_allCases, log_y = log_scaleY_allCases):
    
    fig = px.histogram(df, 
                       x = feature,
                       color = label,
                       marginal = distribution, # can be 'rug', `box`, `violin`
                       barmode = grouping,
                       log_x = log_x,
                       log_y = log_y,
                       hover_data = df.columns,
                            
                       color_discrete_map = color_discrete_map,
                       
                       #category_orders = category_orders,
                        
                       color_discrete_sequence = px.colors.qualitative.Light24,
                       width = 1500, 
                       height = 800
                      )
    
    fig.update_layout(title = General_title_all,
                      font = dict(family = "Courier New, monospace",
                                  size = 15,
                                  color = "Black"
                                 )
                      )
    fig.show()

NameError: name 'marginals_allCases' is not defined

##### <font style = "color:rgb(20, 53, 103)" > **Active Cases** </font>

This is the histogram plot of the entire the currently active cases.

##### <font style = "color:rgb(20, 53, 103)" > **Interactive Widgets** </font>

These interactive widgest will be assigned to work with the "Active Cases" dataset only. You need to do this, otherwise your widgets will sync and upate all graphics associated to it.

In [None]:
# Create a general chart title
General_title_active = f"There are {active:,} active cases in Alberta as of {data_set_date}."

General_title_active

In [None]:
# This checkbox will allow you to enable / disable
# the X-Axis going onto a log scale
# default is FALSE
log_scaleX_activeCases = widgets.Checkbox(value = False,
                                         description = 'Log X scale',
                                         disabled = False,
                                         indent = False
                                        )

In [None]:
# This checkbox will allow you to enable / disable
# the Y-Axis going onto a log scale
# default is FALSE
log_scaleY_activeCases = widgets.Checkbox(value = False,
                                       description = 'Log Y scale',
                                       disabled = False,
                                       indent = False
                                      )

In [None]:
# This radiobutton holds options that 
# allows us to understand the distribution
# using boxplots, violon, rug, etc...

marginals_activeCases = widgets.RadioButtons(options = [None, 'box', 'violin', 'rug'])  # 'histogram' Histogram is excluded from this list

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_activeCases = widgets.RadioButtons(options = ['relative', 'group', 'overlay']) # Declare the set of radio buttons and provide options

In [None]:
# This widget allows you to create an interactive histogram.
# Notice how the ipywidget objects are being passed as parameters into 
# this function
@widgets.interact
def hist_inspect(distribution = marginals_activeCases, grouping = groupings_activeCases, label = labels, feature = features, log_x = log_scaleX_activeCases, log_y = log_scaleY_activeCases):
    
    fig = px.histogram(df_active, 
                       x = feature,
                       color = label,
                       marginal = distribution, # can be 'rug', `box`, `violin`
                       barmode = grouping,
                       log_x = log_x,
                       log_y = log_y,
                       hover_data = df.columns,
                            
                       color_discrete_map = color_discrete_map,
                       
                       #category_orders = category_orders,
                        
                       color_discrete_sequence = px.colors.qualitative.Light24,
                       width = 1500, 
                       height = 800
                      )
    
    fig.update_layout(title = General_title_active,
                      font = dict(family = "Courier New, monospace",
                                  size = 15,
                                  color = "Black"
                                 )
                      )
    fig.show()

##### <font style = "color:rgb(20, 53, 103)" > **Deaths** </font>

This is the histogram plot of the people who've died of COVID-19 so far in Alberta:

In [None]:
# Create a general chart title
General_title_deaths = f"There have been {died:,} deaths in Alberta as of {data_set_date}."

General_title_deaths

In [None]:
# This checkbox will allow you to enable / disable
# the X-Axis going onto a log scale
# default is FALSE
log_scaleX_activeDeaths = widgets.Checkbox(value = False,
                                           description = 'Log X scale',
                                           disabled = False,
                                           indent = False
                                           )

In [None]:
# This checkbox will allow you to enable / disable
# the Y-Axis going onto a log scale
# default is FALSE
log_scaleY_activeDeaths = widgets.Checkbox(value = False,
                                           description = 'Log Y scale',
                                           disabled = False,
                                           indent = False
                                           )

In [None]:
# This radiobutton holds options that 
# allows us to understand the distribution
# using boxplots, violon, rug, etc...
marginals_activeDeaths = widgets.RadioButtons(options = [None, 'box', 'violin', 'rug'])  # 'histogram' Histogram is excluded from this list

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_activeDeaths = widgets.RadioButtons(options = ['relative', 'group', 'overlay']) # Declare the set of radio buttons and provide options

In [None]:
# This widget allows you to create an interactive histogram.
# Notice how the ipywidget objects are being passed as parameters into 
# this function
@widgets.interact
def hist_inspect(distribution = marginals_activeDeaths, grouping = groupings_activeDeaths, label = labels, feature = features, log_x = log_scaleX_activeDeaths, log_y = log_scaleY_activeDeaths):
    
    fig = px.histogram(df_fatal, 
                       x = feature,
                       color = label,
                       marginal = distribution, # can be 'rug', `box`, `violin`
                       barmode = grouping,
                       log_x = log_x,
                       log_y = log_y,
                       hover_data = df.columns,
                            
                       color_discrete_map = color_discrete_map,
                       
                       #category_orders = category_orders,
                        
                       color_discrete_sequence = px.colors.qualitative.Light24,
                       width = 1500, 
                       height = 800
                      )
    
    fig.update_layout(title = General_title_deaths,
                      font = dict(family = "Courier New, monospace",
                                  size = 15,
                                  color = "Black"
                                 )
                      )
    fig.show()

#### <font style = "color:rgb(20, 53, 103)" > **Example 2 - Sorted Histogram** </font>

This code will use a specific sorting order to the data, and therefore the `marginal` are not going to always provide meaningful information. Therefore, they will be excluded from these graphs.
Refer to the [Histograms](https://plotly.com/python/histograms/) section of the Plotly website for more details.

When data sorting is applied, the [Histograms](https://plotly.com/python/histograms/) acts more like a [Bar chart](https://plotly.com/python/bar-charts/). Distribution information does not work on manually sorted data.

##### <font style = "color:rgb(20, 53, 103)" > **All Cases** </font>

This is the histogram plot of the entire data set.

In [None]:
# This checkbox will allow you to enable / disable
# the X-Axis going onto a log scale
# default is FALSE
log_scaleX_allCases1 = widgets.Checkbox(value = False,
                                        description = 'Log X scale',
                                        disabled = False,
                                        indent = False
                                       )

In [None]:
# This checkbox will allow you to enable / disable
# the Y-Axis going onto a log scale
# default is FALSE
log_scaleY_allCases1 = widgets.Checkbox(value = False,
                                        description = 'Log Y scale',
                                        disabled = False,
                                        indent = False
                                       )

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_allCases1 = widgets.RadioButtons(options = ['relative', 'group', 'overlay']) # Declare the set of radio buttons and provide options

In [None]:
# This widget allows you to create a histogram, like the ones produces in the previous notebook.
# You can change the feature, along with 
@widgets.interact
def hist_inspect(grouping = groupings_allCases1, label = labels, feature = features, log_x = log_scaleX_allCases1, log_y = log_scaleY_allCases1):
    
    fig = px.histogram(df, 
                       x = feature,
                       color = label,
                       #marginal = distribution, # can be 'rug', `box`, `violin`
                       barmode = grouping,
                       log_x = log_x,
                       log_y = log_y,
                       hover_data = df.columns,
                            
                       color_discrete_map = color_discrete_map,
                       
                       category_orders = category_orders,
                        
                       color_discrete_sequence = px.colors.qualitative.Light24,
                       width = 1500, 
                       height = 800
                      )
    
    fig.update_layout(title = General_title_all,
                      font = dict(family = "Courier New, monospace",
                                  size = 15,
                                  color = "Black"
                                 )
                      )
    fig.show()

##### <font style = "color:rgb(20, 53, 103)" > **Active Cases** </font>

This is the histogram plot of the entire the currently active cases.

In [None]:
# Create a general chart title
General_title_active = f"There are {active:,} currently active cases in Alberta as of {data_set_date}."

General_title_active

In [None]:
# Labels work best with categorical data. Although "Date reported" is technically a categorical 
# feature, it should be considered a continous value field.
# Therefore, this list will exclude "Date reported" and re-arrange the feature names in order of 
# interest

labels = [None, 'Case status', 'Age group', 'Alberta Health Services Zone', 'Gender', 'Case type']

In [None]:
# This checkbox will allow you to enable / disable
# the X-Axis going onto a log scale
# default is FALSE
log_scaleX_activeCases1 = widgets.Checkbox(value = False,
                                         description = 'Log X scale',
                                         disabled = False,
                                         indent = False
                                        )

In [None]:
# This checkbox will allow you to enable / disable
# the Y-Axis going onto a log scale
# default is FALSE
log_scaleY_activeCases1 = widgets.Checkbox(value = False,
                                       description = 'Log Y scale',
                                       disabled = False,
                                       indent = False
                                      )

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_activeCases1 = widgets.RadioButtons(options = ['relative', 'group', 'overlay']) # Declare the set of radio buttons and provide options

In [None]:
# This widget allows you to create a histogram, like the ones produces in the previous notebook.
# You can change the feature, along with 
@widgets.interact
def hist_inspect(grouping = groupings_activeCases1, label = labels, feature = features, log_x = log_scaleX_activeCases1, log_y = log_scaleY_activeCases1):
    
    fig = px.histogram(df_active, 
                       x = feature,
                       color = label,
                       #marginal = distribution, # can be 'rug', `box`, `violin`
                       barmode = grouping,
                       log_x = log_x,
                       log_y = log_y,
                       hover_data = df.columns,
                            
                       color_discrete_map = color_discrete_map,
                       
                       category_orders = category_orders,
                        
                       color_discrete_sequence = px.colors.qualitative.Light24,
                       width = 1500, 
                       height = 800
                      )
    
    fig.update_layout(title = General_title_active,
                      font = dict(family = "Courier New, monospace",
                                  size = 15,
                                  color = "Black"
                                 )
                      )
    fig.show()

##### <font style = "color:rgb(20, 53, 103)" > **Deaths** </font>

This is the histogram plot of the people who've died of COVID-19 so far in Alberta:

In [None]:
# Create a general chart title
General_title_deaths = f"There have been {died:,} deaths in Alberta as of {data_set_date}."

General_title_deaths

In [None]:
# This checkbox will allow you to enable / disable
# the X-Axis going onto a log scale
# default is FALSE
log_scaleX_activeDeaths1 = widgets.Checkbox(value = False,
                                            description = 'Log X scale',
                                            disabled = False,
                                            indent = False
                                            )

In [None]:
# This checkbox will allow you to enable / disable
# the Y-Axis going onto a log scale
# default is FALSE
log_scaleY_activeDeaths1 = widgets.Checkbox(value = False,
                                            description = 'Log Y scale',
                                            disabled = False,
                                            indent = False
                                            )

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_activeDeaths1 = widgets.RadioButtons(options = ['relative', 'group', 'overlay']) # Declare the set of radio buttons and provide options

In [None]:
# This widget allows you to create a histogram, like the ones produces in the previous notebook.
# You can change the feature, along with 
@widgets.interact
def hist_inspect(grouping = groupings_activeDeaths1, label = labels, feature = features, log_x = log_scaleX_activeDeaths1, log_y = log_scaleY_activeDeaths1):
    
    fig = px.histogram(df_fatal, 
                       x = feature,
                       color = label,
                       #marginal = distribution, # can be 'rug', `box`, `violin`
                       barmode = grouping,
                       log_x = log_x,
                       log_y = log_y,
                       hover_data = df.columns,
                            
                       color_discrete_map = color_discrete_map,
                       
                       category_orders = category_orders,
                        
                       color_discrete_sequence = px.colors.qualitative.Light24,
                       width = 1500, 
                       height = 800
                      )
    
    fig.update_layout(title = General_title_deaths,
                      font = dict(family = "Courier New, monospace",
                                  size = 15,
                                  color = "Black"
                                 )
                      )
    fig.show()

### <font style = "color:rgb(20, 53, 103)" > **Pie-Charts** </font>

The purpose of a pie chart is to show you [a parts-to-whole relationship for categorical or nominal data](https://www.jmp.com/en_us/statistics-knowledge-portal/exploratory-data-analysis/pie-chart.html). Pie charts work best with [categorical data](http://www.stat.yale.edu/Courses/1997-98/101/catdat.htm). Pie charts should not be used with [continous data](https://www.mathsisfun.com/definitions/continuous-data.html).

You can learn more about pie charts, and where to use them at [storytellingwithdata.com](https://www.storytellingwithdata.com/blog/2020/5/14/what-is-a-pie-chart), and the [Statistical Knowledge Portal](https://www.jmp.com/en_us/statistics-knowledge-portal/exploratory-data-analysis/pie-chart.html)

When performing `Descriptive Analytics`, pie-charts can help give you a sense of the balance of your data.

#### <font style = "color:rgb(20, 53, 103)" > **Example 1 - Creating a customized Pie-Chart using Plotly Graph Object (GO)** </font>

The code below shows you how to construct a custom graph object `go` object using Plotly. To get a better understanding of how to use Plotly Pie Charts, visit the plotly website: https://plotly.com/python/pie-charts/ 

In [None]:
# Create a general chart title
General_title_goPie = f"{total_records:,} recorded cases as of {data_set_date}."

General_title_goPie

In [None]:
# Sets the colors that will be used when graphing
# You do this when you want specific colours to be used in your labels.

colors = ['Green', 'Yellow', 'Red']

In [None]:
# Identify the different labels of the pie-chart
# For this data set, the labels are going to be the values located in the "Case status"
# feature

labels_go = ['Recovered','Active','Died']

In [None]:
# A list that provides the quantities of each label.
# For this data set, it's the list o fdifferent kinds of case status

values = [recovered, active, died]

In [None]:
# Draw the pie-chart using the "lables" and "values" created above

fig = go.Figure(data = [go.Pie(labels = labels_go, 
                               values = values,
                              )
                       ],
               )
# Update the layout, format the width and height of the pie chart, pass it the general
# title 
fig.update_layout(width = 800,
                  height = 600,  
                  title = General_title_goPie,
                  font = dict(family = "Courier New, monospace",
                              size = 15,
                              color = "Black"
                             )
                 )

fig.update_traces(hoverinfo = 'value', 
                  textinfo = 'percent', 
                  marker = dict(colors = colors),
                 )

fig.show()

#### <font style = "color:rgb(20, 53, 103)" > **Example 2 - Interactive Pie-Chart using `ipwidgets` with Plotly** </font>

The code below will allow you to build an interactive pie-chart, one that will allow to visualize each categorical feature, and apply different labels. In the example above the labels was `Case status`, which had three categories in it:
* Recovered
* Active
* Died

However, this dataset also contains categorical information about:

* Alberta Health Services Zone
* Gender
* Age group
* Case status
* Case type

**Note:**

The `Date reported` field will not be considered a categorical value,even though it's dtype is `object`.

##### <font style = "color:rgb(20, 53, 103)" > **Total Recorded Cases** </font>

This part chart will help you analyze the total recorded cases:

In [None]:
# Create a general chart title
General_title_goPie = f"{total_records:,} recorded cases as of {data_set_date}."

General_title_goPie

In [None]:
# Identify the different labels of the pie-chart
# For this data set, the labels are going to be the values located in the "Case status"
# feature

hover_label = ['Recovered','Active','Died']

In [None]:
# A list that provides the quantities of each label.
# For this data set, it's the list o fdifferent kinds of case status

hover_data = [recovered, active, died]

In [None]:
#Create a series of labels to scroll through the data

labels_pie = ['Case status', 'Alberta Health Services Zone', 'Gender', 'Age group']

In [None]:
@widgets.interact
def interactive_piechart(label = labels_pie):    
    fig = px.pie(df,
                 names = label,
                 labels = None,
                 values =  None,
                 color = label,
                 #hover_name = df[label],
                 #hover_data = df[label],
                 color_discrete_map = color_discrete_map,
                 width = 800,
                 height = 600,
                 title = General_title_goPie,
                 color_discrete_sequence = px.colors.qualitative.Dark24,
                )
    
    fig.update_layout(plot_bgcolor = 'Gray')

    fig.show()

##### <font style = "color:rgb(20, 53, 103)" > **Active Cases** </font>

This pie chart will help you analyze the active cases. In this visualization, we'll use radio buttons instead of drop-downs for interactions:

In [None]:
# Create a general chart title
General_title_active_pie = f"There are {active:,} active cases in Alberta as of {data_set_date}."

General_title_active_pie

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_active_pie = widgets.RadioButtons(options = ['Alberta Health Services Zone', 'Gender', 'Age group']) # Declare the set of radio buttons and provide options

In [None]:
# This pie chart summarizes the data for just the active cases
@widgets.interact
def interactive_piechart(features = groupings_active_pie):    
    fig = px.pie(df_active,
                 names = features,
                 labels = None,
                 values = None,
                 color = features,
                 color_discrete_map = color_discrete_map,

                 width = 800,
                 height = 600,
                 title = General_title_active_pie,
                 color_discrete_sequence = px.colors.qualitative.Dark24
                )

    fig.show()

##### <font style = "color:rgb(20, 53, 103)" > **Total Deaths** </font>

This pie chart will help you analyze the recorded deaths in Alberta. In this visualization, we'll use radio buttons instead of drop-downs for interactions:

In [None]:
# Create a general chart title
General_title_deaths_pie = f"There have been {died:,} deaths in Alberta as of {data_set_date}."

General_title_deaths_pie

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_deaths_pie = widgets.RadioButtons(options = ['Alberta Health Services Zone', 'Gender', 'Age group']) # Declare the set of radio buttons and provide options

In [None]:
# This pie chart summarizes the data for just the active cases
@widgets.interact
def interactive_piechart(features = groupings_deaths_pie):    
    fig = px.pie(df_fatal,
                 names = features,
                 labels = None,
                 values = None,
                 color = features,
                 color_discrete_map = color_discrete_map,

                 width = 800,
                 height = 600,
                 title = General_title_deaths_pie,
                 color_discrete_sequence = px.colors.qualitative.Dark24
                )

    fig.show()

## <font style = "color:rgb(20, 53, 103)" > **Advanced: Using the Plotly Graphing Library for EDA** </font>

This section uses Plotly to explore the data and see what insights we can gain from it.

### <font style = "color:rgb(20, 53, 103)" > **Visualizing data selection - by Quarter** </font>

The following code shows how you can create a selection, in order to compare yearly quarter over quarter COVID data:

#### <font style = "color:rgb(20, 53, 103)" > **Example 1** </font>

Basic construction.

This example shows how you can create a data set based on a selection.

Change the `value` variable and see the results in the dataset.

In [None]:
# Stores the value of the quarter you want to compare
# Any value from 1 to 4
value = 3

In [None]:
# Create a selection of the dataset query
by_quarter = df[df['Quarter'] == value]

by_quarter

#### <font style = "color:rgb(20, 53, 103)" > **Example 2** </font>

Advanced construction.

This examples shows how to put making a selection into context of data visualization:

In [None]:
# Stores the value of the quarter you want to compare
# Any value from 1 to 4
value = 2

In [None]:
# Create a selection of the dataset query
by_quarter = df[df['Quarter'] == value]

# Visualize the dataset
fig = px.histogram(by_quarter, 
                   x = 'Date reported',
                   color = 'Case status',
                   color_discrete_map = color_discrete_map,
                  )

fig.show()

#### <font style = "color:rgb(20, 53, 103)" > **Example 3** </font>

This example puts it all together into an interactive visualization in order for you to inspect the data quarter by quarter:

In [None]:
# this radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_for_quarter = widgets.RadioButtons(options = ['relative', 'group', 'overlay'],
                                            ) # Declare the set of radio buttons and provide options

In [None]:
# This slider will allow you to select the quarter that you want to inspect
quarter_labels = widgets.SelectionSlider(options = [1, 2, 3, 4],
                                         value = 1,
                                         description = 'Select Q',
                                         disabled = False,
                                         continuous_update = False,
                                         orientation = 'horizontal',
                                         readout = True
                                        )

In [None]:
# This widget allows you to create a histogram, like the ones produces in the previous notebook.
# This histogram will allow you to compare year over year COVID data, by quarter
@widgets.interact
def hist_inspect(grouping = groupings_allCases, label = labels, feature = features, quarter = quarter_labels):
    
    # Create a general chart title
    General_title_deaths_Q = f"Year over year assessments by quarter: Q{quarter}."
    
    # Create a selection of the dataset query
    by_quarter = df[df['Quarter'] == quarter]
    
    # Visualize the dataset
    fig = px.histogram(by_quarter, 
                       x = feature,
                       color = label,
                       facet_col = 'Year',
                       barmode = grouping,
                       category_orders = category_orders,
                       color_discrete_map = color_discrete_map,
                       title = General_title_deaths_Q,
                       width = 1600, 
                       height = 800
                      )

    fig.show() 

### <font style = "color:rgb(20, 53, 103)" > **Visualizing data selection - by Month** </font>

The following code shows how you can create a selection, in order to compare yearly Month by Month COVID data:

#### <font style = "color:rgb(20, 53, 103)" > **Example 1** </font>

In [None]:
# Stores the value of the Month you want to compare
# Any value from 1 to 12 (Jan to Dec)
month = 3

In [None]:
# Create a selection of the dataset query
by_month = df[df['Month'] == month]

by_month

#### <font style = "color:rgb(20, 53, 103)" > **Example 2** </font>

Advanced construction.

This examples shows how to put making a selection into context of data visualization:

In [None]:
# Stores the value of the Month you want to compare
# Any value from 1 to 12 (Jan to Dec)
month = 5

In [None]:
# Create a selection of the dataset query
by_month = df[df['Month'] == month]

# Visualize the dataset
fig = px.histogram(by_month, 
                   x = 'Date reported',
                   color = 'Case status',
                   color_discrete_map = color_discrete_map,
                  )

fig.show()

#### <font style = "color:rgb(20, 53, 103)" > **Example 3A - All Recorded Cases** </font>

This example puts it all together into an interactive visualization in order for you to inspect the data month by month:

In [None]:
# This radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_for_month = widgets.RadioButtons(options = ['relative', 'group', 'overlay'],
                                            ) # Declare the set of radio buttons and provide options

In [None]:
# This dropdown holds the list
# stored in the 'labels' object.
# A widget is created because then 
# the values can change multiple graphs at once
labels_dropdown = widgets.Dropdown(options = labels,
                                value = None,
                                description = 'label',
                                disabled = False,
                               )

In [None]:
# This dropdown holdsthe values of features
# that can be explored.
# A widget is created so one dropdown can 
# change multiple graphs at once
features_dropdown = widgets.Dropdown(options = features,
                                     value = 'Date reported',
                                     description = 'features',
                                     disabled = False,
                                     )

In [None]:
# This slider will allow you to select the month that you want to inspect
# Jan = 1, Dec = 12
month_labels = widgets.Dropdown(options = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                                value = 1,
                                description = 'Month',
                                disabled = False,
                               )

In [None]:
# This widget allows you to create a histogram, like the ones produces in the previous notebook.
# This histogram will allow you to compare year over year COVID data, by quarter
@widgets.interact
def hist_inspect(grouping = groupings_for_month, label = labels_dropdown, feature = features_dropdown, month = month_labels):
    
    # Create a general chart title
    General_title_all_cases_month = f"Year over year assessments by Month: Month Number:{month}."
    
    # Create a selection of the dataset query
    by_month = df[df['Month'] == month]
    
    # Visualize the dataset
    fig = px.histogram(by_month, 
                       x = feature,
                       color = label,
                       facet_col = 'Year',
                       barmode = grouping,
                       category_orders = category_orders,
                       color_discrete_map = color_discrete_map,
                       title = General_title_all_cases_month,
                       width = 1600, 
                       height = 800
                      )

    fig.show() 

#### <font style = "color:rgb(20, 53, 103)" > **Example 3B - All Fatal Cases** </font>

This example will show how widgets can be sync'd. I'm going to use the same widgets created above, the only differnce is the data set.

If I change a setting in one graph, the setting in the other will automatically be updated as well.

This way, you can compare the results of each graph.

In [None]:
# This widget allows you to create a histogram, like the ones produces in the previous notebook.
# This histogram will allow you to compare year over year COVID data, by quarter
@widgets.interact
def hist_inspect(grouping = groupings_for_month, label = labels_dropdown, feature = features_dropdown, month = month_labels):
    
    # Create a general chart title
    General_title_all_cases_month = f"Year over year assessments by Month: Month Number:{month}."
    
    # Create a selection of the dataset query
    by_month = df_fatal[df_fatal['Month'] == month]
    
    # Visualize the dataset
    fig = px.histogram(by_month, 
                       x = feature,
                       color = label,
                       facet_col = 'Year',
                       barmode = grouping,
                       category_orders = category_orders,
                       color_discrete_map = color_discrete_map,
                       title = General_title_all_cases_month,
                       width = 1600, 
                       height = 800
                      )

    fig.show() 

#### <font style = "color:rgb(20, 53, 103)" > **Example 3C - Using a Date Picker (Part 1)** </font>

This example will show how to use a date picker widgets. First I'll create a simple query, then show how to bundle it in a visualizaiton.

I'll have both visualizations sync'd, like the example above.

To start with the date picker, let's first create a start date picker, an end date picker, and variables to store these results: 

In [None]:
# Object to store the widget start date picker
start_date = widgets.DatePicker(description='Start Date',
                                disabled = False
                               )

start_date

In [None]:
# Object ot store the end date picker
end_date = widgets.DatePicker(description='End Date',
                              disabled = False
                              )

end_date

In [None]:
# Assign these values into some variables
stday = start_date.value

enday = end_date.value

In [None]:
# Print the results to show that the start date
# and end date have been correctly chosen
print(stday)

print(enday)

In [None]:
# Create a data frame based on the date selection range.
# In order for this to work, you must use an ISO value
result_by_selection = df[df['ISO Date'].isin(pd.date_range(stday, enday))]

result_by_selection

In [None]:
# Visualize the dataset
fig = px.histogram(result_by_selection, 
                   x = 'Date reported',
                   color = 'Case status',
                   color_discrete_map = color_discrete_map,
                  )

fig.show()

Putting it all together in an interactive widget to query the data frame.

In [None]:
# This script will show you how to put a date pick app together
@widgets.interact
def hist_inspector(sday = start_date, eday = end_date):
    
    print(sday)
    print(eday)
    
    start = sday
    end = eday
    
    print("\n")
    
    if start is None:
        start = first_date_reported
        
    if end is None:
        end = current_date_reported
        
    print(start)
    print(end)
    
    # Create a data frame based on the date selection range.
    # In order for this to work, you must use an ISO value
    result_by_selection = df[df['ISO Date'].isin(pd.date_range(start, end))]
    
    return result_by_selection

#### <font style = "color:rgb(20, 53, 103)" > **Example 3C - Using a Date Picker (Part 2)** </font>

This example will show how to use a date picker widgets to make date selections to visualize data between two dates.

In [None]:
# Variable that stored the first day recorded in the 
# dataset
first_date_reported

In [None]:
# Variable that stored the latest day recorded in the 
# dataset
current_date_reported

In [None]:
# Object to store the widget start date picker
start_date_pick = widgets.DatePicker(description = 'Start Date',
                                     disabled = False
                                     )

In [None]:
# Object ot store the end date picker
end_date_pick = widgets.DatePicker(description = 'End Date',
                                   disabled = False
                                   )

In [None]:
# This radio button holds options that allows
# you to chose if the histgram is stacked (relative)
# grouped, or overlayed.
groupings_for_month = widgets.RadioButtons(options = ['relative', 'group', 'overlay'],
                                            ) # Declare the set of radio buttons and provide options

In [None]:
# This dropdown holds the list
# stored in the 'labels' object.
# A widget is created because then 
# the values can change multiple graphs at once
labels_dropdown = widgets.Dropdown(options = labels,
                                   value = None,
                                   description = 'label',
                                   disabled = False,
                                   )

In [None]:
# This dropdown holdsthe values of features
# that can be explored.
# A widget is created so one dropdown can 
# change multiple graphs at once
features_dropdown = widgets.Dropdown(options = features,
                                     value = 'Date reported',
                                     description = 'features',
                                     disabled = False,
                                     )

In [None]:
# This widget allows you to create a histogram, like the ones produces in the previous notebook.
# This histogram will allow you to compare year over year COVID data, by quarter
@widgets.interact
def hist_inspect(sday = start_date_pick, eday = end_date_pick, grouping = groupings_for_month, label = labels_dropdown, feature = features_dropdown):
    
    # Variables used to store
    # the start and end dates
    start = sday
    end = eday
    
    # Used for intial values
    if start is None:
        start = first_date_reported
        
    if end is None:
        end = current_date_reported
    
    # Create a data frame based on the date selection range.
    # In order for this to work, you must use an ISO value
    result_by_selection = df[df['ISO Date'].isin(pd.date_range(start, end))]
    
    # Create a general chart title
    General_title_all_cases = f"All cases from {start} to {end} (YYYY-MM-DD)."
    
    # Visualize the dataset
    fig = px.histogram(result_by_selection, 
                       x = feature,
                       color = label,
                       #facet_col = 'Year',
                       barmode = grouping,
                       category_orders = category_orders,
                       color_discrete_map = color_discrete_map,
                       title = General_title_all_cases,
                       width = 1600, 
                       height = 800
                      )
    
    fig.update_layout(plot_bgcolor = 'Gray')

    fig.show() 

The bellow code will sync both datasets, so information can be compared:

In [None]:
# This widget allows you to create a histogram, like the ones produces in the previous notebook.
# This histogram will allow you to compare year over year COVID data, by quarter
@widgets.interact
def hist_inspect(sday = start_date_pick, eday = end_date_pick, grouping = groupings_for_month, label = labels_dropdown, feature = features_dropdown):
    
    # Variables used to store
    # the start and end dates
    start = sday
    end = eday
    
    # Used for intial values
    if start is None:
        start = first_date_reported
        
    if end is None:
        end = current_date_reported
    
    # Create a data frame based on the date selection range.
    # In order for this to work, you must use an ISO value
    result_by_selection = df_fatal[df_fatal['ISO Date'].isin(pd.date_range(start, end))]
    
    # Create a general chart title
    General_title_all_cases = f"Total deaths from {start} to {end} (YYYY-MM-DD)."
    
    # Visualize the dataset
    fig = px.histogram(result_by_selection, 
                       x = feature,
                       color = label,
                       #facet_col = 'Year',
                       barmode = grouping,
                       category_orders = category_orders,
                       color_discrete_map = color_discrete_map,
                       title = General_title_all_cases,
                       width = 1600, 
                       height = 800
                      )
    
    fig.update_layout(plot_bgcolor = 'Gray')

    fig.show() 

NameError: name 'start_date_pick' is not defined

## <font style = "color:rgb(20, 53, 103)" > **Conclusion** </font>

I hope this notebook offers some suggestions and help on how to use plotly and ipywidgets.