# Objectives
1. Summarize how interactive plots can be useful to Decision Makers
2. Differentiate between exploratory data visualization and data visualization to illustrates analysis results
3. Use Dash to create interactive plots

# 1. Interactive Plots

In [91]:
import plotly.express as px

In [92]:
df = px.data.gapminder()

fig = px.scatter(df.query("year==2007"), x="gdpPercap", y="lifeExp", size="pop", color="continent",
                 hover_name="country", log_x=True, size_max=60)
fig.show()

This interactive plot is pretty sweet!! But, can we make it even better? **OF COURSE!!** I won't put all the code in here, but check out this [interactive plot](https://plotly.com/python/v3/gapminder-example/).

In [93]:
df = px.data.gapminder().query("year == 2007")
fig = px.scatter_geo(df, locations="iso_alpha",
                     size="pop", # size of markers, "pop" is one of the columns of gapminder
                     )
fig.show()

These are just a few of the interactive plots I thought could be useful to tell our COVID story from the case study. Check out all of the options for interactive plots [here](https://plotly.com/python/)!!

# 2. COVID Case Study

In [94]:
# Import packages for data manipulation and visualization
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

### Load and Inspect Data
[Read](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) the two COVID-19 global .csv files using the URLs above to DataFrames named `cases` and `deaths`, respectively. Additionally, read the `population.csv` file to a DataFrame named `population`. Remember, the file must be in the same directory as this Jupyter Notebook or you must specify the entire file path. Inspect the first five rows of the `cases`.

In [95]:
cases = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
deaths = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
population = pd.read_csv('Data/population_global.csv')
cases.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,3/8/21,3/9/21,3/10/21,3/11/21,3/12/21,3/13/21,3/14/21,3/15/21,3/16/21,3/17/21
0,,Afghanistan,33.93911,67.709953,0,0,0,0,0,0,...,55876,55876,55894,55917,55959,55959,55985,55985,55995,56016
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,113580,114209,114840,115442,116123,116821,117474,118017,118492,118938
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,114382,114543,114681,114851,115008,115143,115265,115410,115540,115688
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,11069,11089,11130,11130,11199,11228,11266,11289,11319,11360
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,21108,21114,21161,21205,21265,21323,21380,21407,21446,21489


# 2. Manipulating our Data Into Tidy Data

In [96]:
cases = cases.rename(columns={"Country/Region": "country"})
deaths = deaths.rename(columns={"Country/Region": "country"})

In [97]:
country_cases = cases.drop(['Province/State', 'Lat', 'Long'], axis=1)
country_deaths = deaths.drop(['Province/State', 'Lat', 'Long'], axis=1)

In [98]:
country_cases = country_cases.groupby('country').agg(sum)
country_deaths = country_deaths.groupby('country').agg(sum)
country_population = population.groupby('country').agg(sum)
country_cases.head()

Unnamed: 0_level_0,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,...,3/8/21,3/9/21,3/10/21,3/11/21,3/12/21,3/13/21,3/14/21,3/15/21,3/16/21,3/17/21
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,0,0,0,0,0,0,0,0,0,0,...,55876,55876,55894,55917,55959,55959,55985,55985,55995,56016
Albania,0,0,0,0,0,0,0,0,0,0,...,113580,114209,114840,115442,116123,116821,117474,118017,118492,118938
Algeria,0,0,0,0,0,0,0,0,0,0,...,114382,114543,114681,114851,115008,115143,115265,115410,115540,115688
Andorra,0,0,0,0,0,0,0,0,0,0,...,11069,11089,11130,11130,11199,11228,11266,11289,11319,11360
Angola,0,0,0,0,0,0,0,0,0,0,...,21108,21114,21161,21205,21265,21323,21380,21407,21446,21489


In [99]:
country_cases = country_cases.join(country_population.population)
country_deaths = country_deaths.join(country_population.population)

In [100]:
cases_tidy = country_cases.reset_index().melt(id_vars=['country', 'population'],
                                              var_name='date',
                                              value_name='cases'
                                              )
deaths_tidy = country_deaths.reset_index().melt(id_vars=['country', 'population'],
                                               var_name='date',
                                               value_name='deaths'
                                               )

In [101]:
# change date column datatype from object to datetime
cases_tidy.date = pd.to_datetime(cases_tidy.date)
deaths_tidy.date = pd.to_datetime(deaths_tidy.date)

In [102]:
df = cases_tidy.join(deaths_tidy['deaths'])
df

Unnamed: 0,country,population,date,cases,deaths
0,Afghanistan,32225560,2020-01-22,0,0
1,Albania,2845955,2020-01-22,0,0
2,Algeria,43000000,2020-01-22,0,0
3,Andorra,77543,2020-01-22,0,0
4,Angola,31127674,2020-01-22,0,0
...,...,...,...,...,...
80827,Vietnam,96208984,2021-03-17,2567,35
80828,West Bank and Gaza,4976684,2021-03-17,215984,2343
80829,Yemen,29825968,2021-03-17,3037,713
80830,Zambia,17885422,2021-03-17,85502,1170


# 3. Interactive Plots

In [103]:
fig = px.scatter(df.loc[df.date == "2021-03-09"], x="cases", y="deaths", size="population", 
                 hover_name='country', log_x=True)
fig.show()

In [71]:
df.loc[df.country == "US"]

Unnamed: 0,country,population,date,cases,deaths
178,US,329584842,2020-01-22,1,0
370,US,329584842,2020-01-23,1,0
562,US,329584842,2020-01-24,2,0
754,US,329584842,2020-01-25,2,0
946,US,329584842,2020-01-26,5,0
...,...,...,...,...,...
80050,US,329584842,2021-03-13,29400553,534316
80242,US,329584842,2021-03-14,29438775,534888
80434,US,329584842,2021-03-15,29495424,535628
80626,US,329584842,2021-03-16,29549003,536914


In [72]:
fig = px.line(df.loc[df.country == "US"], x="date", y="cases", title="US Cases Trend")
fig.show()

In [77]:
df_geo = df.set_index('country').join(cases[['country', 'Lat', 'Long']].set_index('country')).reset_index().groupby('country').agg(max)
df_geo.head()

Unnamed: 0_level_0,population,date,cases,deaths,Lat,Long
country,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Afghanistan,32225560,2021-03-17,56016,2460,33.93911,67.709953
Albania,2845955,2021-03-17,118938,2092,41.1533,20.1683
Algeria,43000000,2021-03-17,115688,3048,28.0339,1.6596
Andorra,77543,2021-03-17,11360,113,42.5063,1.5218
Angola,31127674,2021-03-17,21489,522,-11.2027,17.8739


In [80]:
fig = px.scatter_geo(df_geo, lat='Lat', lon='Long',
                     size="cases", # size of markers, "pop" is one of the columns of gapminder
                     )
fig.show()

# 4. Dash
- Take some to underatand [what Dash is](https://medium.com/plotly/introducing-dash-5ecf7191b503) and what it offers for analysts to tell a story about data. 
- Once you have a decent understanding of how Dash works, check out some of [these cool examples](https://dash-gallery.plotly.host/Portal/)!