# Covid-19 dashboard
---
__Goal,__ create a dynamic dashboard from tableau example

In [78]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go
import plotly.express as px

## 1.Data Cleanning and Preparation
---

In [79]:
df = pd.read_csv('database.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33250 entries, 0 to 33249
Data columns (total 8 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Province/State  33250 non-null  object 
 1   Country/Region  31125 non-null  object 
 2   Lat             33250 non-null  float64
 3   Long            33250 non-null  float64
 4   Date            33250 non-null  object 
 5   Confirmed       33250 non-null  int64  
 6   Death           33250 non-null  int64  
 7   Recovered       29750 non-null  float64
dtypes: float64(3), int64(2), object(3)
memory usage: 2.0+ MB


In [80]:
display(df[df['Country/Region'].isnull()],
df[df['Recovered'].isnull()])

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Death,Recovered
3875,Burkina Faso,,12.2383,-1.5616,2020-01-22,0,0,0.0
3876,Burkina Faso,,12.2383,-1.5616,2020-01-23,0,0,0.0
3877,Burkina Faso,,12.2383,-1.5616,2020-01-24,0,0,0.0
3878,Burkina Faso,,12.2383,-1.5616,2020-01-25,0,0,0.0
3879,Burkina Faso,,12.2383,-1.5616,2020-01-26,0,0,0.0
...,...,...,...,...,...,...,...,...
32620,Western Sahara,,24.2155,-12.8858,2020-05-21,6,0,6.0
32621,Western Sahara,,24.2155,-12.8858,2020-05-22,6,0,6.0
32622,Western Sahara,,24.2155,-12.8858,2020-05-23,6,0,6.0
32623,Western Sahara,,24.2155,-12.8858,2020-05-24,9,0,6.0


Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Death,Recovered
4250,Cameroon,Cameroun,3.848000,11.502100,2020-01-22,0,0,
4251,Cameroon,Cameroun,3.848000,11.502100,2020-01-23,0,0,
4252,Cameroon,Cameroun,3.848000,11.502100,2020-01-24,0,0,
4253,Cameroon,Cameroun,3.848000,11.502100,2020-01-25,0,0,
4254,Cameroon,Cameroun,3.848000,11.502100,2020-01-26,0,0,
...,...,...,...,...,...,...,...,...
33120,Tajikistan,Tajikistan,38.861034,71.276093,2020-05-21,2350,44,
33121,Tajikistan,Tajikistan,38.861034,71.276093,2020-05-22,2551,44,
33122,Tajikistan,Tajikistan,38.861034,71.276093,2020-05-23,2738,44,
33123,Tajikistan,Tajikistan,38.861034,71.276093,2020-05-24,2929,46,


Let's delete the `Country/Region` (we will only use the `Province/State` column) and the `Recovered` feature.

ü§î We will maybe go back in our choice and fill the `Recovered` feature. but for the moment let's go like that

In [81]:
df.drop(columns=['Country/Region','Recovered'], inplace=True)

We also need to convert Date into datetime format

In [82]:
# df['Date'] = pd.to_datetime(df['Date'])
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 33250 entries, 0 to 33249
Data columns (total 6 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Province/State  33250 non-null  object 
 1   Lat             33250 non-null  float64
 2   Long            33250 non-null  float64
 3   Date            33250 non-null  object 
 4   Confirmed       33250 non-null  int64  
 5   Death           33250 non-null  int64  
dtypes: float64(2), int64(2), object(2)
memory usage: 1.5+ MB


Sounds good now we need to create the different part of the dashboard:
- sliders
- global map
- histogram for Confirmed cases and Death
- the 10 counbtry with the biggest death and confirmed cases

##  Gloabal map
---
In that purpose we will use mapbox API (scatter mapbox), for that we need to retreive the mapbox token from `mapbox_token.txt`.

‚ö†Ô∏è I also notice something weird, the plotly mapbox API use string Data for the animation instead of datetime

In [83]:
# read the mapbox file
with open('mapbox_token.txt') as f:
    lines=[x.rstrip() for x in f]
mapbox_access_token = lines[0]

In [84]:
last_date = df['Date'].iloc[-1]
df1 = df[df['Date'] == last_date]

We add a `color` feature which is just the log value of confirmed cases! The goal here is to decrease the huge difference of color between the US and the other country

In [85]:
df1["color"]= df1["Confirmed"].map(lambda x: np.log2(x+1e-6)) # add 1e-6 to avoid log(0)
px.set_mapbox_access_token(mapbox_access_token)
map_plot = px.scatter_mapbox(df1, lat="Lat", lon="Long", 
                        hover_name="Province/State",
                        zoom=0.6, mapbox_style='dark',
                        size="Confirmed", size_max=40, 
                        color="color", color_continuous_scale=['Gold', 'DarkOrange', 'Crimson'])


map_plot.update(layout_coloraxis_showscale=False)

map_plot.show()



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



### TODO
- are the value correspond to the real confirmed cases?
- __add a white border__ 
- __customize the tooltips__
---

## 3. Dash app

In [86]:
# df['Date'] = pd.to_datetime(df['Date'])
dates = df['Date'].unique()

In [None]:
import dash
import dash_core_components as dcc
import dash_html_components as html

external_stylesheets = ['https://codepen.io/chriddyp/pen/bWLwgP.css']

app = dash.Dash()
app.layout = html.Div([
    html.H1(children='Evolution du COVID-19 √† travers le monde'),
    dcc.Slider(
        id = 'date_slider',
        min=0,
        max=len(dates),
        marks={i:date for i,date in enumerate(dates) if i%15==0},
        value=len(dates)), 
    dcc.Graph(figure=map_plot)
])

# @app.callback(
#     dash.dependencies.Output('slider-output-container', 'children'),
#     [dash.dependencies.Input('my-slider', 'value')])
# def update_output(value):
#     return 'You have selected "{}"'.format(value)

app.run_server(debug=True, use_reloader=False)  # Turn off reloader if inside Jupyter

Running on http://127.0.0.1:8050/
Debugger PIN: 533-312-364
 * Serving Flask app "__main__" (lazy loading)
 * Environment: production
[2m   Use a production WSGI server instead.[0m
 * Debug mode: on


In [88]:
# df['Date'] = pd.to_datetime(df['Date'])
dates = df['Date'].unique()
month = pd.to_datetime(df['Date']).dt.month
month_dict = {1:'Janvier',2:'F√©vrier', 3:'Mars', 4:'Avril', 5:'Mai', 6:'Juin',
             7:'Juillet', 8:'Aout', 9:'Septembre', 10:'Octobre', 11:'Novembre', 12:'Decembre'}
{i:date for i,date in enumerate(dates) if i%15==0}

{0: '2020-01-22',
 15: '2020-02-06',
 30: '2020-02-21',
 45: '2020-03-07',
 60: '2020-03-22',
 75: '2020-04-06',
 90: '2020-04-21',
 105: '2020-05-06',
 120: '2020-05-21'}

In [48]:
dates

array(['2020-01-22', '2020-01-23', '2020-01-24', '2020-01-25',
       '2020-01-26', '2020-01-27', '2020-01-28', '2020-01-29',
       '2020-01-30', '2020-01-31', '2020-02-01', '2020-02-02',
       '2020-02-03', '2020-02-04', '2020-02-05', '2020-02-06',
       '2020-02-07', '2020-02-08', '2020-02-09', '2020-02-10',
       '2020-02-11', '2020-02-12', '2020-02-13', '2020-02-14',
       '2020-02-15', '2020-02-16', '2020-02-17', '2020-02-18',
       '2020-02-19', '2020-02-20', '2020-02-21', '2020-02-22',
       '2020-02-23', '2020-02-24', '2020-02-25', '2020-02-26',
       '2020-02-27', '2020-02-28', '2020-02-29', '2020-03-01',
       '2020-03-02', '2020-03-03', '2020-03-04', '2020-03-05',
       '2020-03-06', '2020-03-07', '2020-03-08', '2020-03-09',
       '2020-03-10', '2020-03-11', '2020-03-12', '2020-03-13',
       '2020-03-14', '2020-03-15', '2020-03-16', '2020-03-17',
       '2020-03-18', '2020-03-19', '2020-03-20', '2020-03-21',
       '2020-03-22', '2020-03-23', '2020-03-24', '2020-