<a href="https://colab.research.google.com/github/TobiasLaimer/ColabStuff/blob/master/COVID_Python_Starter_incl_Germany.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring COVID-19 Data with Python

This notebook is an ongoing effort to allow users to explore COVID-19 data with python. It imports and prepares the data for analysis, as well as provides baseline visuals. 

All of the data is provided by John's Hopkins [Center for Systems Science and Engineering](https://github.com/CSSEGISandData/COVID-19):  

Inspiration for a lot of these visuals comes from [@jburnmurdoch](https://twitter.com/jburnmurdoch) and the Financial Times for their tremendous work with COVID-19 data.

Contributions by: 
[@benbbaldwin](https://twitter.com/benbbaldwin)

In [0]:
import pandas as pd
from plotly import graph_objs as go
import plotly.express as px
import numpy as np
pd.set_option('mode.chained_assignment', None)
np.seterr(divide = 'ignore') 
import math
from plotly.subplots import make_subplots
import datetime as dt

In [0]:
### THIS IS THE DEFAULT LIST OF COUNTRIES WHICH WILL VISUALIZE, PLEASE ADD/CHANGE IF YOU WOULD LIKE TO SEE OTHERS ###
countries_of_interest = ['US','Italy','Spain','France','Korea, South','Japan','United Kingdom','Germany']

In [0]:
def prep_data():
  #READ DATA IN FROM GITHUB
  #deaths=pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv')

  #new deaths file from JHU
  deaths=pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')

  #UNPIVOT INTO A LONG DF and RENAME COLUMNS
  deaths2 = pd.melt(deaths,id_vars=['Province/State','Country/Region','Lat','Long'])
  deaths2.rename(columns={"variable": "date", "value": "deaths"},inplace=True)

  #SET DATE DATATYPE
  deaths2['date'] = pd.to_datetime(deaths2['date'])
  deaths_agg = pd.DataFrame(deaths2.groupby(['Country/Region','date'])['deaths'].sum()).reset_index()

  #LIMIT TO COUNTRIES OF INTEREST
  # I DO THIS IN THE CHART BUILDING NOW, THE PREP_DATA() RETURNS ALL COUNTRIES, AGG'D TO THE COUNTRY LEVEL
  # countries_of_interest = ['US','China','Italy','Spain','Iran','France','Korea, South','Japan','United Kingdom']
  # final = deaths_agg[deaths_agg['Country/Region'].isin(countries_of_interest)]

  #GET FIRST DATE WHERE DEATHS > 10 and START COUNTING FROM THERE
  final=deaths_agg
  final['d10'] = np.where(final['deaths']>= 10,1,0)
  final['days_since_10']=final.groupby(by=['Country/Region'])['d10'].cumsum()

  #LIMIT TO ONLY FIRST 35 DAYS SINCE 10 DEATHS
  final_use = final[final['days_since_10']<=35]

  final_use['date_only'] = final_use['date'].dt.date
  print("Data from John's Hopkins Updated as of: {}".format(np.max(final_use['date_only'])))
  return final_use

In [0]:
def plot_line_charts(chart_type,countries):
  '''Thanks to @benbaldwin for making these charts actually look good.'''
  #PLOT USING PLOTLY, LOOP OVER COUNTRIES TO GET ANNOTATION POINTS

  df = prep_data()
  fig = go.Figure()
  annotations = []
  i = 0

  #LIMIT TO COUNTRIES OF INTEREST
  #countries_of_interest = ['US','China','Italy','Spain','Iran','France','Korea, South','Japan','United Kingdom']
  final = df[df['Country/Region'].isin(countries)]
  final['log_deaths'] = np.log(final['deaths'])

  for c in countries_of_interest:
      color = px.colors.qualitative.G10[i]

      if chart_type == 'log':
        y_annot = math.log(final.loc[final['Country/Region'] == c, 'deaths'].values[-1]) / math.log(10)
      else:
        y_annot = final.loc[final['Country/Region'] == c, 'deaths'].values[-1]
      
      #print(y_annot)
      annotations.append(go.layout.Annotation(
          x = final[final['Country/Region']=='{}'.format(c)].iloc[-1]['days_since_10'],
          y = y_annot,
          text=c,
          font=dict(
              family="Franklin Gothic",
              size=16,
              color="black"
              ))
          )

      #make the data in a bad way
      tmp = final[final['Country/Region'].isin([c])]
      x1 = tmp['days_since_10']
      y1 = tmp['deaths']

      #this lets us control the lines
      fig.add_trace(go.Scatter(x=x1, y=y1, name = c, mode='lines+markers', line=dict(color = color, width=3)))
      i += 1

  fig.update(layout=go.Layout(showlegend=False, annotations=annotations, height=1000, width=2000))
  fig.update_layout(
      title={
        'text': "COVID-19 tracker: Deaths since 10th death | {} scale".format(chart_type),
        'y':0.9,
        'x':0.5,
        'xanchor': 'center',
        'yanchor': 'top'},
      xaxis_title="Days since 10th death"
  )

  max = np.max(final['date'])
  note = 'Updated ' + str(max.month) + '-' + str(max.day)


  fig.add_annotation(
          x=0.1,
          y=0.89,
          xref="paper",
          yref="paper",
          text=note,
          showarrow=False,
          font=dict(
              family="Courier New, monospace",
              size=16,
              color="red"
              ),
          align="center",
          ax=20,
          ay=-30,
          bordercolor="black",
          borderwidth=2,
          borderpad=4,
          bgcolor="white",
          opacity=0.8
          )

  fig.add_annotation(
      x=0.1,
      y=0.8,
      xref="paper",
      yref="paper",
      text="Source: JHU",
      showarrow=False,
      font=dict(
        family="Courier New, monospace",
        size=16,
        color="red"
        ),
  )


  #print(fig.layout)
  fig.layout.template = 'plotly_white'
  fig.layout.template = 'presentation'

  fig.update_layout(yaxis_type=chart_type)
  fig.show()

Much discussion has been [had](https://twitter.com/stat_sam/status/1243693482516131840) about using log vs. linear scales in these COVID infection and death charts. Many people, even those trained in statistics, have to make a conscious effort to understand the log-scale. This code below lets you visualize the JHU data by passing either "linear" or "log" to the chart building function.

In [9]:
plot_line_charts("linear",countries_of_interest)

Data from John's Hopkins Updated as of: 2020-04-17


In [10]:
plot_line_charts("log",countries_of_interest)

Data from John's Hopkins Updated as of: 2020-04-17


In [0]:
def build_map(countries):
  '''This function will build a map of the data centralized at each country's Lat/Long.
     Currently this only shows the latest death figure for each country "of interest", however I am working on animating this over time.
  '''
#GET LAT and LONG to JOIN OVER 
  df=prep_data()
  deaths=pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv')
  locs = deaths[(deaths['Country/Region'].isin(countries)) & deaths['Province/State'].isin(['Hubei',np.nan])][['Country/Region','Lat','Long']]
  totals = pd.DataFrame(df.groupby(['Country/Region'])['deaths'].max()).reset_index()
  totals_map = locs.merge(totals,left_on='Country/Region',right_on='Country/Region',how='inner')

  # MAP
  token='pk.eyJ1IjoiamV6bGF4IiwiYSI6ImNrOGF4NmVsczAydDgzZW8yMDVsODA3d2IifQ.AUEyPjoEVaZd9zwGKnSzgg'
  px.set_mapbox_access_token(token)
  fig = px.scatter_mapbox(totals_map, lat="Lat", lon="Long", color="deaths", size="deaths",
                    color_continuous_scale=px.colors.cyclical.IceFire, size_max=50, zoom=2)
  fig.update(layout=dict(height=1000, width=1700))
  fig.show()

In [12]:
build_map(countries=countries_of_interest)

Data from John's Hopkins Updated as of: 2020-04-17
