<a href="https://colab.research.google.com/github/InTEGr8or/jupyter-fun/blob/master/coronavirus2019_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Infection Rates per day


## Purpose and Source

[this code available on Github](https://github.com/InTEGr8or/jupyter-fun/blob/master/nCov19.ipynb)

The percent rates in the sheet at the bottom are approximations. Data seems to be released about twice a day. Sometimes there is a lag, and they release it at different times.

Optimally, the time of the update would be taken into account and prorated per hour and multiplied by the number of hours difference but I am just starting to learn Python so there are probably a lot of improvements that could be made.

Data is collected by Johns Hopkins in Baltimore and published here: [nCov19 contagion](https://gisanddata.maps.arcgis.com/apps/opsdashboard/index.html#/bda7594740fd40299423467b48e9ecf6)

Here is another excellent data presentation at [WorldOMeters.info](https://www.worldometers.info/coronavirus/#repro)

Now a [Time Series Table](https://docs.google.com/spreadsheets/u/1/d/1UF2pSkFTURko2OvfHWWlFpDFAr1UxCBA4JLwlSP6KFo/htmlview?usp=sharing&sle=true) is available, and a [Feature Layers](https://gisanddata.maps.arcgis.com/home/item.html?id=c0b356e20b30490c8b8b4c7bb9554e7c) appears to be available but it requires authentication.

The single-sheet Time Series is a much cleaner data source and _doesn't require repeated reauthentication_ so I'm reworking it to use that and we don't have a percent change right now, until I figure out how to use Pandas properly.

## This first section sets up the imports and some parsing functions.


In [0]:
try:
  from bs4 import BeautifulSoup
except:
  !pip install beautifulsoup4
  from bs4 import BeautifulSoup
  
import requests
import numpy as np
from dateutil import parser
from datetime import datetime
import pandas as pd

repo = "https://github.com/CSSEGISandData/COVID-19"

tsc_csv = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Confirmed.csv"
tsm_csv = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Deaths.csv"
tsr_csv = "https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_19-covid-Recovered.csv"

def day_tot(day):
  return f"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/{day.strftime('%m-%d-%Y')}.csv"

pd.set_option('display.max_rows', 200)

states = {
  'AK': 'Alaska', 'AL': 'Alabama', 'AR': 'Arkansas', 'AS': 'American Samoa',
  'AZ': 'Arizona', 'CA': 'California', 'CO': 'Colorado', 'CT': 'Connecticut',
  'DC': 'District of Columbia', 'DE': 'Delaware', 'FL': 'Florida', 'GA': 'Georgia',
  'GU': 'Guam', 'HI': 'Hawaii', 'IA': 'Iowa', 'ID': 'Idaho', 'IL': 'Illinois',
  'IN': 'Indiana', 'KS': 'Kansas', 'KY': 'Kentucky', 'LA': 'Louisiana', 'MA': 'Massachusetts',
  'MD': 'Maryland', 'ME': 'Maine', 'MI': 'Michigan', 'MN': 'Minnesota',
  'MO': 'Missouri', 'MP': 'Northern Mariana Islands', 'MS': 'Mississippi',
  'MT': 'Montana', 'NA': 'National', 'NC': 'North Carolina', 'ND': 'North Dakota',
  'NE': 'Nebraska', 'NH': 'New Hampshire', 'NJ': 'New Jersey', 'NM': 'New Mexico',
  'NV': 'Nevada', 'NY': 'New York', 'OH': 'Ohio', 'OK': 'Oklahoma', 'OR': 'Oregon',
  'PA': 'Pennsylvania', 'PR': 'Puerto Rico', 'RI': 'Rhode Island', 'SC': 'South Carolina',
  'SD': 'South Dakota', 'TN': 'Tennessee', 'TX': 'Texas', 'UT': 'Utah',
  'VA': 'Virginia', 'VI': 'Virgin Islands', 'VT': 'Vermont', 'WA': 'Washington',
  'WI': 'Wisconsin', 'WV': 'West Virginia', 'WY': 'Wyoming', 'AB': 'Alberta',
  'BC': 'British Columbia', 'MB': 'Manitoba', 'NB': 'New Brunswick',
  'NL': 'Newfoundland and Labrador', 'NT': 'Northwest Territories',
  'NS': 'Nova Scotia', 'NU': 'Nunavut', 'ON': 'Ontario', 'PE': 'Prince Edward Island',
  'QC': 'Quebec', 'SK': 'Saskatchewan', 'YT': 'Yukon'
}

def cdate(date):
  return date.strftime("%-m/%-d/%y")

def datecols(df):
  return [col for i, col in enumerate(df.columns) if is_date(col) ]

def update_firsts(df, firsts_col):
  # Get date of first death or days since first death.
  df[f'{firsts_col}'] = ''
  dates = sorted([parser.parse(col) for col in df.columns if is_date(col) ])
  # print(f"{firsts_col} dates:", dates, df.columns)
  for i, row in df.iterrows():
    found_first = False
    for date in dates:
      try:
        if not found_first and row[cdate(date)] > 0:
          found_first = True
          df.at[i, firsts_col] = date.strftime('%Y-%m-%d')
          # print(f"update_firsts(df, {firsts_col}):\n", date, row[date])
      except Exception as e:
        # pass
        print(f"### ERROR update_firsts(df, {firsts_col}):\n", cdate(date), row, e)
  # print(f"update_firsts({firsts_col})\n", df.iloc[:3])

def is_date(text):
  try:
    s = parser.parse(text)
    return True
  except:
    return False

def color_negative_red(val):
    """
    Takes a scalar and returns a string with
    the css property `'color: red'` for negative
    strings, black otherwise.
    """
    heat = str(hex(min(int(val.replace('%', '')) * 10 + 56, 255))).split('x')[-1].upper()
    color = f'#{heat}5555'
    # print(val, heat, color)
    return 'color: %s' % color

## Get Data, Loop Through and Print

This section is massively simplified with now that Johns Hopkins tidied up their data sources (Thanks Johns Hopkins tech people! _Way_ better)

I'm sure they have a lot of task on their hands and it's nice to see they can retrace over their first crack at the organization of the data that was coming in.

In [6]:
df = pd.read_csv(tsc_csv).drop(columns=['Lat', 'Long'])
dates = datecols(df)
today_col = dates[-1]
print(today_col)
dates.reverse()

dfr = pd.read_csv(tsr_csv)
dfm = pd.read_csv(tsm_csv).drop(columns=['Lat', 'Long'])
df.rename(columns={'Country/Region': 'Country', 'Province/State': 'State'}, inplace=True)
dfm.rename(columns={'Country/Region': 'Country', 'Province/State': 'State'}, inplace=True)
df['Country'].replace('Mainland China', 'China', inplace=True)
df['Country'].replace('United Arab Emirates', 'UAE', inplace=True)
dfm['Country'].replace('Mainland China','China', inplace=True)
dfm['Country'].replace('United Arab Emirates', 'UAE', inplace=True)
mdates = datecols(dfm)

update_firsts(df, 'First Confirmed')
update_firsts(dfm, 'First Death')

dfm.set_index(['Country', 'State'], inplace=True)
df.fillna('')
df['Country'].fillna('')
dfm.fillna('')

df.set_index(['Country', 'State'], inplace=True)
df['Death Toll'] = dfm[today_col].fillna(0).apply(lambda toll: int(toll))
df['First Death'] = dfm[dfm.columns[-1]].fillna('')
df['Death Aging'] = datetime.now() - df['First Death'].apply(lambda date: parser.parse(date, fuzzy=True) if is_date(date) else '')
df['Death Aging'] = df['Death Aging'].apply(lambda days: ' '.join(str(days).replace('NaT', '').split(' ')[:2]))
df.drop(columns=['First Death'], inplace=True)

df['Confirmed Aging'] = datetime.now() - df['First Confirmed'].apply(lambda date: parser.parse(date, fuzzy=True))
df['Confirmed Aging'] = df['Confirmed Aging'].apply(lambda days: ' '.join(str(days).split(' ')[:2]))
df.drop(columns=['First Confirmed'], inplace=True)

# df.sort_values(by=['Country','State'], ascending=[True, True], inplace=True)
# today = dates[0]
# mtoday = mdates[-1]
# print("Latest sheet dates:", today, mtoday)
# for i, row in df.iterrows():
#   country = row['Country']
#   state = row['State']
#   locality = ''
#   ### They've cleaned up their join columns a bit.
#   # if ',' in state:
#   #   locality = state.split(',')[0].strip()
#   #   state = states[state.split(',')[1].strip()]
#   if state == '' or state == 'NaN':
#     try:
#       mrow = dfm.loc[country, mtoday]
#       if 'Phil' in country:
#         # print('Phil:', country, type(country))
#         # print('Phil', mrow[0])
#         mrow = mrow[0]
#     except:
#       mrow = 0
#   else:
#     try:
#       mrow = dfm.loc[country, :].loc[state, mtoday]
#     except:
#       print("ERROR:", country, state, type(state))
#       mrow = 0
#   if isinstance(mrow, (int, float, complex)) and not isinstance(mrow, bool):
#     if mrow > 0:
#       print(row['Country'], row['State'], mrow)
#       df.at[i, 'Death Toll'] = mrow
# print(dfm[mtoday])
# # Remove early results
# for i, date in enumerate(dates):
#   if i < len(dates) -1 and parser.parse(date).day == parser.parse(dates[i + 1]).day:
#     del dates[i + 1]
# # Append to columns
percents = []
drop_dates = []
rev_dates = sorted([parser.parse(date) for date in dates], reverse=True)
for i, date in enumerate(rev_dates):
  date = cdate(date)
  d = parser.parse(date)
  col = d.strftime('%B')[:3] + d.strftime('%d')

  df[col] = df[date].replace(np.inf, 0).fillna(0).astype(int)
  if i < len(dates) - 1:
    pcol = dates[i + 1]
    pct_idx = df.columns.get_loc(pcol)
    pct_col = col + '%'
    percents.append(pct_col)
    pct_val = round((df[date] / df[dates[i + 1]].fillna(0) * 100) - 100).replace(np.inf, 0).fillna(0).astype(int).astype(str) + '%'
    drop_dates.append(date)
    # df.insert(pct_idx, pct_col, pct_val)
    df[pct_col] = pct_val
df.drop(columns=dates, inplace=True)

2/16/20


## Render

In [7]:
# df.set_index('Country', inplace=True)
df.sort_values(by=['Country','State'], ascending=[True, True]).style.set_table_styles(
    [{
        'selector': 'tr:hover', 'props': [('background-color', 'dark-blue')]
      },
     {
         'selector': 'td', 'props': [('background-color', 'black'), ('border', 'black')]
     }]
).applymap(color_negative_red, subset=percents)

Unnamed: 0_level_0,Unnamed: 1_level_0,Death Toll,Death Aging,Confirmed Aging,Feb16,Feb16%,Feb15,Feb15%,Feb14,Feb14%,Feb13,Feb13%,Feb12,Feb12%,Feb11,Feb11%,Feb10,Feb10%,Feb09,Feb09%,Feb08,Feb08%,Feb07,Feb07%,Feb06,Feb06%,Feb05,Feb05%,Feb04,Feb04%,Feb03,Feb03%,Feb02,Feb02%,Feb01,Feb01%,Jan31,Jan31%,Jan30,Jan30%,Jan29,Jan29%,Jan28,Jan28%,Jan27,Jan27%,Jan26,Jan26%,Jan25,Jan25%,Jan24,Jan24%,Jan23,Jan23%,Jan22
Country,State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1
Australia,New South Wales,0,,22 days,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,33%,3,0%,0,0%,0,0%,0,0%,0
Australia,Queensland,0,,19 days,5,0%,5,0%,5,0%,5,0%,5,0%,5,0%,5,0%,5,0%,5,0%,5,25%,4,33%,3,0%,3,50%,2,0%,2,-33%,3,50%,2,-33%,3,200%,1,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0
Australia,South Australia,0,,16 days,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,100%,1,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0
Australia,Victoria,0,,22 days,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,33%,3,50%,2,100%,1,0%,1,0%,1,0%,1,0%,0,0%,0,0%,0,0%,0
Belgium,,0,,13 days,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0
Cambodia,,0,,21 days,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,0,0%,0,0%,0,0%,0,0%,0
Canada,British Columbia,0,,20 days,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,0%,4,100%,2,0%,2,100%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0
Canada,"London, ON",0,,17 days,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,1,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0,0%,0
Canada,"Toronto, ON",0,,22 days,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,0%,2,100%,1,0%,1,0%,1,0%,1,0%,1,0%,0,0%,0,0%,0,0%,0
China,Anhui,6,8 days,26 days,962,1%,950,2%,934,3%,910,2%,889,3%,860,4%,830,7%,779,6%,733,10%,665,13%,591,12%,530,10%,480,18%,408,20%,340,14%,297,25%,237,18%,200,32%,152,43%,106,51%,70,17%,60,54%,39,160%,15,67%,9,800%,1
