<h1 class="alert alert-block alert-info" style="text-align:center; font-size:30px">Daily COVID-19 Interactive Plots</h1>

<h1> What is the COVID-19/Coronavirus? </h1>

* According to the CDC, "coronavirus disease (COVID-19) is an infectious disease caused by a newly discovered coronavirus. Most people infected with the COVID-19 virus will experience mild to moderate respiratory illness and recover without requiring special treatment.  Older people, and those with underlying medical problems like cardiovascular disease, diabetes, chronic respiratory disease, and cancer are more likely to develop serious illness."
* Continuing, "the best way to prevent and slow down transmission is to be well informed about the COVID-19 virus, the disease it causes and how it spreads. Protect yourself and others from infection by washing your hands or using an alcohol based rub frequently and not touching your face."
* Stay informed about COVID-19. Practice social distancing and healthy practices (washing hands). Educate your friends and family on the dangers of COVID-19. 
* Yes, there's currently a vaccine out, but the vaccine will take quite a while to distribute to everyone. Thus, I encourage everyone reading to be patient about COVID so we can unite together and minimize the catastrophic damage the coronavirus has inflicted.

<h1> Aims of this Project </h1>

* This project aims to show COVID-19 data in a **robust** manner through **easy-to-digest** visualizations, so you can stay aware of the current state of COVID-19 at your fingertips.
* This notebook shows the state of the coronavirus for the **previous** day. It does not look at past/moving trends, like <a href="https://www.kaggle.com/therealcyberlord/coronavirus-covid-19-visualization-prediction"> this COVID-19 notebook</a>, but instead visualizes what happened with COVID yesterday.
* To obtain the real-time COVID-19 data, I utilize a small web scraping script and the <a href="https://www.worldometers.info/coronavirus/"> worldometers website</a>. A distinctive advantage of doing so is that the data used in this notebook does **not** have to be updated extraneously; instead, the dataset is crated right here in this notebook.
* This project is very Work in Progress, so if anyone has suggestions or even would like to **code visualizations that are particularly meaningful**, please share them with me so I can improve and/or add new visualizations to this notebook. Feel free to also fork this notebook to play around with the data or just to learn something new.

For more information on COVID-19, please check out this <a href="https://www.who.int/news-room/q-a-detail/coronavirus-disease-covid-19"> World Health Organization FAQ</a>. It is a fantastic resource to educate yourself.

<p style="font-size: 20px"> <b> I will be running this notebook every day</b> until we truly "flatten the curve". </p>

<h2> Table of Contents </h2>

<ol style="font-size: 16px">
    <li> <a href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#gather">Gathering the Data</a> </li>
    <li> <a href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#general">General Visualizations </a> </li>
    <li> <a href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#continent">By Continent </a> </li>
    <li> <a href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#countries"> By Country </a> </li>
    <li style="color: green"> <a style="color: green" href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#country-search">⭐Interactive Search for Stats by Your Country </a> </li>
</ol>

<h2> Updates </h2>
<div style="background: #ffcccb">
<p>Update 1: Special thanks to @JR12DER for catching a mistake with the intearctive graphs. They are now fixed.</p>
<p>Update 2: Much of the code has been reworked and is now primarily written in plotly. This should make the other graphs easier to read and interact with.</p>
</div>

# TL;DR, why should I care?

In [1]:
from bs4 import BeautifulSoup as soup
from urllib.request import Request, urlopen
from datetime import date, datetime
fname = 'https://www.worldometers.info/coronavirus/'
req = Request(fname, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)
page_soup = soup(webpage, "html.parser")

In [2]:
today = datetime.now()
now_str = "%s %d, %d at %d:%s" % (date.today().strftime("%b"), today.day, today.year, today.hour, '0'+str(today.minute) if today.minute < 10 else str(today.minute))
containers = page_soup.findAll("div", {"class": "maincounter-number"})
print("As of %s UTC, there have been %s total COVID-19 cases." % (now_str, containers[0].findAll("span")[0].text.replace(' ', '')))

As of Mar 18, 2021 at 6:13 UTC, there have been 121,823,305 total COVID-19 cases.


Hopefully now you think you should care. Crazy numbers, right? Before we get started, please <span style="font-size: 20px; color: green; font-weight: bold"> leave an upvote </span> if you think this notebook is a valuable resource. I would love if this could become a community project, and upvoting helps with the publicity. Anyways, now, without further ado, let's get started!

<h1 class="alert alert-block alert-info" style="text-align:center; font-size:24px" id="gather">Gathering the Data (Web Scraping)<a class="anchor-link" href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#gather">¶</a></h1>

<h1> Essential Imports </h1>

Here's an updated template of imports that I use. Note the two integer variables at the bottom of this code block, ```LOOK_AT``` and ```AT_LEAST```. ```LOOK_AT``` controls how many bars the user can see in the bar graph and ```AT_LEAST``` controls what rank a country must be in terms of total cases to be shown on the bar graph. That should become more clear in the visualizations towards the end of the notebook.

In [3]:
#-----General------#
import numpy as np
import pandas as pd
import os
import sys
import math
import random

#-----Plotting-----#
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import plotly.offline as py
py.init_notebook_mode(connected=True)
import seaborn as sns
from pandas_profiling import ProfileReport

#-----Utility-----#
import itertools
import warnings
warnings.filterwarnings("ignore")
import re
import gc
from bs4 import BeautifulSoup as soup
from urllib.request import Request, urlopen
from datetime import date, datetime

LOOK_AT = 5 # Controls how many bars the user can see in the bar graph
AT_LEAST = 50 # Controls what rank a country must be in terms of total cases to be shown on the bar graph

<h1> Web Scraping Foundation </h1>

In [4]:
fname = 'https://www.worldometers.info/coronavirus/#countries'
req = Request(fname, headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req)
page_soup = soup(webpage, "html.parser")
today = datetime.now()
today_str = "%s %d, %d" % (date.today().strftime("%b"), today.day, today.year)
yesterday_str = "%s %d, %d" % (date.today().strftime("%b"), today.day-1, today.year)
clean = True

<img src="https://lh3.googleusercontent.com/_s7TukxfF1SFL1S1lMvz7Gd95jQAqoT6zKTiX7wnK7uwCM7Z7PgyUW5s3A4vDmlobHRBqQ=s1000">

> Screenshot of HTML for the website we are scraping.

In [None]:
print("This version of the notebook is being run on %s." % today_str)

<h1> Scraping Script </h1>

If clean is set to <span style="color: blue"> true</span>, then the numerical data will be converted from a string to a float. We drop China in our analysis because of some inconsistent positioning for China when scraping the data.

In [5]:
table = page_soup.findAll("table", {"id": "main_table_countries_yesterday"})
containers = table[0].findAll("tr", {"style": ""})
del containers[0]

all_data = []
for country in containers:
    country_data = []
    country_container = country.findAll("td")
    if country_container[1].text == 'China':
        continue
    for i in range(1, len(country_container)):
        final_feature = country_container[i].text
        if clean:
            if i != 1 and i != len(country_container)-1:
                final_feature = final_feature.replace(',', '')
                if final_feature.find('+') != -1:
                    final_feature = final_feature.replace('+', '')
                    final_feature = float(final_feature)
                elif final_feature.find('-') != -1:
                    final_feature = final_feature.replace('-', '')
                    final_feature = float(final_feature)*-1
        if final_feature == 'N/A':
            final_feature = 0
        elif final_feature == '' or final_feature == ' ':
            final_feature = -1 #None
        country_data.append(final_feature)
    all_data.append(country_data)

In [6]:
df = pd.DataFrame(all_data)
df = df.drop([15, 16, 17], axis=1) # Get rid of unnecessary data

On the <a href="https://www.worldometers.info/coronavirus/"> worldometers website</a>, the category "New Recovered" doesn't appear; however, based on the numbers, we can interpolate a certain column of data to be that.

In [7]:
column_labels = ["Country", "Total Cases", "New Cases", "Total Deaths", "New Deaths", "Total Recovered", "New Recovered", "Active Cases", "Serious/Critical",
                "Tot Cases/1M", "Deaths/1M", "Total Tests", "Tests/1M", "Population", "Continent"]
df.columns = column_labels

<h1> What Countries are not present in the Analysis? </h1>

For some reason, there are some countries that are not included when scraping the webpage.

In [8]:
country_labels = page_soup.findAll("a", {"class": "mt_a"})
c_label = []
for country in country_labels:
    c_label.append(country.text)
c_label = set(c_label)

not_counted = []
sorted_countries = set(df['Country']) #Increase computational speed
for country in c_label:
    if country not in sorted_countries:
        not_counted.append(country)

In [9]:
print(not_counted + ['China'])

['Montserrat', 'Tajikistan', 'Marshall Islands', 'Saint Pierre Miquelon', 'Falkland Islands', 'Micronesia', 'Greenland', 'China', 'China']


<h1> Final Processing </h1>

Here, we will convert all the numerical data into np.int64 data type, and add some other features that may be particularly useful.

In [10]:
for label in df.columns:
    if label != 'Country' and label != 'Continent':
        df[label] = pd.to_numeric(df[label])

In [11]:
df['%Inc Cases'] = df['New Cases']/df['Total Cases']*100
df['%Inc Deaths'] = df['New Deaths']/df['Total Deaths']*100
df['%Inc Recovered'] = df['New Recovered']/df['Total Recovered']*100

In [12]:
pd.options.display.max_rows = None
df

Unnamed: 0,Country,Total Cases,New Cases,Total Deaths,New Deaths,Total Recovered,New Recovered,Active Cases,Serious/Critical,Tot Cases/1M,Deaths/1M,Total Tests,Tests/1M,Population,Continent,%Inc Cases,%Inc Deaths,%Inc Recovered
0,World,121803968,528845.0,2691812,9712.0,98198310,397934.0,20913846.0,88871,15626.0,345.3,-1,-1,-1,All,0.434177,0.360798,0.405235
1,USA,30294798,62794.0,550649,1288.0,22447275,88740.0,7296874.0,9271,91145.0,1657.0,384014761,1155354,332378543,North America,0.207277,0.233906,0.395326
2,Brazil,11700431,90830.0,285136,2736.0,10287057,82516.0,1128238.0,8318,54770.0,1335.0,28600000,133876,213629898,South America,0.776296,0.959542,0.802134
3,India,11474302,35838.0,159250,171.0,11061170,17793.0,253882.0,8944,8257.0,115.0,229249784,164975,1389604984,Asia,0.312333,0.107378,0.16086
4,Russia,4418436,8998.0,93364,427.0,4024975,10755.0,300097.0,2300,30268.0,640.0,116000000,794635,145978921,Europe,0.203647,0.45735,0.267207
5,UK,4274579,5758.0,125831,141.0,3568271,19540.0,580477.0,968,62733.0,1847.0,107731965,1581065,68138862,Europe,0.134703,0.112055,0.547604
6,France,4146609,38501.0,91437,267.0,275360,-1.0,3779812.0,4219,63427.0,1399.0,57940302,886262,65376055,Europe,0.928494,0.292004,-0.000363
7,Italy,3281810,23059.0,103432,431.0,2639370,19716.0,539008.0,3317,54336.0,1712.0,45540778,754004,60398612,Europe,0.702631,0.416699,0.746996
8,Spain,3206116,6092.0,72793,228.0,2857714,-1.0,275609.0,1997,68554.0,1556.0,41114319,879119,46767645,Europe,0.190012,0.313217,-3.5e-05
9,Turkey,2930554,18912.0,29696,73.0,2752023,17161.0,148835.0,1484,34485.0,349.0,35603028,418953,84980970,Asia,0.645339,0.245824,0.623578


<h1> Export </h1>

Feel free to use this data for your own purposes/visualizations. If you don't want to fork the notebook, you can download the csv file in the output section of this notebook.

In [13]:
EXPORT = True
today = datetime.now()
if EXPORT:
    today = date.today()
    df.to_csv(f'covid_stats_{today.year}_{today.month}_{today.day-1}')
    print("Dataset is %.2f MB" % (df.memory_usage(deep=True).sum()/1000000))

Dataset is 0.06 MB


<h1 class="alert alert-block alert-info" style="text-align:center; font-size:24px" id="general">General Visualizations<a class="anchor-link" href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#general">¶</a></h1>

In [14]:
cases_ser = df[["Total Recovered", "Active Cases", "Total Deaths"]].loc[0]
cases_df = pd.DataFrame(cases_ser).reset_index()
cases_df.columns = ['Type', 'Total']
cases_df['Percentage'] = np.round(100*cases_df['Total']/np.sum(cases_df['Total']), 2)
cases_df['Virus'] = ['COVID-19' for i in range(len(cases_df))]

fig = px.bar(cases_df, x='Virus', y='Percentage', color='Type', hover_data=['Total'])
fig.update_layout(title={'text': f"Total Number of Cases, Recoveries, and Deaths on {yesterday_str}", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Percentage", xaxis_title="")
fig.show()

In [15]:
new_ser = df[["New Cases", "New Recovered", "New Deaths"]].loc[0]
new_df = pd.DataFrame(new_ser).reset_index()
new_df.columns = ['Type', 'Total']
new_df['Percentage'] = np.round(100*new_df['Total']/np.sum(new_df['Total']), 2)
new_df['Virus'] = ['COVID-19' for i in range(len(new_df))]

fig = px.bar(new_df, x='Virus', y='Percentage', color='Type', hover_data=['Total'])
fig.update_layout(title={'text': f"New Cases, Recoveries, and Deaths on {yesterday_str}", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Percentage", xaxis_title="")
fig.show()

In [16]:
pinc_ser = np.round(df[["%Inc Cases", "%Inc Recovered", "%Inc Deaths"]].loc[0], 2)
pinc_df = pd.DataFrame(pinc_ser)
pinc_df.columns = ["Percentage"]

fig = go.Figure()
fig.add_trace(go.Bar(x=pinc_df.index, y=pinc_df['Percentage'], marker_color=["yellow", "green", "red"]))
fig.update_layout(title={'text': f"New Cases, Recoveries, and Deaths on {yesterday_str}", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Percentage", xaxis_title="")
fig.show()

<h1 class="alert alert-block alert-info" style="text-align:center; font-size:24px" id="continent">By Continent<a class="anchor-link" href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#continent">¶</a></h1>

In [17]:
continent_df = df.groupby('Continent').sum().drop('All')
continent_df = continent_df.reset_index()
continent_df

Unnamed: 0,Continent,Total Cases,New Cases,Total Deaths,New Deaths,Total Recovered,New Recovered,Active Cases,Serious/Critical,Tot Cases/1M,Deaths/1M,Total Tests,Tests/1M,Population,%Inc Cases,%Inc Deaths,%Inc Recovered
0,Africa,4095103,11177.0,108885,255.0,3664665,8315.0,321553.0,2597,362108.0,5834.5,35945330,3478025,1363135809,18.169524,-150.765243,10.582775
1,Asia,26374984,113986.0,407945,875.0,24633052,84733.0,1333984.0,23133,917143.0,9232.86,482390603,21184331,3184454898,31.525969,36.615563,29.517679
2,Australia/Oceania,52951,46.0,1099,-10.0,34376,-7.0,17471.0,-4,91676.0,540.0,16996667,1304272,42469501,-63.93716,441.488461,-174.101022
3,Europe,36779236,206481.0,866466,3548.0,25983495,96833.0,8053110.0,27961,2838434.0,54141.0,564333453,50848469,747951340,27.362882,4.088272,18.492605
4,North America,34813328,72237.0,795903,1554.0,26207072,99731.0,7810350.0,15702,892697.0,12826.0,426618151,16566253,592307767,8.533193,-112.906923,22.611175
5,South America,19584137,124873.0,506760,3401.0,17553759,108259.0,1523618.0,19413,413009.0,9892.0,74752136,2701819,433271073,8.227794,4.771045,5.912773


In [18]:
cases_vis_list = ['Total Cases', 'Active Cases', 'New Cases', 'Serious/Critical', 'Tot Cases/1M']
deaths_vis_list = ['Total Deaths', 'New Deaths', 'Deaths/1M']
recovered_vis_list = ['Total Recovered', 'New Recovered']
tests_vis_list = ['Total Tests', 'Tests/1M']
essentials = [['Total Cases', 'Active Cases', 'New Cases'], ['Total Deaths', 'New Deaths'], ['Total Recovered', 'New Recovered']]

In [19]:
def continent_visualization(vis_list):
    for label in vis_list:
        c_df = continent_df[['Continent', label]]
        c_df['Percentage'] = np.round(100*c_df[label]/np.sum(c_df[label]), 2)
        c_df['Virus'] = ['COVID-19' for i in range(len(c_df))]
        
        fig = px.bar(c_df, x='Virus', y='Percentage', color='Continent', hover_data=[label])
        fig.update_layout(title={'text': f"{label} at the end of {yesterday_str}", 'x': 0.5,
                                 'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Percentage", xaxis_title="")
        fig.show()
        gc.collect()

In [20]:
continent_visualization(cases_vis_list)

In [None]:
continent_visualization(deaths_vis_list)

In [None]:
continent_visualization(tests_vis_list)

In [None]:
def continent_visualization2(index, label, log_scale=False):
    c_df = continent_df[['Continent'] + essentials[index]]
    
    fig = px.bar(c_df, x="Continent", y=essentials[index], log_y=log_scale)
    log_str = "- log scale" if log_scale else ""
    fig.update_layout(title={'text': f"{label} at the end of {yesterday_str} {log_str}", 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title=label, xaxis_title="")
    
    fig.show()
    gc.collect()

In [None]:
continent_visualization2(0, "Total Cases", log_scale=True)

In [None]:
continent_visualization2(1, "Deaths", log_scale=True)

In [None]:
continent_visualization2(2, "Recoveries", log_scale=True)

<h1 class="alert alert-block alert-info" style="text-align:center; font-size:24px" id="countries">By Countries<a class="anchor-link" href="https://www.kaggle.com/ironicninja/covid-19-every-day/notebook#countries">¶</a></h1>

In [None]:
df = df.drop([len(df)-1])
country_df = df.drop([0])

In [None]:
country_l = country_df.columns[1:14]

fig = go.Figure()
c = 0
for i in country_df.index:
    if c < LOOK_AT:
        fig.add_trace(go.Bar(name=country_df['Country'][i], x=country_l, y=country_df.loc[i][1:14]))
    else:
        break
    c += 1
    
fig.update_layout(title={'text': f'{LOOK_AT} Countries with Most COVID Cases on %s' % yesterday_str, 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Percentage", yaxis_type="log", xaxis_tickangle=-90)
fig.show()

In [None]:
inc_l = country_df.columns[15:]
inc_df = country_df.sort_values("%Inc Cases", ascending=False)
fig = go.Figure()
c = 0
for i in inc_df.index:
    if i > AT_LEAST:
        continue
    if c < LOOK_AT:
        fig.add_trace(go.Bar(name=country_df['Country'][i], x=inc_l, y=inc_df.loc[i][15:]))
    else:
        break
    c += 1
    
fig.update_layout(title={'text': f'{LOOK_AT} Countries with Highest Increase in COVID Cases on %s' % yesterday_str, 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_title="Percentage", xaxis_tickangle=0)
fig.show()

In [None]:
country_labels = country_df.columns[1:14]

def country_visualization(continent):
    buttons_list = []
    base_list = [False for i in range(len(country_df))]
    c = 0
    for i in country_df.index:
        if country_df.loc[i]['Continent'] != continent:
            continue
        tmp_list = base_list.copy()
        tmp_list[c] = True
        c += 1
        buttons_list.append(dict(
                    args=[{"visible": tmp_list}],
                    label=country_df.loc[i]['Country'],
                    method="update"
                ))


    fig = go.Figure()
    c = 0
    for i in country_df.index:
        if country_df.loc[i]['Continent'] != continent:
            continue
        fig.add_trace(go.Bar(name=country_df.loc[i]['Country'], x=country_labels, y=country_df.loc[i][1:14], visible=False if c != 0 else True))
        c += 1

    fig.update_layout(
        updatemenus=[
            dict(
                buttons=buttons_list,
                direction="down",
                pad={"r": 10, "t": 10},
                showactive=True,
                x=0.1,
                xanchor="left",
                y=1.1,
                yanchor="top"
            ),
        ]
    )

    fig.update_layout(title={'text': '%s COVID-19 Cases Search on %s' % (continent, yesterday_str), 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_type="log", xaxis_tickangle=-90)
    fig.show()

<h1 id="country-search">COVID-19 Cases Search</h1>

In [None]:
country_visualization('Africa')

In [None]:
country_visualization('Asia')

In [None]:
country_visualization('Australia/Oceania')

In [None]:
country_visualization('Europe')

In [None]:
country_visualization('North America')

In [None]:
country_visualization('South America')

In [None]:
bar_list = []
for i in country_df.index:
    bar_list.append(go.Bar(name=country_df['Country'][i], y=[country_df['Total Cases'][i]]))
    
fig = go.Figure(data=bar_list)
fig.update_layout(title={'text': 'Stacked Bar Chart of All Countries COVID-19 Cases on %s' % yesterday_str, 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, barmode='stack', height=1200)
fig.show()

In [None]:
bar = go.Bar(x=country_df['Country'], y=country_df['Total Cases'], marker=dict(color=df['Total Cases'], colorscale='Reds', showscale=True))
fig = go.Figure(data=[bar])
fig.update_layout(title={'text': 'Number of COVID Cases by Country on %s, log scale' % yesterday_str, 'x': 0.5,
                         'xanchor': 'center', 'font': {'size': 20}}, yaxis_type="log", xaxis_tickangle=-90)

# Concluding Remarks

If you've read down this far in the notebook, thank you so much. This notebook took quite a long time to make, but that's aside from the point - these visualizations are for the community, and I'd like this project to also be for the community. So please leave an upvote - it takes literally less than a second - so this notebook/project gains more traction & recognition.

And, like I said earlier, if you have any suggestions or code for other visualizations, please let me know in the comments or in DMs. I know I'm not the greatest coder, so everything/anything is appreciated.

Stay safe, stay healthy, educate yourself. Thanks!