<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Data-Preprocessing" data-toc-modified-id="Data-Preprocessing-1"><span class="toc-item-num">1&nbsp;&nbsp;</span>Data Preprocessing</a></span><ul class="toc-item"><li><span><a href="#Geographic-Data-Preprocessing" data-toc-modified-id="Geographic-Data-Preprocessing-1.1"><span class="toc-item-num">1.1&nbsp;&nbsp;</span>Geographic Data Preprocessing</a></span></li><li><span><a href="#Coronavirus-Data-Preprocessing" data-toc-modified-id="Coronavirus-Data-Preprocessing-1.2"><span class="toc-item-num">1.2&nbsp;&nbsp;</span>Coronavirus Data Preprocessing</a></span></li></ul></li><li><span><a href="#Visualization" data-toc-modified-id="Visualization-2"><span class="toc-item-num">2&nbsp;&nbsp;</span>Visualization</a></span><ul class="toc-item"><li><span><a href="#Spread-of-Coronavirus-in-China" data-toc-modified-id="Spread-of-Coronavirus-in-China-2.1"><span class="toc-item-num">2.1&nbsp;&nbsp;</span>Spread of Coronavirus in China</a></span></li><li><span><a href="#Geographic-Visualization" data-toc-modified-id="Geographic-Visualization-2.2"><span class="toc-item-num">2.2&nbsp;&nbsp;</span>Geographic Visualization</a></span></li><li><span><a href="#Death-Rate-and-Cure-Rate" data-toc-modified-id="Death-Rate-and-Cure-Rate-2.3"><span class="toc-item-num">2.3&nbsp;&nbsp;</span>Death Rate and Cure Rate</a></span></li><li><span><a href="#Growth-Rate" data-toc-modified-id="Growth-Rate-2.4"><span class="toc-item-num">2.4&nbsp;&nbsp;</span>Growth Rate</a></span></li></ul></li></ul></div>

# Data Preprocessing

In [1]:
import json
import folium
import webbrowser
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import folium.plugins as plugins
from folium.plugins import HeatMap

## Geographic Data Preprocessing

In [434]:
# load the json file of latitude and longtitude data
with open('/Users/hkmac/Desktop/region.json') as f: 
    region = json.load(f)

In [436]:
longitude = []
latitude = []
region_name = []

In [437]:
# get province,longitude,and latitude to create a daraframe of province's geographic data
# this part is for geographic visualization
longitude = []
latitude = []
region_name = []

for i in region['districts']:
    if i['level'] == 'province': 
        region_name.append(i['name'])
        longitude.append(i["center"]["longitude"])
        latitude.append(i["center"]["latitude"])

region_lookup = pd.DataFrame({'province':region_name,"longitude":longitude,"latitude":latitude})
region_lookup.head()

Unnamed: 0,province,longitude,latitude
0,澳门特别行政区,113.543028,22.186835
1,北京市,116.407394,39.904211
2,重庆市,106.551643,29.562849
3,福建省,119.295143,26.100779
4,广东省,113.26641,23.132324


## Coronavirus Data Preprocessing

In [438]:
covidData = pd.read_csv("/Users/hkmac/Desktop/ncovdata.csv")

In [439]:
covidData.head()

Unnamed: 0,provinceName,cityName,province_confirmedCount,province_suspectedCount,province_curedCount,province_deadCount,city_confirmedCount,city_suspectedCount,city_curedCount,city_deadCount,updateTime
0,上海市,外地来沪人员,318,0,90,1,102,0,27,1,2020-02-15 00:41:46.976
1,上海市,浦东新区,318,0,90,1,56,0,12,0,2020-02-15 00:41:46.976
2,上海市,宝山区,318,0,90,1,20,0,1,0,2020-02-15 00:41:46.976
3,上海市,徐汇区,318,0,90,1,17,0,2,0,2020-02-15 00:41:46.976
4,上海市,闵行区,318,0,90,1,18,0,4,0,2020-02-15 00:41:46.976


In [440]:
# take a look at some information about this dataset
covidData.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39302 entries, 0 to 39301
Data columns (total 11 columns):
provinceName               39302 non-null object
cityName                   39302 non-null object
province_confirmedCount    39302 non-null int64
province_suspectedCount    39302 non-null int64
province_curedCount        39302 non-null int64
province_deadCount         39302 non-null int64
city_confirmedCount        39302 non-null int64
city_suspectedCount        39302 non-null int64
city_curedCount            39302 non-null int64
city_deadCount             39302 non-null int64
updateTime                 39302 non-null object
dtypes: int64(8), object(3)
memory usage: 3.3+ MB


In [441]:
covidData.dtypes

provinceName               object
cityName                   object
province_confirmedCount     int64
province_suspectedCount     int64
province_curedCount         int64
province_deadCount          int64
city_confirmedCount         int64
city_suspectedCount         int64
city_curedCount             int64
city_deadCount              int64
updateTime                 object
dtype: object

In [442]:
# convert the datatype of updateTime to datetime64, and get date from that column as a new column
def datimeTrans(dataset):
    dataset["updateTime"] = dataset["updateTime"].apply(pd.to_datetime)
    dataset["date"] = dataset["updateTime"].dt.date.apply(pd.to_datetime)
    return dataset

In [443]:
covidData = datimeTrans(covidData)

In [444]:
# here I just momentarily convert the datatype to string
covidData["date"] = covidData["date"].astype(str)

In [445]:
date = []

I find out that there are many duplicated data in a single day. In order to group our data by different province every single day, I am going to delete everyday's repetitive records.

In [446]:
def deldup(dataset):    
    for i in range(len(dataset)):
        province = dataset.loc[i,"provinceName"]
        city = dataset.loc[i,"cityName"]
        if dataset.loc[i,"date"] not in date:
            date.append(dataset.loc[i,"date"])
            provinces = []
            citys = []
        if province not in provinces:
            provinces.append(province)
        if (province in provinces) & (city in citys):
            dataset = dataset.drop(i)
        if (province in provinces) & (city not in citys):
            citys.append(city)
    return dataset

In [447]:
covidData = deldup(covidData)

In [448]:
covidData.describe()

Unnamed: 0,province_confirmedCount,province_suspectedCount,province_curedCount,province_deadCount,city_confirmedCount,city_suspectedCount,city_curedCount,city_deadCount
count,8171.0,8171.0,8171.0,8171.0,8171.0,8171.0,8171.0,8171.0
mean,1033.247093,49.546812,76.724024,23.234977,62.054461,0.000857,4.61241,1.365072
std,4701.879561,1077.149318,317.273409,133.796531,732.893854,0.036683,44.293164,25.622139
min,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,75.0,0.0,1.0,0.0,3.0,0.0,0.0,0.0
50%,184.0,0.0,11.0,0.0,8.0,0.0,0.0,0.0
75%,428.0,0.0,44.0,1.0,20.0,0.0,2.0,0.0
max,51986.0,23638.0,3900.0,1318.0,35991.0,2.0,2023.0,1036.0


In [449]:
dayUnique = pd.DataFrame()
dayUnique = covidData.groupby(['provinceName', 'date'])['city_confirmedCount'].aggregate('sum').unstack()
dayUnique

date,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14,2020-02-15
provinceName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
上海市,,,,53.0,96.0,96.0,112.0,233.0,169.0,182.0,203.0,219.0,243.0,257.0,277.0,286.0,293.0,299.0,303.0,311.0,315.0,318.0,318.0
云南省,5.0,11.0,16.0,26.0,44.0,55.0,70.0,83.0,93.0,105.0,114.0,119.0,124.0,133.0,136.0,138.0,141.0,149.0,153.0,154.0,156.0,162.0,
内蒙古自治区,1.0,7.0,7.0,22.0,30.0,16.0,18.0,19.0,23.0,27.0,34.0,35.0,42.0,46.0,50.0,52.0,54.0,58.0,,60.0,61.0,65.0,
北京市,51.0,41.0,68.0,80.0,91.0,111.0,90.0,260.0,168.0,191.0,212.0,228.0,253.0,274.0,297.0,315.0,326.0,337.0,342.0,352.0,366.0,372.0,
吉林省,3.0,4.0,,6.0,8.0,9.0,14.0,14.0,18.0,23.0,31.0,47.0,54.0,59.0,65.0,69.0,78.0,80.0,81.0,83.0,84.0,86.0,
四川省,15.0,28.0,44.0,69.0,90.0,108.0,142.0,177.0,207.0,231.0,254.0,282.0,301.0,321.0,344.0,364.0,386.0,405.0,417.0,436.0,451.0,463.0,
天津市,,,14.0,22.0,24.0,31.0,32.0,33.0,41.0,48.0,60.0,67.0,69.0,78.0,81.0,88.0,90.0,95.0,105.0,112.0,117.0,120.0,
宁夏回族自治区,2.0,3.0,4.0,7.0,11.0,12.0,18.0,21.0,26.0,28.0,31.0,34.0,,40.0,43.0,45.0,45.0,49.0,53.0,58.0,64.0,67.0,
安徽省,15.0,39.0,60.0,70.0,106.0,155.0,200.0,237.0,297.0,340.0,408.0,480.0,530.0,591.0,665.0,733.0,779.0,830.0,860.0,889.0,910.0,934.0,
山东省,15.0,27.0,46.0,75.0,95.0,130.0,158.0,184.0,206.0,230.0,259.0,275.0,307.0,347.0,386.0,416.0,444.0,466.0,487.0,497.0,509.0,523.0,


There are many missing value in this dataframe. To deal with this case, NaN in the first column must be 0. For the other columns, I am going to fill these NaN regarding the previous column.

In [450]:
def nullhandling(dataset):
    dataset.iloc[:,0].fillna(0,inplace = True)
    for j in range(len(dataset)):
        for i in range(len(dataset.columns)):
            if i == 0:
                continue
            elif dataset.isnull().iloc[j,i] == False:
                continue
            else:
                dataset.iloc[j,i] = dataset.iloc[j,i-1]
    return dataset

In [451]:
dayUnique = nullhandling(dayUnique)
dayUnique

date,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14,2020-02-15
provinceName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
上海市,0.0,0.0,0.0,53.0,96.0,96.0,112.0,233.0,169.0,182.0,203.0,219.0,243.0,257.0,277.0,286.0,293.0,299.0,303.0,311.0,315.0,318.0,318.0
云南省,5.0,11.0,16.0,26.0,44.0,55.0,70.0,83.0,93.0,105.0,114.0,119.0,124.0,133.0,136.0,138.0,141.0,149.0,153.0,154.0,156.0,162.0,162.0
内蒙古自治区,1.0,7.0,7.0,22.0,30.0,16.0,18.0,19.0,23.0,27.0,34.0,35.0,42.0,46.0,50.0,52.0,54.0,58.0,58.0,60.0,61.0,65.0,65.0
北京市,51.0,41.0,68.0,80.0,91.0,111.0,90.0,260.0,168.0,191.0,212.0,228.0,253.0,274.0,297.0,315.0,326.0,337.0,342.0,352.0,366.0,372.0,372.0
吉林省,3.0,4.0,4.0,6.0,8.0,9.0,14.0,14.0,18.0,23.0,31.0,47.0,54.0,59.0,65.0,69.0,78.0,80.0,81.0,83.0,84.0,86.0,86.0
四川省,15.0,28.0,44.0,69.0,90.0,108.0,142.0,177.0,207.0,231.0,254.0,282.0,301.0,321.0,344.0,364.0,386.0,405.0,417.0,436.0,451.0,463.0,463.0
天津市,0.0,0.0,14.0,22.0,24.0,31.0,32.0,33.0,41.0,48.0,60.0,67.0,69.0,78.0,81.0,88.0,90.0,95.0,105.0,112.0,117.0,120.0,120.0
宁夏回族自治区,2.0,3.0,4.0,7.0,11.0,12.0,18.0,21.0,26.0,28.0,31.0,34.0,34.0,40.0,43.0,45.0,45.0,49.0,53.0,58.0,64.0,67.0,67.0
安徽省,15.0,39.0,60.0,70.0,106.0,155.0,200.0,237.0,297.0,340.0,408.0,480.0,530.0,591.0,665.0,733.0,779.0,830.0,860.0,889.0,910.0,934.0,934.0
山东省,15.0,27.0,46.0,75.0,95.0,130.0,158.0,184.0,206.0,230.0,259.0,275.0,307.0,347.0,386.0,416.0,444.0,466.0,487.0,497.0,509.0,523.0,523.0


In [452]:
dayUnique.style.background_gradient(cmap='gnuplot')

date,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14,2020-02-15
provinceName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
上海市,0,0,0,53,96,96,112,233,169,182,203,219,243,257,277,286,293,299,303,311,315,318,318
云南省,5,11,16,26,44,55,70,83,93,105,114,119,124,133,136,138,141,149,153,154,156,162,162
内蒙古自治区,1,7,7,22,30,16,18,19,23,27,34,35,42,46,50,52,54,58,58,60,61,65,65
北京市,51,41,68,80,91,111,90,260,168,191,212,228,253,274,297,315,326,337,342,352,366,372,372
吉林省,3,4,4,6,8,9,14,14,18,23,31,47,54,59,65,69,78,80,81,83,84,86,86
四川省,15,28,44,69,90,108,142,177,207,231,254,282,301,321,344,364,386,405,417,436,451,463,463
天津市,0,0,14,22,24,31,32,33,41,48,60,67,69,78,81,88,90,95,105,112,117,120,120
宁夏回族自治区,2,3,4,7,11,12,18,21,26,28,31,34,34,40,43,45,45,49,53,58,64,67,67
安徽省,15,39,60,70,106,155,200,237,297,340,408,480,530,591,665,733,779,830,860,889,910,934,934
山东省,15,27,46,75,95,130,158,184,206,230,259,275,307,347,386,416,444,466,487,497,509,523,523


Because the data range Hubei province is much larger than others, I just provisionally remove Hubei's data to visualize spread of Coronavirus. Later I will focus on the spread of Coronavirus within Hubei province separately.

In [453]:
dayUnique_drophb = dayUnique.drop("湖北省")
dayUnique_drophb.style.background_gradient(cmap='twilight_shifted')

date,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14,2020-02-15
provinceName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
上海市,0,0,0,53,96,96,112,233,169,182,203,219,243,257,277,286,293,299,303,311,315,318,318
云南省,5,11,16,26,44,55,70,83,93,105,114,119,124,133,136,138,141,149,153,154,156,162,162
内蒙古自治区,1,7,7,22,30,16,18,19,23,27,34,35,42,46,50,52,54,58,58,60,61,65,65
北京市,51,41,68,80,91,111,90,260,168,191,212,228,253,274,297,315,326,337,342,352,366,372,372
吉林省,3,4,4,6,8,9,14,14,18,23,31,47,54,59,65,69,78,80,81,83,84,86,86
四川省,15,28,44,69,90,108,142,177,207,231,254,282,301,321,344,364,386,405,417,436,451,463,463
天津市,0,0,14,22,24,31,32,33,41,48,60,67,69,78,81,88,90,95,105,112,117,120,120
宁夏回族自治区,2,3,4,7,11,12,18,21,26,28,31,34,34,40,43,45,45,49,53,58,64,67,67
安徽省,15,39,60,70,106,155,200,237,297,340,408,480,530,591,665,733,779,830,860,889,910,934,934
山东省,15,27,46,75,95,130,158,184,206,230,259,275,307,347,386,416,444,466,487,497,509,523,523


# Visualization
## Spread of Coronavirus in China 

In [454]:
dayUnique_drophb = dayUnique_drophb.T.reset_index()
dayUnique = dayUnique.T.reset_index()

In [455]:
listofcolumns = []
for i in dayUnique_drophb.columns:
    if i != "date":
        listofcolumns.append(i)

dayUnique_drophb_melt = pd.melt(dayUnique_drophb, id_vars=['date'],
                                value_vars= listofcolumns, 
                                var_name='Province', value_name='Count')

In [456]:
listofcolumns = []
for i in dayUnique.columns:
    if i != "date":
        listofcolumns.append(i)

dayUnique_melt = pd.melt(dayUnique, id_vars=['date'],
                                value_vars= listofcolumns, 
                                var_name='Province', value_name='Count')

In [457]:
px.line(dayUnique_melt, x="date", y="Count",color ='Province',
        title='Outbreak in China from Jan 25 - Feb 15')

In [458]:
px.line(dayUnique_drophb_melt, x="date", y="Count",color ='Province',
        title='Outbreak in China (excluding Hubei Province) from Jan 25 - Feb 15')

In [459]:
fig = px.bar(dayUnique_melt, 
             y="date", x="Count", color='Province', orientation='h', height=700,
             title='Number of Confirmed in China')
fig.update_layout(uniformtext_minsize=5, uniformtext_mode='hide')
fig.show()

In [460]:
fig = px.bar(dayUnique_drophb_melt, 
             y="date", x="Count", color='Province', orientation='h', height=700,
             title='Number of Confirmed in China (excluding Hubei Province)')
fig.update_layout(uniformtext_minsize=5, uniformtext_mode='hide')
fig.show()

In [461]:
px.bar(dayUnique_melt,x="Count", y="Province", color='Province', 
       orientation='h', height=800,title='Outbreak in China',
       animation_frame="date",range_x = [0,65000])

In [464]:
px.bar(dayUnique_drophb_melt,x="Count", y="Province", color='Province', 
       orientation='h', height=800,title='Outbreak in China (excluding Hubei Province)',
       animation_frame="date",range_x=[0,1400])

In [465]:
px.treemap(dayUnique_drophb_melt.sort_values(by='Count', ascending=False).reset_index(drop=True), 
           path=["Province"], values="Count")

In [467]:
dayUnique = pd.DataFrame()
dayUnique = covidData.groupby(['provinceName', 'date'])['city_confirmedCount'].aggregate('sum').unstack()
dayUnique.head()

date,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14,2020-02-15
provinceName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1
上海市,,,,53.0,96.0,96.0,112.0,233.0,169.0,182.0,203.0,219.0,243.0,257.0,277.0,286.0,293.0,299.0,303.0,311.0,315.0,318.0,318.0
云南省,5.0,11.0,16.0,26.0,44.0,55.0,70.0,83.0,93.0,105.0,114.0,119.0,124.0,133.0,136.0,138.0,141.0,149.0,153.0,154.0,156.0,162.0,
内蒙古自治区,1.0,7.0,7.0,22.0,30.0,16.0,18.0,19.0,23.0,27.0,34.0,35.0,42.0,46.0,50.0,52.0,54.0,58.0,,60.0,61.0,65.0,
北京市,51.0,41.0,68.0,80.0,91.0,111.0,90.0,260.0,168.0,191.0,212.0,228.0,253.0,274.0,297.0,315.0,326.0,337.0,342.0,352.0,366.0,372.0,
吉林省,3.0,4.0,,6.0,8.0,9.0,14.0,14.0,18.0,23.0,31.0,47.0,54.0,59.0,65.0,69.0,78.0,80.0,81.0,83.0,84.0,86.0,


In [468]:
dayUnique_wuhan = covidData[covidData["provinceName"] == "湖北省"]
dayUnique_wuhan = dayUnique_wuhan.groupby(['cityName', 'date'])['city_confirmedCount'].aggregate('sum').unstack()

In [469]:
# I have defined this function at the begining
dayUnique_wuhan = nullhandling(dayUnique_wuhan)
dayUnique_wuhan.style.background_gradient(cmap='rainbow')

date,2020-01-24,2020-01-25,2020-01-26,2020-01-27,2020-01-28,2020-01-29,2020-01-30,2020-01-31,2020-02-01,2020-02-02,2020-02-03,2020-02-04,2020-02-05,2020-02-06,2020-02-07,2020-02-08,2020-02-09,2020-02-10,2020-02-11,2020-02-12,2020-02-13,2020-02-14
cityName,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1
仙桃,2,10,11,12,27,32,55,90,97,140,169,188,225,265,307,359,379,416,438,460,480,500
十堰,1,5,20,40,65,88,119,150,177,212,256,291,318,353,395,438,467,481,505,536,562,586
咸宁,0,0,43,64,91,112,130,166,206,246,296,348,384,399,443,476,493,507,515,525,534,732
天门,0,3,5,13,23,34,44,67,82,99,115,117,128,138,163,179,197,217,261,293,362,416
孝感,22,26,55,100,173,274,399,541,628,749,918,1120,1462,1886,2141,2313,2436,2541,2642,2751,2874,3009
宜昌,1,1,20,31,51,63,117,167,276,353,392,452,496,563,610,633,711,749,772,784,810,877
恩施,0,0,0,0,0,0,0,0,87,87,87,87,87,87,87,87,87,87,87,87,87,87
恩施州,0,11,17,25,38,51,66,75,87,105,111,123,138,144,157,160,171,187,195,203,229,237
未知地区,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
武汉,495,572,618,698,1590,1905,2261,2639,3215,4109,5142,6384,8351,10117,11618,13603,14982,16902,18454,19558,32994,35991


In [470]:
dayUnique_wuhan = dayUnique_wuhan.T.reset_index()
listofcolumns = []
# get all of the column names, store in a list
for i in dayUnique_wuhan.columns:
    if i != "date":
        listofcolumns.append(i)

dayUnique_wuhan_melt = pd.melt(dayUnique_wuhan, id_vars=['date'],
                                value_vars= listofcolumns, 
                                var_name='Province', value_name='Count')

In [471]:
fig = px.bar(dayUnique_wuhan_melt, 
             y="date", x="Count", color='Province', orientation='h', height=700,
             title='Number of Confirmed in Hubei Province')
fig.update_layout(uniformtext_minsize=5, uniformtext_mode='hide')
fig.show()

In [472]:
px.bar(dayUnique_wuhan_melt,x="Count", y="Province", color='Province', 
       orientation='h', height=800,title='Outbreak in Hubei Province',
       animation_frame="date",range_x = [0,38000])

In [473]:
px.treemap(dayUnique_wuhan_melt.sort_values(by='Count', ascending=False).reset_index(drop=True), 
           path=["Province"], values="Count")

## Geographic Visualization

In [474]:
dayUnique_melt_region = pd.merge(region_lookup,dayUnique_melt,
                                left_on="province",right_on="Province",how='outer')

In [475]:
# here, some province name doesn't match
dayUnique_melt_region.isnull().sum()

province     23
longitude    23
latitude     23
date          3
Province      3
Count         3
dtype: int64

In [476]:
# let's find it
dayUnique_melt_region.head()

Unnamed: 0,province,longitude,latitude,date,Province,Count
0,澳门特别行政区,113.543028,22.186835,,,
1,北京市,116.407394,39.904211,2020-01-24,北京市,51.0
2,北京市,116.407394,39.904211,2020-01-25,北京市,41.0
3,北京市,116.407394,39.904211,2020-01-26,北京市,68.0
4,北京市,116.407394,39.904211,2020-01-27,北京市,80.0


In [477]:
# let's find it
for i in list(set(dayUnique_melt["Province"])):
    if i in region_lookup["province"].values:
        pass
    else:
        print(i)

澳门


In [478]:
region_lookup['province'][0] = '澳门'



A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [479]:
del dayUnique_melt_region
dayUnique_melt_region = pd.merge(region_lookup,dayUnique_melt,
                                left_on="province",right_on="Province",how='outer')
del dayUnique_melt_region["province"]

In [480]:
dayUnique_melt_region.isnull().sum()

longitude    0
latitude     0
date         2
Province     2
Count        2
dtype: int64

In [481]:
dayUnique_melt_region = dayUnique_melt_region.dropna()
dayUnique_melt_region.dtypes

longitude    float64
latitude     float64
date          object
Province      object
Count        float64
dtype: object

In [482]:
# select only 2020-02-14
# currently data of 02-15 is incomplete
dayUnique_melt_region["date"] = dayUnique_melt_region["date"].apply(pd.to_datetime)
dayUnique_melt_region_lastday = dayUnique_melt_region.query("date == '2020-02-14'")
dayUnique_melt_region_lastday.head()

Unnamed: 0,longitude,latitude,date,Province,Count
21,113.543028,22.186835,2020-02-14,澳门,2.0
44,116.407394,39.904211,2020-02-14,北京市,372.0
67,106.551643,29.562849,2020-02-14,重庆市,532.0
90,119.295143,26.100779,2020-02-14,福建省,281.0
113,113.26641,23.132324,2020-02-14,广东省,1261.0


In [483]:
covmap = folium.Map(location=[36, 105], zoom_start=4)

for lat, lon, value, name in zip(dayUnique_melt_region_lastday["latitude"], 
                                 dayUnique_melt_region_lastday['longitude'], 
                                 dayUnique_melt_region_lastday['Count'], 
                                 dayUnique_melt_region_lastday['Province']):
    folium.CircleMarker([lat, lon],
                        radius=20,
                        popup = ('Province: ' + str(name) + '<br>'
                        'Confrimed: ' + str(value) + '<br>'),
                        color='red',
                        
                        fill_color='red',
                        fill_opacity=0.7 ).add_to(covmap)


covmap

In [484]:
# convert data into lists that can be used by folium
num = dayUnique_melt_region_lastday.shape[0]
lat = np.array(dayUnique_melt_region_lastday["latitude"][0:num])
lon = np.array(dayUnique_melt_region_lastday['longitude'][0:num])
confrimed = np.array(dayUnique_melt_region_lastday['Count'][0:num])
mapdata = [[lat[i], lon[i], confrimed[i]] for i in range(num)]

In [485]:
heatmap = folium.Map(location=[38, 100], zoom_start=4)
HeatMap(mapdata).add_to(heatmap)

<folium.plugins.heat_map.HeatMap at 0x1a872b3b50>

In [486]:
# visualize the spread of COVID-19 
heatmap

## Death Rate and Cure Rate

In [487]:
dayunique_dq = covidData.query("date == '2020-02-14'").groupby(['provinceName', 'date'])["city_curedCount","city_deadCount","city_confirmedCount"].aggregate('sum').unstack()

In [488]:
# create two new columns of cure rate and death rate
dayunique_dq["cure rate"] = dayunique_dq["city_curedCount"]/dayunique_dq["city_confirmedCount"]
dayunique_dq["death rate"] = dayunique_dq["city_deadCount"]/dayunique_dq["city_confirmedCount"]

In [489]:
dayunique_dq.drop(["city_curedCount","city_deadCount","city_confirmedCount"],axis=1,inplace=True)
dayunique_dq.reset_index(inplace=True)
dayunique_dq.head()

Unnamed: 0_level_0,provinceName,cure rate,death rate
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,上海市,0.194969,0.003145
1,云南省,0.209877,0.0
2,内蒙古自治区,0.092308,0.0
3,北京市,0.024194,0.0
4,吉林省,0.290698,0.011628


In [490]:
px.bar(dayunique_dq,x="provinceName",y="cure rate",
       color="provinceName",title='Cure rate of each province')

In [491]:
px.bar(dayunique_dq,x="provinceName",y="death rate",
       color="provinceName",title='Death rate of each province')

## Growth Rate

In [497]:
dayUnique_gr = dayUnique.T
dayUnique_gr.drop(["2020-01-24","2020-01-25","2020-01-26"],inplace=True)
dayUnique_gr.drop(["澳门","西藏自治区"],axis=1,inplace=True)

In [498]:
pd.set_option('display.max_columns', None)
dayUnique_gr.head(10)

provinceName,上海市,云南省,内蒙古自治区,北京市,吉林省,四川省,天津市,宁夏回族自治区,安徽省,山东省,山西省,广东省,广西壮族自治区,新疆维吾尔自治区,江苏省,江西省,河北省,河南省,浙江省,海南省,湖北省,湖南省,甘肃省,福建省,贵州省,辽宁省,重庆市,陕西省,青海省,黑龙江省
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
2020-01-27,53.0,26.0,22.0,80.0,6.0,69.0,22.0,7.0,70.0,75.0,13.0,151.0,46.0,5.0,47.0,48.0,18.0,133.0,128.0,33.0,1423.0,100.0,14.0,59.0,7.0,25.0,110.0,35.0,6.0,21.0
2020-01-28,96.0,44.0,30.0,91.0,8.0,90.0,24.0,11.0,106.0,95.0,20.0,208.0,51.0,10.0,70.0,72.0,33.0,188.0,173.0,40.0,2714.0,143.0,22.0,80.0,9.0,32.0,132.0,46.0,,30.0
2020-01-29,96.0,55.0,16.0,111.0,9.0,108.0,31.0,12.0,155.0,130.0,30.0,282.0,58.0,15.0,99.0,109.0,48.0,206.0,296.0,43.0,3554.0,221.0,24.0,82.0,9.0,35.0,147.0,56.0,,38.0
2020-01-30,112.0,70.0,18.0,90.0,14.0,142.0,32.0,18.0,200.0,158.0,35.0,354.0,78.0,14.0,129.0,162.0,65.0,278.0,428.0,46.0,4586.0,277.0,26.0,101.0,12.0,41.0,182.0,63.0,6.0,44.0
2020-01-31,233.0,83.0,19.0,260.0,14.0,177.0,33.0,21.0,237.0,184.0,39.0,436.0,87.0,19.0,168.0,240.0,82.0,354.0,537.0,64.0,5806.0,332.0,34.0,120.0,29.0,43.0,266.0,87.0,8.0,59.0
2020-02-01,169.0,93.0,23.0,168.0,18.0,207.0,41.0,26.0,297.0,206.0,47.0,535.0,100.0,18.0,202.0,286.0,96.0,462.0,599.0,77.0,7240.0,389.0,46.0,144.0,29.0,64.0,247.0,101.0,18.0,80.0
2020-02-02,182.0,105.0,27.0,191.0,23.0,231.0,48.0,28.0,340.0,230.0,56.0,632.0,111.0,21.0,236.0,333.0,104.0,493.0,661.0,64.0,9074.0,463.0,40.0,159.0,38.0,71.0,275.0,116.0,9.0,95.0
2020-02-03,203.0,114.0,34.0,212.0,31.0,254.0,60.0,31.0,408.0,259.0,66.0,725.0,127.0,24.0,271.0,391.0,113.0,781.0,724.0,72.0,11177.0,521.0,51.0,179.0,46.0,74.0,312.0,128.0,14.0,121.0
2020-02-04,219.0,119.0,35.0,228.0,47.0,282.0,67.0,34.0,480.0,275.0,74.0,813.0,139.0,29.0,308.0,476.0,126.0,675.0,829.0,80.0,13522.0,593.0,57.0,194.0,58.0,81.0,344.0,142.0,15.0,155.0
2020-02-05,243.0,124.0,42.0,253.0,54.0,301.0,69.0,,530.0,307.0,81.0,895.0,150.0,32.0,341.0,548.0,135.0,764.0,895.0,91.0,16678.0,661.0,57.0,205.0,64.0,89.0,376.0,165.0,17.0,190.0


In [499]:
dayUnique_gr_2 = dayUnique_gr.copy()

In [500]:
for i in range(len(dayUnique_gr)):
    for j in range(len(dayUnique_gr.columns)):
        if i == 0:
            pass
        else:
            now = dayUnique_gr.iloc[i,j]
            past = dayUnique_gr.iloc[i-1,j]
            dayUnique_gr_2.iloc[i,j] = (now-past)/past

In [501]:
dayUnique_gr_2.drop("2020-01-27",inplace=True)
dayUnique_gr_2.head()

provinceName,上海市,云南省,内蒙古自治区,北京市,吉林省,四川省,天津市,宁夏回族自治区,安徽省,山东省,山西省,广东省,广西壮族自治区,新疆维吾尔自治区,江苏省,江西省,河北省,河南省,浙江省,海南省,湖北省,湖南省,甘肃省,福建省,贵州省,辽宁省,重庆市,陕西省,青海省,黑龙江省
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1
2020-01-28,0.811321,0.692308,0.363636,0.1375,0.333333,0.304348,0.090909,0.571429,0.514286,0.266667,0.538462,0.377483,0.108696,1.0,0.489362,0.5,0.833333,0.413534,0.351562,0.212121,0.907238,0.43,0.571429,0.355932,0.285714,0.28,0.2,0.314286,,0.428571
2020-01-29,0.0,0.25,-0.466667,0.21978,0.125,0.2,0.291667,0.090909,0.462264,0.368421,0.5,0.355769,0.137255,0.5,0.414286,0.513889,0.454545,0.095745,0.710983,0.075,0.309506,0.545455,0.090909,0.025,0.0,0.09375,0.113636,0.217391,,0.266667
2020-01-30,0.166667,0.272727,0.125,-0.189189,0.555556,0.314815,0.032258,0.5,0.290323,0.215385,0.166667,0.255319,0.344828,-0.066667,0.30303,0.486239,0.354167,0.349515,0.445946,0.069767,0.290377,0.253394,0.083333,0.231707,0.333333,0.171429,0.238095,0.125,,0.157895
2020-01-31,1.080357,0.185714,0.055556,1.888889,0.0,0.246479,0.03125,0.166667,0.185,0.164557,0.114286,0.231638,0.115385,0.357143,0.302326,0.481481,0.261538,0.273381,0.254673,0.391304,0.266027,0.198556,0.307692,0.188119,1.416667,0.04878,0.461538,0.380952,0.333333,0.340909
2020-02-01,-0.274678,0.120482,0.210526,-0.353846,0.285714,0.169492,0.242424,0.238095,0.253165,0.119565,0.205128,0.227064,0.149425,-0.052632,0.202381,0.191667,0.170732,0.305085,0.115456,0.203125,0.246986,0.171687,0.352941,0.2,0.0,0.488372,-0.071429,0.16092,1.25,0.355932


In [502]:
dayUnique_gr_2 = dayUnique_gr_2.reset_index()

In [503]:
listofcolumns = []
for i in dayUnique_gr_2.columns:
    if i != "date":
        listofcolumns.append(i)

dayUnique_gr_2_melt = pd.melt(dayUnique_gr_2, id_vars=['date'],
                                value_vars= listofcolumns, 
                                var_name='ProvinceName')

In [504]:
dayUnique_gr_2_melt.head()

Unnamed: 0,date,ProvinceName,value
0,2020-01-28,上海市,0.811321
1,2020-01-29,上海市,0.0
2,2020-01-30,上海市,0.166667
3,2020-01-31,上海市,1.080357
4,2020-02-01,上海市,-0.274678


In [505]:
px.line(dayUnique_gr_2_melt, x="date", y="value",color ='ProvinceName',
        title='Growth rate of coronavirus in each province')

In [506]:
px.bar(dayUnique_gr_2_melt,x="value", y="ProvinceName", color='ProvinceName', 
       orientation='h', height=800,title='Growth rate of coronavirus in each province',
       animation_frame="date",range_x = [0,2])