### Novel Corona Virus 2019 Dataset

The dataset used is from https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset.

<b>Data used are:</b>
time_series_covid_19_confirmed.csv, time_series_covid_19_deaths.csv and time_series_covid_19_recovered.csv. Using this data to create a bar race chart using flourish. The data set is separated by Province/State. However, not all countries have province/State while some country have multiple province/State for example,

|Province/State|Country/Region|
|--|--|
| Australian Capital Territory | Australia |
| New South Wales|Australia |
| Northern Territory|Australia |
Queensland|Australia | 
South Australia|Australia | 
Tasmania|Australia |
Victoria|Australia | 
Western Australia|Australia|

### Change in cases over time at country level

importing the csv file using pandas

In [1]:
import pandas as pd
confirmed_raw_df = pd.read_csv("time_series_covid_19_confirmed.csv")
death_raw_df = pd.read_csv("time_series_covid_19_deaths.csv")
recovered_raw_df = pd.read_csv("time_series_covid_19_recovered.csv")

- Removing Province/State, Lat and long columns as it is not important to create race chart for change in cases and death over time at country level.

In [2]:
confirmed_raw_df.drop(["Province/State","Lat","Long"],axis=1,inplace=True)
death_raw_df.drop(["Province/State","Lat","Long"],axis=1,inplace=True)
recovered_raw_df.drop(["Province/State","Lat","Long"],axis=1,inplace=True)

In [3]:
confirmed_raw_df.head(3)

Unnamed: 0,Country/Region,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,...,4/28/20,4/29/20,4/30/20,5/1/20,5/2/20,5/3/20,5/4/20,5/5/20,5/6/20,5/7/20
0,Afghanistan,0,0,0,0,0,0,0,0,0,...,1828,1939,2171,2335,2469,2704,2894,3224,3392,3563
1,Albania,0,0,0,0,0,0,0,0,0,...,750,766,773,782,789,795,803,820,832,842
2,Algeria,0,0,0,0,0,0,0,0,0,...,3649,3848,4006,4154,4295,4474,4648,4838,4997,5182


In [4]:
death_raw_df.head(3)

Unnamed: 0,Country/Region,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,...,4/28/20,4/29/20,4/30/20,5/1/20,5/2/20,5/3/20,5/4/20,5/5/20,5/6/20,5/7/20
0,Afghanistan,0,0,0,0,0,0,0,0,0,...,58,60,64,68,72,85,90,95,104,106
1,Albania,0,0,0,0,0,0,0,0,0,...,30,30,31,31,31,31,31,31,31,31
2,Algeria,0,0,0,0,0,0,0,0,0,...,437,444,450,453,459,463,465,470,476,483


In [5]:
recovered_raw_df.head(3)

Unnamed: 0,Country/Region,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,...,4/28/20,4/29/20,4/30/20,5/1/20,5/2/20,5/3/20,5/4/20,5/5/20,5/6/20,5/7/20
0,Afghanistan,0,0,0,0,0,0,0,0,0,...,228,252,260,310,331,345,397,421,458,468
1,Albania,0,0,0,0,0,0,0,0,0,...,431,455,470,488,519,531,543,570,595,605
2,Algeria,0,0,0,0,0,0,0,0,0,...,1651,1702,1779,1821,1872,1936,1998,2067,2197,2323


- Double check if the is any duplicated value in Country/Region columns as I expected that there are duplicated country because the data is separated by state.
- If the above is true, data is combined by country.

In [6]:
print(confirmed_raw_df["Country/Region"].is_unique)
print(death_raw_df["Country/Region"].is_unique)
print(recovered_raw_df["Country/Region"].is_unique)

False
False
False


In [7]:
confirmed_df = confirmed_raw_df.groupby("Country/Region").sum()
death_df = death_raw_df.groupby("Country/Region").sum()
recovered_df = recovered_raw_df.groupby("Country/Region").sum()

print(confirmed_df.index.is_unique)
print(death_df.index.is_unique)
print(recovered_df.index.is_unique)

True
True
True


Add images url columns into the dataframe because I wanted to add country flag in the bar graph:

In order to show the flags of the country, Alpha2 country code is need because the img url is from https://www.countryflags.io/sg/flat/64.png (countryflags.io/<alpha 2 country code>/flat/64.png)

- The list is from https://gist.github.com/marijn/396531/188caa065e3cd319fed7913ee3eecf5eec541918

<img src="https://www.countryflags.io/sg/flat/64.png">


In [8]:
alpha_2 = pd.read_csv("https://gist.githubusercontent.com/marijn/396531/raw/188caa065e3cd319fed7913ee3eecf5eec541918/countries.txt", 
                      sep ='|', 
                      header=None
                     )

alpha_2.columns = ["alpha", "country"]
alpha_2 = alpha_2.set_index("country")
alpha_2.loc["Afghanistan"]["alpha"]

'AF'

In [9]:
import numpy as np
alpha_2_covid = []
for i in confirmed_df.index:
    try:
        if(i == "US"):
            alpha_2_covid.append("https://www.countryflags.io/"+"US"+"/flat/64.png")
        elif(i == "Iran"):
            alpha_2_covid.append("https://www.countryflags.io/"+"IR"+"/flat/64.png")
        elif(i == "Korea, South"):
            alpha_2_covid.append("https://www.countryflags.io/"+"KR"+"/flat/64.png")
        elif(i == "Russia"):
            alpha_2_covid.append("https://www.countryflags.io/"+"RU"+"/flat/64.png")
        else:
            alpha_2_covid.append("https://www.countryflags.io/"+alpha_2.loc[i]["alpha"]+"/flat/64.png")
    except:
        alpha_2_covid.append(np.nan)
        
confirmed_df["image"] = alpha_2_covid
death_df["image"] = alpha_2_covid
recovered_df["image"] = alpha_2_covid

In [10]:
confirmed_df.to_csv("cofirmed by country.csv")
death_df.to_csv("death by country.csv")
recovered_df.to_csv("recovered by country.csv")