### Corona Virus Data Analyst 10th April 2020

Coronaviruses (CoV) are a large family of viruses that cause illness ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS-CoV) and Severe Acute Respiratory Syndrome (SARS-CoV). A novel coronavirus (nCoV) is a new strain that has not been previously identified in humans.  

Coronaviruses are zoonotic, meaning they are transmitted between animals and people.  Detailed investigations found that SARS-CoV was transmitted from civet cats to humans and MERS-CoV from dromedary camels to humans. Several known coronaviruses are circulating in animals that have not yet infected humans. 

Common signs of infection include respiratory symptoms, fever, cough, shortness of breath and breathing difficulties. In more severe cases, infection can cause pneumonia, severe acute respiratory syndrome, kidney failure and even death. 

Standard recommendations to prevent infection spread include regular hand washing, covering mouth and nose when coughing and sneezing, thoroughly cooking meat and eggs. Avoid close contact with anyone showing symptoms of respiratory illness such as coughing and sneezing.

Resources : WHO

This data analysis is focused on data wrangling, data cleaning and data visulization of Corona Virus on 8th March 2020.

The dataset is derived from 2019 Novel Coronavirus COVID-19 (2019-nCoV) Data Repository by Johns Hopkins CSSE. 

Here is the link consisting of all datased regarding Coronavirus

https://github.com/CSSEGISandData/COVID-19

1 - Determine which libraries is used for dataset

In [294]:
# Use this cell to set up import statements for all of the packages that you
#   plan to use.

# Remember to include a 'magic word' so that your visualizations are plotted
#   inline with the notebook. See this page for more:
#   http://ipython.readthedocs.io/en/stable/interactive/magics.html
# Importing neccessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import csv
import pycountry_convert as pc
import plotly.express as px
import plotly.io as pio
%matplotlib inline

2 - Load data from csv file defined in the link repository

In [295]:
corona_corfirm_file = "time_series_covid19_confirmed_global.csv"
corona_deaths_file = "time_series_covid19_deaths_global.csv"
corona_recovered_file = "time_series_covid19_recovered_global.csv"
counrty_file = "cases_country.csv"
corona_virus_daily = "04-09-2020.csv"

url = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/'
url_country = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/web-data/data/'
url_daily = 'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/'

corona_data_daily_df = pd.read_csv(url_daily + corona_virus_daily, error_bad_lines=False,encoding = "utf-8")
corona_corfirm_data_df = pd.read_csv(url + corona_corfirm_file, error_bad_lines=False,encoding = "utf-8")
corona_deaths_data_df = pd.read_csv(url + corona_deaths_file, error_bad_lines=False,encoding = "utf-8")
corona_recovered_data_df = pd.read_csv(url + corona_recovered_file , error_bad_lines=False,encoding = "utf-8")
df_corona_country = pd.read_csv(url_country + counrty_file, error_bad_lines=False,encoding = "utf-8")

3 - First 5 values from df_corona_country

In [296]:
df_corona_country.head()

Unnamed: 0,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active
0,Australia,2020-04-10 08:41:35,-25.0,133.0,6204,54,1472,4678
1,Austria,2020-04-10 08:33:58,47.5162,14.5501,13337,295,5240,7802
2,Canada,2020-04-10 08:41:18,60.001,-95.001,20765,510,5311,0
3,China,2020-04-10 01:51:07,30.5928,114.3055,82924,3340,77758,1826
4,Denmark,2020-04-10 08:33:58,56.0,10.0,5830,237,1892,3701


4 - Rearrange some countries name and make a connection with their own contients

In [297]:
df_corona_country = df_corona_country.rename(columns={"Country_Region": "country"})

df_corona_country.loc[df_corona_country['country'] == "US", "country"] = "USA"
df_corona_country.loc[df_corona_country['country'] == "Korea, South", "country"] = "South Korea"
df_corona_country.loc[df_corona_country['country'] == "Taiwan*", "country"] = "Taiwan"
df_corona_country.loc[df_corona_country['country'] == "Congo (Kinshasa)", "country"] = "Democratic Republic of the Congo"
df_corona_country.loc[df_corona_country['country'] == "Congo (Brazzaville)", "country"] = "Republic of the Congo"
df_corona_country.loc[df_corona_country['country'] == "Cote d'Ivoire", "country"] = "Côte d'Ivoire"
df_corona_country.loc[df_corona_country['country'] == "Reunion", "country"] = "Réunion"
df_corona_country.loc[df_corona_country['country'] == "Bahamas, The", "country"] = "Bahamas"
df_corona_country.loc[df_corona_country['country'] == "Gambia, The", "country"] = "Gambia"


5 - Get all countries and put them into the array

In [298]:
countries = np.asarray(df_corona_country["country"])

6 - Define all continents

In [299]:
continents = {
    'NA': 'North America',
    'SA': 'South America', 
    'AS': 'Asia',
    'OC': 'Australia',
    'AF': 'Africa',
    'EU' : 'Europe',
    'na' : 'Others'
}

7 - Define a function to add continent to each country

In [300]:
def country_to_continent_code(country):
    try:
        return pc.country_alpha2_to_continent_code(pc.country_name_to_country_alpha2(country))
    except :
        return 'na'

In [301]:
df_corona_country.insert(1,"continent",  [continents[country_to_continent_code(country)] for country in countries])

8 - Get first 5 values from df_corona_country

In [302]:
df_corona_country.head()

Unnamed: 0,country,continent,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active
0,Australia,Australia,2020-04-10 08:41:35,-25.0,133.0,6204,54,1472,4678
1,Austria,Europe,2020-04-10 08:33:58,47.5162,14.5501,13337,295,5240,7802
2,Canada,North America,2020-04-10 08:41:18,60.001,-95.001,20765,510,5311,0
3,China,Asia,2020-04-10 01:51:07,30.5928,114.3055,82924,3340,77758,1826
4,Denmark,Europe,2020-04-10 08:33:58,56.0,10.0,5830,237,1892,3701


9 - Recalculate Active Number of all countries

In [303]:
df_corona_country['Active'] = df_corona_country['Confirmed'] - df_corona_country['Deaths'] - df_corona_country['Recovered']

10 - Rearrange all countries in terms of their own contitent

In [304]:
df_continents = df_corona_country.copy()
df_continents.index = df_continents["country"]
df_continents = df_continents.drop(['Lat','Long_','country','Last_Update'],axis =1)
df_continents = df_continents.groupby(["continent"]).sum()
df_continents = df_continents.reset_index()

In [305]:
df_continents

Unnamed: 0,continent,Confirmed,Deaths,Recovered,Active
0,Africa,12365,632,1632,10101
1,Asia,264361,9921,129488,124952
2,Australia,7505,56,1845,5604
3,Europe,782840,65904,185717,531219
4,North America,497694,17653,32696,447345
5,Others,1209,22,698,489
6,South America,39574,1620,4085,33869


11 - Show Confirmed Numbers in each continent

In [306]:
fig = px.treemap(df_continents, 
                 path=["continent"], values="Confirmed", height=700,
                 title='Number of Confirmed Cases In Each Contitent',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
pio.write_image(fig, "images/image1.png")
fig.show()

12 - Show Death Numbers in each continent

In [307]:
fig = px.treemap(df_continents, 
                 path=["continent"], values="Deaths", height=700,
                 title='Number of Deaths Cases In Each Contitent',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
pio.write_image(fig, "images/image2.png")
fig.show()

13 - Show Recovered Numbers in each continent

In [308]:
fig = px.treemap(df_continents, 
                 path=["continent"], values="Recovered", height=700,
                 title='Number of Recovered Cases In Each Contitent',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
pio.write_image(fig, "images/image3.png")
fig.show()

14 - Show Recovered Numbers in each continent

In [309]:
fig = px.treemap(df_continents, 
                 path=["continent"], values="Active", height=700,
                 title='Number of Active Cases In Each Contitent',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
pio.write_image(fig, "images/image4.png")
fig.show()

16 - Show Recovered Numbers in countries of its continents

In [310]:
df_countries = df_corona_country.copy()

fig = px.treemap(df_countries, 
                 path=["continent","country"], values="Confirmed", height=700,
                 title='Number of Confirmed Cases In Each Contitent',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
pio.write_image(fig, "images/image5.png")
fig.show()

17 - Show Deaths Numbers in countries of its continents

In [311]:
df_countries = df_corona_country.copy()

fig = px.treemap(df_countries, 
                 path=["continent","country"], values="Deaths", height=700,
                 title='Number of Deaths Cases In Each Contitent',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
pio.write_image(fig, "images/image6.png")
fig.show()

18 - Show Recovered Numbers in countries of its continents

In [312]:
df_countries = df_corona_country.copy()

fig = px.treemap(df_countries, 
                 path=["continent","country"], values="Recovered", height=700,
                 title='Number of Recovered Cases In Each Contitent',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
pio.write_image(fig, "images/image7.png")
fig.show()

19 - Show Active Numbers in countries of its continents

In [313]:
df_countries = df_corona_country.copy()

fig = px.treemap(df_countries, 
                 path=["continent","country"], values="Active", height=700,
                 title='Number of Active Cases In Each Contitent',
                 color_discrete_sequence = px.colors.qualitative.Prism)
fig.data[0].textinfo = 'label+text+value'
pio.write_image(fig, "images/image8.png")
fig.show()

20 - Drop Province/State cloumn

In [314]:
corona_corfirm_data_df.drop(['Province/State'], axis=1, inplace=True) 
corona_deaths_data_df.drop(['Province/State'], axis=1, inplace=True) 
corona_recovered_data_df.drop(['Province/State'], axis=1, inplace=True) 

21 - Sum Values in terms of Country Region in corfirm,deaths and lastly recovered dataframes.

In [315]:
corona_corfirm_time_data_df = corona_corfirm_data_df.groupby(['Country/Region']).sum().reset_index()
corona_death_time_data_df = corona_deaths_data_df.groupby(['Country/Region']).sum().reset_index()
corona_recovered_time_data_df = corona_recovered_data_df.groupby(['Country/Region']).sum().reset_index()

22 - Remove both Lat and Long columns from all these dataframes

In [316]:
corona_corfirm_time_data_df.drop(['Lat', 'Long'], axis=1, inplace=True)
corona_death_time_data_df.drop(['Lat', 'Long'], axis=1, inplace=True)
corona_recovered_time_data_df.drop(['Lat', 'Long'], axis=1, inplace=True)

23 - Convert Country/Region to Country

In [317]:
corona_corfirm_time_data_df.rename(columns={'Country/Region': 'Country'}, inplace=True)
corona_death_time_data_df.rename(columns={'Country/Region': 'Country'}, inplace=True)
corona_recovered_time_data_df.rename(columns={'Country/Region': 'Country'}, inplace=True)

24 - Convert date columns to datetime columns

In [318]:
corona_corfirm_time_data_df.columns = corona_corfirm_time_data_df.columns[:1].tolist() + pd.to_datetime(corona_corfirm_time_data_df.columns[1:]).date.tolist()
corona_death_time_data_df.columns = corona_death_time_data_df.columns[:1].tolist() + pd.to_datetime(corona_death_time_data_df.columns[1:]).date.tolist()
corona_recovered_time_data_df.columns = corona_recovered_time_data_df.columns[:1].tolist() + pd.to_datetime(corona_recovered_time_data_df.columns[1:]).date.tolist()

25 - Write a function for melting all dataframes to visualize computer-family structure

In [319]:
def meltDataFrame(df,id_vars,value_vars,var_name,value_name):
    return pd.melt(df,
        id_vars= id_vars,
        value_vars = value_vars,
        var_name= var_name,
        value_name= value_name)

In [320]:
corona_corfirm_time_data_df_melt = meltDataFrame(corona_corfirm_time_data_df.reset_index(),
        ['Country'],pd.date_range('1/22/20', '4/09/20', freq='D').date,'Date','Total_Corfirmed')

corona_death_time_data_df_melt = meltDataFrame(corona_death_time_data_df.reset_index(),
        ['Country'],pd.date_range('1/22/20', '4/09/20', freq='D').date,'Date','Total_Death')

corona_recovered_time_data_df_melt = meltDataFrame(corona_recovered_time_data_df.reset_index(),
        ['Country'],pd.date_range('1/22/20', '4/09/20', freq='D').date,'Date','Total_Recovered')

26 - Write a function for merging all dataframe in terms of Country and Date

In [321]:
def mergeDataFrame(df_melt_1,df_melt_2,how,left_on,right_on):
    return pd.merge(df_melt_1, 
                    df_melt_2,  
                    how=how, 
                    left_on=left_on, 
                    right_on = right_on)

corona_df = mergeDataFrame(corona_corfirm_time_data_df_melt, 
                    corona_death_time_data_df_melt,  
                    'left', 
                    ['Country','Date'], 
                    ['Country','Date'])


corona_df = mergeDataFrame(corona_df, 
                    corona_recovered_time_data_df_melt,  
                    'left', 
                    ['Country','Date'], 
                    ['Country','Date'])

corona_df.rename(columns={'Date': 'Day_Date'}, inplace=True)

27 - Determine how many people were confirmed,recovered and dead from Corona Virus in terms of their countries

In [322]:
corona_data_df_daily_corfirmed = corona_data_daily_df.groupby(['Country_Region'])['Confirmed'].sum()
corona_data_df_daily_deaths = corona_data_daily_df.groupby(['Country_Region'])['Deaths'].sum()
corona_data_df_daily_recovered = corona_data_daily_df.groupby(['Country_Region'])['Recovered'].sum()

28 - Determine which 10 most-affected countries or regions were confirmed,recovered and dead from Corona Virus cases

In [323]:
df_confirmed = corona_data_df_daily_corfirmed.sort_values(ascending=False).head(10).reset_index()
df_deaths = corona_data_df_daily_deaths.sort_values(ascending=False).head(10).reset_index()
df_recovered = corona_data_df_daily_recovered.sort_values(ascending=False).head(10).reset_index()

In [324]:
for i in range(df_confirmed.shape[0]):
    print(str(df_confirmed['Country_Region'][i]))

US
Spain
Italy
France
Germany
China
Iran
United Kingdom
Turkey
Belgium


In [325]:
for i in range(df_deaths.shape[0]):
    print(str(df_deaths['Country_Region'][i]))

Italy
US
Spain
France
United Kingdom
Iran
China
Germany
Belgium
Netherlands


In [326]:
for i in range(df_recovered.shape[0]):
    print(str(df_recovered['Country_Region'][i]))

China
Germany
Spain
Iran
Italy
US
France
Switzerland
Korea, South
Austria


In [327]:
corona_confirmed_df = corona_df[corona_df['Country'].isin([ str(df_confirmed['Country_Region'][i]) for i in range(df_confirmed.shape[0]) ])]
corona_deaths_df = corona_df[corona_df['Country'].isin([ str(df_deaths['Country_Region'][i]) for i in range(df_deaths.shape[0]) ])]
corona_recovered_df = corona_df[corona_df['Country'].isin([ str(df_recovered['Country_Region'][i]) for i in range(df_recovered.shape[0]) ])]

29 - Top 10 Countries Confirmed Numbers of People in Corona Virus(COVID-19)

In [328]:
fig = px.scatter(corona_confirmed_df, x='Day_Date', y='Country', size='Total_Corfirmed', color='Country', 
           color_continuous_scale=px.colors.sequential.Viridis, title="Top 10 Countries Confirmed Numbers of People in Corona Virus(COVID-19)")
fig.update_layout(yaxis = dict(dtick = 1))
fig.update(layout_coloraxis_showscale=False)
pio.write_image(fig, "images/image9.png") 
fig.show()

30 - Top 10 Countries Confirmed Numbers of People in Corona Virus(COVID-19)

In [329]:
fig = px.scatter(corona_deaths_df, x='Day_Date', y='Country', size='Total_Death', color='Country', 
           color_continuous_scale=px.colors.sequential.Viridis, title = "Top 10 Countries Dead Numbers of People in Corona Virus(COVID-19)")
fig.update_layout(yaxis = dict(dtick = 1))
fig.update(layout_coloraxis_showscale=False)
pio.write_image(fig, "images/image10.png") 
fig.show()

31 - Top 10 Countries Confirmed Numbers of People in Corona Virus(COVID-19)

In [330]:
fig = px.scatter(corona_recovered_df, x='Day_Date', y='Country', size='Total_Recovered', color='Country', 
           color_continuous_scale=px.colors.sequential.Viridis, title = "Top 10 Countries Recovered Numbers of People in Corona Virus(COVID-19)")
fig.update_layout(yaxis = dict(dtick = 1))
fig.update(layout_coloraxis_showscale=False)
pio.write_image(fig, "images/image11.png") 
fig.show()

32 - Countries with Confirmed Cases Time Schedule

In [331]:
corona_df['Day_Date'] = pd.to_datetime(corona_df['Day_Date'])
corona_df['Day_Date'] = corona_df['Day_Date'].dt.strftime('%m/%d/%Y')

fig = px.choropleth(corona_df, locations="Country", 
                    locationmode='country names', color="Total_Corfirmed",
                    hover_name="Country",
                    color_continuous_scale="aggrnyl",
                    animation_frame="Day_Date",
                    title='Countries with Confirmed Cases')

fig.update_layout(margin=dict(t=80,l=0,r=0,b=0))
pio.write_image(fig, "images/image12.png") 
fig.show()

33 - Countries with Confirmed Cases Time Schedule

In [332]:
fig = px.choropleth(corona_df, locations="Country", 
                    locationmode='country names', color="Total_Death",
                    hover_name="Country", 
                    color_continuous_scale="aggrnyl",
                    animation_frame="Day_Date",
                    title='Countries with Deaths Cases')

fig.update_layout(margin=dict(t=80,l=0,r=0,b=0))
pio.write_image(fig, "images/image13.png") 
fig.show()

34 - Countries with Confirmed Cases Time Schedule

In [333]:
fig = px.choropleth(corona_df, locations="Country", 
                    locationmode='country names', color="Total_Recovered",
                    hover_name="Country", 
                    color_continuous_scale="aggrnyl",
                    animation_frame="Day_Date",
                    title='Countries with Recovered Cases')

fig.update_layout(margin=dict(t=80,l=0,r=0,b=0))
pio.write_image(fig, "images/image14.png") 
fig.show()

35 - Save df_corona_country as csv file

In [334]:
df_corona_country.to_csv("files/df_corona_country.csv",encoding='utf-8')

36 - Show countries confirmed, death and recovered number of each continent via Florish

In [335]:
from IPython.display import Javascript
from IPython.core.display import HTML

# HTML String
html = """
   <div class="flourish-embed flourish-hierarchy" data-src="visualisation/1794815" 
               data-url="https://flo.uri.sh/visualisation/1794815/embed">
       <script src="https://public.flourish.studio/resources/embed.js">
       </script>
   </div>
"""

# Write HTML String to file.html
with open("corona_confirmed.html", "w") as file:
    file.write(html)