## Note

#### My new kernel about Coronavirus can be seen below.

[Analysing Coronavirus on Interactive Dashboard](https://www.kaggle.com/fatihbilgin/analysing-coronavirus-on-interactive-dashboard)

# Introduction

The novel coronavirus (provisionally named 2019-nCoV) is a contagious virus that causes respiratory infection. It has been identified as the causative agent of the ongoing 2019–20 Wuhan coronavirus outbreak. 

As many early cases were linked to a large seafood and animal market, the virus is thought to have a zoonotic origin, but this has not been confirmed. Comparisons of the genetic sequences of this virus and other virus samples have shown similarities to SARS-CoV (79.5%) and bat coronaviruses (96%), which makes an ultimate origin in bats likely.

The first known human infection occurred in December 8, 2019. An outbreak of 2019-nCoV was first detected in Wuhan, China, in mid-December 2019.The virus subsequently spread to all other provinces of China and to more than twenty other countries in Asia, Europe, North America, and Oceania. Human-to-human spread of the virus has been confirmed in China, Germany, Thailand, Taiwan, Japan, and the United States. 

As of 1 February 2020, there were 12,024 confirmed cases of infection, of which 11,860 were within mainland China. Cases outside China, to date, were people who have either travelled from Wuhan, or were in direct contact with someone who travelled from the area. The number of deaths was 259 as of 1 February 2020.

Source: https://en.wikipedia.org/wiki/Novel_coronavirus_(2019-nCoV)

<img src="https://i.ibb.co/txCZFvr/3-D-medical-animation-coronavirus-structure.jpg" width="800">
* **Source: Scientific Animations (CC BY-SA 4.0)**<br>
i.e. [Scientific Animations](http://www.scientificanimations.com/wiki-images/)

In [None]:
import numpy as np 
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly as py
import plotly.express as px
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True) 

import warnings
warnings.filterwarnings('ignore')

In [None]:
df_data = pd.read_csv("/kaggle/input/2019-coronavirus-dataset-01212020-01262020/2019_nCoV_20200121_20200131.csv", parse_dates=["Last Update"])
df_data["UpdateDate"] = df_data["Last Update"].dt.date.astype(str)
df_data2 = pd.read_csv("../input/coronavirus-disease-covid19-dataset/2020_nCoV_data.csv", parse_dates=["Last Update", "Date"])
df_data2["Country"] = df_data2["Country"].str.replace("Mainland China", "China")
df_data.head()

Hubei is a province in China which capital is Wuhan. This dataset is contains data until the January 31 currently. And the first row above shows Wuhan's data on the last day of January.

In [None]:
df_data.describe().T

The data on max columns shows Wuhan's status on January 31.Expect "Suspected". it is belong to Hong Kong, it can seen below.

In [None]:
df_data[df_data["Suspected"]>=1].sort_values("Suspected", ascending=False).head()

In [None]:
df_countries = df_data2.groupby(['Country', 'Date']).sum().reset_index().sort_values('Date', ascending=False)
df_countries = df_countries.drop_duplicates(subset = ['Country'])
df_countriesConf = df_countries[df_countries["Confirmed"]>0]

In [None]:
df_countriesConf

In [None]:
data = [ dict(
        type = 'choropleth',
        locations = df_countriesConf['Country'],
        locationmode = 'country names',
        z = df_countriesConf['Confirmed'],
        colorscale=
            [[0.0, "rgb(251, 237, 235)"],
            [0.09, "rgb(245, 211, 206)"],
            [0.12, "rgb(239, 179, 171)"],
            [0.15, "rgb(236, 148, 136)"],
            [0.22, "rgb(239, 117, 100)"],
            [0.35, "rgb(235, 90, 70)"],
            [0.45, "rgb(207, 81, 61)"],
            [0.65, "rgb(176, 70, 50)"],
            [0.85, "rgb(147, 59, 39)"],
            [1.00, "rgb(110, 47, 26)"]],
        autocolorscale = False,
        reversescale = False,
        marker = dict(
            line = dict (
                color = 'rgb(180,180,180)',
                width = 0.5
            ) 
        ),
        colorbar = dict(
            autotick = False,
            tickprefix = '',
            title = 'Participant'),
) 
       ]

layout = dict(
    title = "Last Confirmed Cases (Till February 17, 2020)",
    geo = dict(
        showframe = False,
        showcoastlines = True,
        projection = dict(type = 'Mercator'),
        width=500,height=400)
)

w_map = dict( data=data, layout=layout)
iplot( w_map, validate=False)

By the end of February 17, 2020, there was above 72.36k confirmed Corona cases in the China. If you look at the rest of the world, the virus appeared on all continents except South America.

In [None]:
df_countrybydate = df_data.groupby(['Country/Region', 'Last Update', 'UpdateDate']).sum().reset_index().sort_values('Last Update', ascending=False)
df_countrybydate = df_countrybydate.groupby(['Country/Region', 'UpdateDate']).max().reset_index().sort_values('Last Update')
df_countrybydate["Size"] = np.where(df_countrybydate['Country/Region']=='Mainland China', df_countrybydate['Confirmed'], df_countrybydate['Confirmed']*200)

I used 200 times "confirmed" for Size because China was suppressing other countries on the bubble chart.

In [None]:
df = px.data.gapminder()
fig = px.scatter_geo(df_countrybydate, locations="Country/Region", locationmode = "country names",
                     hover_name="Country/Region", size="Size", color="Confirmed",
                     animation_frame="UpdateDate", 
                     projection="natural earth",
                     title="Progression of Coronavirus in Confirmed Cases in the January 2020",template="none")
fig.show()

Interactive map above shows spread of the Virus day by day on the last 11 days of January 2020. You can click play button to see. 

In [None]:
df_provincebydate = df_data2.groupby(['Province/State', 'Date']).max().reset_index().sort_values('Date', ascending=False)
df_CHProvinces = df_provincebydate[df_provincebydate['Country']=="China"]
df_CHProvinces = df_CHProvinces.drop_duplicates(subset = ['Province/State']).sort_values("Province/State")
df_CHProvinces = df_CHProvinces[~df_CHProvinces['Province/State'].isin(['Macau', 'Taiwan'])]
#I excluded 'Macau', 'Taiwan' from China because they have own countries in the dataset.
df_CHProvinces = df_CHProvinces[df_CHProvinces["Confirmed"]>0]

df_CHRecDead = df_CHProvinces.loc[:,["Province/State", "Recovered", "Deaths"]]
df_CHRecDeadHb = df_CHRecDead[df_CHRecDead["Province/State"]=="Hubei"]

In [None]:
fig = go.Figure()

fig.add_trace(go.Bar(
                x=df_CHProvinces["Province/State"],
                y=df_CHProvinces["Confirmed"],
                marker_color='darkorange',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=2, 
                opacity=0.7)
             )

fig.update_layout(
    title_text='Confirmed Cases on Provinces of China (Till February 17, 2020)',
    height=700, width=800, xaxis_title='Province/State', yaxis_title='Confirmed')

fig.show()

Most cases occurred in Hubei which its capital is Wuhan. This is very normal considering that the place where the disease outbreak in Wuhan. Fewer cases than Hubei were seen in other provinces in China. Nevertheless, compared to other countries, the cases is very high in these provinces too. 

If we zoom in on the chart, China Provinces appear as follows.

<img src="https://i.ibb.co/9NHyL0j/CHProvinces.jpg" width="600">

In [None]:
colors = ['mediumturquoise', 'orangered']
columns = list(df_CHRecDeadHb.iloc[:,1:3])
values = df_CHRecDeadHb.iloc[:,1:3].values.tolist()[0]

fig = go.Figure(data=[go.Pie(labels=columns, 
                             values=values , hole=.3)]
               )

fig.update_traces(hoverinfo='label+percent+value', textinfo='label+percent', textfont_size=18,
                  marker=dict(colors=colors, line=dict(color='#000000', width=2))
                 )

fig.update_layout(
    title_text="Death/Recovered Rate in Hubei (Wuhan) (Till February 17, 2020)", height=500, width=700, showlegend=False
)

fig.show()

In the Wuhan, the number of dead people is about 22.7% of healed people.

In [None]:
df_CHRecDeadNotHb = df_CHRecDead[((df_CHRecDead["Province/State"]!="Hubei") & ((df_CHRecDead["Recovered"]>=1) | (df_CHRecDead["Deaths"]>=1)))].sort_values("Recovered", ascending=False)

In [None]:
fig = go.Figure()

fig.add_trace(go.Bar(
                x=df_CHRecDeadNotHb["Province/State"],
                y=df_CHRecDeadNotHb["Recovered"],
                marker_color='mediumturquoise',
                name="Recovered")
             )

fig.add_trace(go.Bar(
                x=df_CHRecDeadNotHb["Province/State"],
                y=df_CHRecDeadNotHb["Deaths"],
                marker_color='red',
                name="Deaths")
             )

fig.update_traces(marker_line_color='rgb(8,48,107)',
                  marker_line_width=2, opacity=0.7)

fig.update_layout(
    title_text='Death/Recovered Rate in the Other China Provinces Except Wuhan',
    height=600, width=800, xaxis_title='Province/State'
)

fig.show()

In the other China Provinces, the number of healed people is more than dead people generally.

In [None]:
df_CHProvincesByDateHB = df_provincebydate[df_provincebydate['Province/State']=="Hubei"]

fig = go.Figure()

fig.add_trace(go.Scatter(x=df_CHProvincesByDateHB["Date"], y=df_CHProvincesByDateHB["Confirmed"],
                         line=dict(color='indigo', width=3), name="Confirmed")
             )

fig.update_layout(title='Confirmed Cases By Date in Hubei',
                   xaxis_title='Date',
                   yaxis_title='Count',
                   width=740, height=350)

fig.show()


fig = go.Figure()

fig.add_trace(go.Scatter(x=df_CHProvincesByDateHB["Date"], y=df_CHProvincesByDateHB["Deaths"],
                         line=dict(color='crimson', width=3), name="Deaths"))


fig.add_trace(go.Scatter(x=df_CHProvincesByDateHB["Date"], y=df_CHProvincesByDateHB["Recovered"],
                         line=dict(color='darkcyan', width=3), name="Recovered"))

fig.update_layout(title='Recovery and Death Cases By Date in Hubei',
                   xaxis_title='Date',
                   yaxis_title='Count',
                   width=800, height=350)

fig.show()

The number of confirmed cases, dead and recovered people increases over time. I chose analysis on Hubei (Wuhan) because of its data steady. Datasets after January 31, there are some missing data. That interrupts the time series.

In [None]:
df_countriesNotCh = df_countries[~df_countries['Country'].isin(['China', 'Others'])]
df_countriesNotChConf = df_countriesNotCh[df_countriesNotCh["Confirmed"]>0]
df_countriesNotChConf = df_countriesNotChConf.sort_values("Confirmed")

In [None]:
fig = go.Figure()
fig.add_trace(go.Bar(x=df_countriesNotChConf["Confirmed"],
                y=df_countriesNotChConf['Country'],
                marker_color='powderblue',
                marker_line_color='rgb(8,48,107)',
                marker_line_width=1.5, 
                opacity=0.6,
                orientation='h'))

fig.update_layout(
    title_text='Last Confirmed Cases Outside of China (Till February 17, 2020)',
    height=700, width=800,
    showlegend=False, xaxis_title='Confirmed Cases') 

fig.show()

The most cases have seen in the far east. Outside of the east and far east, Germany is the country most cases have occurred. I excluded 'Others' because of it's ambiguous.

In [None]:
df_countriesNotChRec = df_countriesNotCh[df_countriesNotCh["Recovered"]>0]
df_countriesNotChRec = df_countriesNotChRec.sort_values("Recovered")

df_countriesNotChDeath = df_countriesNotCh[df_countriesNotCh["Deaths"]>0]
df_countriesNotChDeath = df_countriesNotChDeath.sort_values("Deaths")

In [None]:
fig = go.Figure()

fig.add_trace(go.Bar(x=df_countriesNotChRec["Recovered"],
                y=df_countriesNotChRec['Country'],
                marker_color='mediumseagreen',
                name="Recovered",
                orientation='h'))

fig.add_trace(go.Bar(x=df_countriesNotChRec["Deaths"],
                y=df_countriesNotChRec['Country'],
                marker_color='maroon',
                name="Deaths",
                orientation='h'))

fig.update_traces(marker_line_color='rgb(8,48,107)',
                  marker_line_width=2, opacity=0.7)


fig.update_layout(
    title_text='Death/Recovered Cases Outside of China (Till February 17, 2020)',
    height=700, width=800,
    xaxis_title='Death/Recovered Cases') 

fig.show()

There are several countries where death cases occur outside China. These ones are Taiwan, Hong Kong, Japan, France and Philippines.

### To be continued... If you like, Please upvote.