### Data Science Blog Post 
*This Project follows the CRISP-DM process (Cross Industry Process for Data Mining) outlined for questions through communication.*

CRISP-DM

1. Business Understanding
2. Data Understanding
3. Data Preparation
4. Data Modeling
5. Evaluate the Results
6. Deploy

### Business Understanding
*This means understanding the problem and questions you are interested in tackling in the context of whatever domain you're working in.*
...
The 3 questions to answer:
1. The evolution of confirmed, recovered, and death cases overtime
2. The heatmap of cases overtime
3. Top 10 countries with confirmed, recovered, and death cases
4. The age distrition of cases
5. Cases evolution of Vietnam

In [None]:
# Import common packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib import pyplot as plt
import plotly.graph_objects as go

### Data Understanding
//at this step, you need to move the questions from Business Understanding to data. You might already have data that could be used to answer the questions, or you might have to collect data to get at your questions of interest.

In [None]:
# Loading data
covid19 = pd.read_csv(r"C:\Users\DucDQ1\Desktop\Data-Science-Blog-Post-master\Data-Science-Blog-Post-master\datasources\novel-corona-virus-2019-dataset\covid_19_data.csv")

PATH = "C:\\Users\\DucDQ1\\Desktop\\Data-Science-Blog-Post-master\\Data-Science-Blog-Post-master\\datasources\\novel-corona-virus-2019-dataset\\"

data = pd.read_csv(PATH + 'covid_19_data.csv', date_parser=['Last Update'])
data.head()

df = pd.read_csv(PATH + 'covid_19_data.csv', date_parser=['Last Update'])
df.rename(columns={'ObservationDate':'Date', 'Country/Region':'Country'}, inplace=True)

df_confirmed = pd.read_csv(PATH + 'time_series_covid_19_confirmed.csv')

df_recovered = pd.read_csv(PATH + 'time_series_covid_19_recovered.csv')

df_deaths = pd.read_csv(PATH + 'time_series_covid_19_deaths.csv')

df_confirmed.rename(columns={'Country/Region':'Country'}, inplace=True)
df_recovered.rename(columns={'Country/Region':'Country'}, inplace=True)
df_deaths.rename(columns={'Country/Region':'Country'}, inplace=True)


In [None]:
# Checking if there is any null values
covid19.isnull().any()

In [None]:
# Seeing the datset with null rows
covid19[covid19.isnull().any(axis=1)]

In [None]:
df_confirmed.head()

In [None]:
# Earliest cases with the current dataset
df.head()

In [None]:
# Latest cases with the current dataset
df.tail()

We can now look at the third step of the process:
### Data Preparation

The evolution of confirmed, recovered, and death cases overtime

In [None]:
confirmed = df.groupby('Date').sum()['Confirmed'].reset_index()
deaths = df.groupby('Date').sum()['Deaths'].reset_index()
recovered = df.groupby('Date').sum()['Recovered'].reset_index()

fig = go.Figure()
fig.add_trace(go.Bar(x=confirmed['Date'],
                y=confirmed['Confirmed'],
                name='Confirmed',
                marker_color='blue'
                ))
fig.add_trace(go.Bar(x=deaths['Date'],
                y=deaths['Deaths'],
                name='Deaths',
                marker_color='Red'
                ))
fig.add_trace(go.Bar(x=recovered['Date'],
                y=recovered['Recovered'],
                name='Recovered',
                marker_color='Green'
                ))

fig.update_layout(
    title='Worldwide Corona Virus Cases - Confirmed, Deaths, Recovered (Bar Chart)',
    xaxis_tickfont_size=14,
    yaxis=dict(
        title='Number of Cases',
        titlefont_size=16,
        tickfont_size=14,
    ),
    legend=dict(
        x=0,
        y=1.0,
        bgcolor='rgba(255, 255, 255, 0)',
        bordercolor='rgba(255, 255, 255, 0)'
    ),
    barmode='group',
    bargap=0.15, # gap between bars of adjacent location coordinates.
    bargroupgap=0.1 # gap between bars of the same location coordinate.
)
fig.show()

 The heatmap of cases overtime

In [None]:
df_confirmed = df_confirmed[["Province/State","Lat","Long","Country"]]
df_temp = df.copy()
df_temp['Country'].replace({'Mainland China': 'China'}, inplace=True)
df_latlong = pd.merge(df_temp, df_confirmed, on=["Country", "Province/State"])

In [None]:
fig = px.density_mapbox(df_latlong, 
                        lat="Lat", 
                        lon="Long", 
                        hover_name="Province/State", 
                        hover_data=["Confirmed","Deaths","Recovered"], 
                        animation_frame="Date",
                        color_continuous_scale="Portland",
                        radius=7, 
                        zoom=0,height=700)
fig.update_layout(title='Worldwide Corona Virus Cases Time Lapse - Confirmed, Deaths, Recovered',
                  font=dict(family="Courier New, monospace",
                            size=18,
                            color="#7f7f7f")
                 )
fig.update_layout(mapbox_style="open-street-map", mapbox_center_lon=0)
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})


fig.show()

Top 10 countries with confirmed, recovered, and death cases

In [None]:
country=covid19.groupby(['Country/Region'])[['Confirmed','Recovered','Deaths']].sum()
top_10=country.nlargest(10,['Confirmed'])
plt.figure(figsize=(20,16))
plt.subplot(311)
plt.title('Top 10 Countries with confirmed, recovered and death cases',fontsize=20)
plt.barh(top_10.index,top_10['Confirmed'],color='blue')
plt.yticks(fontsize=20)
plt.xlabel('Confirmed',fontsize=20)
plt.subplot(312)
plt.barh(top_10.index,top_10['Deaths'],color='red')
plt.yticks(fontsize=20)
plt.xlabel('Deaths',fontsize=20)
plt.subplot(313)
plt.barh(top_10.index,top_10['Recovered'],color='green')
plt.yticks(fontsize=20)
plt.xlabel('Recovered',fontsize=20)

The age distrition of cases

In [None]:
plt.figure(figsize=(15, 6))
sns.distplot(data['age'], rug=False, bins=50, color='g')
plt.title('Age Distribution')
plt.xlabel("Age");
plt.show()

Cases evolution of Vietnam

In [None]:
# By Country - Vietnam
df.query('Country=="Vietnam"').groupby("Last Update")[['Confirmed', 'Deaths', 'Recovered']].sum().reset_index()

In [None]:
italy = df[df['Country/Region'] == 'Vietnam']
new_italy_ = italy.groupby('Last Update')[main_cols].sum()

plt.figure(figsize=(15, 8))
plt.plot(new_italy_['Confirmed'], 'y--', lw=4, label='Italy Confirmed')
plt.plot(new_italy_['Recovered'], 'g--', lw=4, label='Italy Recovered')
plt.plot(new_italy_['Deaths'], 'r--', lw=4, label='Italy Deaths')

plt.xlabel('Date', fontsize=20)
plt.ylabel('No. of Italy Cases', fontsize=20)
plt.title('Total Italy Cases Between 22/1 and 18/3', fontsize=20)

x = new_df.index
labels = x.values
plt.xticks(rotation='45')

plt.legend(loc='best')
plt.show()

### Data Modeling
When looking at the questions, there is no need to do any predictive modeling. I can use only descriptive and a little inferential statistics to retrieve teh results. Therefore, the step **Data Modeling** in CRISP-DM is not necessary to answer the questions.

### Evaluate the Results

### Deploy