# COVID-19 in Korea, Republic of Korea (South Korea)

<a class="anchor" id="0.1"></a>
## Table of Contents

1. [Import libraries](#1)
1. [Download data](#2)
1. [Selection data with holidays](#3)
1. [EDA](#4)

In [1]:
country_main = 'Korea, Republic of'

## 1. Import libraries<a class="anchor" id="1"></a>

[Back to Table of Contents](#0.1)

Import libraries

In [2]:
!pip install pycountry

Collecting pycountry
  Downloading pycountry-20.7.3.tar.gz (10.1 MB)
[K     |████████████████████████████████| 10.1 MB 5.0 MB/s 
[?25hBuilding wheels for collected packages: pycountry
  Building wheel for pycountry (setup.py) ... [?25l[?25hdone
  Created wheel for pycountry: filename=pycountry-20.7.3-py2.py3-none-any.whl size=10746883 sha256=331dde3540bc74ce94166f87a2645e4593de7fb35a86e06d12f64dbbd8101154
  Stored in directory: /root/.cache/pip/wheels/57/e8/3f/120ccc1ff7541c108bc5d656e2a14c39da0d824653b62284c6
Successfully built pycountry
Installing collected packages: pycountry
Successfully installed pycountry-20.7.3


In [3]:
import os
import pandas as pd
import numpy as np
import requests
import seaborn as sns
from matplotlib import pyplot as plt
import plotly.express as px
import plotly.graph_objects as go

from datetime import date, timedelta, datetime
from fbprophet import Prophet
from fbprophet.make_holidays import make_holidays_df
from fbprophet.diagnostics import cross_validation, performance_metrics
from fbprophet.plot import plot_cross_validation_metric
import holidays
from collections import Counter
import pycountry

import warnings
warnings.simplefilter('ignore')

## 2. Download data<a class="anchor" id="2"></a>

[Back to Table of Contents](#0.1)

[질병관리청 코로나19 데이터](http://ncov.mohw.go.kr/)

In [4]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [5]:
df = pd.read_excel("/content/drive/MyDrive/TR_코로나 데이터분석/코로나바이러스감염증-19_확진환자_발생현황_210924.xlsx")
df.head()

Unnamed: 0.1,Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4
0,,,,,
1,,,,,
2,,,,,
3,일자,계(명),국내발생(명),해외유입(명),사망(명)
4,누적(명),295132,280857,14275,2434


In [6]:
df.columns=['Date', 'Confirmed', 'domestic','foreign', 'dead']
df = df.iloc[5:]
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,Date,Confirmed,domestic,foreign,dead
0,2020-01-20 00:00:00,1,-,1,-
1,2020-01-21 00:00:00,0,-,-,-
2,2020-01-22 00:00:00,0,-,-,-
3,2020-01-23 00:00:00,0,-,-,-
4,2020-01-24 00:00:00,1,-,1,-


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 614 entries, 0 to 613
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   Date       614 non-null    object
 1   Confirmed  614 non-null    object
 2   domestic   614 non-null    object
 3   foreign    614 non-null    object
 4   dead       614 non-null    object
dtypes: object(5)
memory usage: 24.1+ KB


In [8]:
df['Date'] = pd.to_datetime(df['Date'])
print('data period: {} ~ {} '.format(df['Date'].min(), df['Date'].max()))

data period: 2020-01-20 00:00:00 ~ 2021-09-24 00:00:00 


In [9]:
import itertools

df['cum_Confirmed'] = list(itertools.accumulate(df['Confirmed']))
df.head()

Unnamed: 0,Date,Confirmed,domestic,foreign,dead,cum_Confirmed
0,2020-01-20,1,-,1,-,1
1,2020-01-21,0,-,-,-,1
2,2020-01-22,0,-,-,-,1
3,2020-01-23,0,-,-,-,1
4,2020-01-24,1,-,1,-,2


## 3. Selection data with holidays<a class="anchor" id="3"></a>

## 3.1. Holidays with a shift<a class="anchor" id="3.1"></a>

[Back to Table of Contents](#0.1)

[COVID-19: Holidays of countries](https://www.kaggle.com/vbmokin/covid19-holidays-of-countries)

In [10]:
holidays_df = pd.read_csv('/content/drive/MyDrive/TR_코로나 데이터분석/holidays_df_of_70_countries_for_covid_19.csv')
holidays_df[holidays_df['country'] == country_main].head()

Unnamed: 0,ds_holidays,holiday,ds,country,code,country_official_name,lower_window,upper_window,prior_scale,source
432,2020-01-24,The day preceding of Lunar New Year's Day,2020-01-31,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays
433,2020-01-25,Lunar New Year's Day,2020-02-01,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays
434,2020-01-26,The second day of Lunar New Year's Day,2020-02-02,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays
435,2020-01-27,Alternative holiday of Lunar New Year's Day,2020-02-03,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays
436,2020-03-01,Independence Movement Day,2020-03-08,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays


In [11]:
holidays_df_code_countries = holidays_df['code'].unique()
holidays_df_code_countries

array(['AR', 'AT', 'AU', 'BD', 'BE', 'BG', 'BI', 'BR', 'BY', 'CA', 'CH',
       'CL', 'CN', 'CO', 'CZ', 'DE', 'DK', 'DO', 'EE', 'EG', 'ES', 'FI',
       'FR', 'GB', 'GR', 'HN', 'HR', 'HU', 'ID', 'IE', 'IL', 'IN', 'IS',
       'IT', 'JP', 'KE', 'KR', 'LT', 'LU', 'LV', 'MA', 'MX', 'MY', 'NG',
       'NI', 'NL', 'NO', 'NZ', 'PE', 'PH', 'PK', 'PL', 'PT', 'PY', 'RO',
       'RS', 'RU', 'SE', 'SG', 'SI', 'SK', 'TH', 'TR', 'UA', 'US', 'VN',
       'ZA', 'GE', 'AL', 'MD'], dtype=object)

In [12]:
# notebook: https://www.kaggle.com/vbmokin/covid-19-prophet-forecast-next-2-weeks
def dict_code_countries_with_holidays(list_name_countries: list,
                                      holidays_df: pd.DataFrame()):
        
    """
    Defines a dictionary with the names of user countries and their two-letter codes (ISO 3166) 
    in the dataset "COVID-19: Holidays of countries" 
    
    Returns: 
    - countries: dictionary with the names of user countries and their two-letter codes (ISO 3166) 
    - holidays_df_identificated: DataFrame with holidays data for countries from dictionary 'countries'
    
    Args: 
    - list_name_countries: list of the name of countries (name or common_name or official_name or alha2 or alpha3 codes from ISO 3166)
    - holidays_df: DataFrame with holidays "COVID-19: Holidays of countries"
    """
    
    import pycountry
    
    # Identification of countries for which there are names according to ISO
    countries = {}
    dataset_all_countries = list(holidays_df['code'].unique())
    list_name_countries_identificated = []
    list_name_countries_not_identificated = []
    for country in list_name_countries:
        try: 
            country_id = pycountry.countries.get(alpha_2=country)
            if country_id.alpha_2 in dataset_all_countries:
                countries[country] = country_id.alpha_2
        except AttributeError:
            try: 
                country_id = pycountry.countries.get(name=country)
                if country_id.alpha_2 in dataset_all_countries:
                    countries[country] = country_id.alpha_2
            except AttributeError:
                try: 
                    country_id = pycountry.countries.get(official_name=country)
                    if country_id.alpha_2 in dataset_all_countries:
                        countries[country] = country_id.alpha_2
                except AttributeError:
                    try: 
                        country_id = pycountry.countries.get(common_name=country)
                        if country_id.alpha_2 in dataset_all_countries:
                            countries[country] = country_id.alpha_2
                    except AttributeError:
                        try: 
                            country_id = pycountry.countries.get(alpha_3=country)
                            if country_id.alpha_2 in dataset_all_countries:
                                countries[country] = country_id.alpha_2
                        except AttributeError:
                            list_name_countries_not_identificated.append(country)
    holidays_df_identificated = holidays_df[holidays_df['code'].isin(countries.values())]
    
    print(f'Thus, the dataset has holidays in {len(countries)} countries from your list with {len(list_name_countries)} countries')
  
    return countries, holidays_df_identificated.reset_index(drop=True)

In [13]:
countries_dict, holidays_df = dict_code_countries_with_holidays([country_main], holidays_df)
countries_dict

Thus, the dataset has holidays in 1 countries from your list with 1 countries


{'Korea, Republic of': 'KR'}

In [14]:
holidays_df.head()

Unnamed: 0,ds_holidays,holiday,ds,country,code,country_official_name,lower_window,upper_window,prior_scale,source
0,2020-01-24,The day preceding of Lunar New Year's Day,2020-01-31,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays
1,2020-01-25,Lunar New Year's Day,2020-02-01,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays
2,2020-01-26,The second day of Lunar New Year's Day,2020-02-02,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays
3,2020-01-27,Alternative holiday of Lunar New Year's Day,2020-02-03,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays
4,2020-03-01,Independence Movement Day,2020-03-08,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays


## 3.2. Additional dates of anomalies as holidays<a class="anchor" id="3.2"></a>

[Back to Table of Contents](#0.1)

**[COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data)**

In [15]:
data = pd.read_csv(f"https://storage.googleapis.com/covid19-open-data/v2/KR/main.csv")

In [16]:
data.head()

Unnamed: 0,key,date,place_id,wikidata,datacommons,country_code,country_name,subregion1_code,subregion1_name,subregion2_code,subregion2_name,locality_code,locality_name,3166-1-alpha-2,3166-1-alpha-3,aggregation_level,new_confirmed,new_deceased,new_recovered,new_tested,total_confirmed,total_deceased,total_recovered,total_tested,new_hospitalized,total_hospitalized,current_hospitalized,new_intensive_care,total_intensive_care,current_intensive_care,new_ventilator,total_ventilator,current_ventilator,population,population_male,population_female,rural_population,urban_population,largest_city_population,clustered_population,...,hospital_beds,nurses,physicians,health_expenditure,out_of_pocket_health_expenditure,mobility_retail_and_recreation,mobility_grocery_and_pharmacy,mobility_parks,mobility_transit_stations,mobility_workplaces,mobility_residential,school_closing,workplace_closing,cancel_public_events,restrictions_on_gatherings,public_transport_closing,stay_at_home_requirements,restrictions_on_internal_movement,international_travel_controls,income_support,debt_relief,fiscal_measures,international_support,public_information_campaigns,testing_policy,contact_tracing,emergency_investment_in_healthcare,investment_in_vaccines,facial_coverings,vaccination_policy,stringency_index,noaa_station,noaa_distance,average_temperature,minimum_temperature,maximum_temperature,rainfall,snowfall,dew_point,relative_humidity
0,KR,2020-01-01,ChIJm7oRy-tVZDURS9uIugCbJJE,Q884,country/KOR,KR,South Korea,,,,,,,KR,KOR,0,0.0,0.0,,,0.0,0.0,,,,,,,,,,,,51269183,25665854,25603328,9602379,42106719,9962393,25963874,...,11.5,7.3009,2.3608,2283.074707,768.689453,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,47135100000.0,24.18517,-0.488889,-6.506173,3.87037,0.0,,-5.422222,69.894145
1,KR,2020-01-02,ChIJm7oRy-tVZDURS9uIugCbJJE,Q884,country/KOR,KR,South Korea,,,,,,,KR,KOR,0,0.0,0.0,,,0.0,0.0,,,,,,,,,,,,51269183,25665854,25603328,9602379,42106719,9962393,25963874,...,11.5,7.3009,2.3608,2283.074707,768.689453,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,47135100000.0,24.18517,1.861111,-2.894444,5.388889,0.0,,-3.4,68.549224
2,KR,2020-01-03,ChIJm7oRy-tVZDURS9uIugCbJJE,Q884,country/KOR,KR,South Korea,,,,,,,KR,KOR,0,0.0,0.0,,,0.0,0.0,,,,,,,,,,,,51269183,25665854,25603328,9602379,42106719,9962393,25963874,...,11.5,7.3009,2.3608,2283.074707,768.689453,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,47135100000.0,24.18517,1.072222,-3.855556,6.994444,0.0,,-5.4,62.7306
3,KR,2020-01-04,ChIJm7oRy-tVZDURS9uIugCbJJE,Q884,country/KOR,KR,South Korea,,,,,,,KR,KOR,0,0.0,0.0,,,0.0,0.0,,,,,,,,,,,,51269183,25665854,25603328,9602379,42106719,9962393,25963874,...,11.5,7.3009,2.3608,2283.074707,768.689453,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,47135100000.0,24.18517,0.320988,-5.888889,7.080247,0.0,,-5.216049,66.906845
4,KR,2020-01-05,ChIJm7oRy-tVZDURS9uIugCbJJE,Q884,country/KOR,KR,South Korea,,,,,,,KR,KOR,0,0.0,0.0,,,0.0,0.0,,,,,,,,,,,,51269183,25665854,25603328,9602379,42106719,9962393,25963874,...,11.5,7.3009,2.3608,2283.074707,768.689453,,,,,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,47135100000.0,24.18517,1.209877,-5.123457,8.271605,0.0,,-6.728395,56.087381


In [17]:
data.shape

(635, 111)

In [19]:
def aux_holidays_df_generator(holidays_df, dates_list, name, source):
    last_row = len(holidays_df)
    holidays_dates = holidays_df['ds_holidays'].tolist()
    common_dates = list(set(holidays_dates).intersection(set(dates_list)))
    dates_list = list(set(dates_list).difference(set(common_dates)))
    for i in range(len(dates_list)):
        holidays_df = holidays_df.append([holidays_df.loc[last_row-1,:]], ignore_index=True)
        ds_dt = datetime.strptime(dates_list[i], '%Y-%m-%d')
        holidays_df.loc[last_row+i, 'ds_holidays'] = dates_list[i]
        holidays_df.loc[last_row+i, 'holiday'] = name
        holidays_df.loc[last_row+i, 'ds'] = (ds_dt + timedelta(days=7)).strftime('%Y-%m-%d')
        holidays_df.loc[last_row+i, 'source'] = source
        
    return holidays_df.sort_values(by=['ds_holidays'])

In [20]:
def plot_with_anomalies(df, cols_y_list, cols_y_list_name, dates_x, col_anomalies, val_anomal, log_y=False):

    fig = px.line(df, x=dates_x, y=cols_y_list[0], title=cols_y_list_name[cols_y_list[0]], log_y=log_y, template='gridon',width=1000, height=600)
    for i in range(len(cols_y_list)-1):
        fig.add_trace(go.Scatter(x=df[dates_x], y=df[cols_y_list[i+1]], mode='lines', name=cols_y_list_name[cols_y_list[i+1]]))
    
    anomal_dates_list = df[df[col_anomalies] == val_anomal][dates_x].tolist()
    y_max = df[cols_y_list[0]].max()
    y_min = min(df[cols_y_list[0]].min(),0)
    for i in range(len(anomal_dates_list)):
        anomal_date = anomal_dates_list[i]
        fig.add_shape(dict(type="line", x0=anomal_date, y0=y_min, x1=anomal_date, y1=y_max, line=dict(color="red", width=1)))
    fig.show()

### 3.2.1. The weakening of quarantine<a class="anchor" id="3.2.1"></a>

[Back to Table of Contents](#0.1)

* [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data)
* [Oxford COVID-19 government response tracker](https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker)

엄격성 지수(stringency_index): 학교·직장 폐쇄, 공공행사 취소, 여행 금지 등 이동 및 경제활동 제약을 나타내는 지수

In [21]:
data['stringency_index_jump'] = 0
for i in range(len(data)-1):
    if (data.loc[i+1,'stringency_index'] is not None) and (data.loc[i,'stringency_index'] is not None) and (data.loc[i+1,'stringency_index'] < data.loc[i,'stringency_index']):
        data.loc[i+1, 'stringency_index_jump'] = 1
source_gov = 'https://www.bsg.ox.ac.uk/research/research-projects/oxford-covid-19-government-response-tracker'
dates_gov_list = data[data['stringency_index_jump'] == 1]['date'].tolist()
holidays_df = aux_holidays_df_generator(holidays_df, dates_gov_list, 'the weakening of quarantine', source_gov)
plot_with_anomalies(data, ["stringency_index"], {"stringency_index" : "Stringency index and dates of the weakening of quarantine in " + country_main}, 'date', 'stringency_index_jump', 1)

In [22]:
df.head()

Unnamed: 0,Date,Confirmed,domestic,foreign,dead,cum_Confirmed
0,2020-01-20,1,-,1,-,1
1,2020-01-21,0,-,-,-,1
2,2020-01-22,0,-,-,-,1
3,2020-01-23,0,-,-,-,1
4,2020-01-24,1,-,1,-,2


In [23]:
df['Date'][0]

Timestamp('2020-01-20 00:00:00')

In [24]:
data['date'] = pd.to_datetime(data['date'])

In [25]:
df3 = data[(data['date'] >= df['Date'][0]) & (data['date'] <= df['Date'][613])]
df3.reset_index(drop=True, inplace=True)

df.reset_index(drop=True, inplace=True)

In [41]:
tmp = pd.concat([df3, df[['Confirmed', 'cum_Confirmed']]], axis=1)

In [48]:
tmp['log_cum_Confirmed'] = np.log(tmp['cum_Confirmed'])
tmp['sqrt_Confirmed'] = np.sqrt(tmp['Confirmed'])

TypeError: ignored

In [58]:
tmp['sqrt_Confirmed'] = tmp['Confirmed'].apply(lambda x:  np.sqrt(float(x)))

In [59]:
tmp['sqrt_Confirmed'] = tmp['Confirmed']

In [60]:
plot_with_anomalies(tmp, ["stringency_index", 'sqrt_Confirmed'], {"stringency_index" : "Stringency index and dates of the weakening of quarantine in " + country_main, "sqrt_Confirmed":"sqrt_Confirmed"}, 'date', 'stringency_index_jump', 1)

### 3.2.2. Very comfortable conditions for rest (not yet taken into account - needs clarification) <a class="anchor" id="3.2.2"></a>

[Back to Table of Contents](#0.1)

#### Thanks to:
* [COVID-19 Open Data](https://github.com/GoogleCloudPlatform/covid-19-open-data)
* [NOAA](https://www.ncei.noaa.gov/)

In [36]:
data.columns

Index(['key', 'date', 'place_id', 'wikidata', 'datacommons', 'country_code',
       'country_name', 'subregion1_code', 'subregion1_name', 'subregion2_code',
       ...
       'noaa_distance', 'average_temperature', 'minimum_temperature',
       'maximum_temperature', 'rainfall', 'snowfall', 'dew_point',
       'relative_humidity', 'stringency_index_jump', 'rest_comfort'],
      dtype='object', length=113)

In [63]:
# Thanks to https://www.kaggle.com/vbmokin/covid-19-in-ukraine-prophet-holidays-tuning
data['rest_comfort'] = 0
data.loc[(data['average_temperature'] > data['average_temperature'].quantile(.95)) & (data['rainfall'] == 0), 'rest_comfort'] = 1
dates_weather_list = data[data['rest_comfort'] == 1]['date'].tolist()
plot_with_anomalies(data, ["average_temperature", "rainfall"], {"average_temperature" : "Average temperature over time in " + country_main, "rainfall" : "rainfall"}, 'date', 'rest_comfort', 1)

## 3.3. Removing the holidays for the period when there were still diseases < 10<a class="anchor" id="3.3"></a>

[Back to Table of Contents](#0.1)

In [29]:
# Removing the holidays for the period when there were still diseases < 10
holidays_df['ds_dt'] = pd.to_datetime(holidays_df['ds'], format='%Y-%m-%d', errors='ignore')
date_the_first_many_cases = df[df.Confirmed >= 10]['Date'].min()
holidays_df = holidays_df[holidays_df['ds_dt'] >= date_the_first_many_cases]
holidays_df.head()

Unnamed: 0,ds_holidays,holiday,ds,country,code,country_official_name,lower_window,upper_window,prior_scale,source,ds_dt
4,2020-03-01,Independence Movement Day,2020-03-08,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays,2020-03-08
5,2020-03-02,Alternative holiday of Independence Movement Day,2020-03-09,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays,2020-03-09
31,2020-04-18,the weakening of quarantine,2020-04-25,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://www.bsg.ox.ac.uk/research/research-pro...,2020-04-25
29,2020-04-20,the weakening of quarantine,2020-04-27,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://www.bsg.ox.ac.uk/research/research-pro...,2020-04-27
6,2020-04-30,Birthday of the Buddha,2020-05-07,"Korea, Republic of",KR,"Korea, Republic of",-3,3,10,https://github.com/dr-prodigy/python-holidays,2020-05-07


## 4. EDA<a class="anchor" id="4"></a>

[Back to Table of Contents](#0.1)

## 4.1. Plots - Confirmed cases over time<a class="anchor" id="4.1"></a>

[Back to Table of Contents](#0.1)

In [30]:
df2 = df[df['Date'] >= date_the_first_many_cases]

In [31]:
fig = px.line(df2, x="Date", y="cum_Confirmed", 
              title="Confirmed cases in " + country_main, 
              log_y=False,template='gridon',width=1000, height=600)
fig.show()

In [32]:
fig = px.line(df2, x="Date", y="cum_Confirmed", 
              title="Cumulative Confirmed cases (logarithmic scale) in " + country_main, 
              log_y=True,template='gridon',width=1000, height=600)
fig.show()

In [33]:
df2['holiday'] = 0
holidays_df_dates = holidays_df['ds'].tolist()
df2.loc[df2['Date'].isin(holidays_df_dates), 'holiday'] = 1
plot_with_anomalies(df2, ["cum_Confirmed"], {"cum_Confirmed" : "Confirmed cases and holidays data in " + country_main}, 'Date', 'holiday', 1)
df2 = df2.drop(columns=['holiday'])

## 4.2. Statistics<a class="anchor" id="4.2"></a>

[Back to Table of Contents](#0.1)

## Describe statistics

In [34]:
df2.describe()

Unnamed: 0,cum_Confirmed
count,584.0
mean,76094.234589
std,75840.452159
min,66.0
25%,13503.75
50%,37844.5
75%,122784.5
max,295132.0


## Earliest Cases

In [35]:
df2.head()

Unnamed: 0,Date,Confirmed,domestic,foreign,dead,cum_Confirmed
30,2020-02-19,34,34,-,-,66
31,2020-02-20,16,16,-,1,82
32,2020-02-21,74,74,-,-,156
33,2020-02-22,190,190,-,2,346
34,2020-02-23,210,209,1,-,556
