# COVID-19

## 1. Have an idea

### The purpose of this task is:
- to determine whether quarantine is effective in preventing the spread of the virus
- to see which countries were the most successful at containing the virus by implementing quarantine

### What is coronavirus?

Coronaviruses are a large family of viruses which may cause illness in animals or humans.  In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19.

### What is COVID-19?

COVID-19 is the infectious disease caused by the most recently discovered coronavirus. This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.

### What are the symptoms of COVID-19?

The most common symptoms of COVID-19 are fever, tiredness, and dry cough. Some patients may have aches and pains, nasal congestion, runny nose, sore throat or diarrhea. These symptoms are usually mild and begin gradually. Some people become infected but don’t develop any symptoms and don't feel unwell. Most people (about 80%) recover from the disease without needing special treatment. Around 1 out of every 6 people who gets COVID-19 becomes seriously ill and develops difficulty breathing. Older people, and those with underlying medical problems like high blood pressure, heart problems or diabetes, are more likely to develop serious illness.

### How does COVID-19 spread?

People can catch COVID-19 from others who have the virus. The disease can spread from person to person through small droplets from the nose or mouth which are spread when a person with COVID-19 coughs or exhales. These droplets land on objects and surfaces around the person. Other people then catch COVID-19 by touching these objects or surfaces, then touching their eyes, nose or mouth. People can also catch COVID-19 if they breathe in droplets from a person with COVID-19 who coughs out or exhales droplets. This is why it is important to stay more than 1 meter (3 feet) away from a person who is sick.

### What is quarantine?

Governments use quarantines to stop the spread of contagious diseases. Quarantines are for people or groups who don’t have symptoms but were exposed to the sickness. A quarantine keeps them away from others so they don’t unknowingly infect anyone. 

### What is the difference between isolation and quarantine?

Isolation separates sick people with a contagious disease from people who are not sick.

Quarantine separates and restricts the movement of people who were exposed to a contagious disease to see if they become sick.

## 2. Collection of data

The dataset used in this task was found through a Google-based dataset search engine (https://datasetsearch.research.google.com/). The first result of the search was an offical dataset of COVID-19 from WHO. Here is a repository that contains data of COVID-19 outbreak with code used to web scrap, data mung and cleaning (https://github.com/imdevskp/covid_19_jhu_data_web_scrap_and_cleaning).

## 3. Data Cleaning

### Import libraries

In [56]:
import numpy as np
import pandas as pd
import plotly.express as px

ModuleNotFoundError: No module named 'plotly'

### Read the data

In [2]:
data = pd.read_csv('covid_19.csv', parse_dates = ['Date'])

In [3]:
data.sample(5)

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered
12754,,Iceland,64.9631,-19.0208,2020-03-03,11.0,0.0,0.0
13807,Chongqing,China,30.0572,107.874,2020-03-06,576.0,6.0,513.0
10995,Maine,US,44.6939,-69.3819,2020-02-26,0.0,0.0,0.0
11163,,Nepal,28.1667,84.25,2020-02-27,1.0,0.0,1.0
12157,,Chile,-35.6751,-71.543,2020-03-01,0.0,0.0,0.0


### Identify the type of variables in the dataset

In [4]:
data.dtypes

Province/State            object
Country/Region            object
Lat                      float64
Long                     float64
Date              datetime64[ns]
Confirmed                float64
Deaths                   float64
Recovered                float64
dtype: object

In [5]:
data['Confirmed'] = data['Confirmed'].astype(int)

ValueError: Cannot convert non-finite values (NA or inf) to integer

We wanted to change data types of columns 'Confirmed', 'Death' and 'Recovered' to int, but we need to fill our na values first.

### Check for the null / na values

In [6]:
data.isna().sum()

Province/State    10788
Country/Region        0
Lat                   0
Long                  0
Date                  0
Confirmed             1
Deaths                1
Recovered             1
dtype: int64

### Replace na values

In [7]:
data[['Confirmed']] = data[['Confirmed']].fillna(0)

In [8]:
data[['Deaths']] = data[['Deaths']].fillna(0)

In [9]:
data[['Recovered']] = data[['Recovered']].fillna(0)

In [10]:
data[['Province/State']] = data[['Province/State']].fillna('')

After we filled our na values, we were able convert columns 'Confirmed', 'Death', 'Recovered' to integers

In [11]:
data['Confirmed'] = data['Confirmed'].astype(int)

In [12]:
data['Deaths'] = data['Deaths'].astype(int)

In [13]:
data['Recovered'] = data['Recovered'].astype(int)

We created a new column that would represent active cases called 'Active'. Since active cases are the ones that are currently happening, we subtracted cases with lethal outcome and cases, where patients had recovered.

In [14]:
data['Active'] = data['Confirmed'] - data['Deaths'] - data['Recovered']

We checked our data types again just to make sure our 'Active' column is of type 'int'.

In [15]:
data.dtypes

Province/State            object
Country/Region            object
Lat                      float64
Long                     float64
Date              datetime64[ns]
Confirmed                  int32
Deaths                     int32
Recovered                  int32
Active                     int32
dtype: object

We also picked 5 random entries to see how the dataset looks like.

In [16]:
data.sample(5)

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered,Active
16441,,Germany,51.0,9.0,2020-03-15,5795,11,46,5738
14334,,Tunisia,34.0,9.0,2020-03-08,2,0,0,2
1028,Washington,US,47.4009,-121.4905,2020-01-25,0,0,0,0
17862,,Russia,60.0,90.0,2020-03-19,199,1,9,189
15055,Guangxi,China,23.8298,108.7881,2020-03-10,252,2,234,16


### Filtering based on different Conditions

We created new dataframes to seperate countries with high numbers of confirmed cases, such as China and Italy.

#### China

In [17]:
china = data[data['Country/Region'] == 'China']
china

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered,Active
154,Hubei,China,30.9756,112.2707,2020-01-22,444,17,28,399
158,Guangdong,China,23.3417,113.4244,2020-01-22,26,0,0,26
159,Henan,China,33.8820,113.6140,2020-01-22,5,0,0,5
160,Zhejiang,China,29.1832,120.0934,2020-01-22,10,0,0,10
161,Hunan,China,27.6104,111.7088,2020-01-22,4,0,0,4
...,...,...,...,...,...,...,...,...,...
19098,Inner Mongolia,China,44.0935,113.9448,2020-03-23,75,1,74,0
19099,Ningxia,China,37.2692,106.1655,2020-03-23,75,0,75,0
19103,Qinghai,China,35.7452,95.9956,2020-03-23,18,0,18,0
19104,Macau,China,22.1667,113.5500,2020-03-23,24,0,10,14


#### Italy

In [18]:
italy = data[data['Country/Region'] == 'Italy']
italy

Unnamed: 0,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered,Active
16,,Italy,43.0,12.0,2020-01-22,0,0,0,0
326,,Italy,43.0,12.0,2020-01-23,0,0,0,0
636,,Italy,43.0,12.0,2020-01-24,0,0,0,0
946,,Italy,43.0,12.0,2020-01-25,0,0,0,0
1256,,Italy,43.0,12.0,2020-01-26,0,0,0,0
...,...,...,...,...,...,...,...,...,...
17686,,Italy,43.0,12.0,2020-03-19,41035,3405,4440,33190
17996,,Italy,43.0,12.0,2020-03-20,47021,4032,4440,38549
18306,,Italy,43.0,12.0,2020-03-21,53578,4825,6072,42681
18616,,Italy,43.0,12.0,2020-03-22,59138,5476,7024,46638


#### Latest data

In [19]:
latest_data = data[data['Date'] == max(data['Date'])].reset_index()
latest_data

Unnamed: 0,index,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered,Active
0,18910,,Thailand,15.0000,101.0000,2020-03-23,599,1,44,554
1,18911,,Japan,36.0000,138.0000,2020-03-23,1086,40,235,811
2,18912,,Singapore,1.2833,103.8333,2020-03-23,455,2,144,309
3,18913,,Nepal,28.1667,84.2500,2020-03-23,2,0,1,1
4,18914,,Malaysia,2.5000,112.5000,2020-03-23,1306,10,139,1157
...,...,...,...,...,...,...,...,...,...,...
305,19215,,Jersey,49.1900,-2.1100,2020-03-23,0,0,0,0
306,19216,,Puerto Rico,18.2000,-66.5000,2020-03-23,0,1,0,-1
307,19217,,Republic of the Congo,-1.4400,15.5560,2020-03-23,0,0,0,0
308,19218,,The Bahamas,24.2500,-76.0000,2020-03-23,0,0,0,0


In [20]:
latest_china = latest_data[latest_data['Country/Region'] =='China']
latest_china

Unnamed: 0,index,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered,Active
154,19064,Hubei,China,30.9756,112.2707,2020-03-23,67800,3153,59879,4768
158,19068,Guangdong,China,23.3417,113.4244,2020-03-23,1413,8,1332,73
159,19069,Henan,China,33.882,113.614,2020-03-23,1274,22,1250,2
160,19070,Zhejiang,China,29.1832,120.0934,2020-03-23,1238,1,1221,16
161,19071,Hunan,China,27.6104,111.7088,2020-03-23,1018,4,1014,0
162,19072,Anhui,China,31.8257,117.2264,2020-03-23,990,6,984,0
163,19073,Jiangxi,China,27.614,115.7221,2020-03-23,936,1,934,1
164,19074,Shandong,China,36.3427,118.1498,2020-03-23,767,7,751,9
166,19076,Jiangsu,China,32.9711,119.455,2020-03-23,633,0,631,2
167,19077,Chongqing,China,30.0572,107.874,2020-03-23,577,6,570,1


In [21]:
latest_italy = latest_data[latest_data['Country/Region'] =='Italy']
latest_italy

Unnamed: 0,index,Province/State,Country/Region,Lat,Long,Date,Confirmed,Deaths,Recovered,Active
16,18926,,Italy,43.0,12.0,2020-03-23,59138,5476,7024,46638


In [22]:
latest_temp = data.groupby(['Country/Region', 'Province/State'])['Confirmed', 'Deaths', 'Recovered', 'Active'].max()

  """Entry point for launching an IPython kernel.


In [23]:
latest_temp

Unnamed: 0_level_0,Unnamed: 1_level_0,Confirmed,Deaths,Recovered,Active
Country/Region,Province/State,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Afghanistan,,40,1,1,38
Albania,,89,2,2,85
Algeria,,201,17,65,119
Andorra,,113,1,1,111
Angola,,2,0,0,2
...,...,...,...,...,...
Uzbekistan,,43,0,0,43
Venezuela,,70,0,15,70
Vietnam,,113,0,17,96
Zambia,,3,0,0,3


We created a dataframe that displays a sum of all confirmed cases for each day.

In [39]:
latest_temp = data.groupby('Date')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()

  """Entry point for launching an IPython kernel.


In [40]:
latest_temp

Unnamed: 0,Date,Confirmed,Deaths,Recovered,Active
0,2020-01-22,554,17,28,509
1,2020-01-23,652,18,30,604
2,2020-01-24,939,26,36,877
3,2020-01-25,1432,42,39,1351
4,2020-01-26,2113,56,52,2005
...,...,...,...,...,...
57,2020-03-19,242708,9867,84854,147987
58,2020-03-20,272166,11299,87256,173611
59,2020-03-21,304524,12973,91499,200052
60,2020-03-22,335955,14632,97704,223619


We added a new column 'Mortality', which displays the probability of an infected person dying.

In [41]:
latest_temp['Mortality'] = latest_temp['Deaths'] / latest_temp['Confirmed']

In [42]:
latest_temp

Unnamed: 0,Date,Confirmed,Deaths,Recovered,Active,Mortality
0,2020-01-22,554,17,28,509,0.030686
1,2020-01-23,652,18,30,604,0.027607
2,2020-01-24,939,26,36,877,0.027689
3,2020-01-25,1432,42,39,1351,0.029330
4,2020-01-26,2113,56,52,2005,0.026503
...,...,...,...,...,...,...
57,2020-03-19,242708,9867,84854,147987,0.040654
58,2020-03-20,272166,11299,87256,173611,0.041515
59,2020-03-21,304524,12973,91499,200052,0.042601
60,2020-03-22,335955,14632,97704,223619,0.043553


Then, we displayed the latest available information on statistics

In [45]:
latest_temp = latest_temp[latest_temp['Date'] == max(latest_temp['Date'])].reset_index(drop = True)

In [46]:
latest_temp

Unnamed: 0,Date,Confirmed,Deaths,Recovered,Active,Mortality
0,2020-03-23,336004,14643,98334,223027,0.04358


We grouped the dataset by countries

In [49]:
latest_data_grouped = latest_data.groupby('Country/Region')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()
latest_data_grouped

  """Entry point for launching an IPython kernel.


Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active
0,Afghanistan,40,1,1,38
1,Albania,89,2,2,85
2,Algeria,201,17,65,119
3,Andorra,113,1,1,111
4,Angola,2,0,0,2
...,...,...,...,...,...
178,Uzbekistan,43,0,0,43
179,Venezuela,70,0,15,55
180,Vietnam,113,0,17,96
181,Zambia,3,0,0,3


Since our data about China is also divided into provinces, we grouped it as well.

In [50]:
latest_china_grouped = latest_china.groupby('Province/State')['Confirmed', 'Deaths', 'Recovered', 'Active'].sum().reset_index()
latest_china_grouped

  """Entry point for launching an IPython kernel.


Unnamed: 0,Province/State,Confirmed,Deaths,Recovered,Active
0,Anhui,990,6,984,0
1,Beijing,522,8,400,114
2,Chongqing,577,6,570,1
3,Fujian,313,1,295,17
4,Gansu,136,2,114,20
5,Guangdong,1413,8,1332,73
6,Guangxi,254,2,250,2
7,Guizhou,146,2,144,0
8,Hainan,168,6,161,1
9,Hebei,319,6,310,3


#### Data in each country sorted by the amount of confirmed cases

In [53]:
latest_grouped_sorted = latest_data_grouped.sort_values(by = 'Confirmed', ascending=False)
latest_grouped_sorted = latest_grouped_sorted[['Country/Region', 'Confirmed', 'Active', 'Deaths', 'Recovered']]
latest_grouped_sorted = latest_grouped_sorted.reset_index(drop=True)

In [54]:
latest_grouped_sorted

Unnamed: 0,Country/Region,Confirmed,Active,Deaths,Recovered
0,China,81439,5351,3274,72814
1,Italy,59138,46638,5476,7024
2,US,33276,32681,417,178
3,Spain,28768,24421,1772,2575
4,Germany,24873,24513,94,266
...,...,...,...,...,...
178,Greenland,0,0,0,0
179,Puerto Rico,0,-1,1,0
180,The Gambia,0,0,0,0
181,Republic of the Congo,0,0,0,0


## 4. Data Analysis

In [55]:
temp = data.groupby('Date')['Recovered', 'Deaths', 'Active'].sum().reset_index()
temp = temp.melt(id_vars="Date", value_vars=['Recovered', 'Deaths', 'Active'],
                 var_name='Case', value_name='Count')
temp.head()

fig = px.area(temp, x="Date", y="Count", color='Case', height=800,
             title='Cases over time', color_discrete_sequence = [rec, dth, act])
fig.update_layout(xaxis_rangeslider_visible=True)
fig.show()

  """Entry point for launching an IPython kernel.


NameError: name 'px' is not defined