# COVID-19 : Worldwide & USA Analysis

## Table of Contents

1. [Problem Statement](#section1)<br>
2. [Data Loading and Description](#section2)<br/>
3. [Data Profiling](#section3)
    - 3.1 [Understanding the Dataset](#section301)<br/>
    - 3.2 [Preprocessing](#section302)<br/>
4. [Questions](#section4)
    - 4.1 [How are COVID-19 cases distributed worldwide?](#section401)<br/>
    - 4.2 [How have cases increased with time in the most infected countries?](#section402)<br/>
    - 4.3 [At what rate have cases increased daily in the most infected countries?](#section403)<br/>
    - 4.4 [Which states in the US have the most and least positive cases?](#section404)<br/>
    - 4.5 [Which states in the US have performed the highest and lowest number of tests?](#section405)<br/>
    - 4.6 [Which states in the US show the highest positive case rate with respect to tests conducted?](#section406)<br/>
    - 4.7 [Which states in the US show the highest death rate with respect to positive cases?](#section407)<br/>
5. [Conclusion](#section5)<br/>  

<a id='section1'></a>
### 1. Problem Statement

This notebook explores the spread of COVID-19 worldwide by using various python libraries for visualization and numerical manipulation. We perform a preliminary __Exploratory Data Analysis(EDA)__ of our __Global COVID case tracking__ dataset. We will then look into the regions which have the highest number of cases. This data will be analysed using some basic statistical tools and charts. 

Our end goal in this notebook is to analyze the current spread of COVID and visualizes the number of cases country-wise. We also look at the amount of new cases country-wise which will give us a picture of how well the virus is being contained. Lastly, we will look into regions that have the most cases and further examine how the virus is spread state-wise.

* __Exploratory Data Analysis__ <br/>
Understand the data by EDA and derive simple models with Pandas as baseline.
EDA ia a critical and first step in analyzing the data and we do this for below reasons :
    - Finding patterns in Data
    - Determining relationships in Data
    - Checking of assumptions
    - Detection of mistakes 

<a id='section2'></a>
### 2. Data Loading and Description

In this project we will be using multiple datasets(one of the overal worldwide data as well as one for the cases only in the USA) so as to get a clearer picture of how COVID-19 has spread across the globe and also across the country with the highest number of cases(USA).

We will analyze and process these datasets in order to answer several questions.


__1. worldwide_df :__
- The dataset consists of data about the spread of COVID-19 across the globe.
- The dataset comprises of __Country Names__ as rows and various data about those countries as columns. Below is a table showing names of the columns and their description.

| Column Name   | Description                                               |
| ------------- |:-------------                                            :| 
| Province/State           | Name of Province or States of the country                                                 | 
| Country/Region      | Name of Country/Region                        |  
| Lat        | Latitude of location                                           | 
| Long          | Longitude of location                                      |
 | Dates          | List of dates which represent the number of cases as of the date mentioned in the column name                                         |

__2. USA_state_stats :__
 - The dataset consists of information about the brand and model details of the phones along with their device IDs.
 - The dataset comprises of __30 columns__ and each row is representative of a state. As we will not be using all the columns, the entire description of the data can be found at 'https://covidtracking.com/'. Below is a table showing names of all the columns we will be using and their descriptions.
        
| Column Name   | Description                                               |
| ------------- |:-------------                                            :| 
| state           | Name of the state                                                 | 
| positive      | Number of positive cases                        |  
| negative        | Number of negative cases                                |
| death        | Number of deaths                                    |
|totalTestResults | Total number of tests conducted|

__Both these datasets are updated daily so importing them should automatically fetch the most recently updated data__

#### Importing packages 

In [2]:
import pandas as pd
import plotly.graph_objs as go 
from plotly.offline import init_notebook_mode,iplot,plot
init_notebook_mode(connected=True) 
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import numpy as np

In [3]:
import cufflinks as cf
init_notebook_mode(connected=True)
# For offline use
cf.go_offline()

#### Importing the Dataset

__worldwide_df__

In [4]:
worldwide_df = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
worldwide_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/7/20,5/8/20,5/9/20,5/10/20,5/11/20,5/12/20,5/13/20,5/14/20,5/15/20,5/16/20
0,,Afghanistan,33.0,65.0,0,0,0,0,0,0,...,3563,3778,4033,4402,4687,4963,5226,5639,6053,6402
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,842,850,856,868,872,876,880,898,916,933
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,5182,5369,5558,5723,5891,6067,6253,6442,6629,6821
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,752,752,754,755,755,758,760,761,761,761
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,36,43,43,45,45,45,45,48,48,48


__USA_state_stats__

In [5]:
#COVID USA stats from covidtracking.com
USA_state_stats = pd.read_csv('https://covidtracking.com/api/v1/states/current.csv')
USA_state_stats.head()

Unnamed: 0,state,positive,positiveScore,negativeScore,negativeRegularScore,commercialScore,grade,score,notes,dataQualityGrade,...,checkTimeEt,death,hospitalized,total,totalTestResults,posNeg,fips,dateModified,dateChecked,hash
0,AK,392,1.0,1.0,1.0,1.0,A,4.0,"Please stop using the ""total"" field. Use ""tota...",B,...,5/16 15:50,10,,33281,33281,33281,2,2020-05-16T04:00:00Z,2020-05-16T19:50:00Z,a42c8ccc77a49e4e959484fca07c6bdb8e488efe
1,AL,11523,1.0,1.0,0.0,1.0,B,3.0,"Please stop using the ""total"" field. Use ""tota...",B,...,5/16 16:51,485,1387.0,153494,153494,153494,1,2020-05-16T04:00:00Z,2020-05-16T20:51:00Z,5dcfc15d94a5a778610b052281ca89be5843162a
2,AR,4578,1.0,1.0,1.0,1.0,A,4.0,"Please stop using the ""total"" field. Use ""tota...",A,...,5/16 15:58,98,520.0,81644,81644,81644,5,2020-05-15T20:20:00Z,2020-05-16T19:58:00Z,444b437f09469749409f9298816b0f0895f47710
3,AZ,13631,1.0,1.0,0.0,1.0,B,3.0,"Please stop using the ""total"" field. Use ""tota...",A+,...,5/16 14:37,679,1683.0,146788,146788,146788,4,2020-05-16T04:00:00Z,2020-05-16T18:37:00Z,ccff60eab09a40b9b77602cf8de06e10f165dc45
4,CA,76793,1.0,1.0,0.0,1.0,B,3.0,"Please stop using the ""total"" field. Use ""tota...",B,...,5/16 14:41,3204,,1179126,1179126,1179126,6,2020-05-16T04:00:00Z,2020-05-16T18:41:00Z,ef8aa7d004e1296396869b558338ac37fb11f853


<a id='section3'></a>
## 3. Data Profiling

- In the upcoming section we will first __understand our dataset__ using various pandas functionalities.
- Once we identify if there are any inconsistencies and shortcomings in the data, we can begin preprocessing it.
- In __preprocessing__, we will deal with erronous and missing values of columns. If necessary, we may also add columns to make analysis easier.

<a id='section301'></a>
### 3.1 Understanding the data

__worldwide_df :__

In [6]:
worldwide_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 266 entries, 0 to 265
Columns: 120 entries, Province/State to 5/16/20
dtypes: float64(2), int64(116), object(2)
memory usage: 249.5+ KB


In [7]:
worldwide_df.shape

(266, 120)

In [8]:
worldwide_df.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/7/20,5/8/20,5/9/20,5/10/20,5/11/20,5/12/20,5/13/20,5/14/20,5/15/20,5/16/20
0,,Afghanistan,33.0,65.0,0,0,0,0,0,0,...,3563,3778,4033,4402,4687,4963,5226,5639,6053,6402
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,842,850,856,868,872,876,880,898,916,933
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,5182,5369,5558,5723,5891,6067,6253,6442,6629,6821
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,752,752,754,755,755,758,760,761,761,761
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,36,43,43,45,45,45,45,48,48,48


__USA_state_stats :__

In [9]:
USA_state_stats.head()

Unnamed: 0,state,positive,positiveScore,negativeScore,negativeRegularScore,commercialScore,grade,score,notes,dataQualityGrade,...,checkTimeEt,death,hospitalized,total,totalTestResults,posNeg,fips,dateModified,dateChecked,hash
0,AK,392,1.0,1.0,1.0,1.0,A,4.0,"Please stop using the ""total"" field. Use ""tota...",B,...,5/16 15:50,10,,33281,33281,33281,2,2020-05-16T04:00:00Z,2020-05-16T19:50:00Z,a42c8ccc77a49e4e959484fca07c6bdb8e488efe
1,AL,11523,1.0,1.0,0.0,1.0,B,3.0,"Please stop using the ""total"" field. Use ""tota...",B,...,5/16 16:51,485,1387.0,153494,153494,153494,1,2020-05-16T04:00:00Z,2020-05-16T20:51:00Z,5dcfc15d94a5a778610b052281ca89be5843162a
2,AR,4578,1.0,1.0,1.0,1.0,A,4.0,"Please stop using the ""total"" field. Use ""tota...",A,...,5/16 15:58,98,520.0,81644,81644,81644,5,2020-05-15T20:20:00Z,2020-05-16T19:58:00Z,444b437f09469749409f9298816b0f0895f47710
3,AZ,13631,1.0,1.0,0.0,1.0,B,3.0,"Please stop using the ""total"" field. Use ""tota...",A+,...,5/16 14:37,679,1683.0,146788,146788,146788,4,2020-05-16T04:00:00Z,2020-05-16T18:37:00Z,ccff60eab09a40b9b77602cf8de06e10f165dc45
4,CA,76793,1.0,1.0,0.0,1.0,B,3.0,"Please stop using the ""total"" field. Use ""tota...",B,...,5/16 14:41,3204,,1179126,1179126,1179126,6,2020-05-16T04:00:00Z,2020-05-16T18:41:00Z,ef8aa7d004e1296396869b558338ac37fb11f853


In [10]:
USA_state_stats.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56 entries, 0 to 55
Data columns (total 30 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   state                   56 non-null     object 
 1   positive                56 non-null     int64  
 2   positiveScore           52 non-null     float64
 3   negativeScore           52 non-null     float64
 4   negativeRegularScore    52 non-null     float64
 5   commercialScore         52 non-null     float64
 6   grade                   52 non-null     object 
 7   score                   52 non-null     float64
 8   notes                   56 non-null     object 
 9   dataQualityGrade        56 non-null     object 
 10  negative                55 non-null     float64
 11  pending                 5 non-null      float64
 12  hospitalizedCurrently   45 non-null     float64
 13  hospitalizedCumulative  33 non-null     float64
 14  inIcuCurrently          25 non-null     floa

In [11]:
USA_state_stats.shape

(56, 30)

<a id='section302'></a>
### 3.2 Preprocessing

__worldwide_df :__

For our analysis, we do not need information about __Province/State.__ We will only be analyzing country information. Below we explore the Province/State column.

In [12]:
worldwide_df[worldwide_df['Country/Region'] == 'China'].head(10)

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,5/7/20,5/8/20,5/9/20,5/10/20,5/11/20,5/12/20,5/13/20,5/14/20,5/15/20,5/16/20
49,Anhui,China,31.8257,117.2264,1,9,15,39,60,70,...,991,991,991,991,991,991,991,991,991,991
50,Beijing,China,40.1824,116.4142,14,22,36,41,68,80,...,593,593,593,593,593,593,593,593,593,593
51,Chongqing,China,30.0572,107.874,6,9,27,57,75,110,...,579,579,579,579,579,579,579,579,579,579
52,Fujian,China,26.0789,117.9874,1,5,10,18,35,59,...,356,356,356,356,356,356,356,356,356,356
53,Gansu,China,37.8099,101.0583,0,2,2,4,7,14,...,139,139,139,139,139,139,139,139,139,139
54,Guangdong,China,23.3417,113.4244,26,32,53,78,111,151,...,1589,1589,1589,1589,1589,1589,1589,1589,1589,1590
55,Guangxi,China,23.8298,108.7881,2,5,23,23,36,46,...,254,254,254,254,254,254,254,254,254,254
56,Guizhou,China,26.8154,106.8748,1,3,3,4,5,7,...,147,147,147,147,147,147,147,147,147,147
57,Hainan,China,19.1959,109.7453,4,5,8,19,22,33,...,168,168,168,168,168,168,168,168,169,169
58,Hebei,China,39.549,116.1306,1,1,2,8,13,18,...,328,328,328,328,328,328,328,328,328,328


As we can see above, each country's cases are distributed amongst their different states and provinces. To simplify this, we will add the values in each of these provinces and states for each country to get the total number of cases for each country.

In [13]:
df_group_country = worldwide_df.groupby('Country/Region').sum()
df_group_country

Unnamed: 0_level_0,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,...,5/7/20,5/8/20,5/9/20,5/10/20,5/11/20,5/12/20,5/13/20,5/14/20,5/15/20,5/16/20
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,33.000000,65.000000,0,0,0,0,0,0,0,0,...,3563,3778,4033,4402,4687,4963,5226,5639,6053,6402
Albania,41.153300,20.168300,0,0,0,0,0,0,0,0,...,842,850,856,868,872,876,880,898,916,933
Algeria,28.033900,1.659600,0,0,0,0,0,0,0,0,...,5182,5369,5558,5723,5891,6067,6253,6442,6629,6821
Andorra,42.506300,1.521800,0,0,0,0,0,0,0,0,...,752,752,754,755,755,758,760,761,761,761
Angola,-11.202700,17.873900,0,0,0,0,0,0,0,0,...,36,43,43,45,45,45,45,48,48,48
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
West Bank and Gaza,31.952200,35.233200,0,0,0,0,0,0,0,0,...,375,375,375,375,375,375,375,375,375,376
Western Sahara,24.215500,-12.885800,0,0,0,0,0,0,0,0,...,6,6,6,6,6,6,6,6,6,6
Yemen,15.552727,48.516388,0,0,0,0,0,0,0,0,...,25,34,34,51,56,65,70,85,106,122
Zambia,-15.416700,28.283300,0,0,0,0,0,0,0,0,...,153,167,252,267,267,441,446,654,654,679


By grouping by country and performing an aggregate sum function, we are able to get the total number of cases for each country.

In [14]:
df_group_country.iloc[:,len(df_group_country.columns) - 1]

Country/Region
Afghanistan           6402
Albania                933
Algeria               6821
Andorra                761
Angola                  48
                      ... 
West Bank and Gaza     376
Western Sahara           6
Yemen                  122
Zambia                 679
Zimbabwe                42
Name: 5/16/20, Length: 188, dtype: int64

In [15]:
df_group_country['Total Cases'] = df_group_country.iloc[:,len(df_group_country.columns) - 1]
df_group_country.head()

Unnamed: 0_level_0,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,...,5/8/20,5/9/20,5/10/20,5/11/20,5/12/20,5/13/20,5/14/20,5/15/20,5/16/20,Total Cases
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Afghanistan,33.0,65.0,0,0,0,0,0,0,0,0,...,3778,4033,4402,4687,4963,5226,5639,6053,6402,6402
Albania,41.1533,20.1683,0,0,0,0,0,0,0,0,...,850,856,868,872,876,880,898,916,933,933
Algeria,28.0339,1.6596,0,0,0,0,0,0,0,0,...,5369,5558,5723,5891,6067,6253,6442,6629,6821,6821
Andorra,42.5063,1.5218,0,0,0,0,0,0,0,0,...,752,754,755,755,758,760,761,761,761,761
Angola,-11.2027,17.8739,0,0,0,0,0,0,0,0,...,43,43,45,45,45,45,48,48,48,48


We have now added a column called __Total Cases__ which contains the total number of cases for each country so that it will be easier to analyze.

In [16]:
df_group_country = df_group_country.sort_values('Total Cases',ascending=False)
df_group_country_top10 = df_group_country.head(10)
df_group_country

Unnamed: 0_level_0,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,...,5/8/20,5/9/20,5/10/20,5/11/20,5/12/20,5/13/20,5/14/20,5/15/20,5/16/20,Total Cases
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
US,37.090200,-95.712900,1,1,2,2,5,5,5,5,...,1283929,1309550,1329260,1347881,1369376,1390406,1417774,1442824,1467820,1467820
Russia,60.000000,90.000000,0,0,0,0,0,0,0,0,...,187859,198676,209688,221344,232243,242271,252245,262843,272043,272043
United Kingdom,270.029900,-482.924700,0,0,0,0,0,0,0,0,...,212629,216525,220449,224332,227741,230985,234440,238004,241461,241461
Brazil,-14.235000,-51.925300,0,0,0,0,0,0,0,0,...,146894,156061,162699,169594,178214,190137,203165,220291,233511,233511
Spain,40.000000,-4.000000,0,0,0,0,0,0,0,0,...,222857,223578,224350,227436,228030,228691,229540,230183,230698,230698
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
Suriname,3.919300,-56.027800,0,0,0,0,0,0,0,0,...,10,10,10,10,10,10,10,10,10,10
MS Zaandam,0.000000,0.000000,0,0,0,0,0,0,0,0,...,9,9,9,9,9,9,9,9,9,9
Papua New Guinea,-6.315000,143.955500,0,0,0,0,0,0,0,0,...,8,8,8,8,8,8,8,8,8,8
Western Sahara,24.215500,-12.885800,0,0,0,0,0,0,0,0,...,6,6,6,6,6,6,6,6,6,6


With the above code, we sort the total cases in descending order so that we have the countries with the highest cases at the top of the list. We will make use of this further down in the notebook.

__USA_state_stats :__

In [17]:
USA_state_stats.drop(['onVentilatorCumulative','onVentilatorCurrently','score','grade','commercialScore','negativeRegularScore','negativeScore','positiveScore','posNeg','total','posNeg','notes'],axis=1,inplace=True)
USA_state_stats.head(5)

Unnamed: 0,state,positive,dataQualityGrade,negative,pending,hospitalizedCurrently,hospitalizedCumulative,inIcuCurrently,inIcuCumulative,recovered,lastUpdateEt,checkTimeEt,death,hospitalized,totalTestResults,fips,dateModified,dateChecked,hash
0,AK,392,B,32889.0,,10.0,,,,344.0,5/16 00:00,5/16 15:50,10,,33281,2,2020-05-16T04:00:00Z,2020-05-16T19:50:00Z,a42c8ccc77a49e4e959484fca07c6bdb8e488efe
1,AL,11523,B,141971.0,,,1387.0,,501.0,,5/16 00:00,5/16 16:51,485,1387.0,153494,1,2020-05-16T04:00:00Z,2020-05-16T20:51:00Z,5dcfc15d94a5a778610b052281ca89be5843162a
2,AR,4578,A,77066.0,,65.0,520.0,,,3472.0,5/15 16:20,5/16 15:58,98,520.0,81644,5,2020-05-15T20:20:00Z,2020-05-16T19:58:00Z,444b437f09469749409f9298816b0f0895f47710
3,AZ,13631,A+,133157.0,,791.0,1683.0,344.0,,3357.0,5/16 00:00,5/16 14:37,679,1683.0,146788,4,2020-05-16T04:00:00Z,2020-05-16T18:37:00Z,ccff60eab09a40b9b77602cf8de06e10f165dc45
4,CA,76793,B,1102333.0,,4424.0,,1313.0,,,5/16 00:00,5/16 14:41,3204,,1179126,6,2020-05-16T04:00:00Z,2020-05-16T18:41:00Z,ef8aa7d004e1296396869b558338ac37fb11f853


In [18]:
USA_state_stats.drop(['hospitalized','dataQualityGrade','inIcuCurrently','inIcuCumulative','pending','hospitalizedCurrently','hospitalizedCumulative','hash','dateChecked','dateModified','fips','checkTimeEt','lastUpdateEt'],axis=1,inplace=True)
USA_state_stats.head(5)

Unnamed: 0,state,positive,negative,recovered,death,totalTestResults
0,AK,392,32889.0,344.0,10,33281
1,AL,11523,141971.0,,485,153494
2,AR,4578,77066.0,3472.0,98,81644
3,AZ,13631,133157.0,3357.0,679,146788
4,CA,76793,1102333.0,,3204,1179126


In [19]:
USA_state_stats.isnull().sum()

state                0
positive             0
negative             1
recovered           14
death                0
totalTestResults     0
dtype: int64

In [20]:
USA_state_stats[USA_state_stats['recovered'].isnull()]

Unnamed: 0,state,positive,negative,recovered,death,totalTestResults
1,AL,11523,141971.0,,485,153494
4,CA,76793,1102333.0,,3204,1179126
9,FL,44811,585236.0,,2040,630047
10,GA,37147,283922.0,,1592,321069
14,IL,92457,469192.0,,4129,561649
15,IN,27280,144078.0,,1741,171358
19,MA,84933,363156.0,,5705,448089
24,MO,10675,128665.0,,589,139340
29,NE,9772,50206.0,,119,59978
35,OH,27474,219228.0,,1610,246702


In [21]:
USA_state_stats.drop('recovered',axis=1,inplace=True)
USA_state_stats.head()

Unnamed: 0,state,positive,negative,death,totalTestResults
0,AK,392,32889.0,10,33281
1,AL,11523,141971.0,485,153494
2,AR,4578,77066.0,98,81644
3,AZ,13631,133157.0,679,146788
4,CA,76793,1102333.0,3204,1179126


We also have one row where the column __negative__ is null. We could drop just this row but we will instead use the totalTestResults and the positive columns to get the missing negative value.

In [24]:
USA_state_stats['negative'] = USA_state_stats['totalTestResults'] - USA_state_stats['positive']
USA_state_stats.isnull().sum()

state               0
positive            0
negative            0
death               0
totalTestResults    0
dtype: int64

In [25]:
USA_state_stats.head()

Unnamed: 0,state,positive,negative,death,totalTestResults
0,AK,392,32889,10,33281
1,AL,11523,141971,485,153494
2,AR,4578,77066,98,81644
3,AZ,13631,133157,679,146788
4,CA,76793,1102333,3204,1179126


In [26]:
USA_state_stats['positive/tests %'] = (USA_state_stats['positive']/USA_state_stats['totalTestResults'])*100
USA_state_stats['death/positive %'] = (USA_state_stats['death']/USA_state_stats['positive'])*100
USA_state_stats.head()

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
0,AK,392,32889,10,33281,1.177849,2.55102
1,AL,11523,141971,485,153494,7.507134,4.208973
2,AR,4578,77066,98,81644,5.607271,2.140673
3,AZ,13631,133157,679,146788,9.286181,4.981293
4,CA,76793,1102333,3204,1179126,6.512705,4.172255


In [27]:
USA_state_stats['state'].nunique()

56

In [28]:
USA_state_stats['state'].unique()

array(['AK', 'AL', 'AR', 'AZ', 'CA', 'CO', 'CT', 'DC', 'DE', 'FL', 'GA',
       'HI', 'IA', 'ID', 'IL', 'IN', 'KS', 'KY', 'LA', 'MA', 'MD', 'ME',
       'MI', 'MN', 'MO', 'MS', 'MT', 'NC', 'ND', 'NE', 'NH', 'NJ', 'NM',
       'NV', 'NY', 'OH', 'OK', 'OR', 'PA', 'RI', 'SC', 'SD', 'TN', 'TX',
       'UT', 'VA', 'VT', 'WA', 'WI', 'WV', 'WY', 'PR', 'AS', 'GU', 'MP',
       'VI'], dtype=object)

In [29]:
USA_state_stats.drop([52,54],axis=0,inplace=True)

<a id='section4'></a>
## 4. Questions

<a id='section401'></a>
### 4.1 How are COVID-19 cases distributed worldwide?

To answer this quetion, we will use a choropleth map to visualize the cases across the globe to gain a holistic view of the spread of the virus.

In [30]:
data = dict(
        type = 'choropleth',
        colorscale = 'agsunset',
        reversescale = True,
        locations = df_group_country.index,
        locationmode = "country names",
        z = df_group_country['Total Cases'],
        text = df_group_country.index,
        colorbar = {'title' : 'COVID-19 cases by Country'},
      ) 

layout = dict(title = 'COVID-19 cases by Country',
                geo = dict(showframe = False,projection = {'type':'orthographic'})
             )

In [31]:
choromap = go.Figure(data = [data],layout = layout)
iplot(choromap,validate=False)

* From the above interactive map, we are able to get an overall understanding of the spread of the virus across the globe. It is clear to us that the US is the worst affected as of the time that this notebook is being made.

In [32]:
df_group_country_top10 = df_group_country_top10.iloc[:,2:]
df_group_country_top10

Unnamed: 0_level_0,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,1/28/20,1/29/20,1/30/20,1/31/20,...,5/8/20,5/9/20,5/10/20,5/11/20,5/12/20,5/13/20,5/14/20,5/15/20,5/16/20,Total Cases
Country/Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
US,1,1,2,2,5,5,5,5,5,7,...,1283929,1309550,1329260,1347881,1369376,1390406,1417774,1442824,1467820,1467820
Russia,0,0,0,0,0,0,0,0,0,2,...,187859,198676,209688,221344,232243,242271,252245,262843,272043,272043
United Kingdom,0,0,0,0,0,0,0,0,0,2,...,212629,216525,220449,224332,227741,230985,234440,238004,241461,241461
Brazil,0,0,0,0,0,0,0,0,0,0,...,146894,156061,162699,169594,178214,190137,203165,220291,233511,233511
Spain,0,0,0,0,0,0,0,0,0,0,...,222857,223578,224350,227436,228030,228691,229540,230183,230698,230698
Italy,0,0,0,0,0,0,0,0,0,2,...,217185,218268,219070,219814,221216,222104,223096,223885,224760,224760
France,0,0,2,3,3,3,4,5,5,5,...,176202,176782,177094,177547,178349,178184,178994,179630,179630,179630
Germany,0,0,0,0,0,1,4,4,4,5,...,170588,171324,171879,172576,173171,174098,174478,175233,175752,175752
Turkey,0,0,0,0,0,0,0,0,0,0,...,135569,137115,138657,139771,141475,143114,144749,146457,148067,148067
Iran,0,0,0,0,0,0,0,0,0,0,...,104691,106220,107603,109286,110767,112725,114533,116635,118392,118392


* We have created a new dataframe with the top 10 most infected countries. We can now use this dataframe to plot the trajectory of cases daily for these 10 countries.

* The transpose function will be implemented on our dataframe so as to allowe us to get our date columns as index values and hence plot the number of cases daily across the 10 most infected countries.

<a id='section402'></a>
### 4.2 How have cases increased with time in the most infected countries?

#### Visualizing daily number of cases among top 10 most infected countries:

In [33]:
df_group_country_top10_trans = df_group_country_top10.transpose()
df_group_country_top10_trans.iplot(width=2.5,size=20)

From the graph above, we can see that the trajectory of the US stands out from the other countries. The curve seems to be much steeper and the number of cases seem to increase at a much higher rate than other countries. 

In [34]:
df_group_country_top10_trans

Country/Region,US,Russia,United Kingdom,Brazil,Spain,Italy,France,Germany,Turkey,Iran
1/22/20,1,0,0,0,0,0,0,0,0,0
1/23/20,1,0,0,0,0,0,0,0,0,0
1/24/20,2,0,0,0,0,0,2,0,0,0
1/25/20,2,0,0,0,0,0,3,0,0,0
1/26/20,5,0,0,0,0,0,3,0,0,0
...,...,...,...,...,...,...,...,...,...,...
5/13/20,1390406,242271,230985,190137,228691,222104,178184,174098,143114,112725
5/14/20,1417774,252245,234440,203165,229540,223096,178994,174478,144749,114533
5/15/20,1442824,262843,238004,220291,230183,223885,179630,175233,146457,116635
5/16/20,1467820,272043,241461,233511,230698,224760,179630,175752,148067,118392


<a id='section403'></a>
### 4.3 At what rate have cases increased daily in the most infected countries?

#### Visualizing daily new cases among top 10 most infected countries:

Next we will create new columns for each of our 10 countries to show the number of __new cases__ each day for each one of them. To do this, we will create a new dataframe named __df_case_increment__.

In [35]:
df_case_increment = df_group_country_top10_trans
df_case_increment

Country/Region,US,Russia,United Kingdom,Brazil,Spain,Italy,France,Germany,Turkey,Iran
1/22/20,1,0,0,0,0,0,0,0,0,0
1/23/20,1,0,0,0,0,0,0,0,0,0
1/24/20,2,0,0,0,0,0,2,0,0,0
1/25/20,2,0,0,0,0,0,3,0,0,0
1/26/20,5,0,0,0,0,0,3,0,0,0
...,...,...,...,...,...,...,...,...,...,...
5/13/20,1390406,242271,230985,190137,228691,222104,178184,174098,143114,112725
5/14/20,1417774,252245,234440,203165,229540,223096,178994,174478,144749,114533
5/15/20,1442824,262843,238004,220291,230183,223885,179630,175233,146457,116635
5/16/20,1467820,272043,241461,233511,230698,224760,179630,175752,148067,118392


In [36]:
df_case_increment.columns

Index(['US', 'Russia', 'United Kingdom', 'Brazil', 'Spain', 'Italy', 'France',
       'Germany', 'Turkey', 'Iran'],
      dtype='object', name='Country/Region')

In [37]:
newCaseList = []
oldColumns = []
for x in df_case_increment.columns:
    oldColumns.append(x)
    newcases = x + '_newcases'
    df_case_increment[newcases] = df_case_increment[x]
    newCaseList.append(newcases)
df_case_increment.iloc[:-1,10:] = df_case_increment.iloc[:-1,10:].shift(periods = 1,fill_value=0)

for y in range(len(newCaseList)):
    df_case_increment[newCaseList[y]] = df_case_increment[oldColumns[y]] - df_case_increment[newCaseList[y]]
df_case_increment
#test_df.head()

Country/Region,US,Russia,United Kingdom,Brazil,Spain,Italy,France,Germany,Turkey,Iran,US_newcases,Russia_newcases,United Kingdom_newcases,Brazil_newcases,Spain_newcases,Italy_newcases,France_newcases,Germany_newcases,Turkey_newcases,Iran_newcases
1/22/20,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0
1/23/20,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1/24/20,2,0,0,0,0,0,2,0,0,0,1,0,0,0,0,0,2,0,0,0
1/25/20,2,0,0,0,0,0,3,0,0,0,0,0,0,0,0,0,1,0,0,0
1/26/20,5,0,0,0,0,0,3,0,0,0,3,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5/13/20,1390406,242271,230985,190137,228691,222104,178184,174098,143114,112725,21030,10028,3244,11923,661,888,-165,927,1639,1958
5/14/20,1417774,252245,234440,203165,229540,223096,178994,174478,144749,114533,27368,9974,3455,13028,849,992,810,380,1635,1808
5/15/20,1442824,262843,238004,220291,230183,223885,179630,175233,146457,116635,25050,10598,3564,17126,643,789,636,755,1708,2102
5/16/20,1467820,272043,241461,233511,230698,224760,179630,175752,148067,118392,24996,9200,3457,13220,515,875,0,519,1610,1757


The above code is used to create new columns for each of our 10 countries and perform mathematical operations to give us the number of new cases per day for each of them.

In [38]:
df_case_increment_top10 = df_case_increment.iloc[1:-1,10:] 

In [39]:
df_case_increment_top10

Country/Region,US_newcases,Russia_newcases,United Kingdom_newcases,Brazil_newcases,Spain_newcases,Italy_newcases,France_newcases,Germany_newcases,Turkey_newcases,Iran_newcases
1/23/20,0,0,0,0,0,0,0,0,0,0
1/24/20,1,0,0,0,0,0,2,0,0,0
1/25/20,0,0,0,0,0,0,1,0,0,0
1/26/20,3,0,0,0,0,0,0,0,0,0
1/27/20,0,0,0,0,0,0,0,1,0,0
...,...,...,...,...,...,...,...,...,...,...
5/12/20,21495,10899,3409,8620,594,1402,802,595,1704,1481
5/13/20,21030,10028,3244,11923,661,888,-165,927,1639,1958
5/14/20,27368,9974,3455,13028,849,992,810,380,1635,1808
5/15/20,25050,10598,3564,17126,643,789,636,755,1708,2102


Lastly, we trim the dataframe to give us only the daily new case data so that we will easily be able to plot them.

In [40]:
df_case_increment_top10.iplot(size=20,width=1.5)

All below observations are made at the time of writing this notebook:

- The number of new cases shows an overall downward trend in the countries USA,UK,Italy,France,Germany and Turkey.

- In Spain the number of new cases seems to be contstant which means the situation is not worsening neither is it getting better.

- In the countries Russia, Brazil and Iran, the number of new cases seems to be increasing daily which shows that there still exists a good amount of spread in these countries.

- Again, we can see that the new case line of the US stands out from the others due to the fact that the number of cases daily is much higher than that of any of the other 9 countries. As this graph is interactive, we are able to deselect the countries we do not want to view from the legend or hover over the lines to see the value of new cases at that point.

### Analyzing spread of COVID-19 cases in the US

<a id='section404'></a>
### 4.4 Which states in the US have the most and least positive cases?

Firstly, we will look the United States of America as a whole and see how the cases are distributed through the country.

We will use a choropleth map to get a geographic plot of the country along with a color-coded legend based on the number of positive cases in each state.

In [41]:
fig = go.Figure(data=go.Choropleth(
    locations=USA_state_stats['state'], # Spatial coordinates
    z = USA_state_stats['positive'], # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Reds', # Reds
    colorbar_title = "COVID-19 cases in USA by state",
))

fig.update_layout(
    width = 800,
    height = 800,
    title_text = 'COVID-19 cases in USA by state',
    geo_scope='usa', # limite map scope to USA
)

fig.show()

Now that we have an overall understanding of how positive cases are spread across the USA as well as have a vague idea of hotspots and coldspots, we can dig a little deeper into the data to see exactly which states have the highest and lowest number of confirmed cases.

#### States with the highest number of confirmed cases

In [42]:
USA_state_stats.sort_values('positive',ascending=False).head(10)

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
34,NY,348232,1030485,22478,1378717,25.257685,6.454892
31,NJ,145089,330135,10249,475224,30.530655,7.06394
14,IL,92457,469192,4129,561649,16.461705,4.46586
19,MA,84933,363156,5705,448089,18.954493,6.717059
4,CA,76793,1102333,3204,1179126,6.512705,4.172255
38,PA,61611,266225,4403,327836,18.793238,7.146451
22,MI,50504,307417,4880,357921,14.110376,9.662601
43,TX,46999,631472,1305,678471,6.927194,2.776655
9,FL,44811,585236,2040,630047,7.112327,4.552454
20,MD,37968,152207,1957,190175,19.964769,5.15434


#### States with the lowest number of confirmed cases

In [43]:
USA_state_stats.sort_values('positive',ascending=False).tail(10)

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
28,ND,1848,51639,42,53487,3.455045,2.272727
21,ME,1648,22092,70,23740,6.94187,4.247573
49,WV,1457,71936,64,73393,1.985203,4.392588
46,VT,934,21342,53,22276,4.192853,5.674518
50,WY,716,15678,7,16394,4.367452,0.977654
11,HI,638,40245,17,40883,1.560551,2.664577
26,MT,468,25623,16,26091,1.793722,3.418803
0,AK,392,32889,10,33281,1.177849,2.55102
53,GU,154,4365,5,4519,3.407834,3.246753
55,VI,69,1194,6,1263,5.463183,8.695652


<a id='section405'></a>
### 4.5 Which states in the US have performed the highest and lowest number of tests?

Similar to the amount of positive cases, we will first look at how the number of tests are distributed across all states using a choropleth map.

In [44]:
fig = go.Figure(data=go.Choropleth(
    locations=USA_state_stats['state'], # Spatial coordinates
    z = USA_state_stats['totalTestResults'], # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'Reds', # Reds
    colorbar_title = "COVID-19 tests in USA by state",
))

fig.update_layout(
    width = 800,
    height = 800,
    title_text = 'COVID-19 tests in USA by state',
    geo_scope='usa', # limite map scope to USA
)

fig.show()

With this understanding, let us move on to analyzing which states have the highest and lowest testing numbers.

#### States with the highest number of tests done

In [45]:
USA_state_stats.sort_values('totalTestResults',ascending=False).head(10)

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
34,NY,348232,1030485,22478,1378717,25.257685,6.454892
4,CA,76793,1102333,3204,1179126,6.512705,4.172255
43,TX,46999,631472,1305,678471,6.927194,2.776655
9,FL,44811,585236,2040,630047,7.112327,4.552454
14,IL,92457,469192,4129,561649,16.461705,4.46586
31,NJ,145089,330135,10249,475224,30.530655,7.06394
19,MA,84933,363156,5705,448089,18.954493,6.717059
22,MI,50504,307417,4880,357921,14.110376,9.662601
38,PA,61611,266225,4403,327836,18.793238,7.146451
10,GA,37147,283922,1592,321069,11.569787,4.285676


#### States with the lowest number of tests done

In [46]:
USA_state_stats.sort_values('totalTestResults',ascending=False).tail(10)

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
7,DC,7042,28490,375,35532,19.818755,5.325192
0,AK,392,32889,10,33281,1.177849,2.55102
41,SD,3959,24217,44,28176,14.050965,1.111392
26,MT,468,25623,16,26091,1.793722,3.418803
21,ME,1648,22092,70,23740,6.94187,4.247573
46,VT,934,21342,53,22276,4.192853,5.674518
50,WY,716,15678,7,16394,4.367452,0.977654
53,GU,154,4365,5,4519,3.407834,3.246753
51,PR,2589,0,122,2589,100.0,4.712244
55,VI,69,1194,6,1263,5.463183,8.695652


<a id='section406'></a>
### 4.6 Which states in the US show the highest rate of positive cases with respect to tests conducted?

To understand this, we have created the column __positive/tests %.__ This column in essence shows us how many tests turn out to be positive out of 100. This can give us a good idea of the extent of infection spread in the state.

#### States with the highest percentage of positive cases with respect to total tests

In [47]:
USA_state_stats.sort_values('positive/tests %',ascending=False).head(10)

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
51,PR,2589,0,122,2589,100.0,4.712244
31,NJ,145089,330135,10249,475224,30.530655,7.06394
34,NY,348232,1030485,22478,1378717,25.257685,6.454892
6,CT,36703,128052,3339,164755,22.277321,9.097349
20,MD,37968,152207,1957,190175,19.964769,5.15434
7,DC,7042,28490,375,35532,19.818755,5.325192
8,DE,7547,32211,286,39758,18.982343,3.789585
19,MA,84933,363156,5705,448089,18.954493,6.717059
38,PA,61611,266225,4403,327836,18.793238,7.146451
5,CO,21232,100608,1150,121840,17.426133,5.416353


In [48]:
top10_pos_tests = USA_state_stats.sort_values('positive/tests %',ascending=False).head(10)
top10_pos_tests.iplot(kind='bar',x='state',y='positive/tests %',color='Blue',fill=True)

From the above graphs we can see that __PR__ shows a 100% positivity rate according to the data. It is followed by __NJ__ and __NY__ in terms of number of positive cases with respect to tests done.

#### States with the lowest percentage of positive cases with respect to total tests

In [49]:
USA_state_stats.sort_values('positive/tests %',ascending=False).tail(10)

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
36,OK,5237,118162,288,123399,4.243957,5.499332
44,UT,7068,159706,78,166774,4.238071,1.103565
46,VT,934,21342,53,22276,4.192853,5.674518
37,OR,3612,88587,137,92199,3.917613,3.792913
28,ND,1848,51639,42,53487,3.455045,2.272727
53,GU,154,4365,5,4519,3.407834,3.246753
49,WV,1457,71936,64,73393,1.985203,4.392588
26,MT,468,25623,16,26091,1.793722,3.418803
11,HI,638,40245,17,40883,1.560551,2.664577
0,AK,392,32889,10,33281,1.177849,2.55102


In [50]:
bottom10_pos_tests = USA_state_stats.sort_values('positive/tests %',ascending=False).tail(10)
bottom10_pos_tests.iplot(kind='bar',x='state',y='positive/tests %',color='Blue',fill=True)

As for the states which have the lowest positive cases with respect to total tests done, __AK__ has the lowest rate with only slightly above 1% followed by __HI__ with about 1.5% and __MT__ with 1.8%.

<a id='section407'></a>
### 4.7 Which states in the US show the highest death rate with respect to positive cases?

To answer this question, we have created the column __death/positive %.__ This shows us how many deaths occor for every 100 positive cases. This gives us an insight about how likey contracting the virus would lead to death.

In [51]:
fig = go.Figure(data=go.Choropleth(
    locations=USA_state_stats['state'], # Spatial coordinates
    z = USA_state_stats['death/positive %'], # Data to be color-coded
    locationmode = 'USA-states', # set of locations match entries in `locations`
    colorscale = 'YlOrRd', # Blues
    colorbar_title = "Deaths wrt positive cases",
))

fig.update_layout(
    width = 800,
    height = 800,
    title_text = 'Deaths with respect to positive cases',
    geo_scope='usa', # limite map scope to USA
)

fig.show()

#### States with the highest percentage of deaths with respect to positive cases

In [52]:
USA_state_stats.sort_values('death/positive %',ascending=False).head(10)

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
22,MI,50504,307417,4880,357921,14.110376,9.662601
6,CT,36703,128052,3339,164755,22.277321,9.097349
55,VI,69,1194,6,1263,5.463183,8.695652
18,LA,34117,225625,2479,259742,13.134957,7.266172
38,PA,61611,266225,4403,327836,18.793238,7.146451
31,NJ,145089,330135,10249,475224,30.530655,7.06394
19,MA,84933,363156,5705,448089,18.954493,6.717059
34,NY,348232,1030485,22478,1378717,25.257685,6.454892
15,IN,27280,144078,1741,171358,15.919887,6.381965
35,OH,27474,219228,1610,246702,11.136513,5.860086


From the dataframe above we can see that __MI__ has the highest death rate with above 9.6% followed by __CT__ with 9%.

#### States with the lowest percentage of deaths with respect to positive cases

In [53]:
USA_state_stats.sort_values('death/positive %',ascending=False).tail(10)

Unnamed: 0,state,positive,negative,death,totalTestResults,positive/tests %,death/positive %
0,AK,392,32889,10,33281,1.177849,2.55102
12,IA,14328,81972,346,96300,14.878505,2.414852
28,ND,1848,51639,42,53487,3.455045,2.272727
16,KS,7886,53706,172,61592,12.803611,2.18108
2,AR,4578,77066,98,81644,5.607271,2.140673
42,TN,17288,302913,295,320201,5.399109,1.706386
29,NE,9772,50206,119,59978,16.292641,1.217765
41,SD,3959,24217,44,28176,14.050965,1.111392
44,UT,7068,159706,78,166774,4.238071,1.103565
50,WY,716,15678,7,16394,4.367452,0.977654


The state with the lowest death rate is __WY__ with a death rate of just under 1%. __UT__ and __SD__ with just above 1.1%.

<a id='section5'></a>
## 5. Conclusion 

- In this notebook, we used various numerical and visualization libraries to perform an Exploratory Data Analysis of COVID-19 data.
- We were able to sucessfully process the datasets by getting rid of irrelevant data or create new columns where necessary.
- We made use of packages like __pandas and plotly__ to develop better insights about the data using visualization. <br/>
- We have also seen how __preproceesing__ helps in dealing with __missing__ and __erroneous__ values and irregualities present in the data. We also _created new features_ which in turn help us to better understand the data.
- We used plotly to be able to visualize geographical data and better understand the spread of our data.
- These steps helped us in developing a deeper understanding of the spread of COVID-19 spread across the globe and in the US. We were able to understand the current situation and estimate how severely each country was hit by the virus.<br/><br/>