# Task: Covid-19 Data Analysis
### This notebook is used to understand the comprehension of Data Analysis techniques using Pandas library.

### Data Source: 
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports

### File naming convention

MM-DD-YYYY.csv in UTC.

### Field description

- Province_State: China - province name; US/Canada/Australia/ - city name, state/province name; Others - name of the event (e.g., "Diamond Princess" cruise ship); other countries - blank.

- Country_Region: country/region name conforming to WHO (will be updated).

- Last_Update: MM/DD/YYYY HH:mm (24 hour format, in UTC).

- Confirmed: the number of confirmed cases. For Hubei Province: from Feb 13 (GMT +8), we report both clinically diagnosed and lab-confirmed cases. For lab-confirmed cases only (Before Feb 17), please refer to who_covid_19_situation_reports. For Italy, diagnosis standard might be changed since Feb 27 to "slow the growth of new case numbers." (Source)

- Deaths: the number of deaths.

- Recovered: the number of recovered cases.

### Question 1

#### Read the dataset

In [62]:
import pandas as pd                                                                                           
import matplotlib.pyplot as plt                                                                               
import plotly.express as px                                                                                   
import plotly.graph_objects as go                                                                             
import numpy as np

df = pd.read_csv('05-30-2020.csv')

#### Display the top 5 rows in the data

In [63]:
df.head()

Unnamed: 0,FIPS,Admin2,Province_State,Country_Region,Last_Update,Lat,Long_,Confirmed,Deaths,Recovered,Active,Combined_Key,Incidence_Rate,Case-Fatality_Ratio
0,45001.0,Abbeville,South Carolina,US,2020-05-31 02:32:45,34.223334,-82.461707,39,0,0,39,"Abbeville, South Carolina, US",159.00844,0.0
1,22001.0,Acadia,Louisiana,US,2020-05-31 02:32:45,30.295065,-92.414197,412,23,0,389,"Acadia, Louisiana, US",664.034169,5.582524
2,51001.0,Accomack,Virginia,US,2020-05-31 02:32:45,37.767072,-75.632346,863,12,0,851,"Accomack, Virginia, US",2670.503775,1.390498
3,16001.0,Ada,Idaho,US,2020-05-31 02:32:45,43.452658,-116.241552,805,22,0,783,"Ada, Idaho, US",167.155675,2.732919
4,19001.0,Adair,Iowa,US,2020-05-31 02:32:45,41.330756,-94.471059,9,0,0,9,"Adair, Iowa, US",125.838926,0.0


#### Show the information of the dataset

In [64]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3566 entries, 0 to 3565
Data columns (total 14 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   FIPS                 3024 non-null   float64
 1   Admin2               3024 non-null   object 
 2   Province_State       3384 non-null   object 
 3   Country_Region       3566 non-null   object 
 4   Last_Update          3566 non-null   object 
 5   Lat                  3491 non-null   float64
 6   Long_                3491 non-null   float64
 7   Confirmed            3566 non-null   int64  
 8   Deaths               3566 non-null   int64  
 9   Recovered            3566 non-null   int64  
 10  Active               3566 non-null   int64  
 11  Combined_Key         3566 non-null   object 
 12  Incidence_Rate       3491 non-null   float64
 13  Case-Fatality_Ratio  3498 non-null   float64
dtypes: float64(5), int64(4), object(5)
memory usage: 390.2+ KB


#### Show the sum of missing values of features in the dataset

In [65]:
df.isnull().sum()

FIPS                   542
Admin2                 542
Province_State         182
Country_Region           0
Last_Update              0
Lat                     75
Long_                   75
Confirmed                0
Deaths                   0
Recovered                0
Active                   0
Combined_Key             0
Incidence_Rate          75
Case-Fatality_Ratio     68
dtype: int64

### Question 2

#### Show the number of Confirmed cases by Country

In [66]:
# Confirmed cases by Country                                                                                  
confirmed_by_country = df.groupby('Country_Region')['Confirmed'].sum().sort_values(ascending=False)           
print("Confirmed cases by Country:")                                                                          
print(confirmed_by_country)                                                                                         

Confirmed cases by Country:
Country_Region
US                      1765287
Brazil                   502914
Russia                   396575
United Kingdom           274215
Spain                    239228
                         ...   
Kiribati                      0
Winter Olympics 2022          0
Antarctica                    0
Tonga                         0
Samoa                         0
Name: Confirmed, Length: 197, dtype: int64


#### Show the number of Deaths by Country

In [67]:
# Deaths by Country                                                                                           
deaths_by_country = df.groupby('Country_Region')['Deaths'].sum().sort_values(ascending=False)                 
print("\nDeaths by Country:")                                                                                 
print(deaths_by_country)  


Deaths by Country:
Country_Region
US                                  104034
United Kingdom                       52374
Italy                                33340
Brazil                               28896
France                               28774
                                     ...  
Holy See                                 0
Grenada                                  0
Saint Vincent and the Grenadines         0
Eritrea                                  0
Laos                                     0
Name: Deaths, Length: 197, dtype: int64


#### Show the number of Recovered cases by Country

In [68]:
# Recovered cases by Country                                                                                  
recovered_by_country = df.groupby('Country_Region')['Recovered'].sum().sort_values(ascending=False)           
print("\nRecovered cases by Country:")                                                                        
print(recovered_by_country) 


Recovered cases by Country:
Country_Region
US              416461
Brazil          200892
Russia          167469
Germany         164908
Italy           155633
                 ...  
MS Zaandam           0
Korea, North         0
Kiribati             0
Tuvalu               0
Tonga                0
Name: Recovered, Length: 197, dtype: int64


#### Show the number of Active Cases by Country

In [69]:
# Calculate Active Cases                                                                                      
df['Active'] = df['Confirmed'] - df['Deaths'] - df['Recovered']                                               
active_by_country = df.groupby('Country_Region')['Active'].sum().sort_values(ascending=False)                 
print("\nActive cases by Country:")                                                                           
print(active_by_country) 


Active cases by Country:
Country_Region
US                  1244792
Brazil               273126
Russia               224551
United Kingdom       220654
France                93583
                     ...   
Korea, North              0
Papua New Guinea          0
Palau                     0
Montenegro                0
Dominica                  0
Name: Active, Length: 197, dtype: int64


#### Show the latest number of Confirmed, Deaths, Recovered and Active cases Country-wise

In [70]:
# Latest numbers for all metrics by country                                                                   
latest_numbers = df.groupby('Country_Region').agg({                                                           
    'Confirmed': 'sum',                                                                                       
    'Deaths': 'sum',                                                                                          
    'Recovered': 'sum',                                                                                       
    'Active': 'sum'                                                                                           
}).sort_values('Confirmed', ascending=False)                                                                  
print("\nLatest numbers by Country:")                                                                         
print(latest_numbers) 


Latest numbers by Country:
                      Confirmed  Deaths  Recovered   Active
Country_Region                                             
US                      1765287  104034     416461  1244792
Brazil                   502914   28896     200892   273126
Russia                   396575    4555     167469   224551
United Kingdom           274215   52374       1187   220654
Spain                    239228   27125     150376    61727
...                         ...     ...        ...      ...
Kiribati                      0       0          0        0
Winter Olympics 2022          0       0          0        0
Antarctica                    0       0          0        0
Tonga                         0       0          0        0
Samoa                         0       0          0        0

[197 rows x 4 columns]


### Question 3

### Show the countries with no recovered cases

In [71]:
# Countries with no recovered cases                                                                           
no_recovered = latest_numbers[latest_numbers['Recovered'] == 0].index.tolist()                                
print("Countries with no recovered cases:")                                                                   
print(no_recovered)

Countries with no recovered cases:
['MS Zaandam', 'Summer Olympics 2020', 'Nauru', 'Palau', 'Tuvalu', 'Korea, North', 'Kiribati', 'Winter Olympics 2022', 'Antarctica', 'Tonga', 'Samoa']


#### Show the countries with no confirmed cases

In [72]:
# Countries with no confirmed cases                                                                           
no_confirmed = latest_numbers[latest_numbers['Confirmed'] == 0].index.tolist()                                
print("\nCountries with no confirmed cases:")                                                                 
print(no_confirmed) 


Countries with no confirmed cases:
['Summer Olympics 2020', 'Nauru', 'Palau', 'Tuvalu', 'Korea, North', 'Kiribati', 'Winter Olympics 2022', 'Antarctica', 'Tonga', 'Samoa']


#### Show the countries with no deaths

In [73]:
# Countries with no deaths                                                                                    
no_deaths = latest_numbers[latest_numbers['Deaths'] == 0].index.tolist()                                      
print("\nCountries with no deaths:")                                                                          
print(no_deaths) 


Countries with no deaths:
['Uganda', 'Vietnam', 'Mongolia', 'Cambodia', 'Eritrea', 'Bhutan', 'Saint Vincent and the Grenadines', 'Timor-Leste', 'Namibia', 'Grenada', 'Laos', 'Fiji', 'Saint Lucia', 'Dominica', 'Saint Kitts and Nevis', 'Holy See', 'Seychelles', 'Papua New Guinea', 'Lesotho', 'Summer Olympics 2020', 'Nauru', 'Palau', 'Tuvalu', 'Korea, North', 'Kiribati', 'Winter Olympics 2022', 'Antarctica', 'Tonga', 'Samoa']


### Question 4

#### Show the Top 10 countries with Confirmed cases

In [74]:
# Top 10 countries with confirmed cases                                                      
top_10_confirmed = latest_numbers.nlargest(10, 'Confirmed')
print("\nTop 10 Countries with Confirmed Cases:")                                            
print(top_10_confirmed) 


Top 10 Countries with Confirmed Cases:
                Confirmed  Deaths  Recovered   Active
Country_Region                                       
US                1765287  104034     416461  1244792
Brazil             502914   28896     200892   273126
Russia             396575    4555     167469   224551
United Kingdom     274215   52374       1187   220654
Spain              239228   27125     150376    61727
Italy              232664   33340     155633    43691
France             190743   28774      68386    93583
Germany            182842    8489     164908     9445
India              181827    5185      86936    89706
Turkey             163103    4515     126984    31604


#### Show the Top 10 Countries with Active cases

In [75]:
# Top 10 countries with active cases                                                                          
top_10_confirmed = latest_numbers.nlargest(10, 'Active')
print("\nTop 10 Countries with Active Cases:")                                            
print(top_10_confirmed) 


Top 10 Countries with Active Cases:
                Confirmed  Deaths  Recovered   Active
Country_Region                                       
US                1765287  104034     416461  1244792
Brazil             502914   28896     200892   273126
Russia             396575    4555     167469   224551
United Kingdom     274215   52374       1187   220654
France             190743   28774      68386    93583
India              181827    5185      86936    89706
Peru               155671   20111      66447    69113
Spain              239228   27125     150376    61727
Chile               94858     997      40431    53430
Italy              232664   33340     155633    43691


### Question 5

#### Plot Country-wise Total deaths, confirmed, recovered and active casaes where total deaths have exceeded 50,000

In [76]:
# Countries with deaths > 50,000                                                                              
high_death_countries = latest_numbers[latest_numbers['Deaths'] > 50000]                                       
                                                                                                            
fig = go.Figure()                                                                                             
for metric in ['Deaths', 'Confirmed', 'Recovered', 'Active']:                                                 
    fig.add_bar(name=metric,                                                                                  
                x=high_death_countries.index,                                                                 
                y=high_death_countries[metric])                                                               
                                                                                                            
fig.update_layout(title='Countries with Deaths > 50,000',                                                     
                barmode='group',                                                                             
                xaxis_title='Country',                                                                       
                yaxis_title='Number of Cases')                                                               
fig.show()  

ValueError: Mime type rendering requires nbformat>=4.2.0 but it is not installed

### Question 6

### Plot Province/State wise Deaths in USA

In [18]:
import plotly.express as px

In [None]:
df.columns

In [None]:
# Filter USA data                                                                                             
usa_data = df[df['Country_Region'] == 'US'] 

In [None]:
# Deaths by state                                                                                             
fig_deaths = px.bar(usa_data,                                                                                 
                x='Province_State',                                                                        
                y='Deaths',                                                                                
                title='Deaths by State in USA')                                                            
fig_deaths.update_xaxes(tickangle=45)                                                                         
fig_deaths.show() 

### Question 7

### Plot Province/State Wise Active Cases in USA

In [None]:
# Active cases by state                                                                                       
fig_active = px.bar(usa_data,                                                                                 
                x='Province_State',                                                                        
                y='Active',                                                                                
                title='Active Cases by State in USA')                                                      
fig_active.update_xaxes(tickangle=45)                                                                         
fig_active.show()  

### Question 8

### Plot Province/State Wise Confirmed cases in USA

In [None]:
# Confirmed cases by state                                                                                    
fig_confirmed = px.bar(usa_data,                                                                              
                    x='Province_State',                                                                     
                    y='Confirmed',                                                                          
                    title='Confirmed Cases by State in USA')                                                
fig_confirmed.update_xaxes(tickangle=45)                                                                      
fig_confirmed.show()

### Question 9

### Plot Worldwide Confirmed Cases over time

In [24]:
import plotly.express as px
import plotly.io as pio

In [None]:
# Convert Last_Update to datetime                                                                             
df['Last_Update'] = pd.to_datetime(df['Last_Update'])                                                         
                                                                                                            
# Group by date and sum confirmed cases                                                                       
worldwide_cases = df.groupby('Last_Update')['Confirmed'].sum().reset_index()                                  
                                                                                                            
# Create time series plot                                                                                     
fig = px.line(worldwide_cases,                                                                                
            x='Last_Update',                                                                                
            y='Confirmed',                                                                                  
            title='Worldwide Confirmed Cases Over Time')                                                    
fig.show()