# Data Description

You can get data from WHO : https://covid19.who.int/data

| Field name | Type | Description |
| --- | --- | --- |
| Date_reported | Date | Date of reporting to WHO |
| Country_code | String | ISO Alpha-2 country code |
| Country | String | Country, territory, area |
| WHO_region | String | WHO regional offices: WHO Member States are grouped into six WHO regions -- Regional Office for Africa (AFRO), Regional Office for the Americas (AMRO), Regional Office for South-East Asia (SEARO), Regional Office for Europe (EURO), Regional Office for the Eastern Mediterranean (EMRO), and Regional Office for the Western Pacific (WPRO). |
| New_cases | Integer | New confirmed cases. Calculated by subtracting previous cumulative case count from current cumulative cases count.* |
| Cumulative_cases | Integer | Cumulative confirmed cases reported to WHO to date. |
| New_deaths | Integer | New confirmed deaths. Calculated by subtracting previous cumulative deaths from current cumulative deaths.* |
| Cumulative_deaths | Integer | Cumulative confirmed deaths reported to WHO to date. |

# Importing Libraries


In [3]:
import pandas as pd;
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
from plotly.subplots import make_subplots
import plotly.graph_objects as go
pd.set_option('display.float_format','{:.2f}'.format)
sns.set()

# Exploring Data


In [4]:
df = pd.read_csv('WHO-COVID-19-global-data.csv')
df.head()

Unnamed: 0,Date_reported,Country_code,Country,WHO_region,New_cases,Cumulative_cases,New_deaths,Cumulative_deaths
0,2020-01-05,AF,Afghanistan,EMRO,,0,,0
1,2020-01-12,AF,Afghanistan,EMRO,,0,,0
2,2020-01-19,AF,Afghanistan,EMRO,,0,,0
3,2020-01-26,AF,Afghanistan,EMRO,,0,,0
4,2020-02-02,AF,Afghanistan,EMRO,,0,,0


In [5]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50880 entries, 0 to 50879
Data columns (total 8 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   Date_reported      50880 non-null  object 
 1   Country_code       49396 non-null  object 
 2   Country            49608 non-null  object 
 3   WHO_region         45792 non-null  object 
 4   New_cases          36622 non-null  float64
 5   Cumulative_cases   50880 non-null  int64  
 6   New_deaths         24216 non-null  float64
 7   Cumulative_deaths  50880 non-null  int64  
dtypes: float64(2), int64(2), object(4)
memory usage: 3.1+ MB


In [6]:
df.describe()

Unnamed: 0,New_cases,Cumulative_cases,New_deaths,Cumulative_deaths
count,36622.0,50880.0,24216.0,50880.0
mean,21144.9,1595994.13,289.91,18731.22
std,279488.75,7172653.48,1232.47,78002.45
min,-65079.0,0.0,-3432.0,0.0
25%,52.0,2750.0,4.0,21.0
50%,470.0,38136.0,21.0,411.5
75%,4477.75,467050.0,110.0,6066.25
max,40475477.0,103436829.0,47687.0,1165780.0


In [7]:
df.describe(include = 'O')

Unnamed: 0,Date_reported,Country_code,Country,WHO_region
count,50880,49396,49608,45792
unique,212,233,234,6
top,2020-01-05,AF,Afghanistan,EURO
freq,240,212,212,11660


In [8]:
df.duplicated().sum()

211

In [9]:
df.isna().mean()*100

Date_reported        0.00
Country_code         2.92
Country              2.50
WHO_region          10.00
New_cases           28.02
Cumulative_cases     0.00
New_deaths          52.41
Cumulative_deaths    0.00
dtype: float64

# Data Cleaning

In [10]:
df.columns = df.columns.str.lower()

In [11]:
# Turn date_reported into Date Time Data Type
df.date_reported = pd.to_datetime(df.date_reported)

In [12]:
# Dropping Country Code column
df.drop(['country_code'],axis = 1,inplace = True)

In [13]:
# Creating year , month column
df['year'] = df.date_reported.dt.year
df['month'] = df.date_reported.dt.month

# Data Analysis

In [14]:
def show_country_info(country,start = df.date_reported.min(),end = df.date_reported.max() ):
  df_temp = df[(df.country == country) & (df.date_reported <= end )&  (df.date_reported >= start)]
  fig = make_subplots(2,2, subplot_titles=("Cumulative Cases", "Cumulative Deaths", "New Cases", "New Deaths"))
  fig.add_trace(go.Line(x = df_temp['date_reported'] ,y = df_temp['cumulative_cases']),row=1,col=1)
  fig.add_trace(go.Line(x = df_temp['date_reported'] ,y = df_temp['cumulative_deaths']),row=1,col=2)
  fig.add_trace(go.Line(x = df_temp['date_reported'] ,y = df_temp['new_cases']),row=2,col=1)
  fig.add_trace(go.Line(x = df_temp['date_reported'] ,y = df_temp['new_deaths']),row=2,col=2)
  fig.update(layout_showlegend = False)
  fig.show()

In [15]:
# Enter Country name and all of it's details will be shown
show_country_info('Egypt')


plotly.graph_objs.Line is deprecated.
Please replace it with one of the following more specific types
  - plotly.graph_objs.scatter.Line
  - plotly.graph_objs.layout.shape.Line
  - etc.




In [16]:
def show_top_countries():
  df_temp = df.groupby(['country']).sum()
  fig = make_subplots(1,2, subplot_titles=("Cumulative Cases", "Cumulative Deaths"))
  fig.add_trace(go.Bar(x = df_temp.new_cases.nlargest(10).index,y=df_temp.new_cases.nlargest(10).values ),row = 1,col = 1)
  fig.add_trace(go.Bar(x = df_temp.new_deaths.nlargest(10).index,y=df_temp.new_deaths.nlargest(10).values ),row = 1,col = 2)
  fig.update(layout_showlegend = False)
  fig.update_layout(title_text="Information about the top countries")
  fig.show()

In [17]:
# To show the top countries in cases and deaths
show_top_countries()


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



In [18]:
def show_regions():
  df_temp = df.groupby(['who_region']).sum()
  fig = make_subplots(1,2, subplot_titles=("Cumulative Cases", "Cumulative Deaths"))
  fig.add_trace(go.Bar(x = df_temp.new_cases.sort_values().index,y=df_temp.new_cases.sort_values().values ),row = 1,col = 1)
  fig.add_trace(go.Bar(x = df_temp.new_deaths.sort_values().index,y=df_temp.new_deaths.sort_values().values ),row = 1,col = 2)
  fig.update(layout_showlegend = False)
  fig.update_layout(title_text="Information about the Regions")
  fig.show()

In [19]:
# To show informations about the Regions cases and deaths
show_regions()


The default value of numeric_only in DataFrameGroupBy.sum is deprecated. In a future version, numeric_only will default to False. Either specify numeric_only or select only columns which should be valid for the function.



# Creating Map

In [20]:
df_temp = df.groupby(['country']).max()
fig = px.choropleth(df_temp,color=df_temp['cumulative_cases'],locations = df_temp.index,locationmode='country names' ,width = 1000,height = 600,title="Numbers of Cases World Wide",color_continuous_scale='greens')
fig.show()