# F20DV - LAB 3

This notebook is prepared for extracting only the Asian countries' data from the main dataset (owid-covid-data.csv)
Additionally, few of the columns have been dropped as well.

Only Asian countries' data has been extracted as the COVID-19 Dashboard displays information only for Asian countries.

The main dataset, i.e., owid-covid-data.csv, has been taken from the following link - https://github.com/owid/covid-19-data/tree/master/public/data 

### Imports <a class="anchor" id="imports"></a>

In [1]:
import pandas as pd
import numpy as np

### Load Data 

Loading the main dataset, owid-covid-data.csv, into a Pandas DataFrame. This will help in extracting the required data.

In [2]:
df = pd.read_csv("data/owid-covid-data.csv")
display(df.head())

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-02-24,5.0,5.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
1,AFG,Asia,Afghanistan,2020-02-25,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
2,AFG,Asia,Afghanistan,2020-02-26,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
3,AFG,Asia,Afghanistan,2020-02-27,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
4,AFG,Asia,Afghanistan,2020-02-28,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,


Before extracting the required data, we drop some columns which won't be used in the dashboard.

In [3]:
df = df.drop(['new_cases_smoothed', 'new_deaths_smoothed', 'new_cases_smoothed_per_million', 'new_deaths_smoothed_per_million', 'reproduction_rate', 'icu_patients', 'icu_patients_per_million', 'hosp_patients', 'hosp_patients_per_million', 'weekly_icu_admissions', 'weekly_icu_admissions_per_million', 'weekly_hosp_admissions', 'weekly_hosp_admissions_per_million', 'new_tests', 'total_tests', 'total_tests_per_thousand', 'new_tests_per_thousand', 'new_tests_smoothed', 'new_tests_smoothed_per_thousand', 'positive_rate', 'tests_per_case', 'tests_units', 'new_vaccinations_smoothed', 'new_vaccinations_smoothed_per_million', 'new_people_vaccinated_smoothed', 'new_people_vaccinated_smoothed_per_hundred', 'stringency_index', 'population_density', 'median_age', 'aged_65_older', 'aged_70_older', 'gdp_per_capita', 'extreme_poverty', 'cardiovasc_death_rate', 'diabetes_prevalence', 'female_smokers', 'male_smokers', 'handwashing_facilities', 'hospital_beds_per_thousand', 'life_expectancy', 'human_development_index', 'excess_mortality_cumulative_absolute', 'excess_mortality_cumulative', 'excess_mortality', 'excess_mortality_cumulative_per_million'], axis=1)

### Extracting the required data

From the main DataFrame, filter out the data that has the "continent" value as "Asia". This will extract only the rows that has the continent set as "Asia". 

Next, store the newly filtered out values into a new DataFrame called "asia_df".

The "asia_df" DataFrame is then converted to a CSV file called "asia.csv", which will serve as the primary dataset for the dashboard

In [4]:
asia_df = df[df["continent"] == "Asia"]
display(asia_df)
asia_df.to_csv("data/asia.csv", index=False)

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,total_deaths,new_deaths,total_cases_per_million,new_cases_per_million,...,total_vaccinations,people_vaccinated,people_fully_vaccinated,total_boosters,new_vaccinations,total_vaccinations_per_hundred,people_vaccinated_per_hundred,people_fully_vaccinated_per_hundred,total_boosters_per_hundred,population
0,AFG,Asia,Afghanistan,2020-02-24,5.0,5.0,,,0.126,0.126,...,,,,,,,,,,39835428.0
1,AFG,Asia,Afghanistan,2020-02-25,5.0,0.0,,,0.126,0.000,...,,,,,,,,,,39835428.0
2,AFG,Asia,Afghanistan,2020-02-26,5.0,0.0,,,0.126,0.000,...,,,,,,,,,,39835428.0
3,AFG,Asia,Afghanistan,2020-02-27,5.0,0.0,,,0.126,0.000,...,,,,,,,,,,39835428.0
4,AFG,Asia,Afghanistan,2020-02-28,5.0,0.0,,,0.126,0.000,...,,,,,,,,,,39835428.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
165346,YEM,Asia,Yemen,2022-03-03,11772.0,1.0,2135.0,0.0,386.086,0.033,...,,,,,,,,,,30490639.0
165347,YEM,Asia,Yemen,2022-03-04,11774.0,2.0,2135.0,0.0,386.151,0.066,...,,,,,,,,,,30490639.0
165348,YEM,Asia,Yemen,2022-03-05,11775.0,1.0,2135.0,0.0,386.184,0.033,...,,,,,,,,,,30490639.0
165349,YEM,Asia,Yemen,2022-03-06,11777.0,2.0,2138.0,3.0,386.250,0.066,...,784792.0,624837.0,384655.0,,,2.57,2.05,1.26,,30490639.0


In [5]:
asia_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 36296 entries, 0 to 165350
Data columns (total 22 columns):
 #   Column                               Non-Null Count  Dtype  
---  ------                               --------------  -----  
 0   iso_code                             36296 non-null  object 
 1   continent                            36296 non-null  object 
 2   location                             36296 non-null  object 
 3   date                                 36296 non-null  object 
 4   total_cases                          35738 non-null  float64
 5   new_cases                            35713 non-null  float64
 6   total_deaths                         31922 non-null  float64
 7   new_deaths                           31903 non-null  float64
 8   total_cases_per_million              35738 non-null  float64
 9   new_cases_per_million                35713 non-null  float64
 10  total_deaths_per_million             31922 non-null  float64
 11  new_deaths_per_million     