# Enterprise Data Science - Project First & Second Delivery

### Notebook Description

This notebook contains the following time series graphs for the selected countries (i.e. United States, Germany, India and Italy):

1) Linear Graph for total Cases of Covid infectors over time for four countries

2) Linear Graph for the relative cases overtime of Covid infectors (absolut Covid cases/population size)

3) Linear Graphs for People vaccinated over time for four countries 

4) Linear Graphs for the vaccination rate (percentage of the population) over time for four countries

The dataset for this Covid-19 project is taken from https://covid.ourworldindata.org/data/owid-covid-data.

## Delivery 1: The relative cases overtime of Covid infectors (absolute Covid cases/population size)

### Import Python Libraries for Data Science

In [27]:
import numpy as np
import pandas as pd

import matplotlib as mpl
import matplotlib.pyplot as plt

%matplotlib inline

import seaborn as sns

### Dataset loading for visualization

I have download the data from the website "https://covid.ourworldindata.org/data/owid-covid-data" in .csv format and here i have imported the data using pandas read_csv command.

In [28]:
df_country= pd.read_csv('owid-covid-data.csv')

### Important info about the Dataset

In [29]:
df_country.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 197484 entries, 0 to 197483
Data columns (total 67 columns):
 #   Column                                      Non-Null Count   Dtype  
---  ------                                      --------------   -----  
 0   iso_code                                    197484 non-null  object 
 1   continent                                   186033 non-null  object 
 2   location                                    197484 non-null  object 
 3   date                                        197484 non-null  object 
 4   total_cases                                 189559 non-null  float64
 5   new_cases                                   189317 non-null  float64
 6   new_cases_smoothed                          188143 non-null  float64
 7   total_deaths                                170942 non-null  float64
 8   new_deaths                                  170911 non-null  float64
 9   new_deaths_smoothed                         169748 non-null  float64
 

In [30]:
df_country.head()

Unnamed: 0,iso_code,continent,location,date,total_cases,new_cases,new_cases_smoothed,total_deaths,new_deaths,new_deaths_smoothed,...,female_smokers,male_smokers,handwashing_facilities,hospital_beds_per_thousand,life_expectancy,human_development_index,excess_mortality_cumulative_absolute,excess_mortality_cumulative,excess_mortality,excess_mortality_cumulative_per_million
0,AFG,Asia,Afghanistan,2020-02-24,5.0,5.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
1,AFG,Asia,Afghanistan,2020-02-25,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
2,AFG,Asia,Afghanistan,2020-02-26,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
3,AFG,Asia,Afghanistan,2020-02-27,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,
4,AFG,Asia,Afghanistan,2020-02-28,5.0,0.0,,,,,...,,,37.746,0.5,64.83,0.511,,,,


### List of countries in the dataset

In [31]:
df_country['location'].unique()

array(['Afghanistan', 'Africa', 'Albania', 'Algeria', 'Andorra', 'Angola',
       'Anguilla', 'Antigua and Barbuda', 'Argentina', 'Armenia', 'Aruba',
       'Asia', 'Australia', 'Austria', 'Azerbaijan', 'Bahamas', 'Bahrain',
       'Bangladesh', 'Barbados', 'Belarus', 'Belgium', 'Belize', 'Benin',
       'Bermuda', 'Bhutan', 'Bolivia', 'Bonaire Sint Eustatius and Saba',
       'Bosnia and Herzegovina', 'Botswana', 'Brazil',
       'British Virgin Islands', 'Brunei', 'Bulgaria', 'Burkina Faso',
       'Burundi', 'Cambodia', 'Cameroon', 'Canada', 'Cape Verde',
       'Cayman Islands', 'Central African Republic', 'Chad', 'Chile',
       'China', 'Colombia', 'Comoros', 'Congo', 'Cook Islands',
       'Costa Rica', "Cote d'Ivoire", 'Croatia', 'Cuba', 'Curacao',
       'Cyprus', 'Czechia', 'Democratic Republic of Congo', 'Denmark',
       'Djibouti', 'Dominica', 'Dominican Republic', 'Ecuador', 'Egypt',
       'El Salvador', 'Equatorial Guinea', 'Eritrea', 'Estonia',
       'Eswatini', 'Ethi

### Data Cleaning: Filtering out important data from the main Dataset

In [32]:
df_country_filteredData = df_country[['date','location','total_cases','population', 'people_vaccinated','people_fully_vaccinated' ]]

In [33]:
df_country_filteredData.head()

Unnamed: 0,date,location,total_cases,population,people_vaccinated,people_fully_vaccinated
0,2020-02-24,Afghanistan,5.0,39835428.0,,
1,2020-02-25,Afghanistan,5.0,39835428.0,,
2,2020-02-26,Afghanistan,5.0,39835428.0,,
3,2020-02-27,Afghanistan,5.0,39835428.0,,
4,2020-02-28,Afghanistan,5.0,39835428.0,,


In [34]:
#df_country_filteredData['people_vaccinated_partially_atleast'] = df_country_filteredData['total_vaccinations']-df_country_filteredData['people_fully_vaccinated']

### Filtering out data for 4 countries - 'India', 'Germany', 'USA', 'China'

In [35]:
df_final_dataset = df_country_filteredData[(df_country_filteredData['location']=='India')| (df_country_filteredData['location']=='Germany') | (df_country_filteredData['location']=='United States') | (df_country_filteredData['location']=='Italy')]

In [36]:
df_final_dataset.reset_index(drop=True)

Unnamed: 0,date,location,total_cases,population,people_vaccinated,people_fully_vaccinated
0,2020-01-27,Germany,1.0,83900471.0,,
1,2020-01-28,Germany,4.0,83900471.0,,
2,2020-01-29,Germany,4.0,83900471.0,,
3,2020-01-30,Germany,4.0,83900471.0,,
4,2020-01-31,Germany,5.0,83900471.0,,
...,...,...,...,...,...,...
3529,2022-06-24,United States,86909476.0,332915074.0,,
3530,2022-06-25,United States,86948848.0,332915074.0,,
3531,2022-06-26,United States,86967399.0,332915074.0,,
3532,2022-06-27,United States,87092233.0,332915074.0,,


In [37]:
df_final_dataset.head()

Unnamed: 0,date,location,total_cases,population,people_vaccinated,people_fully_vaccinated
66594,2020-01-27,Germany,1.0,83900471.0,,
66595,2020-01-28,Germany,4.0,83900471.0,,
66596,2020-01-29,Germany,4.0,83900471.0,,
66597,2020-01-30,Germany,4.0,83900471.0,,
66598,2020-01-31,Germany,5.0,83900471.0,,


### Linear Graph for total Cases of Covid infectors over time for four countries

In [38]:
import plotly.express as px
fig = px.line(df_final_dataset, x="date", y="total_cases",color ='location', title='Total COVID-19 cases over time', markers = True )
fig.show()

### Creating a new column for absolute_Covid_cases/population in percentage

In [61]:
df_final_dataset['(absolute_Covid_cases/population) in %age'] = (df_final_dataset['total_cases'] / df_final_dataset['population']) *100



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [40]:
df_final_dataset

Unnamed: 0,date,location,total_cases,population,people_vaccinated,people_fully_vaccinated,(absolute_Covid_cases/population) in %age
66594,2020-01-27,Germany,1.0,83900471.0,,,0.000001
66595,2020-01-28,Germany,4.0,83900471.0,,,0.000005
66596,2020-01-29,Germany,4.0,83900471.0,,,0.000005
66597,2020-01-30,Germany,4.0,83900471.0,,,0.000005
66598,2020-01-31,Germany,5.0,83900471.0,,,0.000006
...,...,...,...,...,...,...,...
186950,2022-06-24,United States,86909476.0,332915074.0,,,26.105600
186951,2022-06-25,United States,86948848.0,332915074.0,,,26.117426
186952,2022-06-26,United States,86967399.0,332915074.0,,,26.122998
186953,2022-06-27,United States,87092233.0,332915074.0,,,26.160496


###  Linear Graph for the relative cases overtime of Covid infectors (absolut Covid cases/population size)


In [41]:
fig = px.line(df_final_dataset, x="date", y="(absolute_Covid_cases/population) in %age",color ='location', title= 'Covid Cases/Popultaion in percentage', markers = True)
fig.show()

## Delivery 2: The vaccination rate (percentage of the population) over time

### Creating a new column for vaccination_rate in percentage

In [65]:
#df_final_dataset['vaccination_rate_fully_vaccinated'] = ((df_final_dataset['people_fully_vaccinated'])/(df_final_dataset['population']))*100
df_final_dataset['vaccination_rate_fully_vaccinated'] = df_final_dataset.loc[:, ('people_fully_vaccinated')] *100 / df_final_dataset.loc[:, ('population')]
#df_final_dataset['vaccination_rate_partially_vaccinated'] = ((df_final_dataset['people_vaccinated'])/(df_final_dataset['population']))*100



A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



In [43]:
df_final_dataset

Unnamed: 0,date,location,total_cases,population,people_vaccinated,people_fully_vaccinated,(absolute_Covid_cases/population) in %age,vaccination_rate_fully_vaccinated,vaccination_rate_partially_vaccinated
66594,2020-01-27,Germany,1.0,83900471.0,,,0.000001,,
66595,2020-01-28,Germany,4.0,83900471.0,,,0.000005,,
66596,2020-01-29,Germany,4.0,83900471.0,,,0.000005,,
66597,2020-01-30,Germany,4.0,83900471.0,,,0.000005,,
66598,2020-01-31,Germany,5.0,83900471.0,,,0.000006,,
...,...,...,...,...,...,...,...,...,...
186950,2022-06-24,United States,86909476.0,332915074.0,,,26.105600,,
186951,2022-06-25,United States,86948848.0,332915074.0,,,26.117426,,
186952,2022-06-26,United States,86967399.0,332915074.0,,,26.122998,,
186953,2022-06-27,United States,87092233.0,332915074.0,,,26.160496,,


### Linear Graphs for People vaccinated over time for four countries 

In [44]:
fig = px.line(df_final_dataset, x="date", y="people_fully_vaccinated",color ='location', title= 'People fully vaccinated over time', markers = True)
fig.show()

In [45]:
fig = px.line(df_final_dataset, x="date", y="people_vaccinated",color ='location', title= 'People vaccinated with atleast one dose over time', markers = True)
fig.show()

###  Linear Graphs for the vaccination rate (percentage of the population) over time for four countries

In [46]:
fig = px.line(df_final_dataset, x="date", y="vaccination_rate_fully_vaccinated",color ='location', title= 'Vaccination_rate with both doses over time', markers = True)
fig.show()

In [47]:
fig = px.line(df_final_dataset, x="date", y="vaccination_rate_partially_vaccinated" ,color ='location', title= 'Vaccination_rate of partial dose in percentage over time', markers = True)
fig.show()

## Thank you!!!!