<a href="https://colab.research.google.com/github/9789604858/TNDSC_Generative-AI/blob/main/PREDICTIVE%20MODELING%20OF%20DISEASE%20OUTBREAKS%20AND%20EPIDEMOLOGICAL%20TRENDS%20WITH%20SPATIO-%20TEMPORAL%20ANALYSIS%20.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# IMPORTANT: RUN THIS CELL IN ORDER TO IMPORT YOUR KAGGLE DATA SOURCES
# TO THE CORRECT LOCATION (/kaggle/input) IN YOUR NOTEBOOK,
# THEN FEEL FREE TO DELETE THIS CELL.
# NOTE: THIS NOTEBOOK ENVIRONMENT DIFFERS FROM KAGGLE'S PYTHON
# ENVIRONMENT SO THERE MAY BE MISSING LIBRARIES USED BY YOUR
# NOTEBOOK.

import os
import sys
from tempfile import NamedTemporaryFile
from urllib.request import urlopen
from urllib.parse import unquote, urlparse
from urllib.error import HTTPError
from zipfile import ZipFile
import tarfile
import shutil

CHUNK_SIZE = 40960
DATA_SOURCE_MAPPING = 'covid19:https%3A%2F%2Fstorage.googleapis.com%2Fkaggle-data-sets%2F758684%2F1310259%2Fbundle%2Farchive.zip%3FX-Goog-Algorithm%3DGOOG4-RSA-SHA256%26X-Goog-Credential%3Dgcp-kaggle-com%2540kaggle-161607.iam.gserviceaccount.com%252F20240330%252Fauto%252Fstorage%252Fgoog4_request%26X-Goog-Date%3D20240330T035408Z%26X-Goog-Expires%3D259200%26X-Goog-SignedHeaders%3Dhost%26X-Goog-Signature%3D63fa55534eb80556699c72f600c39c7740b7f1354482d6ec6d88b18806c3eef27715ae45086cd03d35b344608ce23fb3d0018676068a1c7663ef89658a3ca1631cfd9a283e76aed46a3247c021c77027e1fdb4ac31316fcf8cc78ccd91eef025acae2a5c67a5f22016da0db4e0a7cc20446c1d8d851aa6eeac4f9e263de619c4019aa6faaf6314a5db181430fb020af7e85df8c9a38d74a7c9e7cc117f9d4cf1a3e92164b51dc5d39c81ac2d8b8ce7f5222fb3ed1a5a9142722c0180320f9f1d90b4eb76df265623b682d06da75a35c649fbad193fe029aa5754f07e93fcd565fccd4c4a2b5f89c45018bec8d818c4035280410eba76343a4d56148267946bc7'

KAGGLE_INPUT_PATH='/kaggle/input'
KAGGLE_WORKING_PATH='/kaggle/working'
KAGGLE_SYMLINK='kaggle'

!umount /kaggle/input/ 2> /dev/null
shutil.rmtree('/kaggle/input', ignore_errors=True)
os.makedirs(KAGGLE_INPUT_PATH, 0o777, exist_ok=True)
os.makedirs(KAGGLE_WORKING_PATH, 0o777, exist_ok=True)

try:
  os.symlink(KAGGLE_INPUT_PATH, os.path.join("..", 'input'), target_is_directory=True)
except FileExistsError:
  pass
try:
  os.symlink(KAGGLE_WORKING_PATH, os.path.join("..", 'working'), target_is_directory=True)
except FileExistsError:
  pass

for data_source_mapping in DATA_SOURCE_MAPPING.split(','):
    directory, download_url_encoded = data_source_mapping.split(':')
    download_url = unquote(download_url_encoded)
    filename = urlparse(download_url).path
    destination_path = os.path.join(KAGGLE_INPUT_PATH, directory)
    try:
        with urlopen(download_url) as fileres, NamedTemporaryFile() as tfile:
            total_length = fileres.headers['content-length']
            print(f'Downloading {directory}, {total_length} bytes compressed')
            dl = 0
            data = fileres.read(CHUNK_SIZE)
            while len(data) > 0:
                dl += len(data)
                tfile.write(data)
                done = int(50 * dl / int(total_length))
                sys.stdout.write(f"\r[{'=' * done}{' ' * (50-done)}] {dl} bytes downloaded")
                sys.stdout.flush()
                data = fileres.read(CHUNK_SIZE)
            if filename.endswith('.zip'):
              with ZipFile(tfile) as zfile:
                zfile.extractall(destination_path)
            else:
              with tarfile.open(tfile.name) as tarfile:
                tarfile.extractall(destination_path)
            print(f'\nDownloaded and uncompressed: {directory}')
    except HTTPError as e:
        print(f'Failed to load (likely expired) {download_url} to path {destination_path}')
        continue
    except OSError as e:
        print(f'Failed to load {download_url} to path {destination_path}')
        continue

print('Data source import complete.')


# <center> Covid 19 Data Analysis </center>
<center>
  <img src="https://www.iotworldtoday.com/files/2020/03/coronavirus-crisis-management.jpg" />
</center>
<!-- <img src="https://www.iotworldtoday.com/files/2020/03/coronavirus-crisis-management.jpg" style="float: center;"/>
 -->

</n>
Coronavirus disease (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. Most people who fall sick with COVID-19 will experience mild to moderate symptoms and recover without special treatment. However, some will become seriously ill and require medical attention. It stuck to the world and left a scathing mark.

In this analysis, I'll show the effect of the virus on several countries. The data I used in this analysis was gotten from the <a href='https://covid19.who.int/data'>WHO database</a>

### Let's Import the modules

We will start by importing some useful `Python` `libraries`. ***Numpy*** and ***Pandas***  are for performing mathematical operations and manipulating data; We will start by importing some useful Python libraries. Tabulate is a Pandas series and dataframe pretty-printing library; ***Seaborn*** and ***Matplolib*** are visualisation libraries for creating informative visuals.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from tabulate import tabulate
import warnings
warnings.filterwarnings('ignore')

## INITIAL EXPLORATION

Importing daily cases and deaths by date reported to WHO

In [None]:
df=pd.read_csv('https://covid19.who.int/WHO-COVID-19-global-data.csv')

# df=pd.read_csv("../input/covid19/covid19_Confirmed_dataset.csv")
df.head()

## Data Description

- `Date_reported` : Date of reporting to WHO
- `Country_code`: String	ISO Alpha-2 country code
- `Country`	String	Country, territory, area
- `WHO_region`	String	WHO regional offices: WHO Member States are grouped into six WHO regions -- Regional Office for Africa (AFRO), Regional Office for the Americas (AMRO), Regional Office for South-East Asia (SEARO), Regional Office for Europe (EURO), Regional Office for the Eastern Mediterranean (EMRO), and Regional Office for the Western Pacific (WPRO).
- `New_cases`	Integer 	New confirmed cases. Calculated by subtracting previous cumulative case count from current cumulative cases count.*
- `Cumulative_cases`	Integer	Cumulative confirmed cases reported to WHO to date.
- `New_deaths`	Integer	New confirmed deaths. Calculated by subtracting previous cumulative deaths from current cumulative deaths.*
- `Cumulative_deaths`	Integer	Cumulative confirmed deaths reported to WHO to date.

In [None]:
df.describe()

In [None]:
df.info()

In [None]:
df.shape

In [None]:
df.describe(include='all')

## Initial Notes

- The dataset comprises 254064 rows and 8 columns.
- Just one of those columns (`Country_code`) contains about `1142` missing datas
- Date data `(Date_reported)` are stored as object data instead of datetime or numerical data and it contains about 1072 unique datas

## Objectives
I'd be moving forward with the following rows and columns and I'd be answering the following question
- What `countries` have the highest number of new cases and death
- What `countries` have the highest number of cumulative cases and deaths
- What `WHO_region` which also signifies each continent are mostly affected
- Using the vaccination information also to answer likely causes of why the countries have the most deaths

## Removing Unwanted Columns and Converting the date from object to datetime format
`Country_code` seems to be the same with the country itsef so i am going to drop the country code and converting the `Date_reported` column from object to datetime format to enable me remove the year since that's all i want to deal with

In [None]:
df['Date_reported'] = pd.to_datetime(df['Date_reported'], errors='coerce')


In [None]:
df['year'] = df['Date_reported'].dt.year

In [None]:
df.drop(['Country_code','Date_reported'],axis=1,inplace=True)
df.head(10)

In [None]:
df['Country'].unique()

## Analysis Of Data

Flitering the data to better understand  and answer the question.

The data will be sorted in this analysis from the nation with the most COVID-19 cases to the nation with the fewest. The nation with the highest fatalities will be identified as we continue our investigation. We will identify the nations in our study that do the best the best job of preventing the epidemic from spreading.

### Countries with most cumulative covid 19 cases

From the nation with the most COVID-19 cases to the nation with the fewest, the countries are ranked here. Then, we will determine the causes behind each nation's amassing of such numbers.

In [None]:
covid_Cumulative_cases = pd.DataFrame(df.groupby("Country")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=False,axis=0)

covid_Cumulative_cases.head()

In [None]:
covid_Cumulative_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative cases (Millions)',title = 'Cumulative Covid19 Cases')
plt.show()


## Countries with the least covid19 cases

In [None]:
covid_Cumulative_least_cases = pd.DataFrame(df.groupby("Country")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=True,axis=0)

covid_Cumulative_least_cases.head(20)

In [None]:
covid_Cumulative_least_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative cases (Millions)',title = 'No of Cumulative cases')
plt.show()


In [None]:
covid_new_cases = pd.DataFrame(df.groupby("Country")['New_cases']
                               .agg('sum')).sort_values(by='New_cases',ascending=False,axis=0)

covid_new_cases.head()

In [None]:
covid_new_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of New_cases',title = 'New Covid19 cases Recorded', figsize=(15,10))
plt.show()

In [None]:
covid_least_new_cases = pd.DataFrame(df.groupby("Country")['New_cases']
                               .agg('sum')).sort_values(by='New_cases',ascending=True,axis=0)

covid_least_new_cases.head(10)

In [None]:
covid_least_new_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of New_cases',title = 'New Covid19 cases Recorded', figsize=(15,10))
plt.show()

In [None]:
covid_new_death = pd.DataFrame(df.groupby("Country")['New_deaths']
                               .agg('sum')).sort_values(by='New_deaths',ascending=False,axis=0)

covid_new_death.head()

In [None]:
covid_new_death.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of New_Death',title = 'Total of Number of deaths')
plt.show()

In [None]:
covid_new_least_death = pd.DataFrame(df.groupby("Country")['New_deaths']
                               .agg('sum')).sort_values(by='New_deaths',ascending=True,axis=0)

covid_new_least_death.head(10)

In [None]:
covid_new_least_death.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of New_Death',title = 'Total of Number of deaths')
plt.show()

In [None]:
covid_Cumulative_deaths = pd.DataFrame(df.groupby("Country")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=False,axis=0)

covid_Cumulative_deaths.head()

In [None]:
covid_Cumulative_deaths.head(15).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative death',title = 'No of Cumulative deaths')
plt.show()

In [None]:
covid_Cumulative_least_deaths = pd.DataFrame(df.groupby("Country")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=True,axis=0)

covid_Cumulative_least_deaths.head(10)

In [None]:
covid_Cumulative_least_deaths.head(15).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative death',title = 'Cumulative deaths')
plt.show()

In [None]:
covid_cum_new_year = pd.DataFrame(df.groupby(["Country",'year'])['Cumulative_cases']
                               .agg('sum')).sort_values(by=['year', 'Cumulative_cases'],ascending=False,axis=0)

covid_cum_new_year.head(10)

In [None]:
covid_cum_new_year = pd.DataFrame(df.groupby(["Country",'year'])['Cumulative_cases']
                               .agg('sum')).sort_values(by=['year', 'Cumulative_cases'],ascending=True,axis=0)

covid_cum_new_year.head(10)

In [None]:
covid_New_cases_year = pd.DataFrame(df.groupby(["Country",'year'])['New_cases']
                               .agg('sum')).sort_values(by=['year', 'New_cases'],ascending=False,axis=0)

covid_New_cases_year.head(10)

In [None]:
covid_New_cases_year = pd.DataFrame(df.groupby(["Country",'year'])['New_cases']
                               .agg('sum')).sort_values(by=['year', 'New_cases'],ascending=True,axis=0)

covid_New_cases_year.head(10)

In [None]:
covid_cum_death_cases_year = pd.DataFrame(df.groupby(["Country",'year'])['Cumulative_deaths']
                               .agg('sum')).sort_values(by=['year', 'Cumulative_deaths'],ascending=False,axis=0)

covid_cum_death_cases_year.head(10)

In [None]:
covid_new_death_cases_year = pd.DataFrame(df.groupby(["Country",'year'])['New_deaths']
                               .agg('sum')).sort_values(by=['year', 'New_deaths'],ascending=False,axis=0)

covid_new_death_cases_year.head(10)

In [None]:
cases_2020 = df[df['year']== 2020]

cases_2020.head()

In [None]:
covid_New_cases_year_cases_2020 = pd.DataFrame(cases_2020.groupby(["Country",'year'])['New_cases']
                               .agg('sum')).sort_values(by=['year', 'New_cases'],ascending=False,axis=0)

covid_New_cases_year_cases_2020.head(10)

In [None]:
covid_New_cases_year_cases_2020 = pd.DataFrame(cases_2020.groupby(["Country",'year'])['New_cases']
                               .agg('sum')).sort_values(by=['year', 'New_cases'],ascending=True,axis=0)

covid_New_cases_year_cases_2020.head(10)

In [None]:
cases_2021 = df[df['year']== 2021]

cases_2021.head()

In [None]:
covid_New_cases_year_cases_2021 = pd.DataFrame(cases_2021.groupby(["Country",'year'])['New_cases']
                               .agg('sum')).sort_values(by=['year', 'New_cases'],ascending=False,axis=0)

covid_New_cases_year_cases_2021.head(10)

In [None]:
covid_New_cases_year_cases_2021 = pd.DataFrame(cases_2021.groupby(["Country",'year'])['New_cases']
                               .agg('sum')).sort_values(by=['year', 'New_cases'],ascending=True,axis=0)

covid_New_cases_year_cases_2021.head(10)

In [None]:
cases_2022 = df[df['year']== 2022]

cases_2022.head()

In [None]:
covid_New_cases_year_cases_2022 = pd.DataFrame(cases_2022.groupby(["Country",'year'])['New_cases']
                               .agg('sum')).sort_values(by=['year', 'New_cases'],ascending=False,axis=0)

covid_New_cases_year_cases_2022.head(10)

In [None]:
covid_New_cases_year_cases_2022 = pd.DataFrame(cases_2022.groupby(["Country",'year'])['New_cases']
                               .agg('sum')).sort_values(by=['year', 'New_cases'],ascending=True,axis=0)

covid_New_cases_year_cases_2022.head(10)

## Coronavirus Cases in different Continents
What `WHO_region` which also signifies each continents are mostly affected

In [None]:
covid_region_Cumulative_deaths = pd.DataFrame(df.groupby("WHO_region")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=False,axis=0)

covid_region_Cumulative_deaths

In [None]:
covid_region_New_deaths = pd.DataFrame(df.groupby("WHO_region")['New_deaths']
                               .agg('sum')).sort_values(by='New_deaths',ascending=False,axis=0)

covid_region_New_deaths

In [None]:
covid_region_Cumulative_cases = pd.DataFrame(df.groupby("WHO_region")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=False,axis=0)

covid_region_Cumulative_cases

In [None]:
covid_region_New_cases = pd.DataFrame(df.groupby("WHO_region")['New_cases']
                               .agg('sum')).sort_values(by='New_cases',ascending=False,axis=0)

covid_region_New_cases

Cumulative and new cases  American region had more cases than other regions probably due to the high recorded in USA and EURO recorded more cumulative and new deaths, it was observed <a href='https://www.researchgate.net/publication/356481471_Estimates_of_global_SARS-CoV-2_infection_exposure_infection_morbidity_and_infection_mortality_rates_in_2020'>this article</a> that most of the death that happened in the EURO region happened in the year 2022 when the new variant was out and the restriction has been lifted.

In [None]:
Covid_19 = df

## Coronavirus cases/death in Africa which is AFRO

In [None]:
africa_cases= Covid_19[Covid_19['WHO_region'] =='AFRO']
#  top ten most affected countries

top_countries = pd.DataFrame(africa_cases.groupby("Country")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=False,axis=0)


top_countries.head(10)

In [None]:
top_countries.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative cases',title = 'Cumulative cases')
plt.show()

In [None]:
top_countries.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(10, 10),
    legend=False,
    )
plt.show()

In [None]:
africa_cases= Covid_19[Covid_19['WHO_region']=='AFRO']
# sum of total cases in africa
africa_cases['Cumulative_cases'].sum()

print('Africa has a total cumulative cases of about', africa_cases['Cumulative_cases'].sum())

## Cumulative Death Cases

In [None]:
# Top ten cases in Africa
africa_death_cases=Covid_19[Covid_19['WHO_region']=='AFRO']
africa_death_cases=pd.DataFrame(africa_death_cases.groupby("Country")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=False,axis=0)
africa_death_cases.head(10)

In [None]:
africa_death_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative death',title = 'Top ten death cases in Africa')
plt.show()

In [None]:
africa_death_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(10, 10),
    legend=False,
    )
plt.show()

In [None]:
africa_death_cases=Covid_19[Covid_19['WHO_region']=='AFRO']
# sum of total  death cases in europe

print('Africa has a total cumulative death of about', africa_death_cases['Cumulative_deaths'].sum())

Africa seems to have it better than other countries of the world in terms despite being the second most populous continent, several reports were made about how they were able to curtail the virus, <a href='https://gdc.unicef.org/resource/coronavirus-africa-five-reasons-why-covid-19-has-been-less-deadly-elsewhere'> some said</a> we are better prepared amidst many others while <a href='https://www.nytimes.com/2022/03/23/health/covid-africa-deaths.html'>others</a> believe it's due to low testing and lack of proper databank.

In all of this, the case of South Africa seems peculiar as they are the most hit country in this pandemic, they have recorded the most cases and deaths, The spread of the virus accelerated in South Africa as the southern hemisphere went into winter but as it became warmer, the number of cases dropped significantly, impacting the continental outlook, as South Africa accounts for almost half the total number of cases and deaths on the continent.

## Coronavirus cases in America which is AMRO

According to the information provided above, the United States not only had the greatest number of coronavirus cases in North America, but also ranked first globally. Let's now learn about the cases in the remaining parts of North America.

In [None]:
america_cases= Covid_19[Covid_19['WHO_region']=='AMRO']
#  top ten most affected countries

america_cases = pd.DataFrame(america_cases.groupby("Country")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=False,axis=0)
america_cases.head(10)

In [None]:
america_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative caes',title = 'Top ten affected countries(AMRO)')
plt.show()

In [None]:
america_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(10, 10),
    legend=False,
    )
plt.show()

The United States of America (USA) has the most cases, as was demonstrated above.

In [None]:
america_cases= Covid_19[Covid_19['WHO_region']=='AMRO']
# sum of total cases in america
print('America has a total Cumulative cases of about', america_cases['Cumulative_cases'].sum())

## Cumulative Death Cases

In [None]:
# Top ten cases in America
america_death_cases=Covid_19[Covid_19['WHO_region']=='AMRO']
# america_death_cases=america_death_cases[['Country','Cumulative_deaths','New_deaths']]
# top_countries=america_death_cases.sort_values('Cumulative_deaths',ascending=False)
# top_countries=top_countries.head(10)
# top_countries


america_death_cases = pd.DataFrame(america_death_cases.groupby("Country")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=False,axis=0)

america_death_cases.head(10)

In [None]:
america_death_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative death',title = 'Top ten affected countries(AMRO)')
plt.show()

In [None]:
america_death_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(10, 10),
    legend=False,
    )
plt.show()

In [None]:
# america_death_cases=Covid_19[Covid_19['WHO_region']=='AMRO']
# # sum of total cases in North America


america_cases= Covid_19[Covid_19['WHO_region']=='AMRO']
# sum of total cases in america
print('America has a total Cumulative death of about', america_death_cases['Cumulative_deaths'].sum())

## Coronavirus cases in South-East Asia which is SEARO

Southeast Asia is composed of eleven countries of impressive diversity in religion, culture and history: Bangladesh, Bhutan, Democratic People's Republic of Korea, India, Indonesia, Maldives, Myanmar, Nepal, Sri Lanka, Thailand, Timor-Leste.

In [None]:
southeast_cases= Covid_19[Covid_19['WHO_region']=='SEARO']
#  top ten most affected countries
# top_countries=southeast_cases.sort_values('Cumulative_cases',ascending=False).head(10)
# top_countries[['Country','Cumulative_cases']]

southeast_cases = pd.DataFrame(southeast_cases.groupby("Country")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=False,axis=0)
southeast_cases.head()

In [None]:
southeast_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative cases',title = 'Top ten affected countries(SEARO)')
plt.show()

In [None]:
southeast_cases.head(8).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(15, 10),
    legend=False,
    )
plt.show()

In [None]:
southeast_cases= Covid_19[Covid_19['WHO_region']=='SEARO']
# sum of total cases in South east Asia

print('South East Asia has a total Cumulative Cases of about', southeast_cases['Cumulative_cases'].sum())

## Cumulative Deaths

In [None]:
# Top ten cases in South east Asia
southeast_death_cases=Covid_19[Covid_19['WHO_region']=='SEARO']
southeast_death_cases = pd.DataFrame(southeast_death_cases.groupby("Country")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=False,axis=0)

southeast_death_cases.head(10)

In [None]:
southeast_death_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative deaths',title = 'Top ten affected countries(SEARO)')
plt.show()

In [None]:
southeast_death_cases.head(5).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(15, 10),
    legend=False,
    )
plt.show()

In [None]:
southeast_death_cases= Covid_19[Covid_19['WHO_region']=='SEARO']
# sum of total cases in South east Asia


print('South East Asia has a total Cumulative Deaths of about', southeast_death_cases['Cumulative_deaths'].sum())

## Coronavirus cases in Europe which is EURO

In [None]:
europe_cases = Covid_19[Covid_19['WHO_region']=='EURO']
#  top ten most affected countries
europe_cases = pd.DataFrame(europe_cases.groupby("Country")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=False,axis=0)
europe_cases.head()

In [None]:
europe_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative cases',title = 'Top ten affected countries(SEARO)')
plt.show()

In [None]:
europe_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(15, 10),
    legend=False,
    )
plt.show()

In [None]:
europe_cases = Covid_19[Covid_19['WHO_region']=='EURO']
# sum of total cases in Europe

print('Europe has a total Cumulative Cases of about', europe_cases['Cumulative_cases'].sum())

## Cumulative Deaths

In [None]:
europe_deaths_cases = Covid_19[Covid_19['WHO_region']=='EURO']
# sum of total cases in Europe
europe_deaths_cases = pd.DataFrame(europe_deaths_cases.groupby("Country")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=False,axis=0)

In [None]:
europe_deaths_cases.head(10)

In [None]:
europe_deaths_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative deaths',title = 'Top ten affected countries(EURO)')
plt.show()

In [None]:
europe_deaths_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(15, 10),
    legend=False,
    )
plt.show()

In [None]:
europe_deaths_cases = Covid_19[Covid_19['WHO_region']=='EURO']
# sum of total cases in Europe

print('Europe has a total Cumulative Deaths of about', europe_deaths_cases['Cumulative_deaths'].sum())

In 2021, after the lockdown, Russia had the world's second-highest death toll from the COVID-19 pandemic, behind the United States, <a href='https://www.rand.org/pubs/research_briefs/RB5056.html'>Many attributed</a>it to the increased deaths due to accidents, injuries, violence, and other preventable causes and it was suggested that broader public health education, similar to that in the United States, can help Russians improve health behaviours while <a href='https://www.reuters.com/world/europe/russias-covid-19-death-toll-climbs-worlds-second-highest-2021-12-30/'>others</a> blamed it on the vaccination rate.

## Coronavirus cases in Eastern Mediterranean which is EMRO

The countries and territories of the Eastern Mediterranean include Cyprus, Turkey (Anatolia), its smaller Hatay Province, the Greek Dodecanese islands, and the countries of Lebanon, Syria, Palestine, Israel, Jordan and Egypt.

In [None]:
emro_cases = Covid_19[Covid_19['WHO_region']=='EMRO']
#  top ten most affected countries
emro_cases = pd.DataFrame(emro_cases.groupby("Country")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=False,axis=0)

In [None]:
emro_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative cases',title = 'Top ten affected countries(EMRO)')
plt.show()

In [None]:
# plt.pie(top_countries['Cumulative_cases'],labels=top_countries['Country'])
# plt.show()

emro_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(15, 10),
    legend=False,
    )
plt.show()

In [None]:
emro_cases= Covid_19[Covid_19['WHO_region']=='EMRO']
# sum of total cases in Eastern Mediterranean

print('Eastern Mediterranean has a total Cumulative Cases of about', emro_cases['Cumulative_cases'].sum())

## Cumulative Deaths

In [None]:
emro_death_cases = Covid_19[Covid_19['WHO_region']=='EMRO']
# sum of total cases in Europe
# emro_death_cases[['Country','Cumulative_deaths','New_deaths']]

emro_death_cases = pd.DataFrame(emro_death_cases.groupby("Country")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=False,axis=0)

In [None]:
# top_countries=emro_death_cases.sort_values('Cumulative_deaths',ascending=False)
# top_countries=top_countries[['Country','Cumulative_deaths','New_deaths']].head(10)
# top_countries

emro_death_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative deaths',title = 'Top ten affected countries(EMRO)')
plt.show()

In [None]:
emro_death_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(15, 10),
    legend=False,
    )
plt.show()

In [None]:
emro_death_cases = Covid_19[Covid_19['WHO_region']=='EMRO']
# sum of total cases in Eastern Mediterranean

print('Eastern Mediterranean has a total Cumulative Cases of about', emro_death_cases['Cumulative_deaths'].sum())

`Iran` seems to have it worst among the countries in the Eastern Mediterranean, the fear of the economy collapsing when restrictions are put in place is one of the major factors that was said to have caused the virus. Iran is a country with a high population, and they have a number of poor citizens. <a href='https://www.atlanticcouncil.org/blogs/iransource/irans-covid-19-deaths-2-5-times-higher-than-healthy-ministry-numbers/'> Iranians</a>  also attribute inadequate testing and a slow rollout of vaccines to policies ordered by Supreme Leader Ayatollah Ali Khamenei. Despite all warnings, the regime has told Iranians to return to normal life, risking a new rise in cases as schools reopen and workers return to their offices.

# Coronavirus cases in Western Pacific which is WPRO

Countries in western pacific includes American Samoa, Australia, Brunei Darussalam, Cambodia, China, Cook Islands, Fiji, Guam, Hong Kong,Japan, Kiribati, Korea, Republic of Lao People’s Democratic Republic, Macau, Malaysia, Marshall Islands, Micronesia, Federated States of Mongolia, Nauru, New Zealand,Niue, Northern Mariana Islands,Palau, Papua New Guinea,Philippines, Samoa, Singapore, Solomon Islands, Taiwan, Tokelau, Tonga, Tuvalu,Vanuatu and Viet Nam

In [None]:
wpro_cases = Covid_19[Covid_19['WHO_region']=='WPRO']
#  top ten most affected countries
# top_countries=emro_cases.sort_values('Cumulative_cases',ascending=False).head(10)
# top_countries[['Country','Cumulative_cases']]

wpro_cases = pd.DataFrame(wpro_cases.groupby("Country")['Cumulative_cases']
                               .agg('sum')).sort_values(by='Cumulative_cases',ascending=False,axis=0)

In [None]:
# visualize
# plt.bar(top_countries['Country'],top_countries['Cumulative_cases'],color="red")
# plt.xticks(rotation=90)
# plt.title("Top ten affected countries(Eastern Mediterranean)")
# plt.xlabel('countries')
# plt.ylabel('Total covid cases(millions)')
# plt.show()

wpro_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative cases',title = 'Top ten affected countries(WPRO)')
plt.show()

In [None]:
wpro_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(15, 10),
    legend=False,
    )
plt.show()

In [None]:
wpro_cases= Covid_19[Covid_19['WHO_region']=='WPRO']
# sum of total cases in Western Pacific

print('Western Pacific has a total Cumulative Cases of about', wpro_cases['Cumulative_cases'].sum())

## Cumulative Deaths

In [None]:
wpro_death_cases = Covid_19[Covid_19['WHO_region']=='WPRO']
# sum of total cases in Western Pacific
emro_death_cases[['Country','Cumulative_deaths','New_deaths']]

wpro_death_cases = pd.DataFrame(wpro_death_cases.groupby("Country")['Cumulative_deaths']
                               .agg('sum')).sort_values(by='Cumulative_deaths',ascending=False,axis=0)

In [None]:
wpro_death_cases.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of Cumulative deaths',title = 'Top ten affected countries(WPRO)')
plt.show()

In [None]:
# plt.pie(top_countries['Total_deaths'],labels=top_countries['Countries'])
# plt.show()
wpro_death_cases.head(10).unstack(level=0).plot(
    kind='pie',
    stacked=True,
    subplots=True,
    autopct='%1.1f%%',
    figsize=(15, 10),
    legend=False,
    )
plt.show()

In [None]:
wpro_death_cases = Covid_19[Covid_19['WHO_region']=='WPRO']
# sum of total cases in Western Pacific
print('Western Pacific has a total Cumulative Cases of about', wpro_death_cases['Cumulative_deaths'].sum())

The Republic of Korea (South Korea) is one of the first countries to experience a COVID-19 outbreak and it was handled quite well with the relatively low number of cumulative deaths that they have, within two months they were <a href='https://direct.mit.edu/asep/article/20/2/41/97312/COVID-19-in-Korea-Success-Based-on-Past-Failure'>repotred able to stop the massive transmission of the virus </a>

While the Philippines, a country of about 111 million people, reported more deaths. it was <a href='https://www.aljazeera.com/news/2022/1/18/philippine-covid-deaths-may-be-double-than-current-number-report'>reported</a> that the rise in death cases was due to other underlining health challenges like heart diseases, stroke, cancer and respiratory diseases such as tuberculosis which were exacerbated by Covid19.

`Australia` had the fewest deaths (7,809). This is attributable to the Australian government's prompt action, which resulted in the imposition of limitations as soon as the first coronavirus case was reported. Lockdowns and restrictions on travel and movement helped to greatly slow the spread of the virus. Among the restrictions that were implemented were the 150-person limit for nightclubs and places of worship, as well as the requirement that spaces hold no more than two square metres of space between each other. Cinemas and stadiums are subject to several limitations, including a 50% capacity limit. For more information, please visit

`The USA` seems to have the highest number of deaths again, followed closely by Brazil this time, then India. There seems to be a change in the list of countries with death cases as compared to the countries with new cases. The following explanation of the causes of the high number of instances in the USA highlights the first and most important factor: the <a href='https://www.cbsnews.com/news/us-covid-19-cases-testing/'> tardy response.</a> The US government took a long time to respond to the problem, which allowed the virus to spread further. The US government took a long time to set up a health care system, stop other travel, and do COVID-19 tests. Additionally, the populace was unaware of the rules established to stop the virus's transmission, such as banning the wearing of facial coverings. Some states in the United States lacked the testing infrastructure needed to thoroughly test their residents.You can <a href='https://www.science.org/content/article/united-states-leads-coronavirus-cases-not-pandemic-response'> read</a> about it


While on the WHO databank, the Democratic People's Republic of Korea seems not to have any cases reported, In the <a href='https://reliefweb.int/report/democratic-peoples-republic-korea/covid-19-north-korea-state-fuelled-tragedy'>news</a> and from official releases so far there seems to national crises over there in the country so there must be like rancour between the two parties, from <a href='https://covid19.who.int/region/wpro/country/tk'> WHO</a> database, Tokelau a country of about 1,398 according to <a href='https://www.worldometers.info/world-population/tokelau-population/'>worldmeter</a> has been vaccinated largely, and they've been able to curtail the virus to the bare minimum.

Countries like `India` and `Brazil` had it very bad at the beginning of the pandemic because of their population and it was so severe that <a href='https://www.washingtonpost.com/world/2021/04/27/india-covid-surge-faq/'>WHO declared them an epicentre </a>

`In 2020`; at the beginning of the pandemic when proper measures hadn't been put in place yet we had top countries like the USA, India, Brazil, Russia, France, the UK, Turkey, Italy, Spain and Germany among the heavily hit countries, while `In 2021` we still had same countries in top 10 but Germany moved some places higher due to several reasons like a large number of unvaccinated people, waning immunity, fewer contact restrictions, the new variant more infectious and seasonal effects more about it can be read <a href='https://www.dw.com/en/5-reasons-germanys-covid-19-infections-are-soaring/a-59793087'>here</a> while Russia moved some places down due to how they have been able to restrict movement among their people.

Meanwhile `In 2022` we still USA has the world number and Germany, France, the republic of Korea, Japan, Italy, brazil, Russia, the UK, and Australia; Now every country has opened up because of the belief we all have in the vaccine, we have seen a reduced in the number of covid19 death cases but it's still on the rise. the vaccine has helped mitigate its effects.


## Looking at the Vaccination database

To intrigue further at why the USA seem to have such numbers in the number of deaths and cases, I'd intrigue further using the vaccination database

In [None]:
Vaccinated =pd.read_csv('https://covid19.who.int/who-data/vaccination-data.csv')

Vaccinated.head()

## Data Description
* `COUNTRY` 	 String 	Country, territory, area
* `ISO3` 	String	ISO Alpha-3 country code
* `WHO_REGION`  	String	WHO regional offices: WHO Member States are grouped into six WHO regions: Regional Office for Africa (AFRO), Regional Office for the Americas (AMRO), Regional Office for South-East Asia (SEARO), Regional Office for Europe (EURO), Regional Office for the Eastern Mediterranean (EMRO), and Regional Office for the Western Pacific (WPRO).
* `DATA_SOURCE` 	String	Indicates data source: - REPORTING: Data reported by Member States, or sourced from official reports - OWID: Data sourced from Our World in Data: https://ourworldindata.org/covid-vaccinations
* `DATE_UPDATED`	Date	Date of last update
* `TOTAL_VACCINATIONS`  	Integer	Cumulative total vaccine doses administered
* `PERSONS_VACCINATED_1PLUS_DOSE`   	Decimal	Cumulative number of persons vaccinated with at least one dose
* `TOTAL_VACCINATIONS_PER100`   	Integer	Cumulative total vaccine doses administered per 100 population
* `PERSONS_VACCINATED_1PLUS_DOSE_PER100`	Decimal	Cumulative persons vaccinated with at least one dose per 100 population
* `PERSONS_FULLY_VACCINATED`	Integer	Cumulative number of persons fully vaccinated
* `PERSONS_FULLY_VACCINATED_PER100` 	Decimal	Cumulative number of persons fully vaccinated per 100 population
* `VACCINES_USED`   	String	Combined short name of vaccine: “Company - Product name” (see below)
* `FIRST_VACCINE_DATE`  	Date	Date of first vaccinations. Equivalent to start/launch date of the first vaccine administered in a country.
* `NUMBER_VACCINES_TYPES_USED`  	Integer	Number of vaccine types used per country, territory, area
* `PERSONS_BOOSTER_ADD_DOSE`	Integer	Persons received booster or additional dose
* `PERSONS_BOOSTER_ADD_DOSE_PER100` 	Decimal	Persons received booster or additional dose per 100 population

In [None]:
Vaccinated.info()

I'd be working with just the TOTAL_VACCINATIONS,  PERSONS_FULLY_VACCINATED, PERSONS_BOOSTER_ADD_DOSE and DATE_UPDATED since vaccination helps reduce the severity of the infection and not necessarily stopping one from contracting it.

In [None]:
Vaccinated = Vaccinated[['COUNTRY', 'WHO_REGION', 'TOTAL_VACCINATIONS', 'PERSONS_FULLY_VACCINATED', 'PERSONS_BOOSTER_ADD_DOSE', 'DATE_UPDATED']]

In [None]:
Vaccinated.head()

In [None]:
Vaccinated['DATE_UPDATED'] = pd.to_datetime(Vaccinated['DATE_UPDATED'], errors='coerce')

Vaccinated['year'] = Vaccinated['DATE_UPDATED'].dt.year

In [None]:
Vaccinated.head()

In [None]:
Total_Vaccinated = pd.DataFrame(Vaccinated.groupby("COUNTRY")['TOTAL_VACCINATIONS']
                               .agg('sum')).sort_values(by='TOTAL_VACCINATIONS',ascending=False,axis=0)

Total_Vaccinated.head()

In [None]:
Total_Vaccinated.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of People with Vaccinations', title = 'No of Total Vaccination')
plt.show()


In [None]:
Full_Vaccinated = pd.DataFrame(Vaccinated.groupby("COUNTRY")['PERSONS_FULLY_VACCINATED']
                               .agg('sum')).sort_values(by='PERSONS_FULLY_VACCINATED',ascending=False,axis=0)

Full_Vaccinated.head()

In [None]:
Full_Vaccinated.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'Fulll Vaccinated',title = 'No of People Fully Vaccinated')
plt.show()

In [None]:
PERSONS_BOOSTER_ADD_DOSE = pd.DataFrame(Vaccinated.groupby("COUNTRY")['PERSONS_BOOSTER_ADD_DOSE']
                               .agg('sum')).sort_values(by='PERSONS_BOOSTER_ADD_DOSE',ascending=False,axis=0)

PERSONS_BOOSTER_ADD_DOSE.head()

In [None]:
PERSONS_BOOSTER_ADD_DOSE.head(10).plot(kind='bar',  xlabel = 'Country', ylabel = 'No. of PERSONS with BOOSTER ADD DOSE',title = 'No of Person with the Booster ADD DOSE ')
plt.show()

In [None]:
TOTAL_VACCINATIONS_year = pd.DataFrame(Vaccinated.groupby(["COUNTRY",'year'])['TOTAL_VACCINATIONS']
                               .agg('sum')).sort_values(by=['year', 'TOTAL_VACCINATIONS'],ascending=False,axis=0)

TOTAL_VACCINATIONS_year.head(10)

In [None]:
PERSONS_BOOSTER_ADD_DOSE_year = pd.DataFrame(Vaccinated.groupby(["COUNTRY",'year'])['PERSONS_BOOSTER_ADD_DOSE']
                               .agg('sum')).sort_values(by=['year', 'PERSONS_BOOSTER_ADD_DOSE'],ascending=False,axis=0)

PERSONS_BOOSTER_ADD_DOSE_year.head(10)

In [None]:
PERSONS_FULLY_VACCINATED_year = pd.DataFrame(Vaccinated.groupby(["COUNTRY",'year'])['PERSONS_FULLY_VACCINATED']
                               .agg('sum')).sort_values(by=['year', 'PERSONS_FULLY_VACCINATED'],ascending=False,axis=0)

PERSONS_FULLY_VACCINATED_year.head(10)

In [None]:
Vaccinated['year'].unique()

In [None]:
Vaccinated_cases_2021 = Vaccinated[Vaccinated['year']== 2021]

Vaccinated_cases_2021.head()

In [None]:
TOTAL_VACCINATIONS_year_2021 = pd.DataFrame(Vaccinated_cases_2021.groupby(["COUNTRY",'year'])['TOTAL_VACCINATIONS']
                               .agg('sum')).sort_values(by=['year', 'TOTAL_VACCINATIONS'],ascending=False,axis=0)

TOTAL_VACCINATIONS_year_2021.head(10)

In [None]:
PERSONS_BOOSTER_ADD_DOSE_year_2021 = pd.DataFrame(Vaccinated_cases_2021.groupby(["COUNTRY",'year'])['PERSONS_BOOSTER_ADD_DOSE']
                               .agg('sum')).sort_values(by=['year', 'PERSONS_BOOSTER_ADD_DOSE'],ascending=False,axis=0)

PERSONS_BOOSTER_ADD_DOSE_year_2021.head(10)

#### Obviously there should be none since the case of taking boosters started in 2022

In [None]:
PERSONS_FULLY_VACCINATED_year_2021 = pd.DataFrame(Vaccinated_cases_2021.groupby(["COUNTRY",'year'])['PERSONS_FULLY_VACCINATED']
                               .agg('sum')).sort_values(by=['year', 'PERSONS_FULLY_VACCINATED'],ascending=False,axis=0)

PERSONS_FULLY_VACCINATED_year_2021.head(10)

At 2021 citizens of Bonaire, Sint Eustatius and Saba and Saint Helena are the only ones that have fully vaccinated people.  

In [None]:
Vaccinated_cases_2022 = Vaccinated[Vaccinated['year']== 2022]

Vaccinated_cases_2022.head()

In [None]:
TOTAL_VACCINATIONS_year_2022 = pd.DataFrame(Vaccinated_cases_2022.groupby(["COUNTRY",'year'])['TOTAL_VACCINATIONS']
                               .agg('sum')).sort_values(by=['year', 'TOTAL_VACCINATIONS'],ascending=False,axis=0)

TOTAL_VACCINATIONS_year_2022.head(10)

In [None]:
PERSONS_BOOSTER_ADD_DOSE_year_2022 = pd.DataFrame(Vaccinated_cases_2022.groupby(["COUNTRY",'year'])['PERSONS_BOOSTER_ADD_DOSE']
                               .agg('sum')).sort_values(by=['year', 'PERSONS_BOOSTER_ADD_DOSE'],ascending=False,axis=0)

PERSONS_BOOSTER_ADD_DOSE_year_2022.head(10)

#### We started collecting the booster dose in early 2022 since we need more doses to keep us immune from the virus

In [None]:
PERSONS_FULLY_VACCINATED_year_2022 = pd.DataFrame(Vaccinated_cases_2022.groupby(["COUNTRY",'year'])['PERSONS_FULLY_VACCINATED']
                               .agg('sum')).sort_values(by=['year', 'PERSONS_FULLY_VACCINATED'],ascending=False,axis=0)

PERSONS_FULLY_VACCINATED_year_2022.head(10)

## Conclusion

In this analysis, we've learned how China, despite its population, was able to vaccinate most of its citizens, which helped reduce the number of deaths and COVID cases in recent times and cumulatively, and India seems to be impressive too.
We also know that the `USA` has the most cases worldwide.
`Brazil` has the most cases in South America.
`India` has the most cases in Asia.