The objective of this assignment is to apply Python data analysis skills using Pandas and NumPy to explore, filter, aggregate, and analyze real-world COVID-19 case data.

Assignment Details:
1. You are provided with the dataset country_wise_latest.csv (from Kaggle’s COVID-19 Dataset). Build a Python program using classes and inheritance to implement the following tasks given in the requirement.
https://www.kaggle.com/datasets/imdevskp/corona-virus-report

Requirements:
1. Summarize Case Counts by Region
    Display total confirmed, death, and recovered cases for each region.
2. Filter Low Case Records
    Exclude entries where confirmed cases are < 10.
3. Identify Region with Highest Confirmed Cases
4. Sort Data by Confirmed Cases
    Save sorted dataset into a new CSV file.
5. Top 5 Countries by Case Count
6. Region with Lowest Death Count
7. India’s Case Summary (as of April 29, 2020)
8. Calculate Mortality Rate by Region
    Death-to-confirmed case ratio.
9. Compare Recovery Rates Across Regions
10. Detect Outliers in Case Counts
    Use mean ± 2*std deviation.
11. Group Data by Country and Region
12. Identify Regions with Zero Recovered Cases

In [1]:
import pandas as pd
import numpy as np

In [8]:
data = pd.read_csv('country_wise_latest.csv')
df = pd.DataFrame(data)

Region wise total confirmed, death, and recovered covid cases

In [12]:
df.groupby("WHO Region")[["Confirmed","Deaths","Recovered"]].sum()

Unnamed: 0_level_0,Confirmed,Deaths,Recovered
WHO Region,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Africa,723207,12223,440645
Americas,8839286,342732,4468616
Eastern Mediterranean,1490744,38339,1201400
Europe,3299523,211144,1993723
South-East Asia,1835297,41349,1156933
Western Pacific,292428,8249,206770


Details of Lowest confirmed covid cases:

In [14]:
lowest_confirmed_case = df["Confirmed"].min()
df[df['Confirmed'] <= lowest_confirmed_case]

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
183,Western Sahara,10,1,8,1,0,0,0,10.0,80.0,12.5,10,0,0.0,Africa


Region with highest covid case

In [15]:
highest_confirmed_case = df.groupby("WHO Region")["Confirmed"].sum()
highest_confirmed_case.idxmax()

'Americas'

Sorted data saved to file - sorted_covid_case_results.csv

In [16]:
sorted_df = df.sort_values(by=['Confirmed'])
sorted_df.to_csv(("sorted_covid_case_results.csv"), index=False)

Top 5 Countries with highest covid cases

In [17]:
sorted_df = df.sort_values(by=['Confirmed'], ascending=False)
sorted_df.head(5)

Unnamed: 0,Country/Region,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,Confirmed last week,1 week change,1 week % increase,WHO Region
173,US,4290259,148011,1325804,2816444,56336,1076,27941,3.45,30.9,11.16,3834677,455582,11.88,Americas
23,Brazil,2442375,87618,1846641,508116,23284,614,33728,3.59,75.61,4.74,2118646,323729,15.28,Americas
79,India,1480073,33408,951166,495499,44457,637,33598,2.26,64.26,3.51,1155338,324735,28.11,South-East Asia
138,Russia,816680,13334,602249,201097,5607,85,3077,1.63,73.74,2.21,776212,40468,5.21,Europe
154,South Africa,452529,7067,274925,170537,7096,298,9848,1.56,60.75,2.57,373628,78901,21.12,Africa


Region with lowest death count

In [18]:
lowest_death_count = df.groupby("WHO Region")["Deaths"].sum()
lowest_death_count.idxmin()

'Western Pacific'

India's case summary

In [19]:
India_df = df[df['Country/Region'] == 'India']
India_df[["Confirmed","Deaths","Recovered"]]

Unnamed: 0,Confirmed,Deaths,Recovered
79,1480073,33408,951166


Mortality Rate by Region

In [20]:
region_wise_summary_df = df.groupby("WHO Region")[["Confirmed","Deaths","Recovered"]].sum()
region_wise_summary_df['Deaths']/region_wise_summary_df['Confirmed']

WHO Region
Africa                   0.016901
Americas                 0.038774
Eastern Mediterranean    0.025718
Europe                   0.063992
South-East Asia          0.022530
Western Pacific          0.028209
dtype: float64

Recovery Rate by Region

In [21]:
region_wise_summary_df['Recovered']/region_wise_summary_df['Confirmed']

WHO Region
Africa                   0.609293
Americas                 0.505540
Eastern Mediterranean    0.805906
Europe                   0.604246
South-East Asia          0.630379
Western Pacific          0.707080
dtype: float64

Outliers in case count

In [24]:
mean = df["Confirmed"].mean()
std = df["Confirmed"].std()
lower = mean - 2 * std
upper = mean + 2 * std
outlier_df = df[(df["Confirmed"] < lower) | (df["Confirmed"] > upper)]
outlier_df[["Country/Region","Confirmed"]]

Unnamed: 0,Country/Region,Confirmed
23,Brazil,2442375
79,India,1480073
173,US,4290259


Group Covid case by Country and Region

In [22]:
df.groupby(["Country/Region","WHO Region"])["Confirmed"].sum()

Country/Region      WHO Region           
Afghanistan         Eastern Mediterranean    36263
Albania             Europe                    4880
Algeria             Africa                   27973
Andorra             Europe                     907
Angola              Africa                     950
                                             ...  
West Bank and Gaza  Eastern Mediterranean    10621
Western Sahara      Africa                      10
Yemen               Eastern Mediterranean     1691
Zambia              Africa                    4552
Zimbabwe            Africa                    2704
Name: Confirmed, Length: 187, dtype: int64

Region with zero Recovered cases

In [23]:
recovered_df = df[df['Recovered'] == 0]
recovered_df[["Country/Region","WHO Region"]]

Unnamed: 0,Country/Region,WHO Region
32,Canada,Americas
117,Mozambique,Africa
147,Serbia,Europe
161,Sweden,Europe
163,Syria,Eastern Mediterranean
168,Timor-Leste,South-East Asia
