# COVID-19 Data Analysis and Visualization Project

The project utilizes the following datasets:

- country_wise: Latest country-wise statistics.
- covid_19: Complete data of COVID-19 cases.
- day_wise: Date-wise statistics of COVID-19.
- full_grouped: Grouped data for detailed analysis.
- usa_country_wise: COVID-19 data specific to the USA.
- world_data: Data from Worldometer for global statistics.

All datasets were cleaned before analysis to ensure accuracy and reliability.

Import important libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import os
import plotly.express as px

The datasets are loaded into DataFrames

In [2]:
country_wise = pd.read_csv('datasets/country_wise_latest.csv')
covid_19 = pd.read_csv('datasets/covid_19_clean_complete.csv')
day_wise = pd.read_csv('datasets/day_wise.csv')
full_grouped = pd.read_csv('datasets/full_grouped.csv')
usa_country_wise = pd.read_csv('datasets/usa_country_wise.csv')
world_data = pd.read_csv('datasets/worldometer_data.csv')

In [3]:
world_data.head()

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region
0,USA,North America,331198100.0,5032179,,162804.0,,2576668.0,,2292707.0,18296.0,15194.0,492.0,63139605.0,190640.0,Americas
1,Brazil,South America,212710700.0,2917562,,98644.0,,2047660.0,,771258.0,8318.0,13716.0,464.0,13206188.0,62085.0,Americas
2,India,Asia,1381345000.0,2025409,,41638.0,,1377384.0,,606387.0,8944.0,1466.0,30.0,22149351.0,16035.0,South-EastAsia
3,Russia,Europe,145940900.0,871894,,14606.0,,676357.0,,180931.0,2300.0,5974.0,100.0,29716907.0,203623.0,Europe
4,South Africa,Africa,59381570.0,538184,,9604.0,,387316.0,,141264.0,539.0,9063.0,162.0,3149807.0,53044.0,Africa


In [4]:
covid_19.shape

(49068, 10)

In [5]:
covid_19.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 49068 entries, 0 to 49067
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Province/State  14664 non-null  object 
 1   Country/Region  49068 non-null  object 
 2   Lat             49068 non-null  float64
 3   Long            49068 non-null  float64
 4   Date            49068 non-null  object 
 5   Confirmed       49068 non-null  int64  
 6   Deaths          49068 non-null  int64  
 7   Recovered       49068 non-null  int64  
 8   Active          49068 non-null  int64  
 9   WHO Region      49068 non-null  object 
dtypes: float64(2), int64(4), object(4)
memory usage: 3.7+ MB


**Analysing Total cases, Deaths, Recovered & active cases:**

- Which Country has maximum Total cases, Deaths, Recovered & active cases

In [6]:
world_data.columns

Index(['Country/Region', 'Continent', 'Population', 'TotalCases', 'NewCases',
       'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered',
       'ActiveCases', 'Serious,Critical', 'Tot Cases/1M pop', 'Deaths/1M pop',
       'TotalTests', 'Tests/1M pop', 'WHO Region'],
      dtype='object')

**Total cases**

In [8]:

world_data["TotalCases"].sum()

19169166

In [97]:
fig = px.treemap(world_data[0:20],values="TotalCases",path=['Country/Region'],template="plotly_white",
           title="<b>TreeMap representation of different Countries with respact to their {}</b>".format("TotalCases"))
fig.show()

Displaying the distribution of total cases among the top 20 countries.

**Total deaths**

In [10]:

world_data["TotalDeaths"].sum()

713007.0

In [98]:
fig = px.treemap(world_data[0:20],values="TotalDeaths",path=['Country/Region'],
           template="plotly_white",
           title="<b>TreeMap representation of different Countries with respact to their {}</b>".format("TotalDeaths"))
fig.show()

Displaying the distribution of total deaths among the top 20 countries.

**Total recovery cases**

In [12]:
world_data["TotalRecovered"].sum()

12070191.0

In [99]:
fig = px.treemap(world_data[0:20],values="TotalRecovered",path=['Country/Region'],
           template="plotly_white",
           title="<b>TreeMap representation of different Countries with respact to their {}</b>".format("TotalRecovered"))
fig.show()

Total recovered cases among the top 20 countries.

**Total active cases**

In [14]:
world_data["ActiveCases"].sum()

5671187.0

In [100]:
fig = px.treemap(world_data[0:20],values="ActiveCases",path=['Country/Region'],
           template="plotly_white",
           title="<b>TreeMap representation of different Countries with respact to their {}</b>".format("ActiveCases"))
fig.show()

**what is the trend of Confirmed Deaths Recovered Active cases**

In [16]:
day_wise.head()

Unnamed: 0,Date,Confirmed,Deaths,Recovered,Active,New cases,New deaths,New recovered,Deaths / 100 Cases,Recovered / 100 Cases,Deaths / 100 Recovered,No. of countries
0,2020-01-22,555,17,28,510,0,0,0,3.06,5.05,60.71,6
1,2020-01-23,654,18,30,606,99,1,2,2.75,4.59,60.0,8
2,2020-01-24,941,26,36,879,287,8,6,2.76,3.83,72.22,9
3,2020-01-25,1434,42,39,1353,493,16,3,2.93,2.72,107.69,11
4,2020-01-26,2118,56,52,2010,684,14,13,2.64,2.46,107.69,13


In [17]:
day_wise.columns

Index(['Date', 'Confirmed', 'Deaths', 'Recovered', 'Active', 'New cases',
       'New deaths', 'New recovered', 'Deaths / 100 Cases',
       'Recovered / 100 Cases', 'Deaths / 100 Recovered', 'No. of countries'],
      dtype='object')

In [101]:
fig=px.line(day_wise,x="Date",y=["Confirmed","Deaths","Recovered","Active"],template="plotly_white",
            title="covid cases with respact to their date")
fig.show()

**BarPlot Representation of Population to Tests Done Ratio**

In [19]:
world_data.head()

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region
0,USA,North America,331198100.0,5032179,,162804.0,,2576668.0,,2292707.0,18296.0,15194.0,492.0,63139605.0,190640.0,Americas
1,Brazil,South America,212710700.0,2917562,,98644.0,,2047660.0,,771258.0,8318.0,13716.0,464.0,13206188.0,62085.0,Americas
2,India,Asia,1381345000.0,2025409,,41638.0,,1377384.0,,606387.0,8944.0,1466.0,30.0,22149351.0,16035.0,South-EastAsia
3,Russia,Europe,145940900.0,871894,,14606.0,,676357.0,,180931.0,2300.0,5974.0,100.0,29716907.0,203623.0,Europe
4,South Africa,Africa,59381570.0,538184,,9604.0,,387316.0,,141264.0,539.0,9063.0,162.0,3149807.0,53044.0,Africa


There do not have any specific column for checking the test done ratio. Created a new feature to analyze the ratio of population to tests done.

In [20]:
world_data["test_done_ratio"] = world_data["Population"]/world_data["TotalTests"]
world_data.head()

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region,test_done_ratio
0,USA,North America,331198100.0,5032179,,162804.0,,2576668.0,,2292707.0,18296.0,15194.0,492.0,63139605.0,190640.0,Americas,5.245489
1,Brazil,South America,212710700.0,2917562,,98644.0,,2047660.0,,771258.0,8318.0,13716.0,464.0,13206188.0,62085.0,Americas,16.106896
2,India,Asia,1381345000.0,2025409,,41638.0,,1377384.0,,606387.0,8944.0,1466.0,30.0,22149351.0,16035.0,South-EastAsia,62.365033
3,Russia,Europe,145940900.0,871894,,14606.0,,676357.0,,180931.0,2300.0,5974.0,100.0,29716907.0,203623.0,Europe,4.91104
4,South Africa,Africa,59381570.0,538184,,9604.0,,387316.0,,141264.0,539.0,9063.0,162.0,3149807.0,53044.0,Africa,18.852446


**Visualization of the test done ratio for the top 20 countries**

In [102]:
fig=px.bar(world_data.iloc[0:20],color='Country/Region',
           y="test_done_ratio",x='Country/Region',template="plotly_white",
           title="<b>population to tests done ratio</b>").update_xaxes(categoryorder="total descending")
fig.show()





**Top 20 countries that are badly affected by corona**

In [22]:
world_data.columns

Index(['Country/Region', 'Continent', 'Population', 'TotalCases', 'NewCases',
       'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered',
       'ActiveCases', 'Serious,Critical', 'Tot Cases/1M pop', 'Deaths/1M pop',
       'TotalTests', 'Tests/1M pop', 'WHO Region', 'test_done_ratio'],
      dtype='object')

In [84]:
px.bar(world_data.iloc[0:20],x="Country/Region",
       y=["Serious,Critical","TotalDeaths","TotalRecovered","ActiveCases","TotalCases"],template="plotly_white")





**Top 20 countries of Total Confirmed cases**

In [85]:
fig=px.bar(world_data.iloc[0:20],y='Country/Region',x='TotalCases',color='TotalCases',
           text="TotalCases",template="plotly_white",
           title="<b>Top 20 countries of Total confirmed cases</b>").update_yaxes(categoryorder="total ascending")

fig.show()

**Top 20 countries of Total Deaths**

In [103]:
fig=px.bar(world_data.iloc[0:20],y='Country/Region',x='TotalDeaths',color='TotalDeaths',
           text="TotalDeaths",template="plotly_white",
           title="<b>Top 20 countries of Total Deaths cases</b>").update_yaxes(categoryorder="total ascending")

fig.show()

**Top 20 countries of Total Active cases**

In [104]:
fig=px.bar(world_data.iloc[0:20],y='Country/Region',x='ActiveCases',color='ActiveCases',
           text="ActiveCases",template="plotly_white",
           title="<b>Top 20 countries of Total Active cases</b>").update_yaxes(categoryorder="total ascending")

fig.show()

**Top 20 countries of Total Recoveries**

In [105]:
fig=px.bar(world_data.iloc[0:20],y='Country/Region',x='TotalRecovered',color='TotalRecovered',
           text="TotalRecovered",template="plotly_white",
           title="<b>Top 20 countries of Total TotalRecovered cases</b>").update_yaxes(categoryorder="total ascending")

fig.show()

In [28]:
world_data.head()

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region,test_done_ratio
0,USA,North America,331198100.0,5032179,,162804.0,,2576668.0,,2292707.0,18296.0,15194.0,492.0,63139605.0,190640.0,Americas,5.245489
1,Brazil,South America,212710700.0,2917562,,98644.0,,2047660.0,,771258.0,8318.0,13716.0,464.0,13206188.0,62085.0,Americas,16.106896
2,India,Asia,1381345000.0,2025409,,41638.0,,1377384.0,,606387.0,8944.0,1466.0,30.0,22149351.0,16035.0,South-EastAsia,62.365033
3,Russia,Europe,145940900.0,871894,,14606.0,,676357.0,,180931.0,2300.0,5974.0,100.0,29716907.0,203623.0,Europe,4.91104
4,South Africa,Africa,59381570.0,538184,,9604.0,,387316.0,,141264.0,539.0,9063.0,162.0,3149807.0,53044.0,Africa,18.852446


In [29]:
world_data.columns

Index(['Country/Region', 'Continent', 'Population', 'TotalCases', 'NewCases',
       'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered',
       'ActiveCases', 'Serious,Critical', 'Tot Cases/1M pop', 'Deaths/1M pop',
       'TotalTests', 'Tests/1M pop', 'WHO Region', 'test_done_ratio'],
      dtype='object')

**Pie charts for total cases, deaths, recovered, and active cases for the top 15 worst affected countries.**

In [108]:
labels=world_data[0:15]['Country/Region'].values
fig = px.pie(world_data[0:15],values="TotalCases",
             names=labels,template="plotly_dark",hole=0.3,
             title=" {} Recordeded with respact to WHO Region of 15 worst effected countries ".format("TotalCases"))
fig.show()

In [107]:

fig = px.pie(world_data[0:15],values="TotalDeaths",
             names=labels,template="plotly_dark",hole=0.3,
             title=" {} Recordeded with respact to to WHO Region of 15 worst effected countries ".format("TotalDeaths"))
fig.show()

In [110]:

fig = px.pie(world_data[0:15],values="TotalRecovered",
             names=labels,template="plotly_dark",hole=0.3,
             title=" {} Recordeded with respact to WHO Region of 15 worst effected countries ".format("TotalRecovered"))
fig.show()

In [111]:

fig = px.pie(world_data[0:15],values="ActiveCases",
             names=labels,template="plotly_dark",hole=0.3,
             title=" {} Recordeded with respact to WHO Region of 15 worst effected countries ".format("ActiveCases"))
fig.show()

**Deaths to Confirmed ratio**

In [44]:
deaths_ratio=((world_data['TotalDeaths']/world_data['TotalCases']))
deaths_ratio

0      0.032353
1      0.033810
2      0.020558
3      0.016752
4      0.017845
         ...   
204    0.076923
205         NaN
206         NaN
207         NaN
208    0.100000
Length: 209, dtype: float64

In [112]:
fig = px.bar(world_data,x='Country/Region',y=deaths_ratio,
             title="Death to confirmed ratio of some  worst effected countries",template="plotly_white")
fig.show()

**Deaths to recovered ratio**

In [46]:
deaths_to_recovered=((world_data['TotalDeaths']/world_data['TotalRecovered']))
deaths_to_recovered

0      0.063184
1      0.048174
2      0.030230
3      0.021595
4      0.024796
         ...   
204    0.100000
205         NaN
206         NaN
207         NaN
208    0.125000
Length: 209, dtype: float64

In [113]:
fig = px.bar(world_data,x='Country/Region',y=deaths_to_recovered,
             title="Death to recover ratio of some  worst effected countries",template="plotly_white")
fig.show()

**Tests to Confirmed Ratio**

In [49]:
test_to_confirmed=((world_data['TotalTests']/world_data['TotalCases']))
test_to_confirmed

0       12.547170
1        4.526446
2       10.935742
3       34.083165
4        5.852658
          ...    
204      4.692308
205     32.615385
206    139.692308
207           NaN
208           NaN
Length: 209, dtype: float64

In [114]:
fig = px.bar(world_data,x='Country/Region',y=test_to_confirmed,
             title="Test to confirm ratio of some  worst effected countries",template="plotly_white")
fig.show()

**Serious to Deaths Ratio**

In [51]:
serious_to_death=((world_data['Serious,Critical']/world_data['TotalDeaths']))
serious_to_death

0      0.112381
1      0.084323
2      0.214804
3      0.157470
4      0.056122
         ...   
204         NaN
205         NaN
206         NaN
207         NaN
208         NaN
Length: 209, dtype: float64

In [115]:
fig = px.bar(world_data,x='Country/Region',y=serious_to_death,
             title="serious to Death ratio of some  worst effected countries",template="plotly_white")
fig.show()

In [86]:
world_data.head()

Unnamed: 0,Country/Region,Continent,Population,TotalCases,NewCases,TotalDeaths,NewDeaths,TotalRecovered,NewRecovered,ActiveCases,"Serious,Critical",Tot Cases/1M pop,Deaths/1M pop,TotalTests,Tests/1M pop,WHO Region,test_done_ratio
0,USA,North America,331198100.0,5032179,,162804.0,,2576668.0,,2292707.0,18296.0,15194.0,492.0,63139605.0,190640.0,Americas,5.245489
1,Brazil,South America,212710700.0,2917562,,98644.0,,2047660.0,,771258.0,8318.0,13716.0,464.0,13206188.0,62085.0,Americas,16.106896
2,India,Asia,1381345000.0,2025409,,41638.0,,1377384.0,,606387.0,8944.0,1466.0,30.0,22149351.0,16035.0,South-EastAsia,62.365033
3,Russia,Europe,145940900.0,871894,,14606.0,,676357.0,,180931.0,2300.0,5974.0,100.0,29716907.0,203623.0,Europe,4.91104
4,South Africa,Africa,59381570.0,538184,,9604.0,,387316.0,,141264.0,539.0,9063.0,162.0,3149807.0,53044.0,Africa,18.852446


This project provides a detailed analysis of the COVID-19 pandemic using various visualization techniques to uncover insights from the data. The analysis includes understanding the spread and impact of the virus across different countries, trends over time, and key ratios that highlight critical aspects of the pandemic