##**Group 11**

##**Covid 19 API**

Hasna Kabeer - 1260077

Vincy Chirayath Varghese - 1201184

**Objective:**

The objective of this project is to analyze COVID-19 data for Asian countries and visualize various aspects such as monthly trends in cases and deaths, population changes before and after COVID-19, and several other aspects.

**Reason for Choosing this Project:**

**Relevance:** COVID-19 has been a significant global event affecting millions of lives and economies worldwide. Analyzing its impact on Asian countries, which were among the earliest affected regions, provides valuable insights into the pandemic's progression.

**Data Analysis Skills:** This project involves retrieving data from an API, processing it, performing various analyses (e.g., monthly trends, population changes), and visualizing the results. It provides an opportunity to apply data analysis skills in a real-world context.

**Visualization Practice:** Visualizing data is crucial for understanding trends and patterns effectively. By using Plotly, we can create interactive and informative visualizations that enhance comprehension and communication of the findings.

**Educational Value:** Exploring COVID-19 data allows for understanding the pandemic's impact on different regions, populations, and time periods. It also highlights the importance of data-driven decision-making in public health and policy.

Overall, this project offers a practical way to apply data analysis techniques, gain insights into a significant global event, and practice creating meaningful visualizations.

**Data Collection**

 European Centre for Disease Prevention and Control - An agency of the European Union - [Project Dataset](https://github.com/STATWORX/covid-19-api#Overview)

 This API provides COVID-19 data for different continents, including Asia, Europe, Africa, America, and Oceania. For this project, we focused on analyzing the COVID-19 data specifically for Asian countries and their associated regions.








In [None]:
import requests
import json
import pandas as pd
# POST to API
payload = {'code': 'ALL'} #To query all countries, use {'code': 'ALL'} in the payload.
URL = 'https://api.statworx.com/covid'
response = requests.post(url=URL, data=json.dumps(payload))

# Convert to data frame
df = pd.DataFrame.from_dict(json.loads(response.text))

In [None]:
# Filter rows where continent is 'Asia'
asian_countries_df = df[df['continent'] == 'Asia']

# Select the specified columns
selected_columns = ['date', 'day', 'month', 'year', 'country', 'code', 'cases', 'deaths', 'population']
asian_countries_details = asian_countries_df[selected_columns]

# Display details of Asian countries
print("Details of Asian Countries:")
asian_countries_details

Details of Asian Countries:


Unnamed: 0,date,day,month,year,country,code,cases,deaths,population
0,2019-12-31,31,12,2019,Afghanistan,AF,0,0,38041757.0
1,2020-01-01,01,01,2020,Afghanistan,AF,0,0,38041757.0
2,2020-01-02,02,01,2020,Afghanistan,AF,0,0,38041757.0
3,2020-01-03,03,01,2020,Afghanistan,AF,0,0,38041757.0
4,2020-01-04,04,01,2020,Afghanistan,AF,0,0,38041757.0
...,...,...,...,...,...,...,...,...,...
61355,2020-12-10,10,12,2020,Yemen,YE,0,0,29161922.0
61356,2020-12-11,11,12,2020,Yemen,YE,2,0,29161922.0
61357,2020-12-12,12,12,2020,Yemen,YE,1,0,29161922.0
61358,2020-12-13,13,12,2020,Yemen,YE,1,0,29161922.0


In [None]:
# Filter rows where continent is 'Asia' and get unique country names
asian_countries_count = len(df[df['continent'] == 'Asia']['country'].unique())

print("Number of Asian Countries:", asian_countries_count)

# Filter rows where continent is 'Asia' and get unique country names
asian_countries = df[df['continent'] == 'Asia']['country'].unique()

print("Asian Countries:")
print(asian_countries) #42 countries


Number of Asian Countries: 42
Asian Countries:
['Afghanistan' 'Bahrain' 'Bangladesh' 'Bhutan' 'Brunei_Darussalam'
 'Cambodia' 'China' 'India' 'Indonesia' 'Iran' 'Iraq' 'Israel' 'Japan'
 'Jordan' 'Kazakhstan' 'Kuwait' 'Kyrgyzstan' 'Laos' 'Lebanon' 'Malaysia'
 'Maldives' 'Mongolia' 'Myanmar' 'Nepal' 'Oman' 'Pakistan' 'Palestine'
 'Philippines' 'Qatar' 'Saudi_Arabia' 'Singapore' 'South_Korea'
 'Sri_Lanka' 'Syria' 'Taiwan' 'Tajikistan' 'Thailand' 'Timor_Leste'
 'United_Arab_Emirates' 'Uzbekistan' 'Vietnam' 'Yemen']


In [None]:
import pandas as pd
import requests
import json

# POST to API
payload = {'code': 'ALL'}  # To query all countries, use {'code': 'ALL'} in the payload.
URL = 'https://api.statworx.com/covid'
response = requests.post(url=URL, data=json.dumps(payload))

# Convert response to dataframe
df = pd.DataFrame.from_dict(json.loads(response.text))

# Filter rows where continent is 'Asia'
asian_countries_df = df[df['continent'] == 'Asia']

# Select the specified columns
selected_columns = ['date', 'cases', 'deaths']
asian_countries_details = asian_countries_df[selected_columns]

# Convert 'date' column to datetime format
asian_countries_details['date'] = pd.to_datetime(asian_countries_details['date'])

# Find start and end dates
start_date = asian_countries_details['date'].min()
end_date = asian_countries_details['date'].max()

print("Start date of data collected:", start_date)
print("End date of data collected:", end_date)


Start date of data collected: 2019-12-31 00:00:00
End date of data collected: 2020-12-14 00:00:00


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  asian_countries_details['date'] = pd.to_datetime(asian_countries_details['date'])


In [None]:
from datetime import datetime

# Define the start and end dates
start_date = datetime(2019, 12, 31)
end_date = datetime(2020, 12, 14)

# Calculate the number of days between the two dates
num_days = (end_date - start_date).days

print("Number of days between", start_date.strftime('%Y-%m-%d'), "and", end_date.strftime('%Y-%m-%d'), "is", num_days, "days.")


Number of days between 2019-12-31 and 2020-12-14 is 349 days.


In [None]:
# Define the number of days
num_days = 349

# Convert days to years
num_years = num_days / 365.25

print("349 days is equivalent to approximately", round(num_years, 2), "years.")


349 days is equivalent to approximately 0.96 years.


In [None]:
import pandas as pd
import requests
import json
import plotly.express as px

# POST to API
payload = {'code': 'ALL'}
URL = 'https://api.statworx.com/covid'
response = requests.post(url=URL, data=json.dumps(payload))

# Convert response to DataFrame
df = pd.DataFrame.from_dict(json.loads(response.text))

# Filter rows where continent is 'Asia'
asian_countries_df = df[df['continent'] == 'Asia'].copy()  # Ensure a copy to avoid SettingWithCopyWarning

# Get top 10 countries with highest population
top_10_countries_population = asian_countries_df.groupby('country')['population'].max().nlargest(10).reset_index()

# Plot top 10 countries with highest population using Plotly
fig_top_10_population = px.bar(top_10_countries_population, x='country', y='population',
                                title='Top 10 Countries with Highest Population in Asia',
                                labels={'country': 'Country', 'population': 'Population'})
fig_top_10_population.update_xaxes(title='Country')
fig_top_10_population.update_yaxes(title='Population')
fig_top_10_population.show()


In [None]:
import pandas as pd
import requests
import json
import plotly.express as px

# POST request to API to fetch COVID-19 data
payload = {'code': 'ALL'}
URL = 'https://api.statworx.com/covid'
response = requests.post(url=URL, data=json.dumps(payload))

# Convert response to DataFrame
df = pd.DataFrame.from_dict(json.loads(response.text))

# Filter rows where continent is 'Asia'
asian_countries_df = df[df['continent'] == 'Asia']

# Group by country and sum the cases
cases_by_country = asian_countries_df.groupby('country')['cases'].sum().reset_index()

# Sort by cases in descending order and select top 10 countries
top_10_countries = cases_by_country.sort_values(by='cases', ascending=False).head(10)

# Plot top 10 countries with number of cases using Plotly
fig = px.bar(top_10_countries, x='country', y='cases', title='Top 10 Countries with Highest Number of COVID-19 Cases')
fig.update_xaxes(title='Country')
fig.update_yaxes(title='Number of Cases')
fig.show()


In [None]:
import pandas as pd
import requests
import json
import plotly.express as px

# POST to API
payload = {'code': 'ALL'}
URL = 'https://api.statworx.com/covid'
response = requests.post(url=URL, data=json.dumps(payload))

# Convert response to DataFrame
df = pd.DataFrame.from_dict(json.loads(response.text))

# Filter rows where continent is 'Asia' and select columns 'country' and 'deaths'
asian_countries_df = df[df['continent'] == 'Asia'][['country', 'deaths']]

# Group by country and sum the deaths
country_deaths = asian_countries_df.groupby('country')['deaths'].sum()

# Sort the countries by number of deaths and select the top 10
top_10_countries_deaths = country_deaths.nlargest(10)

# Plotting the top 10 countries with number of deaths using Plotly
fig = px.bar(top_10_countries_deaths, x=top_10_countries_deaths.index, y='deaths',
             title='Top 10 Countries with Number of Deaths',
             labels={'x':'Country', 'deaths':'Number of Deaths'})
fig.show()


In [None]:
import pandas as pd
import requests
import json
import plotly.express as px

# Function to fetch COVID-19 data for Asian countries
def fetch_covid_data():
    # POST to API
    payload = {'code': 'ALL'}  # To query all countries, use {'code': 'ALL'} in the payload.
    URL = 'https://api.statworx.com/covid'
    response = requests.post(url=URL, data=json.dumps(payload))

    # Convert response to dataframe
    df = pd.DataFrame.from_dict(json.loads(response.text))

    # Filter rows where continent is 'Asia'
    asian_countries_df = df[df['continent'] == 'Asia']

    return asian_countries_df

# Fetch COVID-19 data for Asian countries
asian_countries_df = fetch_covid_data()

# Display first few rows of the dataframe
print("First few rows of the COVID-19 data for Asian countries:")
print(asian_countries_df.head())



First few rows of the COVID-19 data for Asian countries:
         date day month  year  cases  deaths      country code  population  \
0  2019-12-31  31    12  2019      0       0  Afghanistan   AF  38041757.0   
1  2020-01-01  01    01  2020      0       0  Afghanistan   AF  38041757.0   
2  2020-01-02  02    01  2020      0       0  Afghanistan   AF  38041757.0   
3  2020-01-03  03    01  2020      0       0  Afghanistan   AF  38041757.0   
4  2020-01-04  04    01  2020      0       0  Afghanistan   AF  38041757.0   

  continent  cases_cum  deaths_cum  
0      Asia          0           0  
1      Asia          0           0  
2      Asia          0           0  
3      Asia          0           0  
4      Asia          0           0  


In [None]:
import pandas as pd
import plotly.express as px
import requests
import json


# Select the specified columns
selected_columns = ['date', 'cases', 'deaths']
asian_countries_details = asian_countries_df[selected_columns]

# Convert 'date' column to datetime format
asian_countries_details['date'] = pd.to_datetime(asian_countries_details['date'])

# Aggregate data by date
daily_cases = asian_countries_details.groupby('date')['cases'].sum().reset_index()
daily_deaths = asian_countries_details.groupby('date')['deaths'].sum().reset_index()

# Create line plot for daily cases
fig = px.line(daily_cases, x='date', y='cases', title='Daily COVID-19 Cases in Asia')
fig.show()





A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy



The overall trend in the graph of daily COVID-19 cases in Asia from 2019-Dec-31
to 2020-Dec-14 shows several notable patterns:

Initial Surge: The graph begins with a relatively low number of cases in early 2020. However, around March 2020, there is a sudden surge in cases, indicating the initial impact of the pandemic.

First Peak: The cases continue to rise sharply, reaching a significant peak around September 2020. This peak likely corresponds to a major outbreak in the region.

Fluctuations: After the First peak, the graph exhibits fluctuations. Cases decrease slightly, then rise again, suggesting ongoing waves of infections.
Second Peak: Around Novenber 2021, there is another noticeable peak, although it appears lower than the first one. This could represent a subsequent wave of infections.

Stabilization: Toward the end of the graph, the cases seem to stabilize, indicating efforts to control the spread of the virus.

In [None]:
# Create line plot for daily deaths
fig = px.line(daily_deaths, x='date', y='deaths', title='Daily COVID-19 Deaths in Asia')
fig.show()


Initial Surge: The graph begins with a zero or relatively low number of daily deaths in early 2020. However, around February 2020, there is a sudden surge in deaths, indicating the severity of the pandemic’s impact.

First Peak: The death toll continues to rise sharply, reaching a significant peak around April 2020. This peak likely corresponds to a major outbreak or a critical phase in the pandemic.

Fluctuations: After the initial peak, the graph exhibits fluctuations. The death rates decrease slightly, then rise again, suggesting ongoing waves of infections and fatalities.

Second Peak: Around June 2020, there is another noticeable peak (2500 deaths),This sudden rise in deaths highlights the importance of isolating and raising awareness among people to reduce the impact of the pandemic and around Aug 2020 these was a good decrease in the death(833) indicating the resukts of control measures and self isolation.After this there was two peaks during Nov 21 2020 (1946 deaths) which again increased to 2020 deaths on Nov 25 2020.

Stabilization: Toward the end of the graph (December 2020), the death rates seem to stabilize, indicating efforts to control the spread of the virus and mitigate fatalities.

In [None]:
# Trend analysis
asian_countries_df['date'] = pd.to_datetime(asian_countries_df['date'])
daily_cases = asian_countries_df.groupby('date')['cases'].sum()
daily_deaths = asian_countries_df.groupby('date')['deaths'].sum()

# Plot daily cases and deaths using Plotly
fig = px.line(x=daily_cases.index, y=daily_cases.values, labels={'x':'Date', 'y':'Count'}, title='Trend of COVID-19 Cases in Asian Countries')
fig.add_scatter(x=daily_deaths.index, y=daily_deaths.values, mode='lines', name='Daily Deaths', line=dict(color='red'))
fig.show()




Deaths: Highest death occured in India (143,355k), followed by Iran(52.196k) and Indonesia(18.819k)



In [None]:
# Calculate mortality rate
asian_countries_df['mortality_rate'] = asian_countries_df['deaths'] / asian_countries_df['cases']
mortality_rate_by_country = asian_countries_df.groupby('country')['mortality_rate'].mean().reset_index()

# Plot mortality rate by country using Plotly
fig = px.bar(mortality_rate_by_country, x='country', y='mortality_rate', title='Average Mortality Rate of COVID-19 by Asian Countries')
fig.update_xaxes(title='Country')
fig.update_yaxes(title='Mortality Rate')
fig.show()


The chart shows mortality rates, indicating how each country has been impacted by the pandemic.
Average Mortality Rate : Mortality is a broader term that encompasses the occurrence of death within a population or group over a specified period of time. It is often used to measure the frequency or rate of deaths in a population, usually expressed as a ratio or percentage.
Iran (0.0599)had the highest mortality rate followed by Japan(0.0469) and  Indonesia(0.0429).

In [None]:
# Geographical distribution
fig = px.scatter(asian_countries_df, x='cases', y='deaths', color='country', title='Geographical Distribution of COVID-19 Cases and Deaths in Asian Countries', hover_name='country')
fig.update_xaxes(title='Total Cases')
fig.update_yaxes(title='Total Deaths')
fig.show()


The scatter plot titled “Geographical Distribution of COVID-19 Cases and Deaths in Asian Countries” offers a compelling visual representation of the pandemic’s impact. Each point marks a country’s total cases and deaths, providing a clear correlation between the two metrics, which shows the positive correlation between two metrices.

In [None]:
import pandas as pd
import requests
import json
import plotly.express as px

# POST to API
payload = {'code': 'ALL'}
URL = 'https://api.statworx.com/covid'
response = requests.post(url=URL, data=json.dumps(payload))

# Convert response to DataFrame
df = pd.DataFrame.from_dict(json.loads(response.text))

# Filter rows where continent is 'Asia'
asian_countries_df = df[df['continent'] == 'Asia'].copy()  # Ensure a copy to avoid SettingWithCopyWarning


# Population analysis
population_by_country = asian_countries_df.groupby('country')['population'].max().reset_index()

# Plot population of Asian countries using Plotly
fig_pop = px.bar(population_by_country, x='country', y='population', title='Population of Asian Countries')
fig_pop.update_xaxes(title='Country')
fig_pop.update_yaxes(title='Population')
fig_pop.show()


In [None]:
import pandas as pd
import requests
import json
import plotly.express as px

# POST to API
payload = {'code': 'ALL'}
URL = 'https://api.statworx.com/covid'
response = requests.post(url=URL, data=json.dumps(payload))

# Convert response to DataFrame
df = pd.DataFrame.from_dict(json.loads(response.text))

# Filter rows where continent is 'Asia'
asian_countries_df = df[df['continent'] == 'Asia'].copy()  # Ensure a copy to avoid SettingWithCopyWarning

# Convert the 'date' column to datetime format
asian_countries_df['date'] = pd.to_datetime(asian_countries_df['date'])

# Population analysis
population_before_after_covid = asian_countries_df.groupby('country').agg({'population': 'max', 'deaths': 'sum'}).reset_index()
population_before_after_covid['population_after_covid'] = population_before_after_covid['population'] - population_before_after_covid['deaths']

# Plot population before and after COVID-19 using Plotly
fig_population_before_after = px.bar(population_before_after_covid, x='country', y=['population', 'population_after_covid'],
                                     title='Population Before and After COVID-19 in Asian Countries',
                                     labels={'value': 'Population', 'variable': 'Population Status', 'country': 'Country'})
fig_population_before_after.update_xaxes(title='Country')
fig_population_before_after.update_yaxes(title='Population')
fig_population_before_after.show()

# Calculate population difference before and after COVID-19
population_before_after_covid['population_difference'] = population_before_after_covid['population'] - population_before_after_covid['population_after_covid']

# Sort the DataFrame based on population difference from highest to least
population_difference_table_sorted = population_before_after_covid.sort_values(by='population_difference', ascending=False)[['country', 'population_difference']]
population_difference_table_sorted


Unnamed: 0,country,population_difference
7,India,143355.0
9,Iran,52196.0
8,Indonesia,18819.0
10,Iraq,12579.0
25,Pakistan,8832.0
27,Philippines,8733.0
2,Bangladesh,7052.0
29,Saudi_Arabia,6048.0
6,China,4739.0
13,Jordan,3365.0
