# Asthma Prevalence Data Cleaning

This notebook cleans and prepares the asthma prevalence data by country for analysis. The data is sourced from the "Asthma Prevalence By Country" CSV file and includes information on asthma prevalence rates across various countries. The cleaned data will be integrated with other datasets to examine potential correlations between asthma rates, air quality, and quality of life indicators globally.


In [6]:
# Dependencies
import pandas as pd

# Load the asthma prevalence data
csv_file = 'Raw_Data/Asthma Prevalence By Country.csv'
asthma_prevalence_df = pd.read_csv(csv_file)

# Preview the first few rows of the data
asthma_prevalence_df.head()


Unnamed: 0,Entity,Code,Year,Current number of cases in the population per 100 people - Sex: Both - Age: Age-standardized - Cause: Asthma
0,Afghanistan,AFG,1990,4.740098
1,Afghanistan,AFG,1991,4.715458
2,Afghanistan,AFG,1992,4.680628
3,Afghanistan,AFG,1993,4.65177
4,Afghanistan,AFG,1994,4.630515


In [7]:
# Remove the country 'Code' column as it is not needed for analysis
asthma_prevalence_df.drop(columns=['Code'], inplace=True)

# Preview the DataFrame to confirm column removal
asthma_prevalence_df.head()

Unnamed: 0,Entity,Year,Current number of cases in the population per 100 people - Sex: Both - Age: Age-standardized - Cause: Asthma
0,Afghanistan,1990,4.740098
1,Afghanistan,1991,4.715458
2,Afghanistan,1992,4.680628
3,Afghanistan,1993,4.65177
4,Afghanistan,1994,4.630515


In [8]:
# Rename the asthma cases column for readability
asthma_prevalence_df.rename(
    columns={
        'Current number of cases in the population per 100 people - Sex: Both - Age: Age-standardized - Cause: Asthma': 
        'Asthma Cases per 100 (Age-Standardized, Both Sexes)'
    }, 
    inplace=True
)

# Preview the DataFrame to confirm column renaming
asthma_prevalence_df.head()

Unnamed: 0,Entity,Year,"Asthma Cases per 100 (Age-Standardized, Both Sexes)"
0,Afghanistan,1990,4.740098
1,Afghanistan,1991,4.715458
2,Afghanistan,1992,4.680628
3,Afghanistan,1993,4.65177
4,Afghanistan,1994,4.630515


In [9]:
# Filter out rows with 'Year' values earlier than 2015 for relevant analysis
asthma_prevalence_df = asthma_prevalence_df[asthma_prevalence_df['Year'] >= 2015]

# Preview the DataFrame to confirm filtering
asthma_prevalence_df.head()

Unnamed: 0,Entity,Year,"Asthma Cases per 100 (Age-Standardized, Both Sexes)"
25,Afghanistan,2015,4.154888
26,Afghanistan,2016,4.183797
27,Afghanistan,2017,4.228361
28,Afghanistan,2018,4.251362
29,Afghanistan,2019,4.281019


In [10]:
# Rename the 'Entity' column to 'Country' for clarity
asthma_prevalence_df.rename(columns={'Entity': 'Country'}, inplace=True)

# Preview the DataFrame to confirm the column rename
asthma_prevalence_df.head()

Unnamed: 0,Country,Year,"Asthma Cases per 100 (Age-Standardized, Both Sexes)"
25,Afghanistan,2015,4.154888
26,Afghanistan,2016,4.183797
27,Afghanistan,2017,4.228361
28,Afghanistan,2018,4.251362
29,Afghanistan,2019,4.281019


In [11]:
# Define the file path for saving the cleaned asthma prevalence data
cleaned_asthma_prevalence = 'Cleaned_Data/cleaned_asthma_prevalence.csv'

# Export the cleaned DataFrame to a CSV file without the index
asthma_prevalence_df.to_csv(cleaned_asthma_prevalence, index=False)