# Data Filtering and Regional Analysis Notebook

This notebook focuses on filtering and preparing cleaned global health data for regional analysis. It performs the following tasks:
- Reads the cleaned data from `clean_data.csv`.
- Converts the `date` column to an integer format for consistency.
- Filters the data to include only the 7 major regions of interest:
  - East Asia & Pacific
  - Europe & Central Asia
  - Latin America & Caribbean
  - Middle East & North Africa
  - North America
  - South Asia
  - Sub-Saharan Africa
- Saves the filtered data to a new CSV file (`region_data.csv`) for further analysis or visualization.

Use this notebook to extract and save region-specific data for deeper insights.

Reading and Checking the Data

In [None]:
import pandas as pd

# Read the clean_data.csv file into a DataFrame
cleaned_df = pd.read_csv('clean_data.csv')

# Display the first few rows to confirm
print(cleaned_df.head())

# Convert 'date' to integer
cleaned_df['date'] = cleaned_df['date'].astype(int)

# Check
print(cleaned_df.dtypes)

Filtering by Regions and Saving the Filtered Data

In [None]:
# Define the 7 major regions you want
regions_of_interest = [
    'East Asia & Pacific',
    'Europe & Central Asia',
    'Latin America & Caribbean',
    'Middle East & North Africa',
    'North America',
    'South Asia',
    'Sub-Saharan Africa'
]

# Filter cleaned_df
region_df = cleaned_df[cleaned_df['country'].isin(regions_of_interest)]

# Check what you get
print(region_df['country'].unique())
region_df.head()

# Save region_df as a CSV file
region_df.to_csv('region_data.csv', index=False)

# Confirm the file was saved
print("region_data.csv has been saved.")