# Analyzing the Relationship Between Weather Conditions and Migraine Incidents

## 1. Introduction

In this notebook, we will focus on acquiring the raw data needed for our analysis. We aim to study the correlation between weather patterns, specifically sea-level pressure, and the frequency of migraines.

## 2. Objectives

-   To load weather data from our S3 storage bucket.
-   To load migraine frequency data from our S3 storage bucket.
-   To provide initial observations about the raw data.

## 3. Setup

In [2]:
# Import the libraries
from dotenv import load_dotenv
import os
import sys
import pandas as pd
pd.set_option('display.max_columns', None)

# Load the environment variables
load_dotenv("../config/.env")

scripts_path = os.getenv("SCRIPTS_PATH")

# Add the path to the scripts folder and import the functions
if scripts_path not in sys.path:
    sys.path.append(scripts_path)

In [3]:
# Import necessary Python packages
from raw_data import get_raw_dataframes

## 4. Data Acquisition Overview

-   **Weather Data**: Contains daily weather data, including sea-level pressure, from various locations.
-   **Migraine Data**: Contains annual summaries of migraine frequencies in specific locations.

## 5. Loading Data

In [4]:
# Load the data using our Python script
city_data, country_data, weather_data, migraine_data = get_raw_dataframes()

# Display sample rows from each DataFrame
print("\nCity Data:\n")
print(city_data.head())

# Display sample rows from each DataFrame
print("\nCountry Data:\n")
print(country_data.head())

# Display sample rows from each DataFrame
print("\nWeather Data:\n")
print(weather_data.head())

print("\nMigraine Data:\n")
print(migraine_data.head())



City Data:

  station_id   city_name      country       state iso2 iso3   latitude  \
0      41515    Asadabad  Afghanistan       Kunar   AF  AFG  34.866000   
1      38954    Fayzabad  Afghanistan  Badakhshan   AF  AFG  37.129761   
2      41560   Jalalabad  Afghanistan   Nangarhar   AF  AFG  34.441527   
3      38947      Kunduz  Afghanistan      Kunduz   AF  AFG  36.727951   
4      38987  Qala i Naw  Afghanistan     Badghis   AF  AFG  34.983000   

   longitude  
0  71.150005  
1  70.579247  
2  70.436103  
3  68.872530  
4  63.133300  

Country Data:

          country     native_name iso2 iso3  population       area    capital  \
0     Afghanistan       افغانستان   AF  AFG  26023100.0   652230.0      Kabul   
1         Albania       Shqipëria   AL  ALB   2895947.0    28748.0     Tirana   
2         Algeria         الجزائر   DZ  DZA  38700000.0  2381741.0    Algiers   
3  American Samoa  American Samoa   AS  ASM     55519.0      199.0  Pago Pago   
4          Angola          Ango

## 6. Initial Observations

- The weather data contains several missing values that we may need to address during preprocessing.
- The migraine data contains no missing values.
- Each dataset has columns that need removed as unnecessary for our analysis.
- The migraine data is summarized annually, so we may need to aggregate weather data to match this level.
- The countries table will need joined with the cities table to provide more context for the weather data, joining on `country` column.
- The weather data will need to be joined with the combined cities and countries table, joining on `station_id` column.
- The migraine data will need to be joined with the combined cities, countries, and weather table, joining on `city_name` column from weather data and `location_name` column from migraine data.

## 7. Next Steps

The next notebook, **[02_data_preprocessing.ipynb](./02_data_preprocessing.ipynb)**, will focus on cleaning the data, handling missing values, and merging the datasets for analysis.

## 8. Summary

-   Successfully acquired raw weather and migraine data from S3 storage.
-   Identified some initial challenges like missing values and data granularity that will be addressed in subsequent notebooks.

## 9. References

- Weather data source: [The Weather Dataset](https://www.kaggle.com/datasets/guillemservera/global-daily-climate-data/)

- Migraine data source: [Global Burden of Disease Collaborative Network.](https://vizhub.healthdata.org/gbd-results/)

    Global Burden of Disease Study 2019 (GBD 2019) Results.
    Seattle, United States: Institute for Health Metrics and Evaluation (IHME), 2020.
    Available from https://vizhub.healthdata.org/gbd-results/.