# Weather Data Processing Pipeline

This notebook implements a data pipeline to ingest, clean, transform, and analyze weather data. The pipeline is modular, retains all data rows, and includes multiple visualizations for deeper insights.

In [1]:
import os
import sys

# Set working directory to project root
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)

In [2]:
## Step 1: Import Dependencies and Modules
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from src.data_ingestion import load_weather_data
from src.data_cleaning import clean_weather_data
from src.data_transformation import transform_weather_data
from src.data_output import save_transformed_data, generate_temperature_report, plot_visualizations

In [3]:
# Set visualization style
sns.set(style="whitegrid")

## Step 2: Data Ingestion
# Load the weather data from the CSV file
file_path = "../data/weather_data.csv"
df = load_weather_data(file_path)
df.head()

Successfully loaded data from ../data/weather_data.csv


Unnamed: 0,date,city,temperature_celsius,humidity_percent,wind_speed_kph,weather_condition
0,2023-01-01,New York,5.0,60.0,10.0,Sunny
1,01/02/2023,New York,,65.0,12.0,Cloudy
2,03-01-2023,New York,7.0,,8.0,Rainy
3,,London,8.0,70.0,15.0,Unknown
4,2023-01-02,London,6.0,75.0,20.0,Snowy


In [4]:
## Step 3: Data Cleaning
# Clean the data: handle missing values, standardize dates
df_cleaned = clean_weather_data(df)
df_cleaned.head()

Data cleaning completed.


Unnamed: 0,date,city,temperature_celsius,humidity_percent,wind_speed_kph,weather_condition
0,2023-01-01,New York,5.0,60.0,10.0,Sunny
1,NaT,New York,8.923529,65.0,12.0,Cloudy
2,NaT,New York,7.0,57.142857,8.0,Rainy
3,NaT,London,8.0,70.0,15.0,Unknown
4,2023-01-02,London,6.0,75.0,20.0,Snowy


In [5]:
## Step 4: Data Transformation
# Transform the data: add temperature in Fahrenheit
df_transformed = transform_weather_data(df_cleaned)
df_transformed.head()

Data transformation completed.


Unnamed: 0,date,city,temperature_celsius,humidity_percent,wind_speed_kph,weather_condition,temperature_fahrenheit
0,2023-01-01,New York,5.0,60.0,10.0,Sunny,41.0
1,NaT,New York,8.923529,65.0,12.0,Cloudy,48.062353
2,NaT,New York,7.0,57.142857,8.0,Rainy,44.6
3,NaT,London,8.0,70.0,15.0,Unknown,46.4
4,2023-01-02,London,6.0,75.0,20.0,Snowy,42.8


In [6]:
## Step 5: Data Output
# Save the transformed data, generate a report, and create visualizations
output_path = "../outputs/transformed_weather_data.csv"
report_path = "../outputs/temperature_report.md"
viz_dir = "../outputs/"
save_transformed_data(df_transformed, output_path)
generate_temperature_report(df_transformed, report_path)
plot_visualizations(df_transformed, viz_dir)

Transformed data saved to ../outputs/transformed_weather_data.csv
Temperature report saved to ../outputs/temperature_report.md
Visualizations saved to outputs/


In [7]:
## Step 6: Display Visualizations
# The visualizations are saved in the outputs/ directory. Here's a preview of what's generated:
# - Average Temperature per City (Bar Chart)
# - Temperature Trend Over Time (Line Plot)
# - Humidity vs Wind Speed (Scatter Plot)
# - Weather Condition Distribution (Pie Chart)