# `FRE 521D_Assignment 2_Group 3`
### Members: Janine, Juliette, Margaret & Clare

## Task 1: Pipeline Architecture Design

### (a) Data Flow Diagram

The ETL pipeline follows a layered Extract–Transform–Load–Aggregate architecture as illustrated in Figure 1.
Weather data is extracted from the Open-Meteo Historical Weather API using country centroid coordinates from a CSV file. The pipeline enforces rate limiting and retry logic to ensure reliable API access.

During the transformation stage, JSON responses are flattened, cleaned, validated, and standardized. The transformed data is then loaded into a daily weather table using upsert operations to prevent duplication.

Finally, daily weather data is aggregated into monthly and annual summary tables. These aggregated datasets are joined with crop production data to produce an integrated analytical view used for business analysis.

---
```
┌─────────────────────────────────────────────────────────────────────────────────────┐
│                        ETL PIPELINE ARCHITECTURE                                    │
├─────────────────────────────────────────────────────────────────────────────────────┤
│                                                                                     │
│  ┌─────────────┐     ┌─────────────┐      ┌─────────────┐      ┌─────────────┐      │
│  │   EXTRACT   │────>│  TRANSFORM  │─────>│    LOAD     │─────>│  AGGREGATE  │      │
│  └─────────────┘     └─────────────┘      └─────────────┘      └─────────────┘      │
│        │                   │                   │                   │                │
│        ▼                   ▼                   ▼                   ▼                │
│  ┌──────────────┐    ┌───────────┐       ┌──────────────┐    ┌───────────────────┐  │
│  │  Open-Meteo  │    │ Flatten   │       │daily_ Weather│    │ monthly_ weather  │  │
│  │   API        │    │ JSON      │       │ Table        │    │                   │  │
│  │              │    │           │       │              │    │- Monthly & Annual │  │
│  │ - Rate limit │    │ - Parse   │       │ - Upsert     │    │- Calculate Metrics│  │
│  │ - Retry      │    │ - Validate│       │ - Dedupe     │    │- Join with        │  │
│  │ - Country CSV│    │ - Clean   │       │ - Index      │    │  production data  │  │
│  └──────────────┘    └───────────┘       └──────────────┘    └───────────────────┘  │
│        │                   │                   │                       │            │
│        ▼                   ▼                   ▼───────────────────────▼───────┐    │
│     Raw Layer          Cleaned Layer        Aggregate Layers           │       │    │
│        │                   │                   │                       │       │    │
│        ▼                   ▼                   ▼                       ▼       ▼    │
│  ┌───────────┐       ┌───────────┐       ┌───────────┐  ┌────────────────────────┐  │
│  │  Logging  │       │  Logging  │       │  Logging  │  │        Logging         │  │
│  │ - Success │       │ - Records │       │ - Inserts │  │- weather & crop Data   │  │
│  │ - Errors  │       │ - Nulls   │       │ - Commits │  │- Country + Year leve   │  │
│  │ - Timing  │       │ - Types   │       │ - Errors  │  │    ┌───────────────┐   │  │
│  └───────────┘       └───────────┘       └───────────┘  └────│ANALYSIS READY │───┘  │
│                                                              └───────────────┘      │
└─────────────────────────────────────────────────────────────────────────────────────┘
```
**Figure 1:** ETL pipeline data flow illustrating extraction from the Open-Meteo API, transformation through raw and cleaned layers, aggregation, and integration with crop production data.

```
┌─────────────────────────────────────────────────────────────────────┐
│                    TABLE RELATIONSHIPS                              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│   A-1 TABLES                         A-2 TABLES                     │
│   ──────────                         ──────────                     │
│                                                                     │
│   ┌──────────────┐                   ┌──────────────┐               │
│   │crop_production│                   │daily_weather │              │
│   │              │                   │              │               │
│   │ iso3_code ───┼───────────────────┼─ iso3_code   │               │
│   │ year      ───┼───┐               │ date         │               │
│   │ crop         │   │               └──────────────┘               │
│   │ production   │   │                      │                       │
│   │ yield        │   │                      │ Aggregate             │
│   └──────────────┘   │                      ▼                       │
│          │           │               ┌──────────────┐               │
│          │           │               │annual_weather│               │
│          │           │               │              │               │
│          │           └───────────────┼─ iso3_code   │               │
│          │                           │ year ────────┼───┐           │
│          │                           │ weather vars │   │           │
│          │                           └──────────────┘   │           │
│          │                                              │           │
│          │              JOIN ON                         │           │
│          │         iso3_code + year                     │           │
│          │                                              │           │
│          ▼                                              ▼           │
│   ┌─────────────────────────────────────────────────────────┐       │
│   │              climate_agriculture_analysis               │       │
│   │                   (Integrated View)                     │       │
│   │                                                         │       │
│   │  - Country attributes (name, region, income group)      │       │
│   │  - Crop metrics (production, yield, area, fertilizer)   │       │
│   │  - Climate metrics (temp, precip, GDD, extremes)        │       │
│   │  - Derived: water balance, temp bucket                  │       │
│   └─────────────────────────────────────────────────────────┘       │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘
```
**Figure 2:** Table relationships showing integration between Assignment 1 crop production tables and new weather tables using iso3_code and year.

### (b) Schema for New Weather Tables

The ETL pipeline introduces several new tables to store weather data:

**Daily Weather Table (`daily_weather`)**

Stores cleaned daily weather observations including:
- iso3_code
- date
- temperature metrics (mean, max, min)
- precipitation and rain totals
- evapotranspiration (ET0)

This table enforces uniqueness on (`iso3_code, date`) and serves as the cleaned data layer.

**Monthly Weather Table (`monthly_weather`)**

Stores monthly aggregated climate summaries by country and month.

**Annual Weather Table (`annual_weather`)**

Stores annual aggregated climate indicators by country and year, including derived metrics such as:
- Growing Degree Days (GDD)
- Precipitation variability
- Extreme temperature counts

### (c) Relationship to Assignment 1 Tables

Weather data is integrated with existing Assignment 1 crop production tables using shared temporal and geographic keys.

The `daily_weather` table is aggregated into the `annual_weather` table by iso3_code and year.

The `annual_weather` table is then joined with the `crop_production` table on:

**iso3_code + year**


This produces the integrated view `climate_agriculture_analysis`, which combines:
- Country attributes (name, region, income group)
- Crop metrics (production, yield, area, fertilizer use)
- Climate metrics (temperature, precipitation, GDD, extremes)

### (d) Error Handling and API Rate Limit Strategy

To ensure reliable data extraction, the pipeline implements:
- A minimum 5-second delay between API requests to comply with rate limits
- Retry logic with up to three attempts for failed requests
- Exponential backoff between retries

Errors, failures, and processing times are logged at each ETL stage to allow monitoring and troubleshooting.

### (e) Data Lineage and Logging

The pipeline tracks data lineage through both metadata storage and operational logging.

Each extracted weather record includes:
- Source identifier (Open-Meteo API)
- Extraction timestamp

Logging is implemented across the ETL process to capture:
- Extraction success or failure
- Record counts
- Data validation results (nulls and type issues)
- Load confirmations and database errors

This ensures transparency, reproducibility, and quality control throughout the pipeline.


## Task 2: ETL Pipeline Implementation

Our Task 2 work plan:

1. Extract : CSV ingestion, API ingestion, Rate limiting, Logging
2. Tranform : JSON flattening, Raw layer, Cleaned layer, Type enforcement, Validation
3. Load : Table creation, Constraints (CHECK + PK), Batch UPSERT loading, Verification

In [26]:
# ============================================
# ETL PIPELINE SETUP
# ============================================

# Standard imports for ETL work
import pandas as pd
import numpy as np
import requests
import json
import time
import os
from datetime import datetime
from sqlalchemy import text


# For database connection 
from sqlalchemy import create_engine

# Load the SQL magic extension
%load_ext sql

# Display settings 
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)
pd.set_option('display.max_colwidth', 50)

print(f"Pandas version: {pd.__version__}")
print(f"Current time: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("Setup complete!")

The sql extension is already loaded. To reload it, use:
  %reload_ext sql
Pandas version: 2.3.3
Current time: 2026-01-29 15:15:02
Setup complete!


In [5]:
# ============================================
# DATABASE CONNECTION
# ============================================

# Database connection parameters
DB_USER = "mfre521d_user"
DB_PASSWORD = "mfre521d_user_pw"
DB_HOST = "localhost"
DB_PORT = "3306"
DB_NAME = "mfre521d"

# Create connection string
connection_string = f"mysql+pymysql://{DB_USER}:{DB_PASSWORD}@{DB_HOST}:{DB_PORT}/{DB_NAME}"

# Create SQLAlchemy engine
engine = create_engine(connection_string)

# Connect SQL magic
%sql {connection_string}

print("Connection established!")

Connection established!


In [6]:
# ============================================
# LOAD COUNTRY CENTROIDS CSV SAFELY
# ============================================

def read_csv_safely(filepath):
    """
    Read CSV with full control (lecture style).
    Returns all columns as strings for safe processing.
    """
    df = pd.read_csv(
        filepath,
        dtype=str,               # everything as string
        keep_default_na=False,  # no auto converting NA values
        encoding='utf-8',
        low_memory=False
    )
    
    print(f"Loaded {len(df)} rows and {len(df.columns)} columns from {filepath}")
    return df


# Load centroid file 
centroids_path = "../data/country_centroids.csv"
df_centroids_raw = read_csv_safely(centroids_path)

# Quick inspection 
print("\nColumn names:")
print(df_centroids_raw.columns.tolist())

print("\nFirst 5 rows:")
df_centroids_raw.head()


Loaded 34 rows and 5 columns from ../data/country_centroids.csv

Column names:
['iso3_code', 'country_name', 'latitude', 'longitude', 'hemisphere']

First 5 rows:


Unnamed: 0,iso3_code,country_name,latitude,longitude,hemisphere
0,USA,United States,39.8283,-98.5795,Northern
1,CAN,Canada,56.1304,-106.3468,Northern
2,MEX,Mexico,23.6345,-102.5528,Northern
3,BRA,Brazil,-14.235,-51.9253,Southern
4,ARG,Argentina,-38.4161,-63.6167,Southern


In [7]:
# ============================================
# OPEN-METEO API EXTRACTION FUNCTION
# ============================================

BASE_URL = "https://archive-api.open-meteo.com/v1/archive"

# Variables required 
WEATHER_VARS = [
    "temperature_2m_mean",
    "temperature_2m_max",
    "temperature_2m_min",
    "precipitation_sum",
    "rain_sum",
    "et0_fao_evapotranspiration"
]

START_DATE = "2015-01-01"
END_DATE = "2023-12-31"


def fetch_weather_data(lat, lon, start_date, end_date, max_retries=3):
    """
    Fetch historical daily weather data from Open-Meteo API.
    Implements retry logic and error handling (lecture style).
    """
    
    params = {
        "latitude": lat,
        "longitude": lon,
        "start_date": start_date,
        "end_date": end_date,
        "daily": ",".join(WEATHER_VARS),
        "timezone": "UTC"
    }
    
    attempt = 0
    
    while attempt < max_retries:
        try:
            response = requests.get(BASE_URL, params=params)
            
            print(f"Request URL: {response.url}")
            print(f"Status Code: {response.status_code}")
            
            if response.status_code == 200:
                return response.json()
            
            else:
                print(f"API error (attempt {attempt+1}): {response.status_code}")
        
        except Exception as e:
            print(f"Request failed (attempt {attempt+1}): {e}")
        
        # Exponential backoff
        wait_time = 2 ** attempt
        print(f"Retrying after {wait_time} seconds...")
        time.sleep(wait_time)
        
        attempt += 1
    
    print("Max retries reached. Returning None.")
    return None


In [8]:
# ============================================
# TESTING API CALL WITH ONE COUNTRY
# ============================================

# Pick first country (USA)
test_row = df_centroids_raw.iloc[0]

test_lat = test_row['latitude']
test_lon = test_row['longitude']
test_country = test_row['iso3_code']

print(f"Testing API for country: {test_country}")

test_data = fetch_weather_data(
    lat=test_lat,
    lon=test_lon,
    start_date=START_DATE,
    end_date=END_DATE
)

# Inspect structure
if test_data:
    print("\nTop-level keys:")
    print(test_data.keys())
    
    print("\nDaily data keys:")
    print(test_data['daily'].keys())
    
    print("\nFirst 5 dates:")
    print(test_data['daily']['time'][:5])


Testing API for country: USA
Request URL: https://archive-api.open-meteo.com/v1/archive?latitude=39.8283&longitude=-98.5795&start_date=2015-01-01&end_date=2023-12-31&daily=temperature_2m_mean%2Ctemperature_2m_max%2Ctemperature_2m_min%2Cprecipitation_sum%2Crain_sum%2Cet0_fao_evapotranspiration&timezone=UTC
Status Code: 429
API error (attempt 1): 429
Retrying after 1 seconds...
Request URL: https://archive-api.open-meteo.com/v1/archive?latitude=39.8283&longitude=-98.5795&start_date=2015-01-01&end_date=2023-12-31&daily=temperature_2m_mean%2Ctemperature_2m_max%2Ctemperature_2m_min%2Cprecipitation_sum%2Crain_sum%2Cet0_fao_evapotranspiration&timezone=UTC
Status Code: 429
API error (attempt 2): 429
Retrying after 2 seconds...
Request URL: https://archive-api.open-meteo.com/v1/archive?latitude=39.8283&longitude=-98.5795&start_date=2015-01-01&end_date=2023-12-31&daily=temperature_2m_mean%2Ctemperature_2m_max%2Ctemperature_2m_min%2Cprecipitation_sum%2Crain_sum%2Cet0_fao_evapotranspiration&timezo

In [16]:
# ============================================
# FLATTEN WEATHER API JSON INTO TABULAR FORMAT
# ============================================
def flatten_weather_json(weather_json, iso3_code, verbose=False):
    """
    Convert API JSON daily data into tabular format.
    Adds country code and metadata.
    """

    # Extract daily section
    daily_data = weather_json['daily']

    # Build dataframe
    df_raw = pd.DataFrame({
        "iso3_code": iso3_code,
        "date": daily_data['time'],
        "temperature_mean": daily_data['temperature_2m_mean'],
        "temperature_max": daily_data['temperature_2m_max'],
        "temperature_min": daily_data['temperature_2m_min'],
        "precipitation": daily_data['precipitation_sum'],
        "rain": daily_data['rain_sum'],
        "evapotranspiration": daily_data['et0_fao_evapotranspiration']
    })

    # Add metadata
    df_raw['source'] = 'Open-Meteo API'
    df_raw['extracted_at'] = datetime.now()

    if verbose:
        print(f"Created raw dataframe with {len(df_raw)} rows for {iso3_code}")

    return df_raw


In [None]:
# ============================================
# TEST FLATTENING FUNCTION
# ============================================

df_raw_test = flatten_weather_json(test_data, test_country, verbose=True)

print("\nRaw dataframe shape:")
print(df_raw_test.shape)

print("\nFirst 5 rows:")
df_raw_test.head()


In [19]:
# ============================================
# EXTRACT WEATHER DATA FOR ALL COUNTRIES 
# ============================================

all_raw_data = []

start_time = datetime.now()

for idx, row in df_centroids_raw.iterrows():
    
    iso3 = row['iso3_code']
    lat = row['latitude']
    lon = row['longitude']

    # Fetch weather data
    weather_json = fetch_weather_data(
        lat=lat,
        lon=lon,
        start_date=START_DATE,
        end_date=END_DATE
    )
    
    if weather_json is not None:
        df_raw_country = flatten_weather_json(weather_json, iso3)
        all_raw_data.append(df_raw_country)

    # Rate limiting (5 seconds)
    time.sleep(5)

end_time = datetime.now()
print("ETL extraction completed.")
print(f"Countries processed: {len(all_raw_data)}")
print(f"Total runtime: {end_time - start_time}")


Request URL: https://archive-api.open-meteo.com/v1/archive?latitude=39.8283&longitude=-98.5795&start_date=2015-01-01&end_date=2023-12-31&daily=temperature_2m_mean%2Ctemperature_2m_max%2Ctemperature_2m_min%2Cprecipitation_sum%2Crain_sum%2Cet0_fao_evapotranspiration&timezone=UTC
Status Code: 200
Request URL: https://archive-api.open-meteo.com/v1/archive?latitude=56.1304&longitude=-106.3468&start_date=2015-01-01&end_date=2023-12-31&daily=temperature_2m_mean%2Ctemperature_2m_max%2Ctemperature_2m_min%2Cprecipitation_sum%2Crain_sum%2Cet0_fao_evapotranspiration&timezone=UTC
Status Code: 200
Request URL: https://archive-api.open-meteo.com/v1/archive?latitude=23.6345&longitude=-102.5528&start_date=2015-01-01&end_date=2023-12-31&daily=temperature_2m_mean%2Ctemperature_2m_max%2Ctemperature_2m_min%2Cprecipitation_sum%2Crain_sum%2Cet0_fao_evapotranspiration&timezone=UTC
Status Code: 200
Request URL: https://archive-api.open-meteo.com/v1/archive?latitude=-14.235&longitude=-51.9253&start_date=2015-01

In [20]:
# ============================================
# COMBINE ALL RAW DATA
# ============================================

df_raw_all = pd.concat(all_raw_data, ignore_index=True)

print("\nCombined raw dataset shape:")
print(df_raw_all.shape)

print("\nSample rows:")
df_raw_all.head()



Combined raw dataset shape:
(13148, 10)

Sample rows:


Unnamed: 0,iso3_code,date,temperature_mean,temperature_max,temperature_min,precipitation,rain,evapotranspiration,source,extracted_at
0,USA,2015-01-01,-7.7,0.6,-12.1,0.0,0.0,1.1,Open-Meteo API,2026-01-29 15:03:53.931187
1,USA,2015-01-02,-4.3,2.0,-8.8,0.0,0.0,1.11,Open-Meteo API,2026-01-29 15:03:53.931187
2,USA,2015-01-03,-2.5,4.0,-5.3,0.3,0.0,0.97,Open-Meteo API,2026-01-29 15:03:53.931187
3,USA,2015-01-04,-12.2,-6.6,-16.3,4.1,0.0,0.84,Open-Meteo API,2026-01-29 15:03:53.931187
4,USA,2015-01-05,-10.0,-3.8,-13.6,0.0,0.0,0.76,Open-Meteo API,2026-01-29 15:03:53.931187


In [21]:
# ============================================
#  CLEANED LAYER - TYPE CONVERSIONS
# ============================================

df_clean = df_raw_all.copy()

# Convert date column
df_clean['date'] = pd.to_datetime(df_clean['date'])

# Convert numeric columns
numeric_cols = [
    'temperature_mean',
    'temperature_max',
    'temperature_min',
    'precipitation',
    'rain',
    'evapotranspiration'
]

for col in numeric_cols:
    df_clean[col] = pd.to_numeric(df_clean[col], errors='coerce')

# Derive year and month
df_clean['year'] = df_clean['date'].dt.year
df_clean['month'] = df_clean['date'].dt.month

print("Type conversion completed!")

print("\nData types:")
print(df_clean.dtypes)


Type conversion completed!

Data types:
iso3_code                     object
date                  datetime64[ns]
temperature_mean             float64
temperature_max              float64
temperature_min              float64
precipitation                float64
rain                         float64
evapotranspiration           float64
source                        object
extracted_at          datetime64[us]
year                           int32
month                          int32
dtype: object


In [22]:
# ============================================
# NULL VALUE CHECKS
# ============================================

print("\nNull counts per column:")
print(df_clean.isnull().sum())



Null counts per column:
iso3_code             0
date                  0
temperature_mean      0
temperature_max       0
temperature_min       0
precipitation         0
rain                  0
evapotranspiration    0
source                0
extracted_at          0
year                  0
month                 0
dtype: int64


In [23]:
# ============================================
# BASIC RANGE VALIDATION
# ============================================

print("\nTemperature range:")
print(df_clean[['temperature_min', 'temperature_mean', 'temperature_max']].describe())

print("\nPrecipitation range:")
print(df_clean['precipitation'].describe())

print("\nEvapotranspiration range:")
print(df_clean['evapotranspiration'].describe())



Temperature range:
       temperature_min  temperature_mean  temperature_max
count     13148.000000      13148.000000     13148.000000
mean          9.346288         14.456678        20.615082
std          12.890046         13.379046        14.031799
min         -42.500000        -37.900000       -33.900000
25%           2.400000          7.700000        14.100000
50%          12.400000         18.400000        25.200000
75%          19.700000         24.900000        30.900000
max          27.900000         33.300000        40.800000

Precipitation range:
count    13148.000000
mean         1.845246
std          4.988044
min          0.000000
25%          0.000000
50%          0.000000
75%          1.000000
max        125.600000
Name: precipitation, dtype: float64

Evapotranspiration range:
count    13148.000000
mean         3.795084
std          2.148020
min          0.030000
25%          2.090000
50%          4.040000
75%          5.250000
max         11.010000
Name: evapotranspirat

In [24]:
# ============================================
# CHECK EXISTING TABLES IN SQL DATABASE
# ============================================

query = "SHOW TABLES;"

df_tables = pd.read_sql(query, engine)

print("Tables in database:")
print(df_tables)


Tables in database:
              Tables_in_mfre521d
0                   AirQuality_2
1           air_quality_readings
2   climate_agriculture_analysis
3                      countries
4           country_name_mapping
5                crop_production
6                          crops
7                  daily_summary
8                  daily_weather
9                 food_nutrition
10               monthly_summary
11          pollution_thresholds
12                   sensor_info
13         temperature_anomalies
14          temperature_readings
15            validated_readings
16              weather_stations


In [27]:
# ============================================
# CREATE DAILY_WEATHER TABLE
# ============================================

create_daily_weather_table = """
CREATE TABLE IF NOT EXISTS daily_weather (
    iso3_code VARCHAR(3) NOT NULL,
    date DATE NOT NULL,

    temperature_mean FLOAT,
    temperature_max FLOAT,
    temperature_min FLOAT,
    precipitation FLOAT,
    rain FLOAT,
    evapotranspiration FLOAT,

    source VARCHAR(50),
    extracted_at DATETIME,

    year INT,
    month INT,

    -- Primary Key for idempotency
    PRIMARY KEY (iso3_code, date),

    -- Basic validation constraints
    CHECK (year BETWEEN 2015 AND 2023),
    CHECK (month BETWEEN 1 AND 12),
    CHECK (precipitation >= 0),
    CHECK (rain >= 0),
    CHECK (evapotranspiration >= 0)
);
"""

with engine.connect() as connection:
    connection.execute(text(create_daily_weather_table))
    connection.commit()

print("daily_weather table created successfully!")


daily_weather table created successfully!


In [28]:
# ============================================
# UPSERT QUERY FOR DAILY_WEATHER
# ============================================

insert_daily_weather = """
INSERT INTO daily_weather (
    iso3_code,
    date,
    temperature_mean,
    temperature_max,
    temperature_min,
    precipitation,
    rain,
    evapotranspiration,
    source,
    extracted_at,
    year,
    month
)
VALUES (
    :iso3_code,
    :date,
    :temperature_mean,
    :temperature_max,
    :temperature_min,
    :precipitation,
    :rain,
    :evapotranspiration,
    :source,
    :extracted_at,
    :year,
    :month
)
ON DUPLICATE KEY UPDATE
    temperature_mean = VALUES(temperature_mean),
    temperature_max = VALUES(temperature_max),
    temperature_min = VALUES(temperature_min),
    precipitation = VALUES(precipitation),
    rain = VALUES(rain),
    evapotranspiration = VALUES(evapotranspiration),
    source = VALUES(source),
    extracted_at = VALUES(extracted_at),
    year = VALUES(year),
    month = VALUES(month);
"""


In [29]:
# ============================================
# LOAD CLEANED DATA INTO DATABASE 
# ============================================

batch_size = 1000
total_rows = len(df_clean)

print(f"Loading {total_rows} rows into daily_weather...")

with engine.connect() as connection:

    for start in range(0, total_rows, batch_size):
        end = start + batch_size

        batch = df_clean.iloc[start:end].to_dict(orient='records')

        connection.execute(text(insert_daily_weather), batch)
        connection.commit()

print("Data loading completed successfully!")


Loading 13148 rows into daily_weather...
Data loading completed successfully!


In [30]:
# ============================================
# VERIFY LOAD
# ============================================

check_query = "SELECT COUNT(*) AS row_count FROM daily_weather;"

df_check = pd.read_sql(check_query, engine)

print(df_check)


   row_count
0      98610


In [31]:
# ============================================
#  DAILY_WEATHER TABLE PREVIEW
# ============================================

preview_query = "SELECT * FROM daily_weather LIMIT 5;"

df_preview = pd.read_sql(preview_query, engine)

df_preview


Unnamed: 0,iso3_code,date,temperature_mean,temperature_max,temperature_min,precipitation,rain,evapotranspiration,source,extracted_at,year,month
0,ARG,2015-01-01,17.7,20.7,14.3,0.0,0.0,5.86,Open-Meteo API,2026-01-29 13:10:55,2015,1
1,ARG,2015-01-02,17.4,24.0,10.8,0.0,0.0,6.11,Open-Meteo API,2026-01-29 13:10:55,2015,1
2,ARG,2015-01-03,21.5,28.5,16.2,0.0,0.0,7.98,Open-Meteo API,2026-01-29 13:10:55,2015,1
3,ARG,2015-01-04,20.0,30.3,10.9,0.0,0.0,7.71,Open-Meteo API,2026-01-29 13:10:55,2015,1
4,ARG,2015-01-05,28.1,36.4,21.9,0.3,0.3,9.25,Open-Meteo API,2026-01-29 13:10:55,2015,1


## Task 3: Data Aggregation and Integration