## Data Acquisition : Historical Parisian Meteorological Archive

# Project Context and Objectives

This notebook initiates an academic study into the forecasting of 2-meter air temperature ($temperature\_2m$) in Paris, France. 

The primary objective of this specific notebook is to acquire a high-fidelity, multivariate, and hourly-resolution meteorological dataset. The scope of the dataset will span a multi-decadal period (2000-present), which is essential for capturing a wide range of climatological patterns, seasonal cycles, and anomalous weather events.The methodology utilizes the OpenMeteo Archive API.

This notebook serves as the foundational data collection step to this project. This Jupyter notebook format, as well as for the other notebooks, are intentionally chosen to adhere to the principles of FAIR (Findable, Accessible, Interoperable, and Reproducible) research. Every step is explicitly documented and executed in code to ensure full transparency and reproducibility.

In [1]:
# Environment Initialization
import os
import requests
import pandas as pd
from datetime import datetime

In [2]:
# Geospatial Definition
latitude = 48.8534
longitude = 2.3488

ðŸ’¡ Fun Fact : These coordinates do not represent an arbitrary "city center" but are specifically chosen to align with the location of the Paris Observatory (Observatoire de Paris). This site is one of the world's oldest astronomical observatories and has been a central location for Parisian meteorological records for centuries, making it a scientifically and historically relevant anchor point for our study.

In [3]:
# Temporal Definition
end_date = "2025-10-25" # A dynamic end_date (i.e., 'today') is unsuitable for academic research, 
                        # as it makes the dataset non-static and breaks reproducibility.
                        # Hence, we chose to define a fixed end_date for our study.
start_date = "2000-01-01"

In [4]:
url = "https://archive-api.open-meteo.com/v1/archive" # Data Source
params = {
    "latitude" : latitude,
    "longitude" : longitude, 
    "start_date" : start_date,
    "end_date" : end_date,
    "hourly" : "temperature_2m,relative_humidity_2m,apparent_temperature,dew_point_2m,precipitation,rain,snowfall,snow_depth,wind_speed_100m,wind_speed_10m,wind_direction_10m,wind_direction_100m,wind_gusts_10m,weather_code,pressure_msl,surface_pressure,cloud_cover,cloud_cover_low,cloud_cover_mid,cloud_cover_high,et0_fao_evapotranspiration,vapour_pressure_deficit,soil_temperature_0_to_7cm,soil_temperature_7_to_28cm,soil_temperature_28_to_100cm,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm,wet_bulb_temperature_2m,boundary_layer_height,total_column_integrated_water_vapour,is_day,sunshine_duration",
    "timezone": "Europe/Berlin",
    "timezone_abbreviation": "CEST"
}    # The "hourly" item refers to Feature Selection, i.e the list of meteorological variables to retrieve.

In [5]:
response = requests.get(url, params = params)
Data = response.json()

In [6]:
# Data Structuring: JSON to Time-Series DataFrame
df = pd.DataFrame(Data["hourly"])
df['time'] = pd.to_datetime(df['time']) 

In [7]:
df.head()

Unnamed: 0,time,temperature_2m,relative_humidity_2m,apparent_temperature,dew_point_2m,precipitation,rain,snowfall,snow_depth,wind_speed_100m,...,soil_temperature_100_to_255cm,soil_moisture_0_to_7cm,soil_moisture_7_to_28cm,soil_moisture_28_to_100cm,soil_moisture_100_to_255cm,wet_bulb_temperature_2m,boundary_layer_height,total_column_integrated_water_vapour,is_day,sunshine_duration
0,2000-01-01 00:00:00,6.0,97,4.0,5.5,0.0,0.0,0.0,0.01,14.0,...,10.8,0.414,0.419,0.417,0.359,5.6,115.0,15.6,0,0.0
1,2000-01-01 01:00:00,6.0,96,3.8,5.4,0.0,0.0,0.0,0.01,14.9,...,10.8,0.414,0.419,0.417,0.359,5.5,100.0,16.5,0,0.0
2,2000-01-01 02:00:00,5.9,96,3.8,5.3,0.0,0.0,0.0,0.01,14.7,...,10.8,0.414,0.418,0.417,0.359,5.4,105.0,16.7,0,0.0
3,2000-01-01 03:00:00,6.0,96,3.8,5.3,0.0,0.0,0.0,0.01,14.6,...,10.8,0.414,0.418,0.417,0.359,5.5,110.0,16.9,0,0.0
4,2000-01-01 04:00:00,6.1,96,3.9,5.5,0.0,0.0,0.0,0.01,15.5,...,10.7,0.413,0.418,0.417,0.359,5.6,130.0,17.3,0,0.0


In [8]:
# Data Persistence: Export to CSV
data_folder = "../data"
os.makedirs(data_folder, exist_ok=True)

data_path = os.path.join(data_folder, "hourly_data.csv")
df.to_csv(data_path, index=False)

print(f"Data saved to {data_path}")

Data saved to ../data/hourly_data.csv
