<a href="https://colab.research.google.com/github/StefanoGenettiUniTN/appa-chinquinaria/blob/MeteoTrentino/data_meteo_trentino.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exploring Meteo Trentino Data
**Date:** 2025-10-11
In this section we try to merge data from APPA with data from Meteo Trentino. First of all we download data from bollettino APPA from 2023-01-01 to 2023-03-01 which is the interval we usually refer to for these preliminary experiments.

In [92]:
%%shell
if [ ! -f "appa-export.csv" ]; then
  gdown -q 1JDDMzu7Jo1polnxJ249Tthim5MaVdTpb -O appa-export.csv
fi



In [93]:
# get metadata from google drive: csv file with metadata (such as latitude and longitude) about APPA station https://drive.google.com/file/d/17OkT0e9QNh2AuWrcMt8IgOEd4r9jQaMI/view?usp=drive_link
%%shell
if [ ! -f "metadata-appa.csv" ]; then
  gdown -q 17OkT0e9QNh2AuWrcMt8IgOEd4r9jQaMI -O metadata-appa.csv
fi



In [94]:
%%shell
if [ ! -f "meteostations.zip" ]; then
  gdown -q 1qIzTiHfZj2qC3Zp21NrmvDLIesqCogs3 -O meteostations.zip
fi

unzip -o meteostations.zip -d meteostations

Archive:  meteostations.zip
  inflating: meteostations/historic_station_meteo_trentino_sample/T0368.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0454.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0038.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0129.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0409.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0443.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0408.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0148.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0135.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0210.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0425.csv  
  inflating: meteostations/historic_station_meteo_trentino_sample/T0101.csv  
  inflating: meteostations/historic_



In [119]:
import os
import pandas as pd
import requests
import geopandas as gpd
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import networkx as nx
import zipfile
import io
import os
import json
import numpy as np
import folium
from ipywidgets import interact, widgets
from typing import List, Tuple, Dict, Any, Optional, Set
from datetime import datetime, date

## APPA sample data and meteo stat geo metadata
- Cleaning APPA sample data
- Fetching meteo stations coordinates from Meteo Trenino endpoint with REST API
- Coupling of APPA stations with Meteo Stations to build one comprehensive record type per area.

In [96]:
# APPA data
appa_csv = "appa-export.csv"
if os.path.exists(appa_csv):
    df_appa = pd.read_csv(appa_csv, encoding="latin1")
    print("Successfully loaded:", appa_csv)

# read APPA metadata
metadata_appa_csv = "metadata-appa.csv"
if os.path.exists(metadata_appa_csv):
  df_metadata_appa = pd.read_csv(metadata_appa_csv)
  print("Successfully loaded:", metadata_appa_csv)

# mapping nome stazione between APPA data and APPA metadata (key name in APPA metadata, value corresponding name in APPA data)
mapping_appa_metadata_station_name: Dict[str, str] = {
    "TRENTO PSC": "Parco S. Chiara",
    "TRENTO VBZ": "Via Bolzano",
    "PIANA ROTALIANA": "Piana Rotaliana",
    "ROVERETO LGP": "Rovereto",
    "BORGO VAL": "Borgo Valsugana",
    "RIVA GAR": "Riva del Garda",
    "AVIO A22": "A22 (Avio)",
    "MONTE GAZA": "Monte Gaza"
}
# rename Nome stazione to Stazione and replace station names according to APPA csv
df_metadata_appa.rename(columns={"Nome stazione": "Stazione"}, inplace=True)
df_metadata_appa["Stazione"] = df_metadata_appa["Stazione"].replace(mapping_appa_metadata_station_name)

# merge the eea data with the metadata vocabularies
df_appa = pd.merge(df_appa, df_metadata_appa, on="Stazione", how="left", indicator=True)
df_unmatched = df_appa[df_appa["_merge"] == "left_only"]
print(f"Unmatched rows saved as unmatched_measurements.csv (total: {len(df_unmatched)})")

# decrease each "Ora" of APPA dataframe by 1
df_appa["Ora"] = df_appa["Ora"] - 1

# convert "Valore" from int to float
df_appa['Valore'] = pd.to_numeric(df_appa['Valore'], errors='coerce').astype(float)

# replace each "Unità di misura" value with 'ug.m-3'
df_appa["Unità di misura"] = "ug.m-3"

# drop some useless coulumns of APPA dataframe
df_appa.drop(columns=["_merge", "EU - codice europeo", "Località", "Zona", "Tipologia", "IT - codice italiano", "Dati stazione", "Indirizzo"], inplace=True)

# separate coordinates in Posizione column of APPA dataframe into two new columns Latitudine and Longitudine
df_appa[['Latitudine', 'Longitudine']] = df_appa['Posizione'].str.split(',', expand=True)
df_appa['Latitudine'] = df_appa['Latitudine'].str.strip().astype(float)
df_appa['Longitudine'] = df_appa['Longitudine'].str.strip().astype(float)
df_appa = df_appa.drop(columns=['Posizione'])

# add column Nazione and column Comune always equal to Italy and APPA
df_appa['Nazione'] = 'Italy'
df_appa['Comune'] = 'APPA'

df_appa

Successfully loaded: appa-export.csv
Successfully loaded: metadata-appa.csv
Unmatched rows saved as unmatched_measurements.csv (total: 0)


Unnamed: 0,Stazione,Inquinante,Data,Ora,Valore,Unità di misura,Latitudine,Longitudine,Nazione,Comune
0,Parco S. Chiara,PM10,2023-01-01,0,54.0,ug.m-3,46.06292,11.12620,Italy,APPA
1,Parco S. Chiara,PM10,2023-01-01,1,69.0,ug.m-3,46.06292,11.12620,Italy,APPA
2,Parco S. Chiara,PM10,2023-01-01,2,66.0,ug.m-3,46.06292,11.12620,Italy,APPA
3,Parco S. Chiara,PM10,2023-01-01,3,65.0,ug.m-3,46.06292,11.12620,Italy,APPA
4,Parco S. Chiara,PM10,2023-01-01,4,49.0,ug.m-3,46.06292,11.12620,Italy,APPA
...,...,...,...,...,...,...,...,...,...,...
38777,Monte Gaza,Ozono,2023-03-01,19,81.0,ug.m-3,46.08253,10.95804,Italy,APPA
38778,Monte Gaza,Ozono,2023-03-01,20,78.0,ug.m-3,46.08253,10.95804,Italy,APPA
38779,Monte Gaza,Ozono,2023-03-01,21,81.0,ug.m-3,46.08253,10.95804,Italy,APPA
38780,Monte Gaza,Ozono,2023-03-01,22,81.0,ug.m-3,46.08253,10.95804,Italy,APPA


At this point we have properly downloaded data from APPA merging the metadata. Now we take the metadata of the Meteotrentino weather stations in order to associate to each APPA station the nearest weather stations according to Latitude and Longitude. To calculate distances between latitude/longitude points we use the Haverstine formula.

In [97]:
# GET METEOTRENTINO WEATHER STATION METADATA ===================================
url: str = "https://dati.meteotrentino.it/service.asmx/listaStazioniJson" # https://dati.meteotrentino.it/

response: requests.Response = requests.post(url)

if response.status_code == 200:
    try:
        meteo_trentino_stations: Dict[str, Any] = response.json()  # Attempt to parse JSON

        # remove stations which no longer exist
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0298"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0222"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0137"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0189"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0200"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0154"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0186"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0392"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0424"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0405"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0432"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0014"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0015"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0454"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0408"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0101"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('codice') != "T0212"]
        meteo_trentino_stations['sites'] = [site for site in meteo_trentino_stations['sites'] if site.get('fine') == '']

        pd_meteo_trentino_stations: pd.DataFrame = pd.DataFrame(meteo_trentino_stations["sites"])
        print("Successfully loaded MeteoTrentino listaStazioniJson")
    except ValueError:
        print("Response is not in JSON format. Raw text:")
        print(response.text)
else:
    print(f"Request failed with status code: {response.status_code}")
# ==============================================================================

# Haversine function to calculate distance in km ===============================
def haversine(lat1: float, lon1: float, lat2: np.ndarray, lon2: np.ndarray) -> np.ndarray:
    R: float = 6371.0  # Earth radius in km
    lat1_rad: float = np.radians(lat1)
    lon1_rad: float = np.radians(lon1)
    lat2_rad: np.ndarray = np.radians(lat2)
    lon2_rad: np.ndarray = np.radians(lon2)

    dlat: np.ndarray = lat2_rad - lat1_rad
    dlon: np.ndarray = lon2_rad - lon1_rad

    a: np.ndarray = np.sin(dlat / 2.0) ** 2 + np.cos(lat1_rad) * np.cos(lat2_rad) * np.sin(dlon / 2.0) ** 2
    c: np.ndarray = 2 * np.arcsin(np.sqrt(a))
    return R * c

# Function to find nearest station codice ======================================
def nearest_station(lat: float, lon: float, stations_df: pd.DataFrame) -> str:
    distances: np.ndarray = haversine(lat, lon, stations_df['latitudine'], stations_df['longitudine'])
    nearest_idx: int = distances.argmin()
    return stations_df.iloc[nearest_idx]['codice']

# Apply to dataframe
df_appa['nearest_weather_station'] = df_appa.apply(lambda row: nearest_station(row['Latitudine'], row['Longitudine'], pd_meteo_trentino_stations), axis=1)

Successfully loaded MeteoTrentino listaStazioniJson


In [98]:
df_appa

Unnamed: 0,Stazione,Inquinante,Data,Ora,Valore,Unità di misura,Latitudine,Longitudine,Nazione,Comune,nearest_weather_station
0,Parco S. Chiara,PM10,2023-01-01,0,54.0,ug.m-3,46.06292,11.12620,Italy,APPA,T0129
1,Parco S. Chiara,PM10,2023-01-01,1,69.0,ug.m-3,46.06292,11.12620,Italy,APPA,T0129
2,Parco S. Chiara,PM10,2023-01-01,2,66.0,ug.m-3,46.06292,11.12620,Italy,APPA,T0129
3,Parco S. Chiara,PM10,2023-01-01,3,65.0,ug.m-3,46.06292,11.12620,Italy,APPA,T0129
4,Parco S. Chiara,PM10,2023-01-01,4,49.0,ug.m-3,46.06292,11.12620,Italy,APPA,T0129
...,...,...,...,...,...,...,...,...,...,...,...
38777,Monte Gaza,Ozono,2023-03-01,19,81.0,ug.m-3,46.08253,10.95804,Italy,APPA,T0414
38778,Monte Gaza,Ozono,2023-03-01,20,78.0,ug.m-3,46.08253,10.95804,Italy,APPA,T0414
38779,Monte Gaza,Ozono,2023-03-01,21,81.0,ug.m-3,46.08253,10.95804,Italy,APPA,T0414
38780,Monte Gaza,Ozono,2023-03-01,22,81.0,ug.m-3,46.08253,10.95804,Italy,APPA,T0414


In [99]:
# get the unique set of nearest stations
unique_nearest_stations: Set[str] = df_appa['nearest_weather_station'].unique()
unique_nearest_stations

array(['T0129', 'T0135', 'T0118', 'T0147', 'T0010', 'T0193', 'T0153',
       'T0414'], dtype=object)

In this portion we try to display this data.

In [100]:
# FUNCTION TO GET STATION'S COORDINATES GIVEN STATION STR ID====================
def get_coordinates(meteo_trentino_stations: Dict[str, Any], codice: str) -> Optional[Tuple[float, float]]:
    """
    Given the JSON data and a codice (station code),
    returns (latitudine, longitudine) as a tuple.
    Returns None if codice not found.
    """
    sites: List[Dict[str, Any]] = meteo_trentino_stations.get("sites", [])
    for site in sites:
        codice_weather_station: Optional[str] = site.get("codice")
        if codice_weather_station == codice:
            lat: Optional[float] = site.get("latitudine")
            lon: Optional[float] = site.get("longitudine")
            return lat, lon
    return None  # codice not found
#===============================================================================

# Create a base map centered roughly in Trentino
trentino_map: folium.Map = folium.Map(location=[46.0, 11.0], zoom_start=9)

# Keep track of which stations we've already plotted
plotted_stations: Set[str] = set()
plotted_weather_stations: Set[str] = set()

# Iterate over the dataframe
for _, row in df_appa.iterrows():
    stazione_name: str = row['Stazione']
    stazione_coords: Tuple[float, float] = (row['Latitudine'], row['Longitudine'])
    meteo_code: str = row['nearest_weather_station']

    # Only plot each air quality station once
    if stazione_name not in plotted_stations:
        # Add air quality station marker
        folium.CircleMarker(
            location=stazione_coords,
            radius=6,
            color='blue',
            fill=True,
            fill_color='blue',
            fill_opacity=0.7,
            popup=f"Stazione: {stazione_name}"
        ).add_to(trentino_map)
        plotted_stations.add(stazione_name)

    # Get weather station coordinates
    if meteo_code not in plotted_weather_stations:
      plotted_weather_stations.add(meteo_code)
      meteo_coords = get_coordinates(meteo_trentino_stations, meteo_code)
      print(f"meteo_code: {meteo_code} ; meteo_coords: {meteo_coords}")
      if meteo_coords:
          # Add weather station marker
          folium.Marker(
              location=meteo_coords,
              icon=folium.Icon(color='green', icon='cloud'),
              popup=f"Meteo Station: {meteo_code}"
          ).add_to(trentino_map)
      else:
          print(f"Warning: Meteo Station coordinates not found for {meteo_code}")

# Display the map in Jupyter
trentino_map

meteo_code: T0129 ; meteo_coords: (46.071801, 11.135703)
meteo_code: T0135 ; meteo_coords: (46.095645, 11.10137)
meteo_code: T0118 ; meteo_coords: (46.170372, 11.217711)
meteo_code: T0147 ; meteo_coords: (45.896325, 11.043987)
meteo_code: T0010 ; meteo_coords: (46.010558, 11.305143)
meteo_code: T0193 ; meteo_coords: (45.870095, 10.877355)
meteo_code: T0153 ; meteo_coords: (45.73919, 11.06545)
meteo_code: T0414 ; meteo_coords: (46.066311, 10.9113)


In [101]:
df_appa = df_appa.rename(columns = {"nearest_weather_station": "Station_ID"})

## Final Merge of APPA data with meteo historic meteo measurements

### ---Functions---

In [102]:
def data_loading(
    df_name:str,
    path:str = "/content/meteostations/historic_station_meteo_trentino_sample/"
) -> pd.DataFrame:
  """
  Data loading function to dynamically load raw weather
  data to pd.DataFrame Object

  Parameters:
  -----------
  df_name: Name/ID of the station
  path: file path

  Returns:
  --------
  df loaded
  """
  file_path = path + df_name
  # Read the header of the file to determine where to
  # trim the columns header
  with open(file_path, 'r', encoding='latin-1') as f:
    for i in range(3):
      line = f.readline()
      if i == 2:
        usecols = len(line.strip().split(','))
  # Pandas csv parser, dynamic
  df = pd.read_csv(
      file_path,
      skiprows=2,
      header=0,
      sep=",",
      decimal=".",        # Symbol to assing decial point
      encoding='latin-1',
      usecols=range(usecols)   # Select the correct columns
  )
  # Skipping first row
  df = df.iloc[1:].copy()
  # Converting each numeric value to numeric type
  for column in df.columns[1:]:
    df[column] = pd.to_numeric(df[column], errors='coerce')

  return df

In [103]:
def standardize_weather_data(
    df,
    station_name,
    variable_map=None
) -> Tuple[pd.DataFrame, Dict[str, List[str]]]:
  """
  Standardize weeather station data by cleaning columns and parsing dates.

  Parameters:
  -----------
  df : Raw dataframe from weather station
  station_name : ID of the station (e.g., 'T0135')
  variable_map : Dictionary to update with station variables.
                 If None, creates new dict.

  Returns:
  --------
  cleaned_df, updated_variable_map
  """
  if variable_map is None:
    variable_map = {}
  # We make a copy to avoid damage
  df_clean = df.copy()
  # Drop "Unnamed" columns
  unnamed_columns = [col for col in df_clean.columns if 'Unnamed' in str(col)]
  df_clean = df_clean.drop(columns=unnamed_columns)
  # Add Station Identifuier
  df_clean['Station_ID'] = station_name
  # Update Variable map
  variable_cols = [
      col for col in df_clean.columns
      if col not in ['Date', 'Station_ID']
  ]
  variable_map[station_name] = variable_cols
  # Final data
  df_clean = df_clean[['Date'] + ['Station_ID'] + variable_cols]
  return df_clean, variable_map

### ---Sample of Usage---

In [104]:
# Dict comprehension to build python dict for each dfs
dfs = {
  df_name.replace('.csv', ''): data_loading(df_name)
  for df_name in os.listdir("/content/meteostations/historic_station_meteo_trentino_sample/")
  if df_name.endswith('.csv')
}
# - cleaned_dfs: dict collection of dfs
# - variable_map: map of the variables for each df
variable_map = {}
cleaned_dfs = {}
for df_name, df in dfs.items():
    cleaned_df, variable_map = standardize_weather_data(df, df_name, variable_map)
    cleaned_dfs[df_name] = cleaned_df
# Concatenate all cleaned dataframes into a single dataframe
all_cleaned_df = pd.concat(cleaned_dfs.values(), ignore_index=True)

### ---Final Merge---

In [109]:
all_cleaned_df = all_cleaned_df.rename(columns={"Date": "Data"})
all_cleaned_df["Data"] = all_cleaned_df["Data"].str.split().str[-1].astype(str)
df_appa["Data"] = df_appa["Data"].astype(str)
all_cleaned_df["Data"] = all_cleaned_df["Data"].astype(str).str.extract(r"(\d{2}/\d{2}/\d{4})")[0]
df_appa["Data"] = pd.to_datetime(df_appa["Data"]).dt.strftime("%d/%m/%Y")

ValueError: time data "13/01/2023" doesn't match format "%m/%d/%Y", at position 12. You might want to try:
    - passing `format` if your strings have a consistent format;
    - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
    - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [110]:
final_df = pd.merge(df_appa, all_cleaned_df, on=['Station_ID', 'Data'], how='inner')
final_df = final_df.rename(columns={"Station_ID": "StazioneMeteo"})

In [117]:
final_df

Unnamed: 0,Stazione,Inquinante,Data,Ora,Valore,Unità di misura,Latitudine,Longitudine,Nazione,Comune,...,Pioggia (mm),Pioggia (mm).1,Temp. aria (°C),Temp. aria (°C).1,Temp. aria (°C).2,Umid.relat. aria (%),Pressione atm. (hPa),Dir. Vento (°),Vel. Vento (m/s),Rad.Sol.Tot. (kJ/m2)
0,Parco S. Chiara,PM10,01/01/2023,0,54.00,ug.m-3,46.06292,11.12620,Italy,APPA,...,0.0,0.0,8.0,6.3,11.4,80.3,992.9,15.0,1.1,1738.6
1,Parco S. Chiara,PM10,01/01/2023,1,69.00,ug.m-3,46.06292,11.12620,Italy,APPA,...,0.0,0.0,8.0,6.3,11.4,80.3,992.9,15.0,1.1,1738.6
2,Parco S. Chiara,PM10,01/01/2023,2,66.00,ug.m-3,46.06292,11.12620,Italy,APPA,...,0.0,0.0,8.0,6.3,11.4,80.3,992.9,15.0,1.1,1738.6
3,Parco S. Chiara,PM10,01/01/2023,3,65.00,ug.m-3,46.06292,11.12620,Italy,APPA,...,0.0,0.0,8.0,6.3,11.4,80.3,992.9,15.0,1.1,1738.6
4,Parco S. Chiara,PM10,01/01/2023,4,49.00,ug.m-3,46.06292,11.12620,Italy,APPA,...,0.0,0.0,8.0,6.3,11.4,80.3,992.9,15.0,1.1,1738.6
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
20246,A22 (Avio),Ossido di Carbonio,01/03/2023,19,0.46,ug.m-3,45.74215,10.97043,Italy,APPA,...,4.6,2.9,2.9,-0.6,7.3,74.7,935.5,44.0,0.9,4719.1
20247,A22 (Avio),Ossido di Carbonio,01/03/2023,20,0.52,ug.m-3,45.74215,10.97043,Italy,APPA,...,4.6,2.9,2.9,-0.6,7.3,74.7,935.5,44.0,0.9,4719.1
20248,A22 (Avio),Ossido di Carbonio,01/03/2023,21,0.54,ug.m-3,45.74215,10.97043,Italy,APPA,...,4.6,2.9,2.9,-0.6,7.3,74.7,935.5,44.0,0.9,4719.1
20249,A22 (Avio),Ossido di Carbonio,01/03/2023,22,0.55,ug.m-3,45.74215,10.97043,Italy,APPA,...,4.6,2.9,2.9,-0.6,7.3,74.7,935.5,44.0,0.9,4719.1



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.




Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `y` variable to `hue` and set `legend=False` for the same effect.





In [118]:
# Create output folder if it doesn't exist
output_folder = "output"
if not os.path.exists(output_folder):
    os.makedirs(output_folder)
# Save the final DataFrame to a CSV file in the output folder
output_path = os.path.join(output_folder, "final_sample_data.csv")
final_df.to_csv(output_path, index=False)
print(f"Final DataFrame saved to {output_path}")

Final DataFrame saved to output/final_sample_data.csv
