# SABER Temperature Data Processing

This notebook ingests multiple SABER compressed CSV datasets (`.csv.gz`), filters the Mesosphere-Lower Thermosphere (MLT) region, 
and produces aggregated datasets (bronze-level tables) for downstream analysis.

**Pipeline stages:**
1. Load helper functions.
2. Read and combine raw SABER datasets.
3. Apply filters and binning (altitude, latitude, longitude).
4. Generate temporal aggregations:
   - 1D (monthly)
   - 2D (monthly-altitude, with seasonal tags)
   - 4D (monthly-alt-lat-lon grid)
5. Save processed tables to disk.

---
**Input:**  
- `sutherland_timeseries_<year>.csv.gz`

**Output:**  
- `bronze_saber_4d_tempoaral_agg.csv.gz`  
- `bronze_saber_2d_tempoaral_agg.csv.gz`  
- `bronze_saber_1d_tempoaral_agg.csv.gz`
- `bronze_saber_altitude_agg.csv.gz`  
---

In [1]:
import pandas as pd
import numpy as np
from pathlib import Path

import warnings
warnings.filterwarnings('ignore')

In [2]:
# Current working directory (notebook folder)
notebook_path = Path.cwd()  # MLT/code/data_processing/
base_path = notebook_path.parent.parent  # Go up to MLT/

functions_path = notebook_path.parent / "000_Functions.ipynb"
data_path = base_path / "data" / "raw"
bronze_path = base_path / "data" / "bronze"

In [3]:
# Load custom functions
%run "{functions_path.as_posix()}"

In [4]:
# years from 2002 to 2025 (inclusive)
years = range(2002, 2026)

# Read all files into a dictionary of DataFrames
dfs = {}
for year in years:
    try:
        file =   base_path / f"data/raw/sutherland_timeseries_{year}.csv.gz"  # adjust extension if needed
        dfs[year] = pd.read_csv(file)
        print(f"Done with: sutherland_timeseries_{year}.csv.gz")

    except Exception as e:
        print(f"Error reading sutherland_timeseries_{year}.csv.gz: {e}")

Done with: sutherland_timeseries_2002.csv.gz
Done with: sutherland_timeseries_2003.csv.gz
Done with: sutherland_timeseries_2004.csv.gz
Done with: sutherland_timeseries_2005.csv.gz
Done with: sutherland_timeseries_2006.csv.gz
Done with: sutherland_timeseries_2007.csv.gz
Done with: sutherland_timeseries_2008.csv.gz
Done with: sutherland_timeseries_2009.csv.gz
Done with: sutherland_timeseries_2010.csv.gz
Done with: sutherland_timeseries_2011.csv.gz
Done with: sutherland_timeseries_2012.csv.gz
Done with: sutherland_timeseries_2013.csv.gz
Done with: sutherland_timeseries_2014.csv.gz
Done with: sutherland_timeseries_2015.csv.gz
Done with: sutherland_timeseries_2016.csv.gz
Done with: sutherland_timeseries_2017.csv.gz
Done with: sutherland_timeseries_2018.csv.gz
Done with: sutherland_timeseries_2019.csv.gz
Done with: sutherland_timeseries_2020.csv.gz
Done with: sutherland_timeseries_2021.csv.gz
Done with: sutherland_timeseries_2022.csv.gz
Done with: sutherland_timeseries_2023.csv.gz
Done with:

In [5]:
df = pd.concat(dfs, ignore_index=True)

In [6]:
# Filter for MLT region (50 km to 110 km)
df = df[(df['tpaltitude'] >= 50) & (df['tpaltitude'] <= 110)]

In [7]:
# Drop null dates
df = df[df['date'].notnull()]
# Convert date into datetime type
df['date'] = pd.to_datetime(
    df['date'],
    errors='coerce'  # <-- just in case
)

# Create 'year_month' for plotting
df['year_month'] = df['date'].dt.to_period('M')

In [8]:
# Create 1 km bins
bins = np.arange(50, 132, 1)
labels = bins[:-1]
df['altitude_bin'] = pd.cut(df['tpaltitude'], bins=bins, labels=labels, right=False)
df = df.dropna(subset=['altitude_bin'])
df['altitude_bin'] = df['altitude_bin'].astype(int)

# Create 1 degree bins
bins = np.arange(-37, -28, 1)
labels = bins[:-1]
df['latitude_bin'] = pd.cut(df['tplatitude'], bins=bins, labels=labels, right=False)
df = df.dropna(subset=['latitude_bin'])
df['latitude_bin'] = df['latitude_bin'].astype(int)

# Create 1 degree bins
bins = np.arange(15, 26, 1)
labels = bins[:-1]
df['longitude_bin'] = pd.cut(df['tplongitude'], bins=bins, labels=labels, right=False)
df = df.dropna(subset=['longitude_bin'])
df['longitude_bin'] = df['longitude_bin'].astype(int)

print("[INFO] Spatial binning complete.")

[INFO] Spatial binning complete.


## Creating Bronze Data

### bronze_saber_altitude_agg

**Output:** Altitude average temperature (`ktemp`). Vertical temperature profile.

In [9]:
agg_df_alt = df.groupby(['altitude_bin']).agg({
    'ktemp': 'mean'
}).reset_index()

### bronze_saber_1d_tempoaral_agg

**Output:** Monthly average `ktemp` values (time series).

In [10]:
agg_df_1d = df.groupby(['year_month']).agg({
    'ktemp': 'mean'
}).reset_index()

# Create a 'year' and 'month' column from date
agg_df_1d['year'] = agg_df_1d['year_month'].dt.year
agg_df_1d['month'] = agg_df_1d['year_month'].dt.month

### bronze_saber_2d_temporal_agg
**Output:** Monthly × altitude aggregation.

In [11]:
# Group and aggregate
agg_df_2d = df.groupby(['year_month', 'altitude_bin']).agg({
    'ktemp': 'mean'
}).reset_index()

# Create a 'year' and 'month' column from date
agg_df_2d['year'] = agg_df_2d['year_month'].dt.year
agg_df_2d['month'] = agg_df_2d['year_month'].dt.month

### bronze_saber_4d_tempoaral_agg

**Output:** 4D grid (`year_month`, `altitude_bin`, `latitude_bin`, `longitude_bin`)

In [12]:
agg_df_4d = df.groupby(['year_month', 'altitude_bin', 'latitude_bin', 'longitude_bin']).agg({
    'ktemp': 'mean'
}).reset_index()

# Create a 'year' and 'month' column from date
agg_df_4d['year'] = agg_df_4d['year_month'].dt.year
agg_df_4d['month'] = agg_df_4d['year_month'].dt.month

## Save Outputs

In [13]:
# Ensure silver folder exists
bronze_path.mkdir(parents=True, exist_ok=True)

# Define tables to save
tables = {'bronze_saber_4d_tempoaral_agg':agg_df_4d, 
          'bronze_saber_2d_tempoaral_agg':agg_df_2d, 
          'bronze_saber_1d_tempoaral_agg':agg_df_1d, 
          'bronze_saber_altitude_agg':agg_df_alt}

for table_name, table in tables.items():
    try:
        file_path = bronze_path / f"{table_name}.csv.gz"
        table.to_csv(file_path, index=False, compression='gzip')
        print(f"[SUCCESS] {table_name}.csv.gz")
    except Exception as e:
        print(f"[ERROR] Could not save {table_name}. Reason: {e}")

print("All CSVs saved successfully")

[SUCCESS] bronze_saber_4d_tempoaral_agg.csv.gz
[SUCCESS] bronze_saber_2d_tempoaral_agg.csv.gz
[SUCCESS] bronze_saber_1d_tempoaral_agg.csv.gz
[SUCCESS] bronze_saber_altitude_agg.csv.gz
All CSVs saved successfully
