# SM2 — Corridor Temperature Profiles (ThermoPro) & Section Indoor vs Ambient (Atrea) — Public Dataset

**Purpose:** Provide clear visuals of corridor temperatures (ThermoPro) and section‑level indoor vs ambient (Atrea), using the single merged public dataset.

### What this notebook shows
- **Chart 1 — Corridors (ThermoPro): hourly mean temperature** for selected locations (e.g., 5NP). This reveals daily patterns (morning/evening lows, midday peaks) and differences between sections.
- **Chart 2 — Atrea sections: indoor (`temp_indoor`) with ambient reference (`temp_ambient`)**. This helps compare indoor section behavior versus outdoor conditions across the same window.
- **Chart 3 — Corridors (ThermoPro): daily maxima** for the same locations, highlighting the hottest days and relative ranking of sections.

> _Dataset timestamps are **UTC**. We derive **Europe/Prague** local time for the charts._


In [None]:
try:
    import pandas as pd
    import matplotlib
except Exception:
    %pip install -q pandas pyarrow matplotlib gdown
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import re

plt.rcParams['figure.figsize'] = (12, 6)
OUT_DIR = Path('outputs'); OUT_DIR.mkdir(exist_ok=True)


## Load the public dataset (Parquet preferred)

This cell loads `sm2_public_dataset.parquet` (or `csv.gz`) and adds local Prague time columns used by the charts. No data transformations beyond type/time handling.


In [None]:
PARQUET_ID = "1gLPWgUGtRb371Gpv5O8t5j95lthNjELg"  # sm2_public_dataset.parquet
CSVGZ_ID  = "1eLOAOZ13--EKE63GZhmerjmY9zkeJcyd"  # sm2_public_dataset.csv.gz
USE_PARQUET = True

parquet_path = Path('sm2_public_dataset.parquet')
csvgz_path   = Path('sm2_public_dataset.csv.gz')

def gdown_download(file_id: str, out_path: Path):
    import subprocess, sys
    try:
        import gdown  # type: ignore
    except Exception:
        subprocess.check_call([sys.executable, '-m', 'pip', 'install', '-q', 'gdown'])
        import gdown  # type: ignore
    url = f"https://drive.google.com/uc?id={file_id}"
    gdown.download(url, str(out_path), quiet=False)

if USE_PARQUET:
    if not parquet_path.exists():
        print('Downloading Parquet…')
        gdown_download(PARQUET_ID, parquet_path)
else:
    if not csvgz_path.exists():
        print('Downloading CSV.GZ…')
        gdown_download(CSVGZ_ID, csvgz_path)

if USE_PARQUET and parquet_path.exists():
    df = pd.read_parquet(parquet_path)
elif csvgz_path.exists():
    df = pd.read_csv(csvgz_path, compression='gzip')
else:
    raise FileNotFoundError('Dataset not found. Place it next to this notebook or enable internet to download.')

df['time'] = pd.to_datetime(df['time'], utc=True, errors='coerce')
df['data_value'] = pd.to_numeric(df['data_value'], errors='coerce')
df['local_time'] = df['time'].dt.tz_convert('Europe/Prague')
df['local_day'] = df['local_time'].dt.floor('D')
df['year'] = df['local_time'].dt.year
print('Shape:', df.shape)
df.head(3)


## Optional: legacy → normalized location mapping

The public dataset already uses **normalized `location`** values. If you still reference legacy labels, you can provide `location_map.csv` (`from` → `to`) to translate **your filters only** (the dataset stays unchanged).


In [None]:
map_path = Path('location_map.csv')
map_df = None
raw2norm = {}
if map_path.exists():
    map_df = pd.read_csv(map_path)
    if {'from','to'}.issubset(map_df.columns):
        raw2norm = dict(map_df[['from','to']].dropna().values)
        print('Loaded mapping rows:', len(raw2norm))
    else:
        print('Mapping file found but missing required columns: from, to')
else:
    print('No mapping file present — proceeding without it (not needed for the public dataset).')

def map_raw_locations(raw_list):
    """Translate a list of legacy raw location strings to normalized `to` codes.
    Unknown entries are left unchanged so you see what didn't map.
    """
    if not raw2norm:
        return raw_list
    return [raw2norm.get(x, x) for x in raw_list]


## Helper filters

Utility helpers to select source, keys, locations, and time windows from the public dataset. These keep plotting cells compact and consistent.


In [None]:
def select_data(source=None, keys=None, locations=None, start=None, end=None):
    q = df.copy()
    if source is not None:
        q = q[q['source'].isin([source] if isinstance(source, str) else list(source))]
    if keys is not None:
        q = q[q['data_key'].isin([keys] if isinstance(keys, str) else list(keys))]
    if locations is not None:
        q = q[q['location'].isin([locations] if isinstance(locations, str) else list(locations))]
    if start is not None:
        q = q[q['local_time'] >= pd.Timestamp(start, tz='Europe/Prague')]
    if end is not None:
        q = q[q['local_time'] <= pd.Timestamp(end, tz='Europe/Prague')]
    return q

def parse_norm_location(loc: str):
    # Atrea sections: sm2_01..sm2_09
    if isinstance(loc, str) and loc.startswith('sm2_'):
        return {'kind':'Atrea', 'section': loc}
    # ThermoPro corridors/garages: e.g., 5NP-S5, 1PP-S1
    m = re.match(r'^(?P<floor>\d(?:NP|PP))-S(?P<section>\d+)$', str(loc))
    if m:
        return {'kind':'ThermoPro', 'floor': m.group('floor'), 'section': f"S{m.group('section')}"}
    return {'kind':'unknown'}


## Chart 1 — Corridors (ThermoPro): hourly mean temperature

We compute **hourly means** for selected corridor locations (e.g., 5NP) in the chosen window.  
**Interpretation tips:**
- **Shape within a day:** typical daily cycle (cool nights vs warm afternoons).  
- **Between lines:** relative differences among sections (e.g., ventilation, exposure).  
- **Sustained elevation:** persistent heat buildup or poor heat dissipation.


In [None]:
# Choose a window similar to the original notebook
START = '2025-08-21 00:00:00'
END   = '2025-08-27 00:00:00'

# Example legacy raw list (uncomment & edit if you want to translate legacy codes):
# legacy_raw = ['SM2_03_L5/L6_01','SM2_05_L5/L6_01','SM2_01_L5/L6_01','SM2_02_L5/L6_01']
# norm_locs = map_raw_locations(legacy_raw)

# Or simply use normalized directly (5NP corridors examples):
norm_locs = ['5NP-S1','5NP-S2','5NP-S3','5NP-S4','5NP-S5','5NP-S6','5NP-S7','5NP-S8','5NP-S9']

tp = select_data(source='ThermoPro', keys='temp_indoor', locations=norm_locs, start=START, end=END).copy()
if tp.empty:
    print('No ThermoPro rows found for the selection.')
else:
    tp['hourly'] = tp['local_time'].dt.floor('H')
    hourly = tp.groupby(['location','hourly'])['data_value'].mean().unstack(0)
    fig = plt.figure(); ax = plt.gca()
    hourly.plot(ax=ax)
    ax.set_title('Corridors (ThermoPro) — hourly mean temperature')
    ax.set_xlabel('Time [local]'); ax.set_ylabel('°C')
    plt.tight_layout(); plt.savefig(OUT_DIR / 'corridors_hourly_means.png', dpi=150); plt.show()
    print('Saved:', OUT_DIR / 'corridors_hourly_means.png')


## Chart 2 — Atrea sections: indoor with ambient reference

For all Atrea sections (`sm2_01..sm2_09`), we plot hourly **indoor** (`temp_indoor`) lines. As a reference, we overlay the **ambient** (`temp_ambient`) averaged across sections (dashed).  
**What to look for:**
- **Indoor vs ambient gap:** degree of decoupling or tracking with outside temperatures.  
- **Section‑to‑section spread:** which sections tend to run hotter/cooler over time.


In [None]:
START = '2025-08-08 00:00:00'
END   = '2025-08-22 00:00:00'

sections = [f'sm2_{i:02d}' for i in range(1,10)]

# Indoor exhaust (proxy for indoor)
atr_in = select_data(source='Atrea', keys='temp_indoor', locations=sections, start=START, end=END)
# Ambient reference
atr_amb = select_data(source='Atrea', keys='temp_ambient', locations=sections, start=START, end=END)

if atr_in.empty or atr_amb.empty:
    print('Insufficient Atrea data in the selected window.')
else:
    # hourly indoor per section
    atr_in['hourly'] = atr_in['local_time'].dt.floor('H')
    in_hourly = atr_in.groupby(['location','hourly'])['data_value'].mean().unstack(0)

    # hourly ambient averaged across sections (one line)
    atr_amb['hourly'] = atr_amb['local_time'].dt.floor('H')
    amb_hourly = atr_amb.groupby('hourly')['data_value'].mean()

    fig = plt.figure(); ax = plt.gca()
    in_hourly.plot(ax=ax)
    amb_hourly.plot(ax=ax, linewidth=2, linestyle='--')
    ax.set_title('Atrea sections — hourly indoor (temp_indoor) with ambient reference (dashed)')
    ax.set_xlabel('Time [local]'); ax.set_ylabel('°C')
    plt.tight_layout(); plt.savefig(OUT_DIR / 'atrea_hourly_with_ambient.png', dpi=150); plt.show()
    print('Saved:', OUT_DIR / 'atrea_hourly_with_ambient.png')


## Chart 3 — Corridors (ThermoPro): daily maxima

For the same corridor locations, we compute **daily max** temperatures.  
**Reading the chart:**
- **Peaks** mark the hottest days in the window.  
- **Relative ordering** across lines shows which corridors reach higher daily extremes.


In [None]:
START = '2025-08-21 00:00:00'
END   = '2025-08-27 00:00:00'

tp = select_data(source='ThermoPro', keys='temp_indoor', locations=['5NP-S1','5NP-S2','5NP-S3','5NP-S4','5NP-S5','5NP-S6','5NP-S7','5NP-S8','5NP-S9'], start=START, end=END)
if tp.empty:
    print('No ThermoPro data for the selection.')
else:
    tp['day'] = tp['local_time'].dt.floor('D')
    dmax = tp.groupby(['location','day'])['data_value'].max().unstack(0)
    fig = plt.figure(); ax = plt.gca()
    dmax.plot(ax=ax)
    ax.set_title('Corridors 5NP — daily maxima (ThermoPro)')
    ax.set_xlabel('Day [local]'); ax.set_ylabel('°C')
    plt.tight_layout(); plt.savefig(OUT_DIR / 'corridors_daily_maxima.png', dpi=150); plt.show()
    print('Saved:', OUT_DIR / 'corridors_daily_maxima.png')


### Notes
- Adjust the **time window** and **locations** (e.g., to focus on different floors/sections).  
- The charts are **descriptive**: they reveal daily profiles, extreme days, and differences between sections/floors.  
- All visuals come directly from the public dataset; the code does not depend on any private preprocessing.
