<img src="https://upload.wikimedia.org/wikipedia/commons/0/06/Imperial_College_London_new_logo.png" alt="Imperial Logo" width="400">

### **Course:** CIVE70111 Machine Learning
### Task 6 PV Plant Modelling and Machine Learning Pipeline

**Project:** Temporal Forecasting

**Date:** 09/12/2025  

<p align="right">
Created by: Michael Wong

# Table of Contents

1. Project Overview  
2. Workflow Summary  
3. Data Cleaning & Preprocessing  
   - Plant 1  
   - Plant 2  
4. Feature Engineering  
5. Task 4 – Linear SVC Classification Model  
   - Label construction  
   - Feature preparation  
   - Temporal splitting  
   - Model training  
   - Threshold optimisation  
   - Evaluation  
6. Task 6 – LSTM Forecasting Model (PyTorch)  
   - Sequence construction  
   - Model architecture  
   - Training procedure  
   - Baseline models  
   - Evaluation  
7. Results Summary  
8. Discussion & Conclusions  
9. Appendix – Full Python Code


# Project Overview

This project analyses the operational condition and power performance of two solar photovoltaic plants.  
The objectives are:

1. **Classify operating conditions (Optimal vs Suboptimal)**  
   Using a **Linear SVC** model with engineered physical features (irradiation ratios, temperature deltas, time-of-day cyclic encoding).

2. **Forecast AC and DC power 1 hour ahead**  
   Using a **PyTorch LSTM regression model**, evaluated against persistence and moving-average baselines.

The workflow includes extensive **data cleaning**, **sensor consistency checks**, **irradiance-based filtering**, **yield reconstruction**, and advanced modelling using both **scikit-learn** and **PyTorch**.

Both plants are processed symmetrically to allow comparisons across inverters and locations.


# Workflow Summary

### Step 1 — Load and Clean Raw Data
- Parse timestamps (including mixed formats in Plant 1).
- Remove or correct invalid readings.
- Clean irradiance using a day/night rule.
- Clean AC/DC power based on irradiance availability.
- Interpolate and enforce monotonicity in DAILY and TOTAL yields.

### Step 2 — Merge Generation Data with Weather Data
- Create per-inverter datasets.
- Join inverter and weather time series.
- Produce cleaned dictionaries:
  - `df_ps1` (Plant 1)
  - `df_ps2` (Plant 2)

### Step 3 — Feature Engineering
- Efficiency ratios (DC/IRRA, AC/IRRA)
- Module temperature delta
- Cyclic time-of-day encoding (cosine)

### Step 4 — Operating Condition Classification (Task 4)
- Build labels: Optimal = 0, Suboptimal = 1
- Scale numerical features using pipeline
- Fit Linear SVC with balanced class weights
- Optimise threshold using validation PR curve (maximising F1 for suboptimal)
- Evaluate on temporal test segment

### Step 5 — LSTM Forecasting (Task 6)
- Construct sequences (24h window → 1h horizon)
- Normalise features with MinMaxScaler
- Train PyTorch LSTM on each inverter
- Early stopping using validation loss
- Compare against:
  - Persistence baseline
  - Moving average baseline

### Step 6 — Summarise Performance
- Per-inverter MAE & RMSE
- Average errors across all inverters
- Interpretation of classification and forecasting performance


### Imports

In [24]:
import os
import math
import datetime as dt
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import display

import pickle  # <-- ADD THIS LINE

from sklearn.metrics import (
    auc, precision_recall_curve, classification_report, confusion_matrix,
    f1_score, precision_score, recall_score, average_precision_score,
    mean_absolute_error, mean_squared_error
)

from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.impute import SimpleImputer
from sklearn.utils.class_weight import compute_class_weight

from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.inspection import PartialDependenceDisplay, permutation_importance

import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset


### Data Paths

In [None]:
####################################################################################################################################################

# Change here

folder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\In"

####################################################################################################################################################
gen_path_1 = os.path.join(folder, "Plant_1_Generation_Data_updated.csv")
weather_path_1 = os.path.join(folder, "Plant_1_Weather_Sensor_Data.csv")
gen_path_2 = os.path.join(folder, "Plant_2_Generation_Data.csv")
weather_path_2 = os.path.join(folder, "Plant_2_Weather_Sensor_Data.csv")


# Data Cleaning & Preprocessing

Both plants required substantial preprocessing due to:
- Time format inconsistencies
- Sensor noise and missing values
- Physically inconsistent readings (e.g., power at night)
- Non-monotonic yield curves

### Plant 1
- Fixed DATE_TIME using dual-format parsing.
- Removed rows missing the operating condition.
- Aggregated per inverter with counts of Optimal and Suboptimal states.
- Cleaned irradiance using a day/night rule and interpolation.
- Enforced physically consistent AC/DC values based on irradiance.
- Reconstructed DAILY and TOTAL yields with monotonic constraints.
- Generated final cleaned dictionary `df_ps1`.

### Plant 2
- Performed identical cleaning and reconstruction steps.
- Generated dictionary `df_ps2`.

Both cleaned plants include:
- AC_CLEAN  
- DC_CLEAN  
- DAILY_YIELD_CLEAN  
- TOTAL_YIELD_CLEAN  
- AMBIENT_TEMPERATURE  
- MODULE_TEMPERATURE  
- IRRADIATION_CLEAN  
- OPERATING_CONDITION_CLEAN  


##### PLANT 1 CLEANING 

In [26]:
### ============================================
### Plant 1: Clean and merge generation + weather
### ============================================

# Correct time issue for Plant 1 generation data
df = pd.read_csv(gen_path_1)
start = pd.Timestamp("2020-05-15")
end = pd.Timestamp("2020-06-18")

df["parsed"] = pd.to_datetime(df["DATE_TIME"], format="%Y-%m-%d %H:%M:%S", errors="coerce")
invalid = df["parsed"].isna() | (~df["parsed"].between(start, end))
df.loc[invalid, "parsed"] = pd.to_datetime(
    df.loc[invalid, "DATE_TIME"], format="%Y-%d-%m %H:%M:%S", errors="coerce"
)
df["DATE_TIME"] = df["parsed"]
dfc = df.drop(columns=["parsed"])

# Drop rows with missing operating condition
plant_1_c = dfc.dropna()
plant_1_c = plant_1_c.drop(columns=["PLANT_ID", "day"])
plant_1_c.set_index("DATE_TIME", inplace=True)

# Separate into inverter dataframes
source_key_1 = plant_1_c["SOURCE_KEY"].unique().tolist()
p1c_gp = plant_1_c.groupby("SOURCE_KEY")
inv_1 = {SOURCE_KEY: group for SOURCE_KEY, group in p1c_gp}

# Aggregate by time for each inverter
agg_inv_1 = {}
for sk, df_inv in inv_1.items():
    agg_df = df_inv.groupby("DATE_TIME").agg(
        SOURCE_KEY=("SOURCE_KEY", "first"),
        DC_POWER=("DC_POWER", "first"),
        AC_POWER=("AC_POWER", "first"),
        DAILY_YIELD=("DAILY_YIELD", "first"),
        TOTAL_YIELD=("TOTAL_YIELD", "first"),
        NUM_OPT=("Operating_Condition", lambda x: (x == "Optimal").sum()),
        NUM_SUBOPT=("Operating_Condition", lambda x: (x == "Suboptimal").sum()),
    ).reset_index()
    agg_inv_1[sk] = agg_df

# Load Plant 1 weather
df = pd.read_csv(weather_path_1)
df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"])

# Day/night rule
day_start = dt.time(6, 0)
day_end = dt.time(18, 30)
df["expected_day"] = df["DATE_TIME"].dt.time.between(day_start, day_end)

# Clean irradiation data
df["IRRADIATION_CLEAN"] = df["IRRADIATION"].copy()
df.loc[(~df["expected_day"]) & (df["IRRADIATION_CLEAN"] > 0), "IRRADIATION_CLEAN"] = 0
df.loc[(df["expected_day"]) & (df["IRRADIATION_CLEAN"] == 0), "IRRADIATION_CLEAN"] = float("nan")
df["IRRADIATION_CLEAN"] = df["IRRADIATION_CLEAN"].interpolate(method="linear")
df["IRRADIATION_CLEAN"] = df["IRRADIATION_CLEAN"].fillna(0)

s1_c = df.copy()
s1_c.set_index("DATE_TIME", inplace=True)
s1_c = s1_c.drop(columns=["SOURCE_KEY"])

# Join inverter data with weather
wea_inv_1 = {}
for sk, df_inv in agg_inv_1.items():
    df_inv = df_inv.set_index("DATE_TIME")
    join_df = df_inv.join(s1_c, how="inner")
    wea_inv_1[sk] = join_df

# Clean AC/DC and DAILY_YIELD for each inverter
df_step_1 = {}
for sk, df_inv in wea_inv_1.items():
    df_clean = df_inv.copy()
    df_clean["AC_CLEAN"] = df_clean["AC_POWER"].copy()
    df_clean["DC_CLEAN"] = df_clean["DC_POWER"].copy()

    night_mask = df_clean["IRRADIATION_CLEAN"] == 0
    df_clean.loc[night_mask & (df_clean["AC_CLEAN"] > 0), "AC_CLEAN"] = 0
    df_clean.loc[night_mask & (df_clean["DC_CLEAN"] > 0), "DC_CLEAN"] = 0

    day_mask = df_clean["IRRADIATION_CLEAN"] > 0
    df_clean.loc[day_mask & (df_clean["AC_CLEAN"] == 0), "AC_CLEAN"] = float("nan")
    df_clean.loc[day_mask & (df_clean["DC_CLEAN"] == 0), "DC_CLEAN"] = float("nan")

    df_clean["AC_CLEAN"] = df_clean["AC_CLEAN"].interpolate(method="linear")
    df_clean["DC_CLEAN"] = df_clean["DC_CLEAN"].interpolate(method="linear")
    df_clean["AC_CLEAN"] = df_clean["AC_CLEAN"].fillna(0)
    df_clean["DC_CLEAN"] = df_clean["DC_CLEAN"].fillna(0)

    df_step_1[sk] = df_clean

# DAILY_YIELD_CLEAN reconstruction
df_step_2 = {}
for sk, df_inv in df_step_1.items():
    df_clean = df_inv.copy()
    df_clean.index = pd.to_datetime(df_clean.index)
    df_clean["DAILY_YIELD_CLEAN"] = df_clean["DAILY_YIELD"].copy()

    dates = np.unique(df_clean.index.date)
    for d in dates:
        day_mask_full = df_clean.index.date == d
        df_day = df_clean.loc[day_mask_full]
        irr_pos = df_day["IRRADIATION_CLEAN"] > 0

        if not irr_pos.any():
            df_clean.loc[day_mask_full, "DAILY_YIELD_CLEAN"] = 0.0
            continue

        day_start_idx = df_day[irr_pos].index[0]
        day_end_idx = df_day[irr_pos].index[-1]

        night_mask = day_mask_full & (df_clean.index < day_start_idx)
        day_mask = day_mask_full & (df_clean.index >= day_start_idx) & (df_clean.index <= day_end_idx)
        evening_mask = day_mask_full & (df_clean.index > day_end_idx)

        df_clean.loc[night_mask, "DAILY_YIELD_CLEAN"] = 0.0
        val_end = df_clean.at[day_end_idx, "DAILY_YIELD"]
        df_clean.loc[evening_mask, "DAILY_YIELD_CLEAN"] = val_end

        day_idx = df_clean.loc[day_mask].index
        if len(day_idx) == 0:
            continue

        raw_vals = df_clean.loc[day_idx, "DAILY_YIELD_CLEAN"].values.astype(float)
        invalid = np.zeros(len(raw_vals), dtype=bool)
        invalid |= raw_vals <= 0

        if len(raw_vals) > 1:
            drops = np.diff(raw_vals) < 0
            invalid[1:][drops] = True

        df_clean.loc[day_idx[invalid], "DAILY_YIELD_CLEAN"] = np.nan
        df_clean.loc[day_idx, "DAILY_YIELD_CLEAN"] = (
            df_clean.loc[day_idx, "DAILY_YIELD_CLEAN"]
            .interpolate(method="linear", limit_direction="both")
        )

        prev_val = df_clean.at[day_idx[0], "DAILY_YIELD_CLEAN"]
        for t in day_idx[1:]:
            cur = df_clean.at[t, "DAILY_YIELD_CLEAN"]
            if pd.isna(cur) or cur < prev_val:
                df_clean.at[t, "DAILY_YIELD_CLEAN"] = prev_val
            else:
                prev_val = cur

        df_clean.loc[night_mask, "DAILY_YIELD_CLEAN"] = 0.0
        df_clean.loc[evening_mask, "DAILY_YIELD_CLEAN"] = val_end

    df_step_2[sk] = df_clean

# TOTAL_YIELD_CLEAN reconstruction
df_ps1 = {}
for sk, df_inv in df_step_2.items():
    df_clean = df_inv.copy()
    df_clean["TOTAL_YIELD_CLEAN"] = df_clean["TOTAL_YIELD"].copy()
    timestamps = df_clean.index

    for i in range(1, len(timestamps)):
        t_prev = timestamps[i - 1]
        t = timestamps[i]

        TY_prev = df_clean.at[t_prev, "TOTAL_YIELD_CLEAN"]
        TY_now = df_clean.at[t, "TOTAL_YIELD"]
        DY_prev = df_clean.at[t_prev, "DAILY_YIELD_CLEAN"]
        DY_now = df_clean.at[t, "DAILY_YIELD_CLEAN"]

        if t.date() != t_prev.date():
            df_clean.at[t, "TOTAL_YIELD_CLEAN"] = TY_prev
            continue

        delta_dy = DY_now - DY_prev
        TY_expected = TY_prev + delta_dy

        if TY_now < TY_prev:
            df_clean.at[t, "TOTAL_YIELD_CLEAN"] = TY_expected
        else:
            df_clean.at[t, "TOTAL_YIELD_CLEAN"] = TY_now

    df_clean = df_clean[
        [
            "PLANT_ID", "SOURCE_KEY",
            "AC_CLEAN", "DC_CLEAN",
            "DAILY_YIELD_CLEAN", "TOTAL_YIELD_CLEAN",
            "AMBIENT_TEMPERATURE", "MODULE_TEMPERATURE",
            "IRRADIATION_CLEAN", "NUM_OPT", "NUM_SUBOPT",
        ]
    ]

    df_clean["OPERATING_CONDITION_CLEAN"] = np.where(
        df_clean["NUM_OPT"] > df_clean["NUM_SUBOPT"],
        "Optimal", "Suboptimal"
    )

    df_clean = df_clean.drop(columns=["NUM_OPT", "NUM_SUBOPT"])
    df_ps1[sk] = df_clean


##### PLANT 2 CLEANING 

In [27]:
### ============================================
### Plant 2: Clean and merge generation + weather
### ============================================

plant_2 = pd.read_csv(gen_path_2, parse_dates=["DATE_TIME"])
plant_2 = plant_2.drop(columns=["PLANT_ID"])
plant_2.set_index("DATE_TIME", inplace=True)

p2_gp = plant_2.groupby("SOURCE_KEY")
inv_2 = {SOURCE_KEY: group for SOURCE_KEY, group in p2_gp}
source_key_2 = plant_2["SOURCE_KEY"].unique().tolist()

agg_inv_2 = {}
for sk, df_inv in inv_2.items():
    agg_df = df_inv.groupby("DATE_TIME").agg(
        SOURCE_KEY=("SOURCE_KEY", "first"),
        DC_POWER=("DC_POWER", "first"),
        AC_POWER=("AC_POWER", "first"),
        DAILY_YIELD=("DAILY_YIELD", "first"),
        TOTAL_YIELD=("TOTAL_YIELD", "first"),
        NUM_OPT=("Operating_Condition", lambda x: (x == "Optimal").sum()),
        NUM_SUBOPT=("Operating_Condition", lambda x: (x == "Suboptimal").sum()),
    ).reset_index()
    agg_inv_2[sk] = agg_df

df = pd.read_csv(weather_path_2)
df["DATE_TIME"] = pd.to_datetime(df["DATE_TIME"])

day_start = dt.time(6, 0)
day_end = dt.time(18, 30)
df["expected_day"] = df["DATE_TIME"].dt.time.between(day_start, day_end)

df["IRRADIATION_CLEAN"] = df["IRRADIATION"].copy()
df.loc[(~df["expected_day"]) & (df["IRRADIATION_CLEAN"] > 0), "IRRADIATION_CLEAN"] = 0
df.loc[(df["expected_day"]) & (df["IRRADIATION_CLEAN"] == 0), "IRRADIATION_CLEAN"] = float("nan")
df["IRRADIATION_CLEAN"] = df["IRRADIATION_CLEAN"].interpolate(method="linear")
df["IRRADIATION_CLEAN"] = df["IRRADIATION_CLEAN"].fillna(0)

s2_c = df.copy()
s2_c.set_index("DATE_TIME", inplace=True)
s2_c = s2_c.drop(columns=["SOURCE_KEY"])

wea_inv_2 = {}
for sk, df_inv in agg_inv_2.items():
    df_inv = df_inv.set_index("DATE_TIME")
    join_df = df_inv.join(s2_c, how="inner")
    wea_inv_2[sk] = join_df

df_step_1 = {}
for sk, df_inv in wea_inv_2.items():
    df_clean = df_inv.copy()
    df_clean["AC_CLEAN"] = df_clean["AC_POWER"].copy()
    df_clean["DC_CLEAN"] = df_clean["DC_POWER"].copy()

    night_mask = df_clean["IRRADIATION_CLEAN"] == 0
    df_clean.loc[night_mask & (df_clean["AC_CLEAN"] > 0), "AC_CLEAN"] = 0
    df_clean.loc[night_mask & (df_clean["DC_CLEAN"] > 0), "DC_CLEAN"] = 0

    day_mask = df_clean["IRRADIATION_CLEAN"] > 0
    df_clean.loc[day_mask & (df_clean["AC_CLEAN"] == 0), "AC_CLEAN"] = float("nan")
    df_clean.loc[day_mask & (df_clean["DC_CLEAN"] == 0), "DC_CLEAN"] = float("nan")

    df_clean["AC_CLEAN"] = df_clean["AC_CLEAN"].interpolate(method="linear")
    df_clean["DC_CLEAN"] = df_clean["DC_CLEAN"].interpolate(method="linear")
    df_clean["AC_CLEAN"] = df_clean["AC_CLEAN"].fillna(0)
    df_clean["DC_CLEAN"] = df_clean["DC_CLEAN"].fillna(0)

    df_step_1[sk] = df_clean

df_step_2 = {}
for sk, df_inv in df_step_1.items():
    df_clean = df_inv.copy()
    df_clean.index = pd.to_datetime(df_clean.index)
    df_clean["DAILY_YIELD_CLEAN"] = df_clean["DAILY_YIELD"].copy()

    dates = np.unique(df_clean.index.date)
    for d in dates:
        day_mask_full = df_clean.index.date == d
        df_day = df_clean.loc[day_mask_full]
        irr_pos = df_day["IRRADIATION_CLEAN"] > 0

        if not irr_pos.any():
            df_clean.loc[day_mask_full, "DAILY_YIELD_CLEAN"] = 0.0
            continue

        day_start_idx = df_day[irr_pos].index[0]
        day_end_idx = df_day[irr_pos].index[-1]

        night_mask = day_mask_full & (df_clean.index < day_start_idx)
        day_mask = day_mask_full & (df_clean.index >= day_start_idx) & (df_clean.index <= day_end_idx)
        evening_mask = day_mask_full & (df_clean.index > day_end_idx)

        df_clean.loc[night_mask, "DAILY_YIELD_CLEAN"] = 0.0
        val_end = df_clean.at[day_end_idx, "DAILY_YIELD"]
        df_clean.loc[evening_mask, "DAILY_YIELD_CLEAN"] = val_end

        day_idx = df_clean.loc[day_mask].index
        if len(day_idx) == 0:
            continue

        raw_vals = df_clean.loc[day_idx, "DAILY_YIELD_CLEAN"].values.astype(float)
        invalid = np.zeros(len(raw_vals), dtype=bool)
        invalid |= raw_vals <= 0

        if len(raw_vals) > 1:
            drops = np.diff(raw_vals) < 0
            invalid[1:][drops] = True

        df_clean.loc[day_idx[invalid], "DAILY_YIELD_CLEAN"] = np.nan
        df_clean.loc[day_idx, "DAILY_YIELD_CLEAN"] = (
            df_clean.loc[day_idx, "DAILY_YIELD_CLEAN"]
            .interpolate(method="linear", limit_direction="both")
        )

        prev_val = df_clean.at[day_idx[0], "DAILY_YIELD_CLEAN"]
        for t in day_idx[1:]:
            cur = df_clean.at[t, "DAILY_YIELD_CLEAN"]
            if pd.isna(cur) or cur < prev_val:
                df_clean.at[t, "DAILY_YIELD_CLEAN"] = prev_val
            else:
                prev_val = cur

        df_clean.loc[night_mask, "DAILY_YIELD_CLEAN"] = 0.0
        df_clean.loc[evening_mask, "DAILY_YIELD_CLEAN"] = val_end

    df_step_2[sk] = df_clean

df_ps2 = {}
for sk, df_inv in df_step_2.items():
    df_clean = df_inv.copy()
    df_clean["TOTAL_YIELD_CLEAN"] = df_clean["TOTAL_YIELD"].copy()
    timestamps = df_clean.index

    for i in range(1, len(timestamps)):
        t_prev = timestamps[i - 1]
        t = timestamps[i]

        TY_prev = df_clean.at[t_prev, "TOTAL_YIELD_CLEAN"]
        TY_now = df_clean.at[t, "TOTAL_YIELD"]
        DY_prev = df_clean.at[t_prev, "DAILY_YIELD_CLEAN"]
        DY_now = df_clean.at[t, "DAILY_YIELD_CLEAN"]

        if t.date() != t_prev.date():
            df_clean.at[t, "TOTAL_YIELD_CLEAN"] = TY_prev
            continue

        delta_dy = DY_now - DY_prev
        TY_expected = TY_prev + delta_dy

        if TY_now < TY_prev:
            df_clean.at[t, "TOTAL_YIELD_CLEAN"] = TY_expected
        else:
            df_clean.at[t, "TOTAL_YIELD_CLEAN"] = TY_now

    df_clean = df_clean[
        [
            "PLANT_ID", "SOURCE_KEY",
            "AC_CLEAN", "DC_CLEAN",
            "DAILY_YIELD_CLEAN", "TOTAL_YIELD_CLEAN",
            "AMBIENT_TEMPERATURE", "MODULE_TEMPERATURE",
            "IRRADIATION_CLEAN", "NUM_OPT", "NUM_SUBOPT",
        ]
    ]

    df_clean["OPERATING_CONDITION_CLEAN"] = np.where(
        df_clean["NUM_OPT"] > df_clean["NUM_SUBOPT"],
        "Optimal", "Suboptimal"
    )

    df_clean = df_clean.drop(columns=["NUM_OPT", "NUM_SUBOPT"])
    df_ps2[sk] = df_clean


# Feature Engineering

The following physical and temporal features were engineered:

- **DC/IRRA** and **AC/IRRA**  
  Efficiency-like ratios normalised by irradiance.

- **Temperature Delta**  
  MODULE_TEMPERATURE − AMBIENT_TEMPERATURE.

- **hour_cos**  
  Cosine encoding of hour-of-day to avoid discontinuity near midnight.

These features improve separability for the classifier and strengthen LSTM temporal patterns.


# Task 4 – Linear SVC Classification

The classification goal is to detect **Suboptimal** operation.

### Label Strategy
- Optimal = 0  
- Suboptimal = 1  

### Model Pipeline
1. Time-based split: 70% train, 3-day validation window, 10-day test window.
2. Median imputation + StandardScaler.
3. Linear SVC with class_weight="balanced".
4. Threshold selected using validation PR curve:
   - Threshold chosen to maximise **Suboptimal F1 score**.

### Evaluation Metrics
- Precision, Recall, F1 for both classes.
- PR-AUC (Average Precision) for Suboptimal.
- Confusion matrix focused on Suboptimal detection.


##### SVC MODEL CODE

In [28]:
### ============================================
### Task 4 – Linear SVC model
### ============================================

def make_label(df):
    return (df["OPERATING_CONDITION_CLEAN"].str.lower() == "suboptimal").astype(int)

def engineer_features(df):
    df = df.groupby("SOURCE_KEY", group_keys=False).apply(lambda g: g.sort_values("DATE_TIME"))
    if {"DC_CLEAN", "IRRADIATION_CLEAN"}.issubset(df.columns):
        df["DC/IRRA"] = df["DC_CLEAN"] / (df["IRRADIATION_CLEAN"] + 1e-3)
    if {"AC_CLEAN", "IRRADIATION_CLEAN"}.issubset(df.columns):
        df["AC/IRRA"] = df["AC_CLEAN"] / (df["IRRADIATION_CLEAN"] + 1e-3)
    if {"MODULE_TEMPERATURE", "AMBIENT_TEMPERATURE"}.issubset(df.columns):
        df["Temp_Delta"] = df["MODULE_TEMPERATURE"] - df["AMBIENT_TEMPERATURE"]
    if "DATE_TIME" in df.columns:
        t = df["DATE_TIME"]
        hour = t.dt.hour + t.dt.minute / 60
        df["hour_cos"] = np.cos(2 * np.pi * hour / 24)
    return df

def assemble_all_from_df_ps(df_ps):
    parts = []
    for key, df_inv in df_ps.items():
        df_inv = df_inv.reset_index().rename(columns=lambda x: "DATE_TIME" if x == "index" else x)
        df_inv["SOURCE_KEY"] = key
        df_inv["DATE_TIME"] = pd.to_datetime(df_inv["DATE_TIME"])
        parts.append(df_inv)

    df_all = pd.concat(parts, ignore_index=True).drop_duplicates()
    m = (~df_all["OPERATING_CONDITION_CLEAN"].isna()) & (~df_all["IRRADIATION_CLEAN"].isna())

    print("\n=== Operating Condition Counts ===")
    counts = df_all["OPERATING_CONDITION_CLEAN"].value_counts()
    print(f"Number of Optimal (0):     {counts.get('Optimal', 0)}")
    print(f"Number of Suboptimal (1):  {counts.get('Suboptimal', 0)}")

    return df_all[m]

def time_split(df, y, test_days=10, val_days=3):
    last = df["DATE_TIME"].max()
    test_start = last - pd.Timedelta(days=test_days)
    val_start = test_start - pd.Timedelta(days=val_days)

    m_te = df["DATE_TIME"] >= test_start
    m_val = (df["DATE_TIME"] >= val_start) & (~m_te)
    m_tr = df["DATE_TIME"] < val_start

    return df[m_tr], df[m_val], df[m_te], y[m_tr], y[m_val], y[m_te]

def make_preprocessor(df):
    drop = ["OPERATING_CONDITION_CLEAN", "DATE_TIME", "PLANT_ID", "SOURCE_KEY"]
    num_cols = [c for c in df.columns if c not in drop and df[c].dtype.kind in "fcui"]

    pre = ColumnTransformer(
        [
            (
                "num",
                Pipeline(
                    [
                        ("imp", SimpleImputer(strategy="median")),
                        ("scaler", StandardScaler()),
                    ]
                ),
                num_cols,
            )
        ]
    )
    return pre

def Suboptimal_f1_threshold(y, score_suboptimal):
    p, r, thr = precision_recall_curve(y, score_suboptimal)
    f1 = 2 * p[1:] * r[1:] / (p[1:] + r[1:] + 1e-12)
    return 0.5 if len(thr) == 0 else float(thr[np.nanargmax(f1)])

def Suboptimal_evaluate(name, y, score_suboptimal, thr, tag):
    pred = (score_suboptimal >= thr).astype(int)
    ap = average_precision_score(y, score_suboptimal)

    print(f"\n==== {name} | {tag} ====")
    print(f"Suboptimal Threshold: {thr:.4f} | PR-AUC: {ap:.4f}")
    print(classification_report(y, pred, digits=3))
    print(confusion_matrix(y, pred))

def run_classification_on_df_ps(df_ps, test_days=10, val_days=3):
    df = assemble_all_from_df_ps(df_ps)
    y = make_label(df)
    df_feat = engineer_features(df)

    X_tr, X_val, X_te, y_tr, y_val, y_te = time_split(df_feat, y, test_days, val_days)
    pre = make_preprocessor(df_feat)

    svc = Pipeline(
        [
            ("pre", pre),
            ("clf", SVC(kernel="linear", class_weight="balanced", probability=True)),
        ]
    )
    svc.fit(X_tr, y_tr)

    val_scores = svc.predict_proba(X_val)[:, 1]
    thr = Suboptimal_f1_threshold(y_val, val_scores)

    test_scores = svc.predict_proba(X_te)[:, 1]
    Suboptimal_evaluate("Linear SVC", y_te, test_scores, thr, "Test Set")


# Task 6 – LSTM Forecasting (PyTorch)

We forecast AC/DC power **1 hour ahead** using **24 hours of historical data**.

### Sequence Setup
- Sampling interval: 15 minutes
- Window length: 24h → 96 steps
- Forecast horizon: 1h → 4 steps ahead
- Target columns: AC_CLEAN, DC_CLEAN

### Model
- Single-layer LSTM with 32 hidden units.
- Final dense layer outputs 1-step regression.
- Loss: MSE
- Optimiser: Adam (lr = 1e-3)
- Early stopping on validation loss (patience = 5).

### Baselines
- **Persistence:** last observed value of window.
- **Moving average:** mean of last 4 time steps.

### Metrics
- MAE
- RMSE
- Comparison between:
  - LSTM
  - Persistence
  - Moving average

Per-inverter results are aggregated and averaged across plant.


##### LSTM MODEL

In [29]:
### ============================================
### Task 6 – LSTM model (PyTorch)
### ============================================

def make_seq(X, y, L):
    xs, ys = [], []
    for i in range(len(X) - L + 1):
        xs.append(X[i : i + L])
        ys.append(y[i + L - 1])
    return np.array(xs), np.array(ys)

def seq_X_only(arr, L):
    return np.array([arr[i : i + L] for i in range(len(arr) - L + 1)])

def eval_model(y, yhat):
    mae = mean_absolute_error(y, yhat)
    rmse = math.sqrt(mean_squared_error(y, yhat))
    return mae, rmse

class LSTMRegressor(nn.Module):
    def __init__(self, input_dim, hidden_dim=32):
        super().__init__()
        self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
        self.fc = nn.Linear(hidden_dim, 1)

    def forward(self, x):
        out, _ = self.lstm(x)
        out = out[:, -1, :]
        out = self.fc(out)
        return out

def run_lstm_for_one_df(
    df_raw,
    feature_cols,
    target_col="DC_CLEAN",
    horizon_steps=4,
    window_steps=96,
    inverter_name="",
):
    df = df_raw.sort_index().copy()
    df.index = pd.to_datetime(df.index)
    df["TARGET"] = df[target_col].shift(-horizon_steps)
    df_model = df.dropna(subset=["TARGET"]).copy()

    if len(df_model) <= window_steps + 10:
        print(f"{inverter_name}: insufficient data")
        return None

    n = len(df_model)
    train_end = int(n * 0.7)
    val_end = int(n * 0.85)

    train = df_model.iloc[:train_end]
    val = df_model.iloc[train_end:val_end]
    test = df_model.iloc[val_end:]

    if len(test) <= window_steps:
        print(f"{inverter_name}: test too short")
        return None

    scX = MinMaxScaler().fit(train[feature_cols])
    scY = MinMaxScaler().fit(train[["TARGET"]])

    def scale(df_part):
        return scX.transform(df_part[feature_cols]), scY.transform(df_part[["TARGET"]]).ravel()

    X_train, y_train = scale(train)
    X_val, y_val = scale(val)
    X_test, y_test = scale(test)

    Xtr, ytr = make_seq(X_train, y_train, window_steps)
    Xv, yv = make_seq(X_val, y_val, window_steps)
    Xte, yte = make_seq(X_test, y_test, window_steps)

    test_time = test.index[window_steps - 1 :]

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

    Xtr_t = torch.tensor(Xtr, dtype=torch.float32).to(device)
    ytr_t = torch.tensor(ytr, dtype=torch.float32).view(-1, 1).to(device)
    Xv_t = torch.tensor(Xv, dtype=torch.float32).to(device)
    yv_t = torch.tensor(yv, dtype=torch.float32).view(-1, 1).to(device)
    Xte_t = torch.tensor(Xte, dtype=torch.float32).to(device)

    train_loader = DataLoader(TensorDataset(Xtr_t, ytr_t), batch_size=64, shuffle=True)

    model = LSTMRegressor(input_dim=Xtr.shape[2]).to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
    loss_fn = nn.MSELoss()

    best_val_loss = float("inf")
    best_state = None
    patience = 5
    streak = 0

    for epoch in range(50):
        model.train()
        for xb, yb in train_loader:
            optimizer.zero_grad()
            pred = model(xb)
            loss = loss_fn(pred, yb)
            loss.backward()
            optimizer.step()

        model.eval()
        with torch.no_grad():
            val_pred = model(Xv_t)
            val_loss = loss_fn(val_pred, yv_t).item()

        if val_loss < best_val_loss:
            best_val_loss = val_loss
            best_state = {k: v.cpu().clone() for k, v in model.state_dict().items()}
            streak = 0
        else:
            streak += 1
            if streak >= patience:
                break

    if best_state is not None:
        model.load_state_dict(best_state)
        model.to(device)

    with torch.no_grad():
        pred_test = model(Xte_t).cpu().numpy().ravel()

    y_pred = scY.inverse_transform(pred_test.reshape(-1, 1)).ravel()
    y_true = scY.inverse_transform(yte.reshape(-1, 1)).ravel()

    tgt_raw = test[target_col].values.reshape(-1, 1)
    tgt_seq = seq_X_only(tgt_raw, window_steps)

    persist = tgt_seq[:, -1, 0]
    movavg = tgt_seq[:, -4:, 0].mean(axis=1)

    mae_lstm, rmse_lstm = eval_model(y_true, y_pred)
    mae_pers, rmse_pers = eval_model(y_true, persist)
    mae_ma, rmse_ma = eval_model(y_true, movavg)

    return {
        "inverter": inverter_name,
        "LSTM_MAE": mae_lstm,
        "LSTM_RMSE": rmse_lstm,
        "Pers_MAE": mae_pers,
        "Pers_RMSE": rmse_pers,
        "MA_MAE": mae_ma,
        "MA_RMSE": rmse_ma,
        "y_true": y_true,
        "y_pred_lstm": y_pred,
        "persist": persist,
        "movavg": movavg,
        "test_time": test_time,
        # >>> NEW: information needed to reload the model <<<
        "model_state_dict": model.state_dict(),
        "input_dim": Xtr.shape[2],
        "hidden_dim": model.lstm.hidden_size,
    }
def run_lstm_for_plant(
    df_dict,
    source_keys,
    feature_cols,
    target_type="DC",
    window_hours=24,
    horizon_minutes=60,
    step_minutes=15,
):
    step_per_hour = int(60 / step_minutes)
    window_steps = window_hours * step_per_hour
    horizon_steps = int(horizon_minutes / step_minutes)

    target_col = f"{target_type}_CLEAN"

    all_results = []

    for key in source_keys:
        df_inv = df_dict[key]
        res = run_lstm_for_one_df(
            df_raw=df_inv,
            feature_cols=feature_cols,
            target_col=target_col,
            horizon_steps=horizon_steps,
            window_steps=window_steps,
            inverter_name=key,
        )
        if res is not None:
            all_results.append(res)

    if not all_results:
        return pd.DataFrame(), pd.DataFrame(), []

    results_df = pd.DataFrame(all_results).set_index("inverter")

    avg_results = pd.DataFrame(
        {
            "LSTM_MAE_avg": [results_df["LSTM_MAE"].mean()],
            "LSTM_RMSE_avg": [results_df["LSTM_RMSE"].mean()],
            "Pers_MAE_avg": [results_df["Pers_MAE"].mean()],
            "Pers_RMSE_avg": [results_df["Pers_RMSE"].mean()],
            "MA_MAE_avg": [results_df["MA_MAE"].mean()],
            "MA_RMSE_avg": [results_df["MA_RMSE"].mean()],
        }
    )

    return results_df, avg_results, all_results


# RUNNING ALL MODELS

In [30]:
### ============================================
### Run SVC and LSTM for Plants 1 and 2
### ============================================

# ---------------------------
# Task 4 – Linear SVC
# ---------------------------
run_classification_on_df_ps(df_ps1)
run_classification_on_df_ps(df_ps2)

# ---------------------------
# Task 6 – LSTM Forecasting
# ---------------------------
feature_cols = [
    "AC_CLEAN",
    "DC_CLEAN",
    "IRRADIATION_CLEAN",
    "AMBIENT_TEMPERATURE",
    "MODULE_TEMPERATURE",
]

# Plant 1 – AC
results_p1_ac_df, avg_p1_ac_df, raw_p1_ac = run_lstm_for_plant(
    df_dict=df_ps1,
    source_keys=source_key_1,
    feature_cols=feature_cols,
    target_type="AC",
    window_hours=24,
    horizon_minutes=60,
    step_minutes=15,
)
display(results_p1_ac_df)
display(avg_p1_ac_df)

# Plant 1 – DC
results_p1_dc_df, avg_p1_dc_df, raw_p1_dc = run_lstm_for_plant(
    df_dict=df_ps1,
    source_keys=source_key_1,
    feature_cols=feature_cols,
    target_type="DC",
    window_hours=24,
    horizon_minutes=60,
    step_minutes=15,
)
display(results_p1_dc_df)
display(avg_p1_dc_df)

# Plant 2 – AC
results_p2_ac_df, avg_p2_ac_df, raw_p2_ac = run_lstm_for_plant(
    df_dict=df_ps2,
    source_keys=source_key_2,
    feature_cols=feature_cols,
    target_type="AC",
    window_hours=24,
    horizon_minutes=60,
    step_minutes=15,
)
display(results_p2_ac_df)
display(avg_p2_ac_df)

# Plant 2 – DC
results_p2_dc_df, avg_p2_dc_df, raw_p2_dc = run_lstm_for_plant(
    df_dict=df_ps2,
    source_keys=source_key_2,
    feature_cols=feature_cols,
    target_type="DC",
    window_hours=24,
    horizon_minutes=60,
    step_minutes=15,
)
display(results_p2_dc_df)
display(avg_p2_dc_df)



=== Operating Condition Counts ===
Number of Optimal (0):     7656
Number of Suboptimal (1):  38024


  df = df.groupby("SOURCE_KEY", group_keys=False).apply(lambda g: g.sort_values("DATE_TIME"))



==== Linear SVC | Test Set ====
Suboptimal Threshold: 0.0069 | PR-AUC: 1.0000
              precision    recall  f1-score   support

           0      1.000     0.915     0.956      1298
           1      0.988     1.000     0.994      9218

    accuracy                          0.990     10516
   macro avg      0.994     0.958     0.975     10516
weighted avg      0.990     0.990     0.989     10516

[[1188  110]
 [   0 9218]]

=== Operating Condition Counts ===
Number of Optimal (0):     7414
Number of Suboptimal (1):  60284


  df = df.groupby("SOURCE_KEY", group_keys=False).apply(lambda g: g.sort_values("DATE_TIME"))



==== Linear SVC | Test Set ====
Suboptimal Threshold: 0.7672 | PR-AUC: 0.9987
              precision    recall  f1-score   support

           0      0.699     0.908     0.790      1760
           1      0.991     0.964     0.978     19382

    accuracy                          0.960     21142
   macro avg      0.845     0.936     0.884     21142
weighted avg      0.967     0.960     0.962     21142

[[ 1598   162]
 [  689 18693]]


Unnamed: 0_level_0,LSTM_MAE,LSTM_RMSE,Pers_MAE,Pers_RMSE,MA_MAE,MA_RMSE,y_true,y_pred_lstm,persist,movavg,test_time,model_state_dict,input_dim,hidden_dim
inverter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1BY6WEcLGh8j5v7,94.929348,131.975305,90.620942,158.669991,110.688427,174.945425,"[323.8857143, 360.7749999999999, 251.7857143, ...","[400.2845, 422.3158, 309.0261, 231.28014, 207....","[426.9857143, 599.275, 160.0857143, 220.425, 3...","[561.555803575, 547.899553575, 439.29598215000...","DatetimeIndex(['2020-06-15 15:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1637), tensor...",5,32
1IF53ai7Xc0U56Y,85.642427,134.252352,97.429369,173.169825,119.2275,191.844999,"[120.8375, 52.74285713999999, 23.4625, 2.34285...","[160.41298, 152.55385, 134.55028, 103.42, 84.0...","[392.9625, 281.3571429, 225.4625, 142.2285714,...","[291.553125, 317.060267875, 311.588392875, 260...","DatetimeIndex(['2020-06-15 16:45:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.2349), tensor...",5,32
3PZuoBAID5Wc2HD,85.342027,129.971228,97.137957,171.22978,115.969441,187.989427,"[119.4571429, 51.6, 22.42857143, 2.15, 0.0, 0....","[211.01822, 193.51228, 166.95343, 126.84726, 9...","[394.5857143, 276.5125, 225.9714286, 143.8375,...","[291.294196425, 314.347321425, 310.758035725, ...","DatetimeIndex(['2020-06-15 16:45:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0551), tenso...",5,32
7JYdWkrLSPkdwr4,88.752917,140.557246,95.9418,168.567515,117.216114,185.670745,"[271.1714286, 218.60000000000002, 137.8285714,...","[342.1816, 259.88455, 241.77861, 254.96797, 24...","[172.4285714, 242.35, 338.1714286, 384.1625, 2...","[476.6549107, 391.8736607, 345.459375, 284.278...","DatetimeIndex(['2020-06-15 16:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.1004), tenso...",5,32
McdE0feGgRqW7Ca,76.189366,130.691426,98.10668,175.709121,121.063068,194.233328,"[140.8142857, 125.5142857, 54.825, 23.82857143...","[106.0684, 130.76666, 112.07684, 90.793, 59.91...","[351.3571429, 414.8, 276.1857143, 225.325, 140...","[341.65044645, 297.81294644999997, 322.7950893...","DatetimeIndex(['2020-06-15 16:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.2546), tensor...",5,32
VHMLBKoKgIrUVDU,83.365602,134.566277,99.202908,172.836365,120.273593,190.590393,"[275.2428571, 222.625, 140.1428571, 117.225, 5...","[166.96677, 126.611145, 146.5388, 180.17343, 1...","[175.2571429, 245.8625, 346.4285714, 396.225, ...","[487.779017875, 398.197767875, 356.037053575, ...","DatetimeIndex(['2020-06-15 16:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.1565), tenso...",5,32
WRmjgnKYAwPKWDb,84.350465,132.605883,96.529402,169.821542,114.49309,186.553725,"[118.3571429, 51.475, 22.32857143, 2.15, 0.0, ...","[193.06401, 176.00424, 153.02739, 119.36092, 9...","[390.9428571, 275.2875, 224.4714286, 142.8375,...","[288.59375, 311.821875, 308.675446425, 258.384...","DatetimeIndex(['2020-06-15 16:45:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1001), tensor...",5,32
ZnxXDlPa8U1GXgE,87.71094,136.258933,97.260739,172.297983,119.017088,190.140424,"[224.08750000000003, 140.8714286, 119.2375, 51...","[177.79099, 190.06688, 223.73691, 207.38564, 1...","[242.6875, 350.1428571, 397.125, 273.8, 224.08...","[379.017410725, 346.88526785, 290.86026785, 31...","DatetimeIndex(['2020-06-15 16:15:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.0734), tensor...",5,32
ZoEaEvLYb1n2sOq,82.393735,132.980636,96.403406,171.707318,118.721197,190.081258,"[140.4714286, 121.8875, 51.72857143, 23.0875, ...","[165.12596, 182.27599, 166.06534, 144.90044, 1...","[341.8375, 404.2, 270.4125, 223.5714286, 140.4...","[350.8330357, 291.479464275, 314.3482142750000...","DatetimeIndex(['2020-06-15 16:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1763), tensor...",5,32
adLQvlD726eNBSB,83.203142,129.919006,99.28863,175.608486,120.259925,193.613587,"[122.13750000000002, 53.871428570000006, 24.42...","[180.17146, 166.97188, 143.51976, 105.38429, 8...","[397.2625, 284.1142857, 227.55, 144.2, 122.137...","[294.478571425, 320.235714275, 314.76383927499...","DatetimeIndex(['2020-06-15 16:45:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0906), tenso...",5,32


Unnamed: 0,LSTM_MAE_avg,LSTM_RMSE_avg,Pers_MAE_avg,Pers_RMSE_avg,MA_MAE_avg,MA_RMSE_avg
0,85.871019,134.713192,96.554936,171.372109,117.930408,189.020025


Unnamed: 0_level_0,LSTM_MAE,LSTM_RMSE,Pers_MAE,Pers_RMSE,MA_MAE,MA_RMSE,y_true,y_pred_lstm,persist,movavg,test_time,model_state_dict,input_dim,hidden_dim
inverter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
1BY6WEcLGh8j5v7,738.29951,1167.665747,926.993588,1623.764202,1132.338981,1789.851329,"[3293.857143, 3670.6249999999995, 2566.428571,...","[2987.905, 3219.3655, 1811.2904, 1197.3541, 12...","[4351.714286, 6117.5, 1637.428571, 2247.125, 3...","[5730.4598215000005, 5590.3973215000005, 4483....","DatetimeIndex(['2020-06-15 15:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.0641), tensor...",5,32
1IF53ai7Xc0U56Y,936.827899,1370.205997,997.469269,1773.893681,1220.472467,1964.25377,"[1240.75, 545.1428571, 242.87500000000003, 24....","[1869.9155, 1695.6353, 1479.0555, 1152.4946, 9...","[3999.25, 2865.571429, 2299.5, 1459.285714, 12...","[2969.62053575, 3227.97767875, 3172.50892875, ...","DatetimeIndex(['2020-06-15 16:45:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0751), tenso...",5,32
3PZuoBAID5Wc2HD,910.571382,1388.776098,994.445598,1753.341722,1186.86738,1924.15658,"[1226.571429, 533.125, 232.42857140000004, 22....","[2010.6692, 1876.5378, 1682.5192, 1355.9034, 1...","[4016.142857, 2816.125, 2304.285714, 1474.25, ...","[2966.8080357500003, 3200.3705357500003, 3164....","DatetimeIndex(['2020-06-15 16:45:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1147), tensor...",5,32
7JYdWkrLSPkdwr4,798.50136,1354.681321,982.065777,1726.255801,1199.74744,1900.552103,"[2761.714286, 2229.375, 1414.714286, 1187.25, ...","[1585.5332, 1076.7416, 1183.4441, 1478.8785, 1...","[1761.857143, 2470.625, 3440.0, 3909.5, 2761.7...","[4865.4151785, 3999.3839285, 3523.526785750000...","DatetimeIndex(['2020-06-15 16:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.0025), tensor...",5,32
McdE0feGgRqW7Ca,925.873609,1439.746177,1004.310268,1800.219247,1239.378514,1988.950288,"[1445.0, 1288.714286, 566.125, 246.7142857, 26...","[1867.9869, 2110.1003, 1945.0974, 1701.4677, 1...","[3574.571429, 4224.25, 2812.857143, 2298.125, ...","[3484.3035714999996, 3034.1473215, 3287.075893...","DatetimeIndex(['2020-06-15 16:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1871), tensor...",5,32
VHMLBKoKgIrUVDU,912.50359,1459.272029,1015.591334,1770.16782,1231.175766,1951.187761,"[2803.428571, 2270.375, 1438.285714, 1204.0, 5...","[2702.4963, 1983.1427, 1904.2655, 2091.3477, 1...","[1790.714286, 2506.5, 3523.857143, 4033.625, 2...","[4979.37946425, 4064.16071425, 3631.76785725, ...","DatetimeIndex(['2020-06-15 16:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.0414), tensor...",5,32
WRmjgnKYAwPKWDb,829.790033,1329.081374,988.256312,1738.859633,1171.738808,1909.568256,"[1215.428571, 532.0, 231.1428571, 22.375, 0.0,...","[1979.6503, 1759.1196, 1488.2908, 1083.1799, 8...","[3978.714286, 2803.625, 2289.285714, 1464.625,...","[2938.94196425, 3174.50446425, 3142.71875, 263...","DatetimeIndex(['2020-06-15 16:45:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.0741), tensor...",5,32
ZnxXDlPa8U1GXgE,880.725441,1421.028498,995.683262,1764.896396,1218.282299,1946.850597,"[2285.375, 1445.714286, 1224.5, 533.1428571, 2...","[1787.721, 1778.8735, 2055.036, 1991.6538, 178...","[2473.875, 3561.857143, 4042.25, 2788.571429, ...","[3867.36607125, 3537.7589285000004, 2962.60267...","DatetimeIndex(['2020-06-15 16:15:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0409), tenso...",5,32
ZoEaEvLYb1n2sOq,910.528472,1402.447866,986.947669,1758.96382,1215.307044,1946.203573,"[1441.0, 1251.75, 534.4285714, 239.25, 13.1428...","[1717.7953, 2036.2284, 1859.2618, 1619.6008, 1...","[3477.375, 4115.285714, 2754.5, 2280.285714, 1...","[3578.2991072500004, 2969.15625, 3200.71875, 3...","DatetimeIndex(['2020-06-15 16:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.1403), tenso...",5,32
adLQvlD726eNBSB,899.319959,1357.064722,1016.715947,1798.925054,1231.123339,1982.464182,"[1254.375, 556.5714286, 252.875, 26.71428571, ...","[1977.8412, 1825.26, 1598.4213, 1272.8907, 107...","[4043.0, 2893.571429, 2320.75, 1479.142857, 12...","[2999.3705357500003, 3260.263393, 3204.794643,...","DatetimeIndex(['2020-06-15 16:45:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.2349), tensor...",5,32


Unnamed: 0,LSTM_MAE_avg,LSTM_RMSE_avg,Pers_MAE_avg,Pers_RMSE_avg,MA_MAE_avg,MA_RMSE_avg
0,864.970093,1372.639468,988.410022,1755.310052,1207.116253,1935.174995


Unnamed: 0_level_0,LSTM_MAE,LSTM_RMSE,Pers_MAE,Pers_RMSE,MA_MAE,MA_RMSE,y_true,y_pred_lstm,persist,movavg,test_time,model_state_dict,input_dim,hidden_dim
inverter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
4UPUqMRk7TRMgml,94.643091,155.688072,92.521622,162.886294,98.686355,159.91432,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[-16.923157, -19.290262, -20.85445, -21.729763...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 23:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.1219), tenso...",5,32
81aHJ1q11NBPMrL,91.751213,140.674768,102.321885,175.151336,110.717829,180.813577,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[-23.74715, -18.613382, -11.683197, -3.5212896...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0279), tenso...",5,32
9kRcWv60rDACzjR,90.622565,142.952484,99.153648,171.737498,107.93968,175.638596,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[1.1479847, 7.243692, 12.593529, 17.735357, 22...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0438), tenso...",5,32
Et9kgGMDl729KT4,86.897708,125.144607,79.685379,139.275408,88.257116,142.982355,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[22.693035, 22.66443, 22.475153, 22.109083, 22...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 23:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.0146), tensor...",5,32
IQ2d7wF4YD8zU1Q,111.817817,151.149106,112.729539,185.984408,115.838003,179.696196,"[228.7266666666667, 417.4428571428572, 498.2, ...","[102.268196, 129.75102, 174.16724, 217.46893, ...","[85.38571428571427, 120.46666666666668, 201.31...","[32.0397619047619, 62.15642857142858, 109.6897...","DatetimeIndex(['2020-06-15 06:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1623), tensor...",5,32
LYwnQax7tkwH5Cb,94.521176,147.747469,97.385961,164.706334,102.999524,169.578633,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[3.1927364, 2.4681954, 2.177074, 3.2724006, 5....","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1935), tensor...",5,32
LlT2YUhhzqhg5Sw,89.530922,142.959005,99.526374,171.126711,106.717133,174.645307,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[-24.637772, -22.347317, -19.760721, -15.95717...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0296), tenso...",5,32
Mx2yZCDsyf6DPfv,103.903015,150.716444,91.758785,161.892123,103.183557,167.488102,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[56.304577, 56.456455, 56.85681, 57.107357, 58...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 23:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.0658), tensor...",5,32
NgDl19wMapZy17u,94.539142,139.170645,113.509516,187.876854,116.048788,180.241518,"[226.0533333333333, 414.8571428571428, 487.046...","[110.94437, 149.36398, 210.3939, 268.49426, 30...","[84.71428571428572, 120.15333333333336, 202.73...","[31.488571428571433, 61.526904761904774, 109.6...","DatetimeIndex(['2020-06-15 06:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0355), tenso...",5,32
PeE6FRyGXUgsRhN,90.818691,147.068826,95.345732,165.497412,107.093559,175.646368,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[17.683874, 18.168266, 17.554731, 16.777475, 1...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.2066), tensor...",5,32


Unnamed: 0,LSTM_MAE_avg,LSTM_RMSE_avg,Pers_MAE_avg,Pers_RMSE_avg,MA_MAE_avg,MA_RMSE_avg
0,94.085245,145.764343,98.225402,168.644316,106.306272,171.37499


Unnamed: 0_level_0,LSTM_MAE,LSTM_RMSE,Pers_MAE,Pers_RMSE,MA_MAE,MA_RMSE,y_true,y_pred_lstm,persist,movavg,test_time,model_state_dict,input_dim,hidden_dim
inverter,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
4UPUqMRk7TRMgml,92.676774,155.943479,94.401805,166.199468,100.667181,163.07944,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[-18.829563, -22.078579, -24.141285, -25.60454...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 23:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0851), tenso...",5,32
81aHJ1q11NBPMrL,83.241402,141.445639,104.740585,179.37113,113.299947,185.057231,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[7.5046687, 9.220459, 11.534, 14.815381, 18.48...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0500), tenso...",5,32
9kRcWv60rDACzjR,89.86955,145.691287,101.473749,175.804003,110.439084,179.689674,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[3.3313193, 3.9855785, 3.3482904, 2.7695951, 2...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.0096), tensor...",5,32
Et9kgGMDl729KT4,92.232694,129.170022,81.268255,142.024392,90.005379,145.757185,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[28.354578, 27.213764, 26.872934, 26.804396, 2...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 23:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0613), tenso...",5,32
IQ2d7wF4YD8zU1Q,129.833025,174.277844,115.313786,190.344069,118.43855,183.691416,"[233.2333333333333, 425.5642857142857, 508.093...","[90.07911, 114.72976, 154.29149, 189.65671, 21...","[87.9642857142857, 123.85333333333335, 205.446...","[33.05273809523809, 64.01607142857142, 112.486...","DatetimeIndex(['2020-06-15 06:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1666), tensor...",5,32
LYwnQax7tkwH5Cb,93.158563,146.511315,99.642406,168.574145,105.356128,173.448917,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[17.471025, 16.814661, 16.187294, 16.51537, 17...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0337), tenso...",5,32
LlT2YUhhzqhg5Sw,83.371117,134.314465,101.853891,175.17277,109.187628,178.661494,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[7.5443425, 5.479482, 3.0839152, 1.8397022, 1....","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(0.1348), tensor...",5,32
Mx2yZCDsyf6DPfv,101.931972,152.350063,93.77062,165.398757,105.419234,171.032136,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[52.34214, 52.555435, 53.135227, 53.685364, 54...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 23:00:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.2536), tenso...",5,32
NgDl19wMapZy17u,126.901566,172.258621,116.120594,192.312707,118.663521,184.268994,"[230.56666666666672, 422.9428571428572, 496.60...","[88.47502, 113.8168, 154.18503, 192.5061, 217....","[87.25714285714285, 123.52, 206.86666666666665...","[32.48761904761905, 63.367619047619044, 112.38...","DatetimeIndex(['2020-06-15 06:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.1241), tenso...",5,32
PeE6FRyGXUgsRhN,98.428431,154.041194,97.487616,169.25609,109.482911,179.600352,"[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[-28.50724, -28.636576, -27.959658, -25.51688,...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, ...","DatetimeIndex(['2020-06-13 20:30:00', '2020-06...","{'lstm.weight_ih_l0': [[tensor(-0.0998), tenso...",5,32


Unnamed: 0,LSTM_MAE_avg,LSTM_RMSE_avg,Pers_MAE_avg,Pers_RMSE_avg,MA_MAE_avg,MA_RMSE_avg
0,99.078671,150.564232,100.432793,172.474896,108.662329,175.14957


In [None]:
# ============================================================
# SAVE FORECASTING RESULTS + TRAINED LSTM MODELS
# ============================================================

####################################################################################################################################################

# Change here 

save_folder = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model"

####################################################################################################################################################

os.makedirs(save_folder, exist_ok=True)

# ---------- 4.1 Save all results in one PKL ----------

save_path = os.path.join(save_folder, "LSTM Forecasting Model.pkl")

LSTM_results = {
    "Plant1_AC": {
        "per_inverter": results_p1_ac_df,
        "average": avg_p1_ac_df,
        "raw_outputs": raw_p1_ac,
    },
    "Plant1_DC": {
        "per_inverter": results_p1_dc_df,
        "average": avg_p1_dc_df,
        "raw_outputs": raw_p1_dc,
    },
    "Plant2_AC": {
        "per_inverter": results_p2_ac_df,
        "average": avg_p2_ac_df,
        "raw_outputs": raw_p2_ac,
    },
    "Plant2_DC": {
        "per_inverter": results_p2_dc_df,
        "average": avg_p2_dc_df,
        "raw_outputs": raw_p2_dc,
    },
}

with open(save_path, "wb") as f:
    pickle.dump(LSTM_results, f)

print(f"\n✅ LSTM forecasting results saved to:\n{save_path}")

# ---------- 4.2 Save each trained LSTM model as a .pt file ----------

model_dir = os.path.join(save_folder, "models")
os.makedirs(model_dir, exist_ok=True)

def save_model_group(results_list, plant_name, target_type):
    for entry in results_list:
        inv = entry["inverter"]
        state = entry["model_state_dict"]
        input_dim = entry["input_dim"]
        hidden_dim = entry["hidden_dim"]

        filename = f"{plant_name}_{target_type}_{inv}_LSTM.pt"
        file_path = os.path.join(model_dir, filename)

        torch.save(
            {
                "state_dict": state,
                "input_dim": input_dim,
                "hidden_dim": hidden_dim,
            },
            file_path,
        )

        print(f"📁 Saved model → {file_path}")

# Save Plant 1 models
save_model_group(raw_p1_ac, "Plant1", "AC")
save_model_group(raw_p1_dc, "Plant1", "DC")

# Save Plant 2 models
save_model_group(raw_p2_ac, "Plant2", "AC")
save_model_group(raw_p2_dc, "Plant2", "DC")

print("\n✅ All trained LSTM models saved successfully!")



✅ LSTM forecasting results saved to:
C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\LSTM Forecasting Model.pkl
📁 Saved model → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\models\Plant1_AC_1BY6WEcLGh8j5v7_LSTM.pt
📁 Saved model → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\models\Plant1_AC_1IF53ai7Xc0U56Y_LSTM.pt
📁 Saved model → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\models\Plant1_AC_3PZuoBAID5Wc2HD_LSTM.pt
📁 Saved model → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\models\Plant1_AC_7JYdWkrLSPkdwr4_LSTM.pt
📁 Saved model → C:\Users\B.KING\OneDrive - Imperial College London\

DESIGN JUSTIFICATION

Lookback Window

The lookback window determines how many hours of historical data the model uses when predicting future AC or DC power. We test multiple window lengths (6, 12, 18, and 24 hours) to evaluate their influence on model performance.
PV power exhibits strong daily periodicity and gradual weather-driven trends. A longer window provides richer temporal context—including sunrise/sunset patterns, temperature drift, and irradiance dynamics—allowing the LSTM to learn more stable relationships. While short windows capture short-term variations, 24 hour windows consistently lead to lower forecasting errors.

Reason:
Longer lookback windows capture diurnal patterns and slow weather transitions, improving model accuracy for both 1-hour and 3-hour forecasts.

Hidden Size

We use a single-layer LSTM with 32 hidden units. This choice balances model complexity, training efficiency, and overfitting avoidance.
PV power time series are relatively smooth, dominated by irradiance patterns rather than high-frequency noise. Thus a very large LSTM (e.g., 128+ units) is unnecessary and may overfit sensor noise or sudden irradiance spikes. A 32-unit layer provides enough representational capacity to capture nonlinear interactions among irradiance, temperature, and power output, while remaining computationally efficient.

Reason:
A hidden size of 32 provides sufficient learning capacity without overfitting and allows efficient training across many inverters.

Loss Function & Optimizer

We employ Mean Squared Error (MSE) as the loss function.MSE penalizes large errors more strongly, which is especially important in PV forecasting because large deviations (e.g., cloud-induced power drops) can severely affect plant scheduling, reserve management, and inverter operation. Compared with MAE, MSE is more sensitive to sudden irradiance changes, making it suitable for operational reliability.

Reason:
MSE emphasizes large deviations and aligns with engineering requirements for stable power forecasting.

The model is trained using the Adam optimizer, which adapts learning rates dynamically.

Reason:
Adam is the most efficient and robust choice for sequence models like LSTM.

In [None]:
def visualize_lstm_results(
    df_dict,
    results_df,
    all_results,
    target_type="DC",
    power_threshold=50.0,
    figsize=(16, 6),
    plant_name="Plant"
):
    """
    Visualize AND SAVE LSTM forecast results for each inverter.

    Parameters
    ----------
    df_dict : dict
        Dictionary of inverter dataframes (df_ps1 or df_ps2).
    results_df : DataFrame
        Per-inverter summary metrics returned by run_lstm_for_plant().
    all_results : list of dict
        Raw results list returned by run_lstm_for_plant().
    target_type : str
        "AC" or "DC".
    power_threshold : float
        Filter out low-power nighttime noise.
    figsize : tuple
        Figure size.
    plant_name : str
        "Plant1" or "Plant2".
    """

    # ------------------------
    # Create output folder
    # ------------------------

    ####################################################################################################################################################
    
    # Change here 

    save_dir = r"C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots"

    #####################################################################################################################################################
    
    os.makedirs(save_dir, exist_ok=True)

    print(f"\n==============================")
    print(f" Saving LSTM Plots for {plant_name} ({target_type})")
    print(f"==============================\n")

    for entry in all_results:
        inv = entry["inverter"]

        y_true = entry["y_true"]
        y_pred = entry["y_pred_lstm"]
        persist = entry["persist"]
        movavg = entry["movavg"]
        time_index = entry["test_time"]

        # Filter small values
        valid = y_true > power_threshold
        if valid.sum() < 5:
            print(f"Skipping {inv}: insufficient high-power samples.")
            continue

        t = time_index[valid]
        yt = y_true[valid]
        yp = y_pred[valid]
        yp_persist = persist[valid]
        yp_ma = movavg[valid]

        # ------------------------------
        # Plot 1: True vs Predicted
        # ------------------------------
        plt.figure(figsize=figsize)
        plt.plot(t, yt, label="True Power", linewidth=2)
        plt.plot(t, yp, label="LSTM Forecast", alpha=0.85)
        plt.plot(t, yp_persist, label="Persistence", linestyle="--", alpha=0.7)
        plt.plot(t, yp_ma, label="Moving Avg", linestyle="--", alpha=0.7)

        plt.title(f"{plant_name} – {inv} – {target_type} Forecast")
        plt.xlabel("Time")
        plt.ylabel(f"{target_type} Power")
        plt.legend()
        plt.grid(True)
        plt.tight_layout()

        # Save forecast plot
        fpath1 = os.path.join(save_dir, f"{plant_name}_{target_type}_{inv}_forecast.png")
        plt.savefig(fpath1, dpi=200)
        plt.close()

        # ------------------------------
        # Plot 2: Error plot
        # ------------------------------
        errors = yt - yp

        plt.figure(figsize=figsize)
        plt.plot(t, errors, label="LSTM Error", linewidth=1.5)
        plt.axhline(0, color="black", linewidth=1)
        plt.title(f"{plant_name} – {inv} – {target_type} Error")
        plt.xlabel("Time")
        plt.ylabel("Error (True - Predicted)")
        plt.grid(True)
        plt.tight_layout()

        # Save error plot
        fpath2 = os.path.join(save_dir, f"{plant_name}_{target_type}_{inv}_error.png")
        plt.savefig(fpath2, dpi=200)
        plt.close()

        print(f"📁 Saved plots for {inv}:")
        print(f"   → {fpath1}")
        print(f"   → {fpath2}")

        # ------------------------------
        # Print Metrics
        # ------------------------------
        print(f"\n----- Metrics for {inv} ({target_type}) -----")
        print(results_df.loc[inv][[
            "LSTM_MAE", "LSTM_RMSE",
            "Pers_MAE", "Pers_RMSE",
            "MA_MAE", "MA_RMSE"
        ]])
        print("\n----------------------------------------------\n")


### Plant 1 - DC

In [33]:
visualize_lstm_results(
    df_ps1,
    results_p1_dc_df,
    raw_p1_dc,
    "DC",
    50,
    plant_name="Plant1"
)


 Saving LSTM Plots for Plant1 (DC)

📁 Saved plots for 1BY6WEcLGh8j5v7:
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant1_DC_1BY6WEcLGh8j5v7_forecast.png
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant1_DC_1BY6WEcLGh8j5v7_error.png

----- Metrics for 1BY6WEcLGh8j5v7 (DC) -----
LSTM_MAE       738.29951
LSTM_RMSE    1167.665747
Pers_MAE      926.993588
Pers_RMSE    1623.764202
MA_MAE       1132.338981
MA_RMSE      1789.851329
Name: 1BY6WEcLGh8j5v7, dtype: object

----------------------------------------------

📁 Saved plots for 1IF53ai7Xc0U56Y:
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant1_DC_1IF53ai7Xc0U56Y_forecast.png
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learni

### Plant 1 - AC

In [34]:
visualize_lstm_results(
    df_ps1,
    results_p1_ac_df,
    raw_p1_ac,
    "AC",
    50,
    plant_name="Plant1"
)



 Saving LSTM Plots for Plant1 (AC)

📁 Saved plots for 1BY6WEcLGh8j5v7:
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant1_AC_1BY6WEcLGh8j5v7_forecast.png
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant1_AC_1BY6WEcLGh8j5v7_error.png

----- Metrics for 1BY6WEcLGh8j5v7 (AC) -----
LSTM_MAE      94.929348
LSTM_RMSE    131.975305
Pers_MAE      90.620942
Pers_RMSE    158.669991
MA_MAE       110.688427
MA_RMSE      174.945425
Name: 1BY6WEcLGh8j5v7, dtype: object

----------------------------------------------

📁 Saved plots for 1IF53ai7Xc0U56Y:
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant1_AC_1IF53ai7Xc0U56Y_forecast.png
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\Cou

### Plant 2 – DC

In [35]:
visualize_lstm_results(
    df_ps2,
    results_p2_ac_df,
    raw_p2_ac,
    "AC",
    50,
    plant_name="Plant2"
)



 Saving LSTM Plots for Plant2 (AC)

📁 Saved plots for 4UPUqMRk7TRMgml:
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant2_AC_4UPUqMRk7TRMgml_forecast.png
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant2_AC_4UPUqMRk7TRMgml_error.png

----- Metrics for 4UPUqMRk7TRMgml (AC) -----
LSTM_MAE      94.643091
LSTM_RMSE    155.688072
Pers_MAE      92.521622
Pers_RMSE    162.886294
MA_MAE        98.686355
MA_RMSE       159.91432
Name: 4UPUqMRk7TRMgml, dtype: object

----------------------------------------------

📁 Saved plots for 81aHJ1q11NBPMrL:
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant2_AC_81aHJ1q11NBPMrL_forecast.png
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\Cou

### Plant 2 – AC

In [36]:
visualize_lstm_results(
    df_ps2,
    results_p2_ac_df,
    raw_p2_ac,
    "AC",
    50,
    plant_name="Plant2"
)



 Saving LSTM Plots for Plant2 (AC)

📁 Saved plots for 4UPUqMRk7TRMgml:
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant2_AC_4UPUqMRk7TRMgml_forecast.png
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant2_AC_4UPUqMRk7TRMgml_error.png

----- Metrics for 4UPUqMRk7TRMgml (AC) -----
LSTM_MAE      94.643091
LSTM_RMSE    155.688072
Pers_MAE      92.521622
Pers_RMSE    162.886294
MA_MAE        98.686355
MA_RMSE       159.91432
Name: 4UPUqMRk7TRMgml, dtype: object

----------------------------------------------

📁 Saved plots for 81aHJ1q11NBPMrL:
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\CouseWork\Group-11\data\06 LSTM Forecasting Model\Plots\Plant2_AC_81aHJ1q11NBPMrL_forecast.png
   → C:\Users\B.KING\OneDrive - Imperial College London\CIVE70111 Machine Learning\Cou