# ITS8080 Project — Task-by-task Notes with Focused Java Snippets (BeakerX Java)

This notebook is for **Jupyter Notebook for Java (BeakerX Java kernel)**.

It documents *what was done* **task-by-task**, and includes **small, important Java implementation snippets** (not full classes) as evidence.


In [None]:
%%bash
# Repo health check (macOS/Linux). On Windows use gradlew.bat
# ./gradlew --version
# ./gradlew clean build


## Task 1 — Dataset understanding (report)

- Target: **Demand**
- Inputs: PV, price, weather, and time features
- Hourly time series, multi-year data
- Goal: forecasting + EMS optimisation


## Task 2 — Data science lifecycle plan (report)

Pipeline: data → cleaning → feature engineering → modelling → validation → forecasting → optimisation.

Most effort: PV missing data, feature engineering, walk-forward evaluation.


## Task 3 — Visualisation & summaries

### Time series plot with dual axis (Price on right axis)


In [None]:
// XChart: Demand & PV on left axis, Price on right axis
var chart = new XYChartBuilder()
    .width(1000).height(600)
    .title("Prices, Demand and PV")
    .xAxisTitle("Time").yAxisTitle("Power (kW)")
    .build();

chart.addSeries("Demand (kW)", times, demand);
chart.addSeries("PV (kW)", times, pv);

chart.addSeries("Price (EUR/kWh)", times, price)
     .setYAxisGroup(1);

chart.getStyler().setYAxisGroupPosition(1, org.knowm.xchart.style.Styler.YAxisPosition.Right);
ChartExport.saveSvg(chart, OutputConfig.defaults(java.nio.file.Path.of("figures")), "task3_timeseries");


### Histogram for price distribution


In [None]:
var prices = rows.stream()
    .map(DataRow::price)
    .filter(p -> !Double.isNaN(p))
    .toList();

var bins = HistogramUtils.computeBins(prices, HistogramMethods.STURGES);
var hist = new org.knowm.xchart.Histogram(prices, bins);

var h = new org.knowm.xchart.CategoryChartBuilder()
    .width(900).height(600)
    .title("Price distribution")
    .xAxisTitle("EUR/kWh").yAxisTitle("Count")
    .build();

h.addSeries("Price", hist.getxAxisData(), hist.getyAxisData());
ChartExport.savePng(h, OutputConfig.defaults(java.nio.file.Path.of("figures")), "task3_price_hist");


## Task 4 — PV missing data & imputation

### Missingness counts + daytime hint using shortwave radiation


In [None]:
long missingPv1 = rows.stream()
    .filter(r -> r.pvMod1() == null || Double.isNaN(r.pvMod1()))
    .count();

long missingAtDay = rows.stream()
    .filter(r -> r.pvMod1() == null || Double.isNaN(r.pvMod1()))
    .filter(r -> r.shortwaveRadiation() != null && r.shortwaveRadiation() > 5.0)
    .count();

System.out.println("pv_mod1 missing=" + missingPv1);
System.out.println("pv_mod1 missing with radiation>5=" + missingAtDay);


### Null-safe multivariate features (fix for NPE)


In [None]:
Double radObj = row.shortwaveRadiation();
Double cloudObj = row.cloudCover();

if (radObj == null || cloudObj == null) continue;

double rad = radObj;
double cloud = cloudObj;


## Task 5 — Feature engineering

### Jarque–Bera normality test → transformation decision


In [None]:
double[] demand = FeatureEngineering.extract(rows, DataRow::demand);
var jb = Normality.jarqueBera(demand, 0.05);

boolean useLog = !jb.isNormal();
double[] used = useLog ? FeatureEngineering.log1pEps(demand, 1e-6) : demand;


### IQR outlier mask → filter rows consistently


In [None]:
boolean[] mask = OutlierCleaner.iqrMask(used);

var filtered = new java.util.ArrayList<DataRow>();
for (int i = 0; i < rows.size(); i++) {
  if (mask[i]) filtered.add(rows.get(i));
}


### Feature ranking by |Pearson r


In [None]:
var dataset = FeatureEngineering.buildDataset(filtered, Task5Config.defaults(), false);
var ranking = FeatureRanking.rankByAbsPearson(dataset);
FeatureRanking.printTSV(ranking);


## Task 6 — Classical decomposition (additive)

Key idea:
- trend via moving average
- seasonal pattern via average detrended values per hour-of-day
- residual = y - trend - seasonality


In [None]:
double Tt = movingAverage(y, t, window);
double dt = y[t] - Tt;

int hour = timestamps[t].getHour();
double St = seasonalMean[hour];

double Rt = y[t] - Tt - St;


## Task 7 — Statistical modelling

### Differencing for stationarity


In [None]:
double[] diff = new double[y.length - 1];
for (int i = 1; i < y.length; i++) diff[i - 1] = y[i] - y[i - 1];


### Walk-forward validation (last week, daily folds)


In [None]:
for (int d = 0; d < 7; d++) {
  int start = lastWeekStart + d * 24;

  var model = ArModel.fit(diff, 24, start);
  double[] pred = model.forecast24(diff, start);

  double nrmse = Metrics.nrmse(actual, pred, start, start + 24);
  perDay.add(nrmse);
}


## Task 8 — XGBoost

### Lag features (null-safe Double→float)


In [None]:
for (int lag : lags) {
  Double v = history.get(n - lag);
  x[c++] = (v == null || Double.isNaN(v)) ? Float.NaN : v.floatValue();
}


## Task 9 — Rolling-origin forecast (7 days)

### Timestamp parsing fix for 'yyyy-MM-dd HH:mm:ss+00:00'


In [None]:
var fmt = java.time.format.DateTimeFormatter.ofPattern("yyyy-MM-dd HH:mm:ssXXXXX");
var ts = java.time.OffsetDateTime.parse(text, fmt);


## Task 10 — Exogenous inputs

### Join lag features + time/weather features


In [None]:
x[c++] = (float) hour;
x[c++] = weekend ? 1.0f : 0.0f;
x[c++] = temp == null ? Float.NaN : temp.floatValue();
x[c++] = cloud == null ? Float.NaN : cloud.floatValue();
x[c++] = rad == null ? Float.NaN : rad.floatValue();


## Task 11 — Battery optimisation (MILP)

### Run optimisation twice: PV_low vs PV_high


In [None]:
var resLow  = Task11BatteryOptimizer.optimise(opt24, demand24, battery, 1.0, 5.0, OptimisationRow::pvLow);
var resHigh = Task11BatteryOptimizer.optimise(opt24, demand24, battery, 1.0, 5.0, OptimisationRow::pvHigh);


### MILP exclusivity (binary per hour) — idea


In [None]:
// gridImport_t ≤ M * z_t
// gridExport_t ≤ M * (1 - z_t)


### Cost baselines used for plots


In [None]:
double costNoPvNoBatt = price * demand;

double gridImportNoBatt = Math.max(0.0, demand - pvForecast);
double costPvOnly = price * gridImportNoBatt;


## Appendix — Quantiles
`q` is a quantile level: 0.50=median, 0.95=95th percentile threshold.


In [None]:
int idx = (int) Math.floor(q * (vals.length - 1));
double threshold = vals[idx];
