# Interview Technical Exercise

Prepared for Met Office interview.

This notebook is the presentation, the reusable implementation lives in `min_temp/`.


**Present as slides:**
- Export/serve reveal.js: `jupyter nbconvert --to slides presentation.ipynb --post serve`

## Problem & Goal

- A forecaster wants to test a historic method for estimating overnight minimum temperature (Tmin) against "latest modelling".
- Deliverable: a validated, reproducible way to compute predictions and (when targets are available) quantify accuracy.

Reference-book method:

$$T_{\min} = 0.316\,T_{12} + 0.548\,T_{d12} - 1.24 + K(\text{wind},\text{cloud cover})$$


## Definition of Done (My Interpretation)
Create a tool kit that can:
- Parse $T_{12}$, $T_{d12}$, `Wind Speed` and `Cloud Cover` as inputs and compute $T_{min}$ as an output in a reliable and deterministic way with clear failure modes.
- Handle batch computation of multiple rows of input data easily, with helpful parsers for convenience.
- When $T_{min}$ targets exist, evaluate with MAE/RMSE/bias plus simple plots to enable easy comparison.
- Tool kit is adaptable to work (and be valid) in different contexts (Alternate Methods/Parameters, Different Locations)
- Automated checks: unit tests for parsing, K lookup, predictions.

### Why?
- Unsure how "hands on" the forecaster wants to be. Do they want to use a tool kit to code and perform their own investigation? Or would they rather interact with a simple UI?
- My solution is for the former, but could be the basis for a UI to be built on top of.
- Have chosen to build this with Python so that I make use of data/science libraries like Pandas and NumPy.


## Prediction Contract

- Defined a minimal predictor contract (an abstract base class) that every predictor in the tool kit should follow.
- Any predictor satisfying this contract can plug into the rest of the toolkit.
- Allows the tool kit to be easily extendible to alternate historic methods for calculating overnight minimum temperature. e.g. McKenzie's Method $T_{min} = 0.5(T_{max} + T_{d}) + \hat{K}(\text{surface wind},\text{cloud cover})$
- I don't yet know which alternative methods will be in scope or exactly what their inputs will look like, so the contract is intentionally small and generic.
- `feature_keys` enforces explicit input dependencies alongside units to help prevent accidental misuse.


```python
class MinTempPredictor(ABC):
    """Abstract base for Tmin predictors that operate on a single row/Series."""

    name: str
    feature_keys: Tuple[FeatureKey, ...] # dataclass (name, unit)

    @abstractmethod
    def predict(self, row: Mapping[str, float] | pd.Series) -> float:
        """Compute Tmin for a single row (mapping or Series)."""
```


## Craddock & Pritchard Predictor

Implements the reference-book formula:

$$T_{min} = 0.316 T_{12} + 0.548 T_{d12} - 1.24 + K(wind, cloud)$$

- Important expected quantities/units are clearly stated and correct.
- $T_{min}$, $T_{12}$, $T_{d12}$ have example data given in `deg C`, it's not explicitly stated that the formula expects this, but i've assumed it has.
- What is wind speed referring to? What height? Surface wind? Geostrophic wind? Consultation of the full "Forecaster's Reference Book" revealed it is geostrophic wind.
- This ought to be verified with a SME, until then assumptions are clearly documented.


The linear part is straightforward. The K lookup needs a bit more thought.

### Given K table (ambiguous ranges)

| Wind (knots) \ Cloud cover (oktas) | 0-2 | 2-4 | 4-6 | 6-8 |
| --- | --- | --- | --- | --- |
| 0-12 | -2.2 | -1.7 | -0.6 | 0.0 |
| 13-25 | -1.1 | 0.0 | 0.6 | 1.1 |
| 26-38 | -0.6 | 0.0 | 0.6 | 1.1 |
| 39-51 | 1.1 | 1.7 | 2.8 | Unknown |

Ambiguities: 
- **Boundary ownership:** which bin owns a boundary value? E.g. Which column do you search when Cloud Cover = 2?
- In practice, this is more of a problem for cloud cover vs wind speed.
- **Missing cells:** how should a not-provided cell be handled (e.g. 39–51 kn and 6–8 oktas)?
- **Out-of-range inputs:** how to handle beyond table range?

Consultation of the full "Forecaster's Reference Book" provided no clarity on what to do at cell boundaries


### Deterministic interpretation (right-closed bins)

- Treat each edge as an upper bound (right-closed). Bin = first edge `>=` value.
- Wind edges: 12.5, 25.5, 38.5, 51. Cloud edges: 2, 4, 6, 8.

| Wind edge (knots) \ Cloud cover edge (oktas) | 2 | 4 | 6 | 8 |
| --- | --- | --- | --- | --- |
| 12.5 | -2.2 | -1.7 | -0.6 | 0.0 |
| 25.5 | -1.1 | 0.0 | 0.6 | 1.1 |
| 38.5 | -0.6 | 0.0 | 0.6 | 1.1 |
| 51 | 1.1 | 1.7 | 2.8 | Unknown |

- This removes boundary ambiguity. Left-closed would also work, but consistency matters more than the choice.
- Forecaster/SME should be consulted on right-closed vs left-closed convention.
- "Undefined" and "out of range" case is made to explicitly fail rather than silently guess.
- Implemented via `KTable` - a pandas DataFrame with validation and useful helpers.



### Creating a KTable

- Parsing a CSV file

k_table.csv

|  | 2 | 4 | 6 | 8 |
| --- | --- | --- | --- | --- |
| 12.5 | -2.2 | -1.7 | -0.6 | 0.0 |
| 25.5 | -1.1 | 0.0 | 0.6 | 1.1 |
| 38.5 | -0.6 | 0.0 | 0.6 | 1.1 |
| 51 | 1.1 | 1.7 | 2.8 |  |

In [1]:
from min_temp import parse_k_table

k_table = parse_k_table("k_table.csv")


In [2]:
k_table.values

Unnamed: 0,2.0,4.0,6.0,8.0
12.5,-2.2,-1.7,-0.6,0.0
25.5,-1.1,0.0,0.6,1.1
38.5,-0.6,0.0,0.6,1.1
51.0,1.1,1.7,2.8,


- Specify Directly

In [3]:
import pandas as pd
from min_temp import KTable
df = pd.DataFrame(
    data={
        # Cloud Cover edges (Oktas) : K values
        2.0: [-2.2, -1.1, -0.6, 1.1],
        4.0: [-1.7, 0.0, 0.0, 1.7],
        6.0: [-0.6, 0.6, 0.6, 2.8],
        8.0: [0.0, 1.1, 1.1, float("nan")],
    },
    index=[12.5, 25.5, 38.5, 51], # Geostrophic wind speed edges (kn)
)
k_table = KTable(values=df)
k_table.values

Unnamed: 0,2.0,4.0,6.0,8.0
12.5,-2.2,-1.7,-0.6,0.0
25.5,-1.1,0.0,0.6,1.1
38.5,-0.6,0.0,0.6,1.1
51.0,1.1,1.7,2.8,


### KTable Helpers

`KTable` turns the reference-book K lookup into a deterministic, validated table lookup for continuous inputs.

- `wind_edges`: ordered wind-speed bin upper-bounds (row index).
- `cloud_edges`: ordered cloud-cover bin upper-bounds (column labels).
- `lookup(wind_kn, cloud_oktas)`: finds the K value from (`wind`, `cloud cover`) input.
- `bin_edge(x, edges, name)`: finds a bin from a K value.



In [12]:
# Boundary behaviour: right-closed bins (first edge >= value)
tests = [
    {"wind_kn": 12.5, "cloud_oktas": 2.0},   # exact edge
    {"wind_kn": 12.5, "cloud_oktas": 2.1},   # nudged into next wind bin
]
for t in tests:
    wind, cloud = k_table.bin_edge(t["wind_kn"], axis=0), k_table.bin_edge(t["cloud_oktas"], axis=1)
    k_val = k_table.lookup(t["wind_kn"], t["cloud_oktas"])
    print(f"{t} -> ({wind}, {cloud}) -> {k_val}")
    


{'wind_kn': 12.5, 'cloud_oktas': 2.0} -> (12.5, 2.0) -> -2.2
{'wind_kn': 12.5, 'cloud_oktas': 2.1} -> (12.5, 4.0) -> -1.7


```python
@dataclass(frozen=True)
class CraddockAndPritchardModel(MinTempPredictor):
    name: str = "craddock_and_pritchard"
    feature_keys: Tuple[FeatureKey, ...] = (
        FeatureKey("midday_temp_c", "deg C"),
        FeatureKey("midday_dew_point_c", "deg C"),
        FeatureKey("geostrophic_wind_kn", "kn"),
        FeatureKey("cloud_oktas", "oktas"),
    )
    a: float = 0.316
    b: float = 0.548
    c: float = -1.24
    k_table: KTable = KTable(pd.DataFrame(...)) # Omit detail to fit on slide
    
    def __post_init__(self) -> None:
        # Avoid sharing mutable DataFrames across instances.
        object.__setattr__(self, "k_table", KTable(self.k_table.values.copy(deep=True)))

    def predict(self, features: Mapping[str, float]) -> float:
        t12 = features["midday_temp_c"]
        td12 = features["midday_dew_point_c"]
        wind_kn = features["geostrophic_wind_kn"]
        cloud_oktas = features["cloud_oktas"]
        return self.a * t12 + self.b * td12 + self.c + self.k_table.lookup(wind_kn, cloud_oktas)
```
 

## Craddock & Pritchard In Action
### Reproduce exercise example (Unit Test)

\[
\begin{aligned}
T_{\min} & = 0.316\,T_{12} + 0.548\,T_{d12} - 1.24 + K(\text{wind},\text{cloud cover})\\
         & = 0.316\times 18 + 0.548\times 10 - 1.24 + K(30, 3)\\
         & = 9.928 + 0\\
         & \approx 10
\end{aligned}
\]


In [5]:
from min_temp import CraddockAndPritchardModel, parse_initial_data

cp_model = CraddockAndPritchardModel()
example = {"midday_temp_c": 18.0, "midday_dew_point_c": 10.0,
            "cloud_oktas": 3.0, "wind_kn": 30.0}
example_pred = cp_model.predict(example)
example_pred

9.927999999999999

## Batch Computation

- I do not yet know the precise form input data will take, or how messy/incomplete it will be.
- Implemented a simple parser that can parse data in the form of a CSV table and return a toolkit-compliant DataFrame.
- In production a new parser would be required that is compliant the actual form input data takes.

`data/initial_data.csv`

In [6]:
# Apply to provided sample data
observations = parse_initial_data("data/initial_data.csv")
observations

Unnamed: 0,Date,Location,midday_temp_c,midday_dew_point_c,wind_kn,cloud_oktas
0,1,A,22.4,10.9,14.56,3.9
1,1,B,18.6,12.65,3.4,6.0
2,2,B,26.0,8.5,0.0,0.0
3,2,C,13.2,9.4,12.5,4.1


- The problem naturally lends itself to wanting to compute multiple rows of data.
- Tool kit easily integrates with Pandas' DataFrames to apply a prediction method on all data rows.

In [7]:
observations["pred_cp"] = observations.apply(cp_model.predict, axis=1)
observations[["Date", "Location", "pred_cp"]]

Unnamed: 0,Date,Location,pred_cp
0,1,A,11.8116
1,1,B,10.9698
2,2,B,9.434
3,2,C,7.4824


## Custom parameters

- Coefficients `a`, `b`, `c` and the K table can be overridden.
- The forecaster's reference book states that the current default parameters are only valid for areas of eastern England not close to the sea, and that appropriate regression equations need to be developed if the method is to be applied to other areas of the country.
- Could consult SME to see if this work has already been done and parameters already known for other areas, and integrate this into the tool kit.

In [8]:
custom_model = CraddockAndPritchardModel(a=0.25, b=0.6, c=-0.8, 
                                         k_table=k_table)
observations["pred_cp_custom"] = observations.apply(
    custom_model.predict, axis=1)
observations[["Date", "Location", "pred_cp", "pred_cp_custom"]]

Unnamed: 0,Date,Location,pred_cp,pred_cp_custom
0,1,A,11.8116,11.34
1,1,B,10.9698,10.84
2,2,B,9.434,8.6
3,2,C,7.4824,7.54


## Metrics
- `evaluate_predictor` / `evaluate_predictors` accept any predictor plus data with the target column.
- Return a `PredictionResult`: true/pred values, errors, MAE, RMSE, bias.
- Consult Data Scientist/SME on how to handle data anomalies, what accuracy metrics would be useful, and what constitutes 'good'.
```python
@dataclass(frozen=True)
class PredictionResult:
    method: str
    y_true: Tuple[float, ...]
    y_pred: Tuple[float, ...]
    errors: Tuple[float, ...]
    mae: float
    rmse: float
    bias: float
```

- No "latest modelling" output is provided for the exercise, so synthetic targets are created for demonstrational purposes.
- Even if targets were provided, fair assessment of the method would still require confirmation that the input data is from areas of the country where the method's parameters are *supposed* to be accurate.

In [9]:
from min_temp import evaluate_predictors
observations["observed_min_temp_c"] = [12.3, 11.0, 8.9, 8.5]
metrics = evaluate_predictors(observations, {"Default": cp_model, 
                                             "Custom": custom_model}, 
                                             actual_key="observed_min_temp_c")
metrics

(PredictionResult(method='Default', y_true=(12.3, 11.0, 8.9, 8.5), y_pred=(11.8116, 10.969800000000001, 9.433999999999997, 7.4824), errors=(-0.4884000000000004, -0.030199999999998894, 0.5339999999999971, -1.0175999999999998), mae=0.5175499999999991, rmse=0.6245222894340916, bias=-0.2505500000000005),
 PredictionResult(method='Custom', y_true=(12.3, 11.0, 8.9, 8.5), y_pred=(11.34, 10.84, 8.599999999999998, 7.539999999999999), errors=(-0.9600000000000009, -0.16000000000000014, -0.3000000000000025, -0.9600000000000009), mae=0.5950000000000011, rmse=0.6997856814768371, bias=-0.5950000000000011))

## Visualisation

- Plot helpers consume `PredictionResult` objects.

In [13]:
from min_temp import plot_error_bars, plot_pred_vs_actual
observations["observed_min_temp_c"] = [12.3, 11.0, 8.9, 8.5]  # supply synthetic targets
metrics = evaluate_predictors(observations, {"Default": cp_model, "Custom": custom_model}, actual_key="observed_min_temp_c")
# Save plots
plot_error_bars(metrics, save_path="plots/cp_errors.png")
plot_pred_vs_actual(metrics, save_path="plots/cp_pred_vs_actual.png")

# Close figures to avoid inline noise during a live walk-through
import matplotlib.pyplot as plt
plt.close("all")

- `plot_error_bars`: compare MAE/RMSE across methods.

![](plots/cp_errors.png)

- `plot_pred_vs_actual`: scatter predicted vs actual with ideal line.

![](plots/cp_pred_vs_actual.png)

## Optional extension (if time): Optimiser for Craddock & Pritchard

The reference book notes region-specific regressions may be needed. In a larger/production setting with local training data, we could refit `a`, `b`, `c` and K-table cell offsets.

I started a prototype in `min_temp/optimizers.py` to demonstrate feasibility (least squares on a linear-in-parameters design).

In [None]:
import pandas as pd
from min_temp import (CraddockAndPritchardOptimizer, 
                      CraddockAndPritchardModel, KTable, parse_initial_data)

# Synthetic setup with known parameters (2x2 K-table)
true_a, true_b, true_c = 0.3, 0.5, -1.
k_values = pd.DataFrame(data={2.0: [1.0, 3.0], 5.0: [2.0, 4.0]},
                        index=[10.0, 20.0],)

# Model used to generate observations
true_model = CraddockAndPritchardModel(a=true_a, b=true_b, c=true_c, 
                                       k_table=KTable(k_values))

# Build data that hits each bin multiple times
df_fit = parse_initial_data("data/optimization_data.csv")
df_fit["observed_min_temp_c"] = df_fit.apply(true_model.predict, axis=1)

# Fit model to synthetic data
fitted = CraddockAndPritchardOptimizer(a=0, b=0, c=0, 
                        k_table=true_model.k_table).fit(df_fit)
fitted.model.a, fitted.model.b, fitted.model.c, fitted.metrics.mae

(0.29999999999999793,
 0.49999999999999745,
 -1.0000000000000053,
 3.652633751016765e-14)

## Reflections

### What did I do and why did I do it that way?
- Created a `Predictor`-`Metric`-`Plotting` tool kit for testing the accuracy of historic methods for calculating overnight minimum temperature.
  - `Metrics` and `Plotting` logic are not wedded to a particular type of predictor.
  - Project is well placed to expand beyond just testing the accuracy of a single predictor.
  - Enforce clear inputs by declaring required features with units to avoid mistakes in using the tool.
  - Unit tests for `Metrics` and `Plotting` for quality assurance.
- Implemented the `Craddock & Pritchard` predictor
  - Explicitly defined k-table lookup logic to remove ambiguity and ensure results are deterministic and repeatable.
  - Enabled batch predictions via compatibility with pandas DataFrames
  - Allowed full customisation of parameters so that the method can be applied to other locations in the country beyond the default.
  - Unit tests for `Craddock & Pritchard` predictions for quality assurance.

### How has what I don't know about the problem impacted the solution?
- Unsure how 'hands on' the forecaster wants to be in using the tool.
  - Created a tool that can be used directly as code
  - Can provide the basis for a UI to be built on top of if that is what is required
- Do not know the the form of the real operation data
  - Parsing and prediction is kept separate, and built example parser that can be designed according to requirements
- The exercise data does not include "latest modelling" outputs, or whether or not the locations are where the C&P predictor is *supposed* to be valid
  - Unable to determine the accuracy of the method
  - Evaluation code is demonstrated with synthetic target
- The reference K-table is ambiguous at boundaries and includes an "Unknown" cell
  - Made a deliberate, documented convention and treated undefined cases as explicit failures/flags rather than guessing

### Who should be involved in developing the final solution?
- **Forecasters/SMEs:** 
  - Confirm assumptions (Unit expectations, geostrophic wind, boundary conventions, handling of "Unknown" K-cells and treatment of edge cases)
  - Define operational acceptability. (How the tool is used, what outputs are required, accuracy thresholds, usage scope)
- **Data Providers**
  - Provide testing datasets, including location metadata
  - Provide the "latest modelling" comparison
- **Data Scientists/Stats/Modelling:** 
  - Help provide valid `C&P` parameters for different areas of the country.
- **Software/IT:** 
  - Collaboration structure
  - Deployment route 

### How might you do it differently in a team or production context?
- **Harden validation + error handling**
  - Explicit policy for how `metrics` and `plots` handle failed predictions or anomalous data.
  - Structured error reporting/logging and per-row status so batch runs can complete while flagging exclusions.
- **More comprehensive unit testing**
  - Unit tests for and typical data-quality issues
  - “Known answer” tests from SME-approved examples 
- **Externalise parameters and outputs (K tables, coefficients)**
  - Store K tables/coefficients as versioned artefacts so that results can refer back to them.
  - Support multiple parameter sets by region/site.
  - Optimisation/calibration as an optional, auditable workflow
- **CI/CD**
  - Permission based
  - Version control
  - Automated testing


### What things might you need to consider if this was part of a larger project?
- **Collaboration**
  - Clear roles (SME, data owners, software/ops)
  - Review/sign-off points
  - Who maintains parameters/tables over time.
- **Documentation**
  - Onboarding docs (inputs/units, edge-case behaviour, examples)
- **Reproducibility**
  - Versioning
  - Parameter artefacts
  - Recording run configuration
  - Keep outputs traceable to inputs
- **Performance & Scale:**
  - Efficient batch execution
  - Database storage of parameters
- **Permissions/Access**
  - How do people access this tool?
  - Who can use the tool?
  - Who can make changes to the tool?
