# Risk Framework — Decision Rules & Evaluation Strategy

This notebook defines the **risk framing**, **business objective**, and **evaluation strategy** for a preventive wine-quality screening model.

**Important scope rules**
- No model training in this notebook.
- No new metrics computed in this notebook.
- This notebook sets the decision logic that will be implemented in `04_risk_modeling.ipynb` and used operationally in `05_risk_calculator.ipynb`.


## 1) Technical risk definition (target variable)

We reframe the original ordinal `quality` score into a **binary risk event**.

- **High Risk (risk = 1):** `quality ≤ 5`
- **Low Risk (risk = 0):** `quality > 5`

This turns the problem into a **binary classification task**: predict whether a wine is likely to be technically low-quality **before market release**, using only physicochemical features.


## 2) Business context and cost asymmetry

This model is designed as an **early warning system** for conservative decision-making.

In this context, the costs of errors are asymmetric:

- **False Negative (FN)**: a risky wine is predicted as safe (risk=0)  
  - **Highest cost**: increases the chance of releasing low-quality wine.
- **False Positive (FP)**: a safe wine is predicted as risky (risk=1)  
  - **Lower cost**: triggers additional checks, rework, or delay.

**Business priority:** minimize **False Negatives**, even if that produces more False Positives.


## 3) Why accuracy is not the main metric

Accuracy treats all errors equally and can be misleading in prevention problems.

A model can achieve decent accuracy while still missing many risky wines—exactly the outcome we want to avoid.

Because the business cost of **missing risk** is higher than the cost of **false alarms**, we do not optimize for accuracy as the primary objective.


## 4) Primary objective metric: Recall for High Risk (risk = 1)

**Recall (risk=1)** measures how many truly risky wines we successfully flag.

- **Recall (risk=1) = TP / (TP + FN)**

This metric directly aligns with the business goal of preventing missed risk cases.

**Interpretation:** higher recall means fewer risky wines slipping through as “safe”.


## 5) Supporting metrics (why we still track them)

Even with recall as the priority, we monitor additional metrics to ensure the model remains usable:

- **Precision (risk=1)**: how many flagged wines are truly risky  
  - Helps control the operational burden of too many false alarms.
- **ROC-AUC**: measures ranking quality across thresholds  
  - Useful for comparing models independent of a chosen threshold.
- **PR-AUC**: focuses on performance on the positive class (risk=1)  
  - Especially relevant when the positive class is the business focus.

These metrics complement recall but do not replace it as the primary objective.


## 6) Thresholding: 0.5 is a convention, not a business rule

Most classifiers output a **risk probability** (or score). A **threshold** converts that probability into a decision:

- If `P(risk=1) ≥ threshold` → predict **High Risk**
- Else → predict **Low Risk**

The default threshold **0.5** assumes equal error costs and equal priorities, which is **not true** here.

Because **False Negatives are more costly**, we will likely choose a **lower threshold** than 0.5 to increase High-Risk recall (accepting more false positives).


## 7) Decision policy for selecting an operating threshold (to implement in Notebook 04)

Threshold selection is a **business policy** informed by validation results.

A practical approach aligned with this project:

1. Choose a **minimum target recall** for High Risk (e.g., ≥ 0.85 or ≥ 0.90 on validation).
2. Among thresholds that meet the recall target, select the one that:
   - maximizes precision (reduces false alarms), and/or
   - improves PR-AUC and overall stability.

This creates a transparent, defensible trade-off between prevention (recall) and operational impact (precision).


## 8) Model comparison and selection rule (to implement in Notebook 04)

In `04_risk_modeling.ipynb`, we will compare candidate models under the same train/validation protocol.

**Selection rule**
- A model must first satisfy the **High-Risk recall target** on validation.
- If multiple models satisfy the target, prefer the model with:
  - higher precision (risk=1) at the chosen operating threshold,
  - higher PR-AUC,
  - stable performance (small variance across folds, if cross-validation is used),
  - and strong interpretability where feasible.

This keeps the project aligned with the prevention objective and supports explainability.


## 9) Why probability calibration may be needed (to implement in Notebooks 04–05)

The final deliverable includes a **risk calculator** used for decision support.

If we present probabilities (e.g., “risk = 0.72”), we want them to be **calibrated**, meaning:

- predicted probabilities match observed frequencies (on average)

Calibration matters because:
- a well-ranked model (good AUC) can still output poorly calibrated probabilities,
- business rules may depend on probability cutoffs,
- stakeholders interpret probabilities as real-world risk levels.

Calibration will be evaluated later and applied if needed.


## 10) Guardrails, assumptions, and limitations

- This project identifies **associations**, not causal effects (correlation ≠ causation).
- The model is trained on historical data; changes in production processes or measurement protocols can shift performance.
- Risk framing is specific to the chosen threshold (`quality ≤ 5`); alternative definitions would require re-validation.
- The model supports **screening and prevention**, not final quality certification.

All subsequent modeling decisions must remain consistent with this framework.
