## Data Engineering
## Data Distribution Changes and Monitoring

Last updated: July 3, 2022

---

### Sources

- Designing Machine Learning Systems, Chip Huyen

---

### Concepts

- model degradation
- monitoring vs observability
- software failures vs ML failures
- data distribution shifts
- edge cases
- degenerate feedback loops
- detecting performance issues
- performance monitoring plan

---

### 1. Model Degradation
Model performance inevitably degrades over time in production

Several reasons for this, some **software** related and some **ML** related

**Software failures** include:  
- dependency issue: the software changes, vanishes
- deployment issue: wrong version deployed, not deployed to correct machine(s)
- hardware issue

**ML failures** include:  
- training data distribution differs from production (inference) data distribution
- edge cases
- degenerate feedback loop

#### Training data distribution differs from production (inference) data distribution

ML works well when patterns in data at production time match patterns in data at training time.  
This is generalization.

Several reasons why this might fail to be the case:

- **non-stationarity**: patterns change over time for various reasons:
    - major disruption like pandemic
    - seasonality
    - change in market conditions
    - change in strategy
    
Change is common

- **bad data** including:
  - incorrect inputs
  - unexpected data format
  - issue with data collection / pipeline  
  
  
  Can often be hard to detect as ML issues can faily silently.

#### Edge Cases

This is situation where model performs poorly.

Example: Model trained on financial data when interest rates were always positive.  
In production, it is fed negative interest rates. This might produce poor results.

**Helps to include edge cases in training data to make more robust.**

#### Degenerate Feedback Loop

---

### 2. Monitoring and Observability

*Monitoring* refers to the act of tracking, measuring and logging different metrics to help determine when something goes wrong.

*Observability* refers to setting up the system so that users have **visibility into the system** to determine when something goes wrong. An example would be logging all events in the system as it runs.

#### Detecting Distribution Shifts

Can **monitor predictions over time.** Example metrics:
- any predicted probabilities < 0 or > 1?
- have all predictions over some period of time identical?
- Run test cases with known answer. Does the prediction vary over time?

Can **monitor specific model-level metrics over time**, which can be used in alert. Example metrics:
- accuracy
- F1 score
- AUC

Can **monitor specific feature-level metrics over time**, such as:
- statistics of each predictor (quartiles, median, ...)

Visualizations can be produced over time at feature level, model level.  
These can be helpful for human review, but aren't as useful in automated alert.

#### Performance Monitoring Plan

A **performance monitoring plan** is recommended for each model.  
This should be crafted by stakeholders to include:
- metrics to monitor
- triggers for each metric (e.g., if AUC falls by 10% between review periods, then ALERT)
- monitoring frequency
- actions to take if ALERT (who does what by when)

---

### 3. Monitoring Tools

monitoring predictions

