# Environment and Distribution Shift

## 1. Distribution Shift
A **distribution shift** occurs when the data distribution changes between training and testing (or deployment) phases. Formally, it means:

- Training data: $P_{train}(x, y)$
- Test data: $P_{test}(x, y)$

If $P_{train}(x, y) \neq P_{test}(x, y)$, then the model is facing a distribution shift.

Types of distribution shifts include:
- **Covariate shift**: $P_{train}(x) \neq P_{test}(x)$ but $P(y|x)$ stays the same.
- **Label shift**: $P_{train}(y) \neq P_{test}(y)$ but $ P(x|y)$ stays the same.
- **Concept shift**: $ P(y|x)$ itself changes — this is the most challenging.

### 2. Environment Shift
This is a broader concept that often refers to changes in the **underlying causal mechanisms** or data-generating processes, often across **domains**, **tasks**, or **real-world conditions**. It overlaps with ideas from:
- **Out-of-distribution (OOD) generalization**
- **Domain adaptation**
- **Causal inference**

In reinforcement learning or decision-making problems, an "environment" refers to the system with which an agent interacts. A shift in the environment (e.g., policy deployment in the real world vs. simulation) can degrade performance if the agent overfits to the training environment.

## 3. Importance of Understanding Env./Dist. Shifts

- Self-driving cars trained in sunny California may struggle in snowy Norway.
- A disease classifier trained on hospital A’s data may misdiagnose patients in hospital B.

## 4. How to Handle Shifts

- **Domain adaptation**: Adapts models trained in one domain to perform well in another.
- **Test-time adaptation**: Adjusts the model on the fly during deployment.
- **Data augmentation and diversity**: Enriches training data to cover multiple modes or environments.
- **Causal models**: Learn stable causal relationships that are less sensitive to distribution shifts.
