# AIO Q1 — Noise Augmentation and Regularization

We study linear regression under input noise augmentation.

Model (scalar):

$$
y = f_\theta(x) = \theta x, \qquad
L(\theta) = \mathbb{E}[(y - \theta x)^2]
$$


## Q1 — Forward: Exponentials

Starting from

$$
L(\theta) = \mathbb{E}[(y - \theta x)^2],
$$

**expand** into terms involving $\mathbb{E}[y^2]$, $\mathbb{E}[xy]$, and $\mathbb{E}[x^2]$.


## Q2 — Define Augmented Loss (Scalar)

We introduce *noise augmentation* on the input:

$$
x' = x + \epsilon, \quad \epsilon \sim \mathcal{N}(0, \sigma^2), \quad \epsilon \perp (x, y).
$$

We train on noisy inputs using the augmented loss:

$$
L_{\text{aug}}(\theta) = \mathbb{E}_{x, y, \epsilon}\!\left[(y - \theta x')^2\right]
$$

**Expand** this loss into $\mathbb{E}[y^2]$, $\mathbb{E}[xy]$, $\mathbb{E}[x^2]$, and $\sigma^2$.


## Q3 — Simplify and Compare (Scalar)

Show that

$$
L_{\text{aug}}(\theta) = L(\theta) + \theta^2 \sigma^2
$$

**Question:**  
What does this additional $\theta^2 \sigma^2$ term represent *intuitively*?


## Q4 — The Minimizer (Scalar)

Find the minimizer $\theta^\star$ for:

1. $L(\theta)$  
2. $L_{\text{aug}}(\theta)$

**Compare:** How does noise augmentation affect the learned coefficient?


## Q5 — Regularization Connection (Scalar)

Ridge regression minimizes

$$
L_{\text{ridge}}(\theta) = \mathbb{E}[(y - \theta x)^2] + \lambda \theta^2
$$

**Show:** $L_{\text{aug}}(\theta)$ can be written in this form and **identify** the effective $\lambda$ in terms of $\sigma^2$.


## Q6 — Multivariate Augmentation

Now consider $d$-dimensional linear regression:

$$
y = w^\top x, \quad x \in \mathbb{R}^d,
$$

and augment inputs as $x' = x + \epsilon$ with $\epsilon \sim \mathcal{N}(0, \sigma^2 I_d)$, independent of $(x, y)$.

We train on $x'$ by minimizing

$$
L_{\text{aug}}(w) = \mathbb{E}[(y - w^\top x')^2]
$$

**Show that**

$$
L_{\text{aug}}(w) = L(w) + \sigma^2 \|w\|_2^2,
$$

where $L(w) = \mathbb{E}[(y - w^\top x)^2]$.


## Q7 — Empirical Verification of the Ridge Connection (clarified)

**Goal:**  
Verify numerically that *input noise* acts like an $\ell_2$ (ridge) penalty.

### Data-Generating Process (DGP)

We start from a clean linear relationship with output (label) noise:

$$
x_i \sim \mathcal{N}(0,1), \quad
\eta_i \sim \mathcal{N}(0, 0.5^2), \quad
y_i = 2x_i + \eta_i,
$$

for $i = 1, \dots, n$ with $n = 400$.

- The dataset $(x, y)$ is fixed across all experiments.  
- Random seed should be fixed for reproducibility.

### Input Noise Augmentation

For each $\sigma \in \{0, 0.5, 1, 1.5, 2\}$:

1. Draw independent input noise  
   $$
   \epsilon_i \sim \mathcal{N}(0, \sigma^2), \quad \epsilon_i \perp (x_i, \eta_i)
   $$
2. Form noisy inputs  
   $$
   x'_i = x_i + \epsilon_i
   $$
3. Fit a **no-intercept linear model** using the closed-form least-squares estimator:
   $$
   \hat{\theta}(\sigma) = \frac{\sum_i x'_i y_i}{\sum_i (x'_i)^2}
   $$

### Experiment Steps

1. Compute $\hat{\theta}(\sigma)$ for each noise level.  
2. Plot $\hat{\theta}$ versus $\sigma$.  
3. Overlay the theoretical curve $\theta_{\text{pop}}(\sigma) = 2/(1+\sigma^2)$.


## Q8 — Compare Augmentation to Explicit Ridge Regularization

**Goal:** Show that training with noise is equivalent to explicit $\ell_2$ regularization.

Let $(x_i, y_i)_{i=1}^n$ be any fixed dataset (you may use any dataset available to you).

**Steps**
1. For $\lambda \in \{0, 0.25, 0.5, 1, 2\}$, fit the ridge estimator (no intercept):
   $$
   \hat{\theta}_{\text{ridge}}(\lambda)
   = \frac{\sum_i x_i y_i}{\sum_i x_i^2 + \lambda}
   $$
2. For the same values, set $\sigma^2 = \lambda$ and perform *noise-augmented* training by replacing inputs with
   $x_i' = x_i + \epsilon_i$, where $\epsilon_i \sim \mathcal{N}(0, \sigma^2)$, independent of $(x_i, y_i)$, and then compute
   $$
   \hat{\theta}_{\text{aug}}(\sigma^2)
   = \frac{\sum_i x_i' y_i}{\sum_i (x_i')^2}
   $$
3. Plot both $\hat{\theta}_{\text{ridge}}$ and $\hat{\theta}_{\text{aug}}$ as functions of $\lambda$.

**Question.** Do the two curves coincide numerically? If not exactly, what do you think may explain small discrepancies?
