# NumPy Basics: Stats, Normalization, and Smoothing

**Goal:** use NumPy arrays to implement the math-y parts of wrangling.

**You’ll do:**
- Compute descriptive stats and handle NaNs
- Normalize/standardize arrays
- Apply simple smoothing for noisy signals

## 1. Setup
- Import `numpy as np`.
- Assume we start from either a Pandas column (`df['col'].to_numpy()`) or a raw array `x`.

## 2. Descriptive Stats with NaNs
- `np.nanmean`, `np.nanmedian`, `np.nanstd`
- Count NaNs: `np.isnan(x).sum()`
- Replace NaNs: `np.nan_to_num(x, nan=...)`

> Exercise: compute z-scores by hand: `z = (x - mean) / std` while ignoring NaNs.

## 3. Normalization & Standardization
- Min-max to [0,1]: `x_minmax = (x - x.min()) / (x.max() - x.min())`
- Standard score: `x_std = (x - mu) / sigma`
- Robust scale (median/MAD): `x_robust = (x - median) / MAD`

> Exercise: compare standard vs robust scaling on a vector with a few extreme values.

## 4. Clipping and Winsorizing (array-only)
- Clip: `np.clip(x, a_min, a_max)` to bound absurd values before modeling
- Winsorize idea: replace extreme tails with boundary values (manual with percentiles)

> Exercise: compute p1 and p99 using `np.percentile` and winsorize manually.

## 5. Smoothing / Noise Reduction
- Simple moving average: use `np.convolve(x, np.ones(k)/k, mode='valid')`
- Centered vs causal windows: pad appropriately if you need same-length output
- Exponential moving average (concept): `y[t] = alpha*x[t] + (1-alpha)*y[t-1]`

> Exercise: implement a function that returns both SMA and EMA for a series and compare.

## 6. Matrix Ops You’ll Actually Use
- Broadcasting rules to combine vectors and columns
- Fast distance computations: `(A[:,None] - B[None,:])**2` then `sum(axis=-1)`
- Batch transforms: apply scaling to 2D arrays column-wise with broadcasting

**Takeaways**
- NumPy = low-level, predictable performance.
- Handle NaNs explicitly or your stats lie.
- Prefer robust scaling when outliers are expected.