# **Chapter 15: Feature Scaling and Normalization**

## **15.1 Why Scale Features?**

Feature scaling transforms features to comparable ranges, essential for machine learning algorithms sensitive to magnitude differences. In NEPSE (Nepal Stock Exchange) prediction systems, scaling addresses fundamental challenges arising from heterogeneous data dimensions.

**The Magnitude Problem in NEPSE Data:**
Consider a typical NEPSE dataset row:
- **Close Price**: Rs. 450.50 (hundreds)
- **Volume**: 125,000 shares (hundreds of thousands)
- **Turnover**: NPR 56,250,000 (millions)
- **Diff %**: 2.35 (units)
- **RSI**: 68.5 (tens)

Without scaling, distance-based algorithms (KNN, SVM, Neural Networks) interpret turnover (millions) as vastly more important than RSI (tens), regardless of actual predictive power. Gradient-based optimizers (SGD, Adam) converge slowly when features have vastly different scales, requiring different learning rates per feature.

**Specific NEPSE Challenges:**
1. **Price Heterogeneity**: NEPSE stocks range from Rs. 10 (penny stocks) to Rs. 10,000+ (blue chips like NTC). Raw price levels obscure percentage-based patterns.
2. **Volume Variability**: Institutional stocks trade millions of shares daily; illiquid micro-caps trade hundreds. Raw volume creates outliers that dominate models.
3. **Indicator Mixing**: Technical indicators (0-100 range) mixed with fundamental ratios (0-50 range) and absolute prices (10-10,000 range) require harmonization.
4. **Temporal Drift**: As NEPSE indices rise over years, absolute price levels inflate, making cross-temporal model deployment impossible without scaling.

**When Scaling is Critical:**
- **Distance-based algorithms**: KNN, K-Means, SVM (RBF kernel), PCA
- **Gradient descent**: Neural Networks, Linear/Logistic Regression
- **Regularization**: L1/L2 penalties apply uniformly; unscaled features receive inappropriate penalization
- **Ensemble methods**: Though tree-based models (Random Forest, XGBoost) are scale-invariant, scaling improves numerical stability

**When Scaling is Optional:**
- Tree-based models: Decision Trees, Random Forest, Gradient Boosting (scale-invariant splits)
- Rule-based systems: If-then logic unaffected by magnitude

---

## **15.2 Standardization (Z-Score)**

Standardization transforms features to have zero mean and unit variance, preserving distribution shape while removing location and scale parameters.

**Mathematical Foundation:**
$$z = \frac{x - \mu}{\sigma}$$

Where $\mu$ is the population mean and $\sigma$ is the standard deviation. For NEPSE time-series, we use rolling statistics to prevent lookahead bias.

```python
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler

def calculate_rolling_zscore(df, column, window=252, min_periods=30):
    """
    Calculate rolling Z-Score for NEPSE time-series features.
    
    Uses expanding window for initial periods to maximize data usage,
    then rolling window to maintain stationarity.
    
    Parameters:
    -----------
    df : pd.DataFrame
        NEPSE data with time-series index
    column : str
        Column to standardize (e.g., 'Close', 'Volume', 'Turnover')
    window : int
        Rolling window size (default 252 ≈ 1 year of NEPSE trading days)
    min_periods : int
        Minimum observations required for calculation
    
    Returns:
    --------
    pd.Series
        Z-Score normalized values
    """
    # Calculate rolling statistics
    rolling_mean = df[column].rolling(window=window, min_periods=min_periods).mean()
    rolling_std = df[column].rolling(window=window, min_periods=min_periods).std()
    
    # Handle zero standard deviation (constant values)
    rolling_std = rolling_std.replace(0, np.nan)
    
    # Calculate Z-Score
    zscore = (df[column] - rolling_mean) / rolling_std
    
    # Forward fill for initial NaN periods (optional, or use expanding window)
    zscore = zscore.fillna(method='ffill')
    
    return zscore

def standardize_nepse_features(df, feature_groups=None):
    """
    Apply appropriate standardization to different NEPSE feature groups.
    
    Different feature types require different window sizes:
    - Price features: Long window (252 days) to capture regime changes
    - Volume features: Medium window (63 days ≈ 3 months) for liquidity cycles
    - Volatility features: Short window (21 days) for current risk regime
    """
    df_scaled = df.copy()
    
    if feature_groups is None:
        feature_groups = {
            'price': ['Close', 'Open', 'High', 'Low', 'VWAP'],
            'volume': ['Vol', 'Turnover'],
            'volatility': ['ATR', 'Range_Pct'],
            'technical': ['RSI', 'MACD', 'Stoch_K']  # Already bounded, but scale for consistency
        }
    
    window_map = {
        'price': 252,      # Annual cycle
        'volume': 63,      # Quarterly liquidity patterns
        'volatility': 21,  # Monthly risk regime
        'technical': 63   # Quarterly momentum cycles
    }
    
    for group, columns in feature_groups.items():
        window = window_map.get(group, 63)
        
        for col in columns:
            if col in df.columns:
                # Use expanding window for first 'window' days, then rolling
                df_scaled[f'{col}_Z'] = calculate_rolling_zscore(
                    df, col, window=window, min_periods=min_periods
                )
                
                # Alternative: Group by Symbol for cross-sectional standardization
                # Useful for comparing stocks within same timeframe
                df_scaled[f'{col}_Z_XS'] = df.groupby('Symbol')[col].transform(
                    lambda x: (x - x.rolling(window).mean()) / x.rolling(window).std()
                )
    
    return df_scaled

# Detailed Explanation:
#
# 1. Rolling vs Expanding Windows:
#    - Rolling(window=252): Uses only last 252 days, adapts to regime changes
#    - Expanding: Uses all history since start, stable but slow to adapt
#    - For NEPSE, rolling is preferred because market regimes shift (bull/bear cycles)
#
# 2. Why 252 days?
#    - NEPSE trades ~252 days per year (Sunday-Thursday, minus holidays)
#    - Annual window captures full business cycle, earnings seasons, and tax year effects
#    - Shorter windows (20, 60) create noise; longer windows (500) lag regime shifts
#
# 3. Cross-Sectional Standardization (Z_XS suffix):
#    - Groups by Symbol, calculates Z-score relative to stock's own history
#    - Essential for NEPSE because different stocks have different volatility profiles
#    - Example: NTC (low vol, Rs 800) and micro-cap (high vol, Rs 50) become comparable
#
# 4. Handling Zero Std:
#    - Some features (like Diff % on holidays) may have zero variance
#    - Replace with NaN to avoid division by zero, then forward fill or drop
```

**Code Explanation:**

The `calculate_rolling_zscore` function implements time-series aware standardization critical for NEPSE data. Unlike `StandardScaler` from sklearn (which uses full dataset statistics), this function uses **rolling windows** to prevent lookahead bias—essential for financial time-series where future information cannot influence past scaling.

**Rolling vs. Expanding:**
The function uses a rolling window of 252 days (approximately one NEPSE trading year) rather than expanding statistics. This ensures the scaler adapts to **regime changes**—for example, when NEPSE transitions from a low-volatility bull market to high-volatility bear market, the Z-scores recalibrate using only recent data, preventing the "ghost" of old regimes from distorting current signals.

**Cross-Sectional Standardization:**
The `standardize_nepse_features` function creates two versions of each feature:
1. **Time-series Z** (`Close_Z`): Relative to stock's own history (rolling mean/std)
2. **Cross-sectional Z** (`Close_Z_XS`): Relative to stock's specific behavior patterns using `groupby('Symbol')`

This is crucial for NEPSE because comparing a stable blue-chip (NTC, volatility 15% annually) against a speculative micro-cap (volatility 80%) requires each to be scaled relative to its own baseline. A raw price move of Rs. 10 is significant for a Rs. 50 stock (20%) but noise for a Rs. 1000 stock (1%). Cross-sectional Z-scores normalize these differences.

**NEPSE-Specific Considerations:**
- **252-day window**: Matches Nepal's fiscal year and captures full earnings cycles
- **Min_periods=30**: Ensures at least one month of data before calculating (prevents erratic early values)
- **Holiday handling**: NEPSE observes Nepali holidays; pandas rolling automatically handles irregular trading days

---

## **15.3 Min-Max Normalization**

Min-Max scaling transforms features to a fixed range (typically [0, 1] or [-1, 1]), preserving zero values and maintaining the original distribution shape (unlike Z-score which forces normal distribution).

**Mathematical Formula:**
$$x_{scaled} = \frac{x - x_{min}}{x_{max} - x_{min}}$$

For NEPSE bounded indicators (RSI, Stochastic, Williams %R), Min-Max is natural since these already have theoretical bounds (0-100).

```python
def calculate_rolling_minmax(df, column, window=252, feature_range=(0, 1)):
    """
    Calculate rolling Min-Max scaling for NEPSE features.
    
    Critical for bounded indicators and neural network inputs.
    Handles look-ahead bias by using only past data for min/max calculation.
    
    Parameters:
    -----------
    feature_range : tuple
        Desired output range, default (0, 1)
    """
    min_val, max_val = feature_range
    
    # Rolling min/max (lookback only)
    rolling_min = df[column].rolling(window=window, min_periods=30).min()
    rolling_max = df[column].rolling(window=window, min_periods=30).max()
    
    # Scale to range
    range_size = rolling_max - rolling_min
    range_size = range_size.replace(0, np.nan)  # Avoid division by zero
    
    scaled = (df[column] - rolling_min) / range_size
    scaled = scaled * (max_val - min_val) + min_val
    
    return scaled

def minmax_scale_nepse_indicators(df):
    """
    Apply Min-Max scaling to bounded NEPSE technical indicators.
    
    Appropriate for:
    - RSI (0-100) → Scale to (0, 1)
    - Stochastic Oscillator (0-100)
    - Position in Range (0-100)
    - Percentile Rank (0-100)
    """
    bounded_indicators = ['RSI', 'Stoch_K', 'Stoch_D', 'Position_52W_Range', 
                        'Percentile_20', 'Close_Location']
    
    df_scaled = df.copy()
    
    for indicator in bounded_indicators:
        if indicator in df.columns:
            # For bounded indicators, use fixed theoretical bounds
            if indicator in ['RSI', 'Stoch_K', 'Stoch_D']:
                theoretical_min, theoretical_max = 0, 100
            elif 'Percentile' in indicator or 'Position' in indicator:
                theoretical_min, theoretical_max = 0, 100
            else:
                # For others, use rolling min/max
                theoretical_min = df[indicator].rolling(252).min()
                theoretical_max = df[indicator].rolling(252).max()
            
            # Scale using theoretical bounds (prevents future data leakage)
            df_scaled[f'{indicator}_Norm'] = (df[indicator] - theoretical_min) / \
                                           (theoretical_max - theoretical_min)
    
    # Special handling for Price data (dynamic range, but bounded recently)
    # Use recent 252-day high/low for normalization (support/resistance levels)
    df_scaled['Close_Norm'] = calculate_rolling_minmax(df, 'Close', window=252, feature_range=(0, 1))
    
    return df_scaled

def robust_minmax_scaling(df, column, lookback=252, quantile_range=(0.05, 0.95)):
    """
    Robust Min-Max using percentiles instead of min/max to handle outliers.
    
    Critical for NEPSE volume data which has extreme spikes during news events.
    """
    lower_q = df[column].rolling(lookback).quantile(quantile_range[0])
    upper_q = df[column].rolling(lookback).quantile(quantile_range[1])
    
    clipped = df[column].clip(lower_q, upper_q)
    scaled = (clipped - lower_q) / (upper_q - lower_q)
    
    return scaled

# Explanation:
#
# 1. Theoretical vs Empirical Bounds:
#    - RSI theoretically bounded 0-100, so use fixed bounds (0, 100)
#    - Price has no theoretical bound, so use rolling 252-day high/low
#    - Using fixed bounds for bounded indicators prevents look-ahead bias
#
# 2. Why Min-Max for Neural Networks?
#    - Sigmoid/tanh activations saturate at extremes; inputs should be in [-1, 1] or [0, 1]
#    - Z-scores can be (-inf, +inf); outliers cause vanishing gradients
#    - Min-Max ensures all inputs fall within active region of activation functions
#
# 3. Robust Scaling:
#    - NEPSE volume can spike 10x normal during IPO announcements or earnings
#    - Standard Min-Max compresses normal data into tiny range when outliers exist
#    - Using 5th-95th percentiles clips outliers while preserving 90% of data distribution
```

**Code Explanation:**

The `calculate_rolling_minmax` function implements **lookback-only Min-Max scaling**, crucial for preventing data leakage in time-series. Unlike sklearn's `MinMaxScaler` which uses global min/max (including future data), this function uses rolling windows where each day's scaling depends only on the past 252 days.

**Theoretical vs. Empirical Bounds:**
For bounded indicators like RSI (0-100) or Stochastic Oscillator (0-100), the function uses **theoretical bounds** rather than empirical rolling min/max. This is critical because:
1. **Prevents compression**: If NEPSE has a strong bull run and RSI never drops below 40 in a year, rolling Min-Max would scale 40→0 and 80→1, losing the absolute overbought/oversold context. Fixed bounds preserve semantic meaning (70+ always = overbought).
2. **Avoids look-ahead**: Theoretical bounds don't require historical data to estimate.

**Robust Min-Max:**
The `robust_minmax_scaling` function addresses NEPSE's **fat-tailed distributions**—particularly volume data where a single corporate announcement can cause 20x normal trading volume. Using percentiles (5th and 95th) instead of absolute min/max prevents these outliers from compressing the scaling of normal trading days into a tiny range.

**NEPSE Application:**
For LSTM/GRU neural networks processing NEPSE sequences, Min-Max scaling to [-1, 1] is preferred over Z-score because:
- LSTM gates (sigmoid) expect inputs in [0, 1]
- Tanh hidden states expect [-1, 1]
- Z-scores outside [-3, 3] are common in financial data and cause saturation

---

## **15.4 Robust Scaling**

Robust scaling uses median and interquartile range (IQR) instead of mean and standard deviation, making it resistant to outliers—critical for NEPSE's occasional extreme moves.

**Mathematical Formula:**
$$x_{robust} = \frac{x - median}{IQR}$$

Where $IQR = Q_3 - Q_1$ (75th percentile - 25th percentile).

```python
def calculate_rolling_robust_scale(df, column, window=252):
    """
    Calculate robust scaling using median and IQR.
    
    Outliers (e.g., NEPSE circuit breaker days) have minimal impact.
    """
    # Rolling median (50th percentile)
    rolling_median = df[column].rolling(window=window).median()
    
    # Rolling IQR (75th - 25th percentile)
    q75 = df[column].rolling(window=window).quantile(0.75)
    q25 = df[column].rolling(window=window).quantile(0.25)
    iqr = q75 - q25
    
    # Avoid division by zero
    iqr = iqr.replace(0, np.nan)
    
    robust_scaled = (df[column] - rolling_median) / iqr
    
    return robust_scaled

def detect_and_scale_outliers(df, column, method='robust', threshold=3):
    """
    Detect outliers using robust scaling, then optionally winsorize.
    
    NEPSE specific: Circuit breaker limits (4% daily move) create natural outliers
    that should be preserved (information content) but scaled appropriately.
    """
    if method == 'robust':
        scaled = calculate_rolling_robust_scale(df, column)
        # Values > 3 or < -3 are outliers (beyond 3 IQRs)
        is_outlier = (abs(scaled) > threshold)
        
        # Winsorize (clip) extreme outliers to threshold
        winsorized = scaled.clip(-threshold, threshold)
        
        return pd.DataFrame({
            f'{column}_Robust': scaled,
            f'{column}_Winsorized': winsorized,
            'Is_Outlier': is_outlier
        })
    
    elif method == 'zscore':
        zscore = (df[column] - df[column].rolling(252).mean()) / df[column].rolling(252).std()
        is_outlier = (abs(zscore) > threshold)
        return pd.DataFrame({
            f'{column}_ZScore': zscore,
            'Is_Outlier': is_outlier
        })

# Comparison for NEPSE Circuit Breaker Scenario:
# If a stock hits +4% upper circuit (common in NEPSE):
# - Z-Score: Might be 5.0 (extreme outlier, dominates model)
# - Robust: Might be 2.5 (within bounds, preserves information)
#
# This is why Robust Scaling is preferred for NEPSE high-volatility periods
```

**Explanation:**

Robust scaling is essential for NEPSE due to **circuit breaker mechanisms** and low-float manipulation risks that create extreme outliers. When a NEPSE stock hits the 4% daily upper circuit (common for Class A stocks), Z-score scaling might mark this as a 5-sigma event (essentially infinite in model terms), causing the observation to dominate loss functions and gradient updates.

**Median vs. Mean:**
The median is unaffected by extreme values. If NEPSE has 249 normal days and 3 circuit-breaker days, the median remains representative of typical trading, while the mean shifts toward the outliers.

**IQR vs. Std:**
The IQR (Interquartile Range) measures spread using the middle 50% of data (Q3-Q1), ignoring the extreme tails. This prevents the "tail wagging the dog" where rare volatility spikes inflate the denominator and compress normal variations toward zero.

**Winsorization:**
The `detect_and_scale_outliers` function clips extreme values to ±3 IQRs. This preserves the information that "today was extreme" (value = 3) without allowing the exact magnitude (which might be 10 or 100) to distort the model. For NEPSE, this handles flash crashes and pump-and-dump schemes gracefully.

---

## **15.5 Power Transformations**

Power transformations stabilize variance and reduce skewness, making distributions more Gaussian—beneficial for linear models and neural networks assuming normally distributed inputs.

### **15.5.1 Log Transformation**

Log transformation compresses large values and expands small ones, useful for right-skewed financial data (prices, market caps, turnover).

```python
def apply_log_transform(df, columns, offset=1):
    """
    Apply natural log transformation to NEPSE financial data.
    
    Log transform is appropriate for:
    - Prices (exponential growth assumption)
    - Turnover (highly skewed, multiplicative effects)
    - Volume (exponential distribution)
    
    Offset handles zero values (e.g., zero volume days).
    """
    df_transformed = df.copy()
    
    for col in columns:
        # Add offset to handle zeros (log(0) = -inf)
        # For NEPSE, use offset=1 for volume (no trade = 1 share minimum)
        df_transformed[f'{col}_Log'] = np.log(df[col] + offset)
        
        # Alternative: Log1p (log(1+x)) for small values
        df_transformed[f'{col}_Log1p'] = np.log1p(df[col])
    
    return df_transformed

# NEPSE Example:
# Turnover distribution is highly right-skewed:
# - Most days: 1-10 million NPR
# - High activity days: 100+ million NPR
# - Log transform makes this Gaussian, improving linear model performance
```

### **15.5.2 Box-Cox Transformation**

Box-Cox finds the optimal lambda ($\lambda$) to transform data to normality.

$$y(\lambda) = \begin{cases} \frac{y^\lambda - 1}{\lambda} & \text{if } \lambda \neq 0 \\ \log(y) & \text{if } \lambda = 0 \end{cases}$$

```python
from scipy.stats import boxcox
from scipy.special import inv_boxcox

def apply_boxcox_rolling(df, column, window=252):
    """
    Apply Box-Cox transformation using rolling window.
    
    Box-Cox requires positive values, so ensure data > 0.
    """
    # Shift to positive if needed (NEPSE returns can be negative)
    min_val = df[column].min()
    shift = abs(min_val) + 1 if min_val <= 0 else 0
    shifted = df[column] + shift
    
    # Calculate optimal lambda on rolling basis (computationally expensive)
    # For production, use fixed lambda or monthly recalculation
    lambdas = shifted.rolling(window=window).apply(
        lambda x: boxcox(x)[1] if len(x) == window else np.nan,
        raw=True
    )
    
    # Apply transformation (simplified - actual implementation needs vectorization)
    transformed = (shifted ** lambdas - 1) / lambdas
    
    return transformed, lambdas, shift

# Note: Box-Cox is rarely used in production NEPSE pipelines due to:
# 1. Computational cost of rolling lambda estimation
# 2. Difficulty of inverse transform for predictions
# 3. Yeo-Johnson is more flexible (handles negative values)
```

### **15.5.3 Yeo-Johnson Transformation**

Yeo-Johnson extends Box-Cox to handle negative values (returns can be negative), making it suitable for NEPSE financial data.

```python
from sklearn.preprocessing import PowerTransformer

def apply_yeo_johnson(df, columns):
    """
    Apply Yeo-Johnson transformation to NEPSE features.
    
    Advantages over Box-Cox:
    - Handles negative values (returns, price changes)
    - More robust for financial time-series
    """
    transformer = PowerTransformer(method='yeo-johnson', standardize=True)
    
    # Fit on rolling window (prevent look-ahead)
    # For time-series, fit on first N days, transform rolling
    df_transformed = df.copy()
    
    for col in columns:
        values = df[col].values.reshape(-1, 1)
        
        # Rolling application (simplified - production would use expanding window fit)
        transformed = transformer.fit_transform(values)
        df_transformed[f'{col}_YJ'] = transformed.flatten()
    
    return df_transformed

# Detailed Explanation:
#
# Yeo-Johnson automatically selects the power parameter to minimize skewness.
# For NEPSE:
# - Price levels: Usually λ ≈ 0 (logarithmic)
# - Returns: Usually λ ≈ 1 (nearly linear, already normal-ish)
# - Volume: Usually λ ≈ 0.3 (between log and sqrt)
#
# The standardize=True parameter also applies Z-score after transformation,
# giving two-stage normalization (shape correction + scale standardization).
```

**Explanation:**

Power transformations address **non-normality** in NEPSE data. Financial data typically exhibits:
- **Prices**: Exponential growth (log-normal distribution)
- **Volume**: Power-law distribution (few days with massive volume)
- **Returns**: Fat tails (leptokurtic), though often approximately normal

**Log Transformation:**
For NEPSE, log prices are more stationary than raw prices. The log transform converts percentage changes to absolute differences ($\log(P_t) - \log(P_{t-1}) \approx \frac{P_t - P_{t-1}}{P_{t-1}}$), making ARIMA and linear models more effective. For volume/turnover, log compresses the extreme right tail (IPO days with 100x volume) into a manageable range.

**Yeo-Johnson:**
Preferred over Box-Cox for NEPSE because it handles negative values (returns can be -5%) without shifting. The transformation automatically selects the power parameter $\lambda$ that makes the data most Gaussian. For NEPSE turnover data, $\lambda$ typically converges to ~0.3, indicating a transformation between square root and logarithmic—effectively handling the heavy tails of trading activity without over-compressing normal days.

---

## **15.6 Quantile Transformation**

Quantile transformation maps data to a uniform or normal distribution based on empirical quantiles, robust to outliers and effective for highly skewed distributions.

```python
from sklearn.preprocessing import QuantileTransformer

def apply_quantile_transform(df, columns, n_quantiles=1000, output_distribution='normal'):
    """
    Apply quantile transformation to NEPSE features.
    
    Maps any distribution to Gaussian (if output_distribution='normal') 
    or Uniform (if 'uniform').
    
    Best for: Highly skewed features with outliers (Volume, Turnover, ATR)
    """
    df_transformed = df.copy()
    
    for col in columns:
        # Rolling quantile transform (complex - requires storing quantile function)
        # Simplified: Use expanding window for quantile estimation
        
        values = df[col].values.reshape(-1, 1)
        
        transformer = QuantileTransformer(
            n_quantiles=n_quantiles,
            output_distribution=output_distribution,
            random_state=42
        )
        
        # For time-series: Fit on first 252 days, apply to all
        # In practice, use rolling quantile bins
        transformed = transformer.fit_transform(values)
        df_transformed[f'{col}_QT'] = transformed.flatten()
    
    return df_transformed

def manual_quantile_binning(df, column, n_bins=10):
    """
    Manual quantile binning for interpretable features.
    
    Creates discrete bins based on deciles (10-quantiles).
    Useful for tree-based models and rule extraction.
    """
    # Rolling quantile bins (time-series aware)
    bins = [df[column].rolling(252).quantile(i/n_bins) for i in range(n_bins+1)]
    
    labels = [f'Q{i+1}' for i in range(n_bins)]
    df[f'{column}_Decile'] = pd.cut(df[column], bins=bins, labels=labels)
    
    return df

# Explanation:
#
# Quantile Transformation forces any distribution into Gaussian shape.
# For NEPSE Volume (exponentially distributed):
# - Raw: 95% of values < 100k, 5% > 1M (extreme skew)
# - After QT: Gaussian distribution, outliers preserved but normalized
#
# Use case: Neural networks that assume Gaussian inputs perform better
# with QT than with raw volume data.
#
# Warning: Destroys linear relationships (rank-preserving but not distance-preserving)
# Do not use when linear relationships between features matter.
```

**Explanation:**

Quantile transformation is **non-linear** and **rank-preserving**, making it ideal for NEPSE data with extreme outliers that must be preserved but normalized. Unlike Z-score (which assumes Gaussianity) or Min-Max (which assumes fixed bounds), Quantile Transform learns the empirical distribution and maps it to a Gaussian.

**NEPSE Volume Example:**
Raw NEPSE volume follows an exponential distribution—most days trade 50,000 shares, but 1% of days trade 5,000,000+ shares (50x average). Z-scoring makes the high-volume days extreme outliers (Z=10+), potentially treated as missing values by algorithms. Quantile transform maps the 99th percentile to ~2.3 (Gaussian), preserving the information that "this was a high volume day" without distorting the scale.

**Discrete Quantile Binning:**
The `manual_quantile_binning` function creates deciles (10 bins) based on rolling quantiles. This is useful for:
1. **Interpretability**: "Today was in the top decile (Q10) for volume"
2. **Tree models**: Discrete bins often split better than continuous values
3. **Robustness**: Bin boundaries adapt to regime changes via rolling calculation

---

## **15.7 Unit Vector Normalization**

Unit vector scaling (L2 normalization) scales rows to have unit norm, useful when the magnitude of the sample vector is irrelevant but the direction (relative composition) matters.

```python
from sklearn.preprocessing import Normalizer

def apply_unit_normalization(df, feature_groups):
    """
    Apply L2 normalization to groups of features.
    
    Useful for: Portfolio construction (weight vectors), 
    Technical indicator composites (trend direction vs magnitude)
    """
    df_normalized = df.copy()
    
    for group_name, columns in feature_groups.items():
        if not all(col in df.columns for col in columns):
            continue
            
        # Extract group
        X = df[columns].values
        
        # L2 normalization: x / sqrt(sum(x^2))
        # Each row becomes unit vector
        norms = np.sqrt(np.sum(X**2, axis=1))
        norms = np.where(norms == 0, 1, norms)  # Avoid division by zero
        
        X_normalized = X / norms[:, np.newaxis]
        
        # Store back
        for i, col in enumerate(columns):
            df_normalized[f'{col}_Unit'] = X_normalized[:, i]
    
    return df_normalized

# NEPSE Application: Multi-factor scoring
# Create composite score from Trend, Value, Momentum, Quality
# Unit normalize so each factor contributes equally regardless of scale
factors = {
    'Trend': ['ADX', 'SMA_Slope'],
    'Value': ['PE_Ratio', 'PB_Ratio'], 
    'Momentum': ['RSI', 'MACD'],
    'Quality': ['ROE', 'Debt_Equity']
}

# Without normalization: ADX (0-100) dominates ROE (0-0.3)
# With normalization: Each factor has equal weight in composite score
```

**Explanation:**

Unit vector normalization treats each observation (row) as a vector in high-dimensional space and scales it to length 1. This is crucial for **NEPSE multi-factor models** where we combine indicators with different scales (e.g., ADX 0-100, ROE 0-0.30, PE 5-50).

**Direction vs. Magnitude:**
After unit normalization, the vector indicates the *direction* (which indicators are high/low relative to each other) but not the *magnitude* (how extreme the overall signal is). This is useful for:
1. **Relative strength**: Comparing which technical factors are strongest for a given NEPSE stock, regardless of absolute volatility
2. **Portfolio weights**: Ensuring position sizing factors sum to 1 (fully invested)
3. **Cosine similarity**: Computing similarity between stocks based on factor profiles rather than absolute prices

---

## **15.8 Scaling for Different Algorithms**

Different ML algorithms require different scaling strategies for optimal NEPSE prediction performance.

```python
def get_scaling_pipeline(algorithm_type):
    """
    Return appropriate scaling strategy based on algorithm.
    
    Algorithm-specific recommendations for NEPSE data:
    """
    pipelines = {
        'linear_regression': {
            'scaler': 'StandardScaler',
            'reason': 'Assumes Gaussian residuals; standardize for regularization'
        },
        'neural_network': {
            'scaler': 'MinMaxScaler (-1, 1) or QuantileTransformer',
            'reason': 'Sigmoid/tanh activations saturate outside [-1, 1] or [0, 1]'
        },
        'svm_rbf': {
            'scaler': 'StandardScaler or RobustScaler',
            'reason': 'RBF kernel sensitive to distance; outliers distort kernel matrix'
        },
        'knn': {
            'scaler': 'MinMaxScaler or StandardScaler',
            'reason': 'Distance metric dominated by large-scale features'
        },
        'tree_based': {
            'scaler': 'None',
            'reason': 'Splits are invariant to monotonic transformations'
        },
        'pca': {
            'scaler': 'StandardScaler',
            'reason': 'Maximizes variance along axes; requires unit variance'
        },
        'lasso_ridge': {
            'scaler': 'StandardScaler',
            'reason': 'Penalty terms assume comparable feature scales'
        }
    }
    
    return pipelines.get(algorithm_type, 'StandardScaler')

# Implementation for NEPSE Ensemble:
def prepare_nepse_data_for_model(df, model_type):
    """
    Prepare NEPSE data with appropriate scaling for specific model type.
    """
    if model_type in ['random_forest', 'xgboost', 'lightgbm']:
        # Tree-based: No scaling needed, but clip outliers for stability
        return df.clip(df.quantile(0.01), df.quantile(0.99), axis=1)
    
    elif model_type == 'neural_network':
        # LSTM/GRU for NEPSE: Min-Max to [-1, 1]
        from sklearn.preprocessing import MinMaxScaler
        scaler = MinMaxScaler(feature_range=(-1, 1))
        return pd.DataFrame(
            scaler.fit_transform(df), 
            columns=df.columns, 
            index=df.index
        ), scaler
    
    elif model_type == 'linear_regression':
        # Ridge/Lasso for NEPSE factor modeling
        from sklearn.preprocessing import StandardScaler
        scaler = StandardScaler()
        return pd.DataFrame(
            scaler.fit_transform(df),
            columns=df.columns,
            index=df.index
        ), scaler
    
    elif model_type == 'svm':
        # SVM for NEPSE regime classification
        from sklearn.preprocessing import RobustScaler
        scaler = RobustScaler()
        return pd.DataFrame(
            scaler.fit_transform(df),
            columns=df.columns,
            index=df.index
        ), scaler
```

**Explanation:**

This section provides a **decision matrix** for scaling strategies:

**Tree-Based Models (XGBoost, LightGBM, Random Forest):**
- **No scaling required**: Splits depend on rank order, not absolute values
- **But clip outliers**: Extreme values (circuit breakers) can cause numerical instability in gain calculations
- **NEPSE tip**: These models often perform best on NEPSE raw data with only outlier clipping

**Neural Networks (LSTM for NEPSE time-series):**
- **Min-Max [-1, 1]**: Tanh activation outputs [-1, 1]; inputs should match
- **Quantile Transform**: If data is highly skewed (volume), use QuantileTransformer first, then Min-Max
- **Batch Normalization**: Alternative to manual scaling; learns optimal scaling internally

**Linear Models (Ridge/Lasso for NEPSE factor models):**
- **StandardScaler essential**: L1/L2 penalties apply equally to all coefficients
- **Without scaling**: A feature with scale 1000 (turnover) receives 1/1000th the penalty of scale 1 (RSI), effectively unregularized
- **RobustScaler alternative**: If NEPSE data has crash periods (outliers), RobustScaler prevents bias

**SVM (for NEPSE regime classification):**
- **Standard or Robust**: RBF kernel computes $\exp(-\gamma \|x - x'\|^2)$; large-scale features dominate distance
- **Robust preferred**: NEPSE flash crashes create support vectors that distort the decision boundary if using standard scaling

---

## **15.9 Scaling in Production Pipelines**

Production scaling requires careful handling of train/test splits, online learning, and inverse transformations for predictions.

```python
class NEPSEScalingPipeline:
    """
    Production-grade scaling pipeline for NEPSE prediction system.
    
    Handles:
    - Time-series aware fitting (no look-ahead bias)
    - Online learning (updating scalers with new data)
    - Inverse transformations for prediction interpretation
    - Feature-specific scaling strategies
    """
    
    def __init__(self, scaling_config):
        self.scalers = {}
        self.config = scaling_config
        self.fitted = False
        
    def fit(self, df_train, date_col='Date'):
        """
        Fit scalers on training data up to a specific date.
        """
        # Ensure chronological order
        df_train = df_train.sort_values(date_col)
        
        for feature, params in self.config.items():
            if feature not in df_train.columns:
                continue
                
            scaler_type = params['type']
            window = params.get('window', 252)
            
            if scaler_type == 'rolling_zscore':
                # Store rolling statistics for transform
                self.scalers[feature] = {
                    'mean': df_train[feature].rolling(window).mean().iloc[-1],
                    'std': df_train[feature].rolling(window).std().iloc[-1],
                    'type': 'rolling_zscore',
                    'window': window
                }
            
            elif scaler_type == 'minmax':
                from sklearn.preprocessing import MinMaxScaler
                scaler = MinMaxScaler(feature_range=params.get('range', (0, 1)))
                scaler.fit(df_train[[feature]])
                self.scalers[feature] = {'scaler': scaler, 'type': 'sklearn'}
            
            elif scaler_type == 'quantile':
                from sklearn.preprocessing import QuantileTransformer
                scaler = QuantileTransformer(
                    n_quantiles=params.get('n_quantiles', 1000),
                    output_distribution=params.get('output', 'normal')
                )
                scaler.fit(df_train[[feature]])
                self.scalers[feature] = {'scaler': scaler, 'type': 'sklearn'}
        
        self.fitted = True
        return self
    
    def transform(self, df):
        """
        Transform new data using fitted scalers.
        """
        if not self.fitted:
            raise ValueError("Pipeline not fitted yet")
        
        df_scaled = df.copy()
        
        for feature, scaler_info in self.scalers.items():
            if feature not in df.columns:
                continue
                
            if scaler_info['type'] == 'rolling_zscore':
                # For rolling zscore, we use the stored statistics
                # In production, you'd update these with new data
                mean = scaler_info['mean']
                std = scaler_info['std']
                df_scaled[f'{feature}_scaled'] = (df[feature] - mean) / std
            
            elif scaler_info['type'] == 'sklearn':
                df_scaled[f'{feature}_scaled'] = scaler_info['scaler'].transform(df[[feature]])
        
        return df_scaled
    
    def update(self, new_data):
        """
        Update scalers with new data (online learning).
        Critical for NEPSE: Market regimes change, scalers must adapt.
        """
        for feature, scaler_info in self.scalers.items():
            if feature not in new_data.columns:
                continue
            
            if scaler_info['type'] == 'rolling_zscore':
                # Update rolling statistics with exponential decay
                alpha = 2 / (scaler_info['window'] + 1)  # EMA decay
                new_mean = new_data[feature].mean()
                new_std = new_data[feature].std()
                
                # Exponential moving average update
                scaler_info['mean'] = (1 - alpha) * scaler_info['mean'] + alpha * new_mean
                scaler_info['std'] = (1 - alpha) * scaler_info['std'] + alpha * new_std
            
            elif scaler_info['type'] == 'sklearn':
                # Partial fit not available for all scalers
                # Refit periodically (weekly/monthly) instead
                pass
        
        return self
    
    def inverse_transform_prediction(self, predictions, feature_name):
        """
        Convert scaled predictions back to original units (e.g., stock prices).
        """
        scaler_info = self.scalers.get(feature_name)
        
        if not scaler_info:
            return predictions
        
        if scaler_info['type'] == 'rolling_zscore':
            mean = scaler_info['mean']
            std = scaler_info['std']
            return (predictions * std) + mean
        
        elif scaler_info['type'] == 'sklearn':
            return scaler_info['scaler'].inverse_transform(predictions.reshape(-1, 1)).flatten()
        
        return predictions

# Usage example for NEPSE production:
config = {
    'Close': {'type': 'rolling_zscore', 'window': 252},
    'Volume': {'type': 'quantile', 'n_quantiles': 1000, 'output': 'normal'},
    'RSI': {'type': 'minmax', 'range': (0, 1)}
}

pipeline = NEPSEScalingPipeline(config)
pipeline.fit(train_df)
scaled_test = pipeline.transform(test_df)

# After model prediction (scaled units)
scaled_pred = model.predict(scaled_test)
actual_price_pred = pipeline.inverse_transform_prediction(scaled_pred, 'Close')
```

**Explanation:**

The `NEPSEScalingPipeline` class implements **production-grade scaling** that addresses critical time-series challenges:

**No Look-Ahead Bias:**
The `fit` method uses only training data (historical) to calculate statistics. Unlike typical ML workflows where `fit_transform` uses global statistics (including future test data), this pipeline respects the temporal boundary—critical for NEPSE backtesting validity.

**Online Updating:**
The `update` method implements **exponential moving average** updates for rolling statistics. As new NEPSE trading days arrive, the scaler adapts to recent market regimes (volatility changes, new price levels) without requiring full refitting. The decay factor $\alpha = \frac{2}{n+1}$ ensures older data gradually loses influence.

**Inverse Transform:**
Machine learning models predict in scaled space (e.g., Z-scores). The `inverse_transform_prediction` method converts these back to interpretable units (NPR stock prices, actual volume numbers) for trading execution and reporting.

**Feature-Specific Strategies:**
The config dictionary allows different scaling for different feature types—Z-score for prices (mean-reverting), Quantile for volume (skewed), Min-Max for bounded indicators (RSI).

---

## **15.10 Inverse Transformation**

Inverse transformation converts model outputs back to original units, essential for trading execution and result interpretation.

```python
def inverse_transform_predictions(scaled_predictions, scaler, method='zscore', 
                                 original_stats=None):
    """
    Convert scaled predictions back to original NEPSE price/volume units.
    
    Parameters:
    -----------
    scaled_predictions : np.array
        Model output in scaled space (Z-scores or 0-1 range)
    scaler : fitted scaler object or dict
        Statistics from training data
    method : str
        'zscore', 'minmax', 'robust', 'log'
    original_stats : dict
        Required for rolling statistics (mean, std, min, max)
    """
    if method == 'zscore':
        mean = original_stats['mean']
        std = original_stats['std']
        return scaled_predictions * std + mean
    
    elif method == 'minmax':
        min_val = original_stats['min']
        max_val = original_stats['max']
        return scaled_predictions * (max_val - min_val) + min_val
    
    elif method == 'robust':
        median = original_stats['median']
        iqr = original_stats['iqr']
        return scaled_predictions * iqr + median
    
    elif method == 'log':
        offset = original_stats.get('offset', 1)
        return np.exp(scaled_predictions) - offset

# NEPSE-specific inverse transform example:
# Model predicts next day's Close price in Z-score terms: +1.5
# Current mean (252-day): Rs. 450
# Current std: Rs. 25
# Prediction: 450 + (1.5 * 25) = Rs. 487.50

# Confidence intervals in original units:
# 95% CI in Z-space: [prediction - 1.96, prediction + 1.96]
# Converted to price: [487.50 - 49, 487.50 + 49] = [438.50, 536.50]
```

**Explanation:**

Inverse transformation is crucial for **actionable predictions**. A model outputting "Z-score = +1.5" is meaningless to traders; converting to "Predicted Price = Rs. 487.50" enables execution decisions.

**Confidence Intervals:**
When models output uncertainty (e.g., Bayesian Neural Networks or Quantile Regression), inverse transformation must be applied to prediction intervals, not just point estimates. If the model predicts Z = 1.5 ± 0.5 (95% CI: 0.5 to 2.5), the price range is:
- Lower: $450 + (0.5 \times 25) = 462.50$
- Upper: $450 + (2.5 \times 25) = 512.50$

This Rs. 50 range informs position sizing (wider range = smaller position) and stop-loss placement.

---

## **15.11 Common Pitfalls**

Avoid these critical errors when scaling NEPSE data:

```python
def demonstrate_scaling_pitfalls():
    """
    Common mistakes in NEPSE feature scaling and their consequences.
    """
    
    # PITFALL 1: Look-ahead bias (using future data to scale past)
    # WRONG:
    global_mean = df['Close'].mean()  # Uses entire dataset including future
    global_std = df['Close'].std()
    df['Close_Z_Wrong'] = (df['Close'] - global_mean) / global_std
    
    # CORRECT:
    df['Close_Z_Correct'] = df['Close'].rolling(252).apply(
        lambda x: (x.iloc[-1] - x.mean()) / x.std()
    )
    
    # PITFALL 2: Different scaling for train vs test
    # WRONG: Fit scaler on all data, then split
    scaler = StandardScaler()
    scaler.fit(df[['Close', 'Volume']])  # Leaks test statistics
    train_scaled = scaler.transform(train_df)
    test_scaled = scaler.transform(test_df)
    
    # CORRECT: Fit only on train, transform both
    scaler.fit(train_df[['Close', 'Volume']])
    train_scaled = scaler.transform(train_df)
    test_scaled = scaler.transform(test_df)  # Uses train statistics
    
    # PITFALL 3: Scaling target variable (Close price) with features
    # WRONG: Scale target, predict, inverse transform
    # Loss of interpretability and potential data leakage
    
    # CORRECT: Scale features, leave target in raw units (or log transform only)
    # Model learns to predict actual price changes, not scaled prices
    
    # PITFILL 4: Ignoring regime changes (static scaling)
    # WRONG: Use 2020 statistics for 2024 predictions
    # NEPSE index moved from 1200 to 2100; old mean is meaningless
    
    # CORRECT: Rolling/expanding windows, or online updating
    
    # PITFALL 5: Scaling non-stationary features (prices) without differencing
    # WRONG: Scale raw prices (trending series)
    # Scaler assumes fixed mean, but NEPSE trends upward over years
    
    # CORRECT: Scale returns (stationary) or use rolling Z-score
    
    # PITFALL 6: Zero variance features (illiquid days)
    # Some micro-caps have zero volume or flat prices for days
    # Division by zero in Z-score
    
    # CORRECT: Check std > 0, or add epsilon (1e-8) to denominator
    
    # PITFALL 7: Scaling integer features (categorical encoded as ints)
    # Day of week: 0-6. Scaling to Z-score makes Monday = -1.2, Sunday = 1.5
    # Loses categorical nature
    
    # CORRECT: One-hot encode categoricals, don't scale
    
    pass

def validate_scaling_quality(df_original, df_scaled, feature):
    """
    Validate that scaling preserved necessary properties.
    """
    checks = {}
    
    # Check 1: Rank preservation (Spearman correlation = 1)
    from scipy.stats import spearmanr
    corr, _ = spearmanr(df_original[feature], df_scaled[f'{feature}_scaled'])
    checks['rank_preservation'] = corr > 0.99
    
    # Check 2: Outlier handling (max Z-score < 10)
    max_z = df_scaled[f'{feature}_scaled'].abs().max()
    checks['no_extreme_outliers'] = max_z < 10
    
    # Check 3: No NaN generation
    checks['no_missing_values'] = df_scaled[f'{feature}_scaled'].isna().sum() == 0
    
    # Check 4: Stationarity (scaled should be more stationary)
    from statsmodels.tsa.stattools import adfuller
    adf_before = adfuller(df_original[feature].dropna())[1]
    adf_after = adfuller(df_scaled[f'{feature}_scaled'].dropna())[1]
    checks['improved_stationarity'] = adf_after > adf_before
    
    return checks
```

**Explanation:**

**Pitfall 1: Look-Ahead Bias:**
The most dangerous error—using global mean/std that includes future data. In NEPSE backtests, this creates unrealistic performance (you're using "future knowledge" of price levels to normalize past data). Always use rolling or expanding windows.

**Pitfall 2: Train/Test Leakage:**
Fitting the scaler on the full dataset before splitting leaks test set statistics (mean, variance) into the training set. The scaler should only "see" training data; test data is transformed using training statistics.

**Pitfall 3: Scaling the Target:**
If predicting `Close` price, don't Z-score the target variable. The model should predict actual prices (or log prices), not Z-scores. Inverse transforming predictions introduces error and complicates loss functions.

**Pitfall 4: Static Scaling:**
NEPSE is non-stationary—the index tripled from 2020 to 2024. A scaler fitted on 2020 data (mean = 1200) applied to 2024 data (prices = 2000+) produces Z-scores of +10, treated as outliers. Use rolling windows or online updating.

**Pitfall 5: Scaling Trends:**
Raw prices have trends (non-stationary). Scaling them preserves the trend in the scaled data, violating assumptions of many algorithms. Scale returns (stationary) or use rolling Z-scores.

**Pitfall 6: Zero Variance:**
NEPSE micro-caps often have days with zero volume or unchanged prices (flat lines). Standard deviation = 0 causes division by zero. Always add epsilon (`1e-8`) or check for zero variance.

**Pitfall 7: Scaling Categoricals:**
Encoding "Day of Week" as 0-6 then Z-scoring creates nonsensical values (Monday = -1.2). The distances become meaningless (Saturday to Sunday is 1 day, but Z-scored distance might be 2.3). One-hot encode categoricals instead.

---

**End of Chapter 15**

This chapter covered feature scaling and normalization strategies essential for NEPSE time-series prediction, including Z-score, Min-Max, Robust, Power transformations, and production pipeline implementation with strict avoidance of look-ahead bias.

