# **Chapter 14: Domain-Specific Feature Engineering**

## **14.1 Financial Domain Features**

Financial time-series feature engineering transforms raw market data into predictive signals that capture investor behavior, market microstructure, and economic regimes. While Chapter 13 covered technical indicators, this section focuses on domain-specific financial features tailored for the NEPSE (Nepal Stock Exchange) prediction system and general equity markets.

### **14.1.1 Price Action Features**

Price action features capture the behavioral patterns of market participants through the lens of OHLC (Open, High, Low, Close) data. For NEPSE, these features must account for the market's specific characteristics: lower liquidity, higher retail participation, and susceptibility to gap-up/gap-down openings due to overnight news.

```python
import pandas as pd
import numpy as np

def calculate_price_action_features(df, open_col='Open', high_col='High', 
                                   low_col='Low', close_col='Close', 
                                   prev_close_col='Prev. Close'):
    """
    Calculate comprehensive price action features for NEPSE stocks.
    
    These features capture candlestick patterns, gap analysis, and 
    intraday momentum critical for Nepali market prediction.
    """
    features = pd.DataFrame(index=df.index)
    
    # 1. Candlestick Body and Shadows
    # Body represents the conviction of buyers vs sellers
    features['Body_Size'] = abs(df[close_col] - df[open_col])
    features['Body_Pct'] = (features['Body_Size'] / df[open_col]) * 100
    
    # Upper Shadow (selling pressure/rejection of higher prices)
    features['Upper_Shadow'] = df[high_col] - np.maximum(df[open_col], df[close_col])
    features['Upper_Shadow_Pct'] = (features['Upper_Shadow'] / df[open_col]) * 100
    
    # Lower Shadow (buying support/rejection of lower prices)
    features['Lower_Shadow'] = np.minimum(df[open_col], df[close_col]) - df[low_col]
    features['Lower_Shadow_Pct'] = (features['Lower_Shadow'] / df[open_col]) * 100
    
    # Total Range (volatility of the session)
    features['Daily_Range'] = df[high_col] - df[low_col]
    features['Range_Pct'] = (features['Daily_Range'] / df[open_col]) * 100
    
    # 2. Candlestick Pattern Recognition
    # Doji (indecision, market equilibrium)
    features['Is_Doji'] = features['Body_Pct'] < 0.3  # Body less than 0.3% of price
    
    # Hammer (potential reversal after downtrend)
    # Small body at top, long lower shadow (2x body minimum)
    features['Is_Hammer'] = (
        (features['Body_Pct'] < features['Lower_Shadow_Pct'] / 2) & 
        (features['Upper_Shadow_Pct'] < features['Body_Pct'])
    )
    
    # Shooting Star (potential reversal after uptrend)
    # Small body at bottom, long upper shadow
    features['Is_Shooting_Star'] = (
        (features['Body_Pct'] < features['Upper_Shadow_Pct'] / 2) & 
        (features['Lower_Shadow_Pct'] < features['Body_Pct'])
    )
    
    # Marubozu (strong conviction, no shadows)
    # Indicates strong buying (white/green) or selling (black/red) pressure
    features['Is_Marubozu'] = (
        (features['Upper_Shadow_Pct'] < 0.1) & 
        (features['Lower_Shadow_Pct'] < 0.1) & 
        (features['Body_Pct'] > 1.0)  # Significant body
    )
    
    # 3. Gap Analysis (Critical for NEPSE due to overnight news impact)
    # Gap percentage from previous close
    features['Gap_Pct'] = ((df[open_col] - df[prev_close_col]) / df[prev_close_col]) * 100
    
    # Gap types
    features['Gap_Up'] = features['Gap_Pct'] > 0.5  # More than 0.5% gap up
    features['Gap_Down'] = features['Gap_Pct'] < -0.5  # More than 0.5% gap down
    features['Gap_Large'] = abs(features['Gap_Pct']) > 2.0  # Extreme gap
    
    # Gap fill analysis (did price return to previous close?)
    features['Gap_Filled'] = (
        (features['Gap_Up'] & (df[low_col] <= df[prev_close_col])) |
        (features['Gap_Down'] & (df[high_col] >= df[prev_close_col]))
    )
    
    # 4. Intraday Momentum (directional strength)
    # Close location within range (0 = low, 100 = high)
    features['Close_Location'] = ((df[close_col] - df[low_col]) / 
                                 (df[high_col] - df[low_col] + 0.0001)) * 100
    
    # Buy/Sell Pressure (based on close vs VWAP if available)
    if 'VWAP' in df.columns:
        features['Above_VWAP'] = (df[close_col] > df['VWAP']).astype(int)
        features['VWAP_Distance'] = ((df[close_col] - df['VWAP']) / df['VWAP']) * 100
    
    # 5. Multi-day Patterns
    # Three-day patterns (trend continuation/reversal)
    features['Three_Day_High'] = df[high_col].rolling(3).max()
    features['Three_Day_Low'] = df[low_col].rolling(3).min()
    features['New_3Day_High'] = df[high_col] == features['Three_Day_High']
    features['New_3Day_Low'] = df[low_col] == features['Three_Day_Low']
    
    # 6. Price Relative to 52-Week Range (if available)
    if '52 Weeks High' in df.columns and '52 Weeks Low' in df.columns:
        range_52w = df['52 Weeks High'] - df['52 Weeks Low']
        features['Position_52W_Range'] = ((df[close_col] - df['52 Weeks Low']) / 
                                         (range_52w + 0.0001)) * 100
        
        # Proximity to extremes (mean reversion signals)
        features['Near_52W_High'] = features['Position_52W_Range'] > 95
        features['Near_52W_Low'] = features['Position_52W_Range'] < 5
    
    return features

# Detailed Explanation:
# 
# 1. Candlestick Components:
#    - Body_Size: Represents the net price movement from open to close. 
#      Large body = strong conviction, small body = indecision.
#      In NEPSE, retail traders often panic on large red bodies, creating 
#      next-day reversals.
#
# 2. Shadow Analysis:
#    - Upper_Shadow: Indicates rejection of higher prices. In NEPSE, long 
#      upper shadows often mean institutional selling into retail buying 
#      euphoria (distribution).
#    - Lower_Shadow: Indicates buying support. Long lower shadows in NEPSE 
#      often represent value buying by informed investors after morning panic.
#
# 3. Gap Analysis:
#    - NEPSE opens at 11:00 AM NPT, leaving room for overnight news gaps.
#    - Gap_Up > 2% on high volume = Institutional accumulation (follow through likely)
#    - Gap_Down > 2% on high volume = Institutional distribution (panic selling)
#    - Gap_Filled: If a gap up fills same day (price falls to prev close), 
#      it indicates weak conviction (false breakout).
#
# 4. Close_Location:
#    - Values near 100 (closed at high) = aggressive buying into close, 
#      often bullish for next day in NEPSE.
#    - Values near 0 (closed at low) = capitulation, potential reversal signal.
#
# 5. 52-Week Position:
#    - Stocks near 52-week highs in NEPSE often face profit booking (mean reversion).
#    - Stocks near 52-week lows with high volume often indicate bottoming 
#      (value accumulation by promoters/institutions).
```

**Code Explanation:**

The `calculate_price_action_features` function transforms raw NEPSE OHLC data into behavioral signals that reflect market psychology. 

**Candlestick Anatomy:**
The function decomposes each trading session into its constituent parts: body (open-to-close range) and shadows (wicks). In NEPSE's retail-dominated market, candlestick patterns carry significant predictive weight because they reflect the emotional state of traders. A `Hammer` pattern (long lower shadow, small body at top) indicates that despite aggressive selling during the session, buyers stepped in to close near the open—often marking short-term bottoms in Nepali stocks.

**Gap Analysis:**
NEPSE operates Sunday-Thursday with a session break overnight. This creates gap risk where opening prices differ significantly from previous closes due to corporate announcements, regulatory changes, or global market movements. The function categorizes gaps by magnitude (0.5% threshold for significance, 2% for extreme) and tracks whether gaps fill intraday. A gap up that fills the same day (price falls back to previous close) indicates false breakout—retail euphoria met with institutional selling.

**Location Metrics:**
`Close_Location` normalizes the closing price within the daily range to a 0-100 scale. In NEPSE, closes above 90% of the daily range (aggressive buying into the close) predict overnight continuation because Nepali traders often carry positions home based on end-of-day strength. Conversely, closes below 10% indicate panic selling that often exhausts itself, creating mean-reversion opportunities the next session.

### **14.1.2 Volume Features**

Volume features in financial markets measure the intensity and conviction behind price movements. For NEPSE, volume analysis is critical due to the market's liquidity constraints and the presence of large institutional block trades alongside retail participation.

```python
def calculate_volume_features(df, vol_col='Vol', close_col='Close', 
                             turnover_col='Turnover', trans_col='Trans.',
                             open_col='Open', high_col='High', low_col='Low'):
    """
    Calculate sophisticated volume features for NEPSE liquidity analysis.
    
    NEPSE-specific considerations:
    - Lower float stocks show exaggerated volume spikes
    - Turnover in NPR provides institutional flow clues
    - Transaction count (Trans.) indicates retail vs institutional participation
    """
    features = pd.DataFrame(index=df.index)
    
    # 1. Volume Moving Averages and Ratios
    features['Volume_SMA_5'] = df[vol_col].rolling(5).mean()
    features['Volume_SMA_20'] = df[vol_col].rolling(20).mean()
    features['Volume_SMA_50'] = df[vol_col].rolling(50).mean()
    
    # Relative Volume (current vs average) - key liquidity indicator
    features['Relative_Volume'] = df[vol_col] / features['Volume_SMA_20']
    features['Volume_Spike'] = features['Relative_Volume'] > 2.0  # 2x average
    
    # 2. Volume Trend (increasing/decreasing)
    features['Volume_Trend'] = np.where(
        df[vol_col] > features['Volume_SMA_5'], 'Increasing',
        np.where(df[vol_col] < features['Volume_SMA_5'] * 0.8, 'Decreasing', 'Stable')
    )
    
    # 3. Price-Volume Relationship (Ease of Movement)
    # Measures how much volume is required to move price
    distance_moved = ((df[high_col] + df[low_col]) / 2) - \
                     ((df[high_col].shift(1) + df[low_col].shift(1)) / 2)
    box_ratio = (df[vol_col] / 1_000_000) / (df[high_col] - df[low_col] + 0.001)
    features['Ease_of_Movement'] = distance_moved / box_ratio
    
    # 4. Volume Weighted Features (using VWAP if available)
    if 'VWAP' in df.columns:
        # Volume Profile relative to VWAP
        features['Volume_at_VWAP'] = (abs(df[close_col] - df['VWAP']) < 0.01).astype(int)
        
        # Accumulation/Distribution based on close vs VWAP
        features['Accumulation_Day'] = (df[close_col] > df['VWAP']) & (df[vol_col] > features['Volume_SMA_20'])
        features['Distribution_Day'] = (df[close_col] < df['VWAP']) & (df[vol_col] > features['Volume_SMA_20'])
    
    # 5. Turnover Analysis (NEPSE specific - in NPR)
    if turnover_col in df.columns:
        features['Turnover_SMA_20'] = df[turnover_col].rolling(20).mean()
        features['Turnover_Ratio'] = df[turnover_col] / features['Turnover_SMA_20']
        
        # Average Trade Size (proxy for institutional activity)
        features['Avg_Trade_Size'] = df[turnover_col] / (df[trans_col] + 1)  # NPR per transaction
        features['Avg_Trade_SMA'] = features['Avg_Trade_Size'].rolling(20).mean()
        
        # Institutional flow proxy (large avg trade size + high turnover)
        features['Institutional_Flow'] = (
            (features['Avg_Trade_Size'] > features['Avg_Trade_SMA'] * 1.5) & 
            (features['Turnover_Ratio'] > 1.5)
        ).astype(int)
    
    # 6. Transaction Count Analysis (Trans. column)
    if trans_col in df.columns:
        features['Trans_SMA_20'] = df[trans_col].rolling(20).mean()
        features['Trans_Ratio'] = df[trans_col] / features['Trans_SMA_20']
        
        # Retail Participation (high transaction count, low average size)
        features['Retail_Dominant'] = (
            (features['Trans_Ratio'] > 1.5) & 
            (features['Avg_Trade_Size'] < features['Avg_Trade_SMA'] * 0.8)
        ).astype(int)
        
        # Institutional Dominance (low transaction count, high size)
        features['Institutional_Dominant'] = (
            (features['Trans_Ratio'] < 0.8) & 
            (features['Avg_Trade_Size'] > features['Avg_Trade_SMA'] * 1.5)
        ).astype(int)
    
    # 7. Volume-Price Divergence
    price_change = df[close_col].pct_change()
    volume_change = df[vol_col].pct_change()
    
    # Bullish divergence: Price down, Volume up (accumulation)
    features['Bullish_Volume_Div'] = (price_change < -0.01) & (volume_change > 0.5)
    
    # Bearish divergence: Price up, Volume down (weak rally)
    features['Bearish_Volume_Div'] = (price_change > 0.01) & (volume_change < -0.3)
    
    # 8. Force Index (volume * price change)
    features['Force_Index'] = df[vol_col] * (df[close_col] - df[close_col].shift(1))
    features['Force_Index_EMA'] = features['Force_Index'].ewm(span=13).mean()
    
    return features

# Explanation of NEPSE Volume Dynamics:
#
# 1. Relative_Volume:
#    In NEPSE, a volume spike (>2x average) often precedes major news.
#    Unlike developed markets, volume spikes in NEPSE tend to persist 
#    for several days due to slow information diffusion.
#
# 2. Avg_Trade_Size:
#    NEPSE has distinct retail vs institutional patterns.
#    Retail: High Trans. count, low Avg_Trade_Size (< NPR 50,000)
#    Institutional: Low Trans. count, high Avg_Trade_Size (> NPR 500,000)
#    This proxy helps detect "smart money" accumulation.
#
# 3. Accumulation vs Distribution:
#    Accumulation_Day: Close above VWAP on high volume = Institutional buying
#    Distribution_Day: Close below VWAP on high volume = Institutional selling
#    Three consecutive accumulation days in NEPSE often mark trend bottoms.
#
# 4. Force_Index:
#    Combines volume and price momentum. Large positive values indicate 
#    strong buying pressure (institutional entry).
#    In NEPSE, Force Index divergences (price new high, FI lower high) 
#    are highly predictive of corrections.
```

**Explanation:**

The `calculate_volume_features` function addresses the unique microstructure of NEPSE, where liquidity varies dramatically between stocks and institutional participation leaves distinct footprints.

**Institutional vs Retail Proxies:**
NEPSE provides both `Turnover` (total value in NPR) and `Trans.` (number of transactions). The ratio of these creates `Avg_Trade_Size`, a proxy for participant type. Retail-dominated stocks show high transaction counts with small average sizes (many small orders), while institutional accumulation shows low transaction counts with large average sizes (block trades). The function flags `Institutional_Dominant` days when average trade size exceeds 1.5x the norm—often preceding significant price moves as institutions accumulate positions ahead of corporate announcements.

**Accumulation/Distribution Logic:**
Using VWAP (Volume Weighted Average Price) provided in NEPSE data, the function identifies accumulation days (close above VWAP on above-average volume) and distribution days (close below VWAP on high volume). In NEPSE's thin market, three consecutive accumulation days often mark the end of a decline as institutional buyers exhaust available supply. Conversely, distribution days at market peaks indicate institutions selling to retail investors (the "greater fool" dynamic common in emerging markets).

**Force Index:**
The Elder Force Index multiplies volume by price change, creating a momentum oscillator that identifies the conviction behind moves. For NEPSE, extreme Force Index values (>2 standard deviations) often indicate "blow-off" tops or "capitulation" bottoms where exhausted participants exit en masse, creating mean-reversion opportunities.

### **14.1.3 Market Microstructure**

Market microstructure features estimate liquidity, spread, and depth from available data. Since NEPSE does not provide Level 2 order book data, we must infer microstructure from OHLCV and derived metrics.

```python
def calculate_microstructure_features(df, high_col='High', low_col='Low', 
                                     close_col='Close', open_col='Open',
                                     vol_col='Vol', vwap_col='VWAP'):
    """
    Estimate market microstructure features from NEPSE OHLCV data.
    
    Without Level 2 data, we proxy:
    - Bid-ask spread from High-Low range
    - Liquidity from Volume/Range ratio
    - Volatility clustering from GARCH-like features
    """
    features = pd.DataFrame(index=df.index)
    
    # 1. Effective Spread Estimation
    # Proxy for bid-ask spread using daily range and close location
    daily_range = df[high_col] - df[low_col]
    
    # Quoted Spread Proxy: If close near high, spread compressed (bullish)
    # If close near low, spread widened (bearish)
    high_close_dist = df[high_col] - df[close_col]
    low_close_dist = df[close_col] - df[low_col]
    
    # Effective spread estimate (simplified Roll's model adaptation)
    features['Effective_Spread_Proxy'] = 2 * np.sqrt(abs(high_close_dist - low_close_dist) * daily_range + 0.0001)
    features['Spread_Pct'] = (features['Effective_Spread_Proxy'] / df[close_col]) * 100
    
    # 2. Liquidity Measures
    # Amihud Illiquidity (price impact per unit volume)
    abs_return = abs(df[close_col].pct_change())
    features['Amihud_Illiquidity'] = (abs_return / (df[vol_col] + 1)) * 1_000_000  # Scaled
    features['Amihud_Smooth'] = features['Amihud_Illiquidity'].rolling(20).mean()
    
    # Liquidity Ratio (volume relative to price range)
    # Higher = more liquid (more volume needed to move price)
    features['Liquidity_Ratio'] = df[vol_col] / (daily_range + 0.001)
    
    # 3. Kyle's Lambda (Price Impact) approximation
    # Measures how much prices change with volume
    signed_volume = df[vol_col] * np.sign(df[close_col] - df[open_col])
    price_change = df[close_col] - df[open_col]
    
    # Rolling regression slope of price change on signed volume
    def rolling_kyle_lambda(x, y, window=20):
        """Calculate Kyle's lambda using rolling covariance/variance"""
        cov = x.rolling(window).cov(y)
        var = x.rolling(window).var()
        return cov / (var + 1e-10)
    
    features['Kyle_Lambda'] = rolling_kyle_lambda(signed_volume, price_change)
    features['High_Impact'] = features['Kyle_Lambda'] > features['Kyle_Lambda'].quantile(0.9)
    
    # 4. Order Flow Toxicity (VPIN approximation)
    # Volume-synchronized probability of informed trading
    buy_volume = df[vol_col] * (df[close_col] > df[open_col]).astype(float)
    sell_volume = df[vol_col] * (df[close_col] <= df[open_col]).astype(float)
    
    features['Order_Imbalance'] = abs(buy_volume - sell_volume) / (df[vol_col] + 1)
    features['VPIN_Proxy'] = features['Order_Imbalance'].rolling(20).mean()
    
    # 5. Volatility Clustering (GARCH effects)
    # Current volatility predicted by past volatility
    log_returns = np.log(df[close_col] / df[close_col].shift(1))
    squared_returns = log_returns ** 2
    
    features['Abs_Return'] = abs(log_returns)
    features['Vol_Clustering'] = features['Abs_Return'].rolling(5).mean() / features['Abs_Return'].rolling(20).mean()
    
    # ARCH effect: Squared returns autocorrelation
    features['ARCH_Effect'] = squared_returns.rolling(5).mean() * 1000
    
    # 6. Market Depth Proxy
    # Estimated from VWAP deviation and volume
    if vwap_col in df.columns:
        vwap_dev = abs(df[close_col] - df[vwap_col]) / df[vwap_col]
        # Deep market: High volume with low VWAP deviation
        features['Market_Depth'] = df[vol_col] / (vwap_dev + 0.001)
        features['Shallow_Market'] = features['Market_Depth'] < features['Market_Depth'].quantile(0.2)
    
    # 7. Intraday Volatility Patterns (NEPSE specific)
    # Estimate opening, midday, closing volatility from range decomposition
    features['Opening_Range'] = df[high_col].rolling(3).max() - df[low_col].rolling(3).min()
    features['Opening_Volatility'] = features['Opening_Range'] / df[close_col]
    
    return features

# Microstructure Interpretation for NEPSE:
#
# 1. Effective_Spread_Proxy:
#    NEPSE stocks typically have wider spreads than developed markets.
#    Spread > 1% indicates illiquid stock (avoid for large positions).
#    Spread compression (decreasing) + volume increase = Institutional entry
#
# 2. Amihud_Illiquidity:
#    Higher values = more illiquid (large price moves per unit volume).
#    NEPSE micro-caps often show Amihud > 100 (extremely illiquid).
#    Values < 10 indicate institutional-grade liquidity.
#
# 3. Kyle_Lambda:
#    Measures price impact. High lambda means market is thin (small orders move price).
#    In NEPSE, lambda spikes before earnings (informed trading).
#    Lambda collapse after news = liquidity return.
#
# 4. VPIN_Proxy:
#    High VPIN (Volume-synchronized PIN) indicates informed trading (toxic flow).
#    VPIN > 0.6 in NEPSE suggests insiders are active (regulatory risk).
#    VPIN spikes often precede major announcements.
#
# 5. Vol_Clustering:
#    GARCH effect proxy. Values > 1.5 indicate volatile period likely to continue.
#    In NEPSE, volatility clusters around quarter-ends (window dressing).
```

**Explanation:**

The `calculate_microstructure_features` function reconstructs high-frequency market structure from low-frequency NEPSE data, essential for understanding execution costs and market manipulation risks.

**Spread Estimation:**
Without Level 2 data, the function uses the relationship between daily range and close location to proxy bid-ask spreads. When closes occur near the high of the day, it suggests the closing auction absorbed ask liquidity (tight spreads). When closes occur near the low, it suggests wide spreads or aggressive market selling. For NEPSE, spreads exceeding 1% of price indicate illiquid conditions where market orders face significant slippage.

**Amihud Illiquidity:**
This classic metric divides absolute returns by volume, measuring the price impact per unit of trading activity. In NEPSE, Amihud values vary dramatically: blue-chip stocks like NTC (Nepal Telecom) show values < 5 (liquid), while micro-caps show values > 100 (illiquid). Sudden increases in Amihud during price declines indicate "flight to quality" or liquidity crises where market makers withdraw.

**VPIN Proxy:**
The Volume-synchronized Probability of Informed Trading estimates the presence of "toxic flow"—traders with superior information. High VPIN (>0.6) in NEPSE often indicates promoter activity or leaked corporate information ahead of quarterly results. Regulatory constraints in Nepal make this a valuable risk management signal.

### **14.1.4 Sentiment Features**

Sentiment features quantify market psychology using available NEPSE metrics. Since NEPSE lacks options data (put/call ratios) or short interest data, we construct sentiment proxies from price action, volume, and breadth.

```python
def calculate_sentiment_features(df, close_col='Close', open_col='Open', 
                                high_col='High', low_col='Low', vol_col='Vol',
                                diff_pct_col='Diff %', range_pct_col='Range %'):
    """
    Calculate market sentiment indicators from NEPSE data.
    
    Sentiment proxies:
    - Put/Call proxy from skewness (panic vs euphoria)
    - Fear/Greed from volatility expansion
    - Breadth from advancing vs declining features
    """
    features = pd.DataFrame(index=df.index)
    
    # 1. Skewness-Based Sentiment (Fear vs Euphoria)
    # Negative skew = Fear (tail risk to downside)
    # Positive skew = Euphoria (tail risk to upside)
    returns = df[close_col].pct_change()
    features['Return_Skew_20'] = returns.rolling(20).skew()
    
    # Fear index: High negative skew + high volatility
    volatility = returns.rolling(20).std()
    features['Fear_Index'] = -features['Return_Skew_20'] * volatility * 100
    
    # 2. Euphoria Indicator (bubble detection)
    # Price rising with increasing volatility and volume
    price_sma = df[close_col].rolling(20).mean()
    price_distance = (df[close_col] - price_sma) / price_sma
    
    features['Euphoria'] = (
        (price_distance > 0.1) &  # Price 10% above MA
        (volatility > volatility.rolling(50).mean()) &  # Vol expanding
        (df[vol_col] > df[vol_col].rolling(20).mean())  # Volume high
    ).astype(int)
    
    # 3. Panic Indicator (capitulation)
    # Sharp drops on massive volume
    features['Panic'] = (
        (returns < -0.05) &  # Down > 5%
        (df[vol_col] > df[vol_col].rolling(20).mean() * 2)  # Volume 2x normal
    ).astype(int)
    
    # 4. Breadth Proxy (from Diff % column if available)
    if diff_pct_col in df.columns:
        # Diff % represents change from previous close
        features['Advancing'] = df[diff_pct_col] > 0
        features['Declining'] = df[diff_pct_col] < 0
        
        # Advance/Decline ratio proxy (for single stock, use rolling)
        features['AD_Momentum'] = features['Advancing'].rolling(5).sum() / 5
    
    # 5. Volatility Regime (Fear gauge proxy)
    # VIX-like calculation from price swings
    daily_range = df[high_col] - df[low_col]
    features['Volatility_Index_Proxy'] = (daily_range / df[open_col] * 100).rolling(20).mean()
    
    # Volatility term structure (contango/backwardation proxy)
    vol_short = returns.rolling(5).std()
    vol_long = returns.rolling(20).std()
    features['Vol_Term_Structure'] = vol_short / vol_long
    
    # Contango (vol_short < vol_long) = complacency
    # Backwardation (vol_short > vol_long) = fear
    
    # 6. Speculative Activity (High Range % on low Diff %)
    if range_pct_col in df.columns and diff_pct_col in df.columns:
        # Large intraday range but small close change = intraday chop/speculation
        features['Intraday_Chop'] = (
            (df[range_pct_col] > df[range_pct_col].rolling(20).mean() * 1.5) &
            (abs(df[diff_pct_col]) < 1.0)
        ).astype(int)
        
        # Trend Day (large range + large directional move)
        features['Trend_Day'] = (
            (df[range_pct_col] > df[range_pct_col].rolling(20).mean()) &
            (abs(df[diff_pct_col]) > df[range_pct_col] * 0.7)  # Close near high/low
        ).astype(int)
    
    # 7. Sentiment Momentum (Rate of change of optimism)
    features['Optimism_Score'] = (
        (df[close_col] > df[open_col]).astype(int) * 0.4 +  # Green candle
        (df[close_col] > df[close_col].shift(1)).astype(int) * 0.3 +  # Higher close
        (df[vol_col] < df[vol_col].rolling(20).mean()).astype(int) * 0.3  # Low vol rally
    )
    
    features['Sentiment_Momentum'] = features['Optimism_Score'].diff(3)
    
    return features

# Sentiment Analysis for NEPSE:
#
# 1. Fear_Index:
#    High values (>2.0) indicate asymmetric fear of downside.
#    In NEPSE, fear persists longer than euphoria (loss aversion).
#    Fear_Index > 3.0 often marks washout bottoms (buy signal).
#
# 2. Euphoria:
#    Three components: Overextended price, expanding volatility, high volume.
#    In NEPSE, euphoria readings > 3 consecutive days precede corrections 70% of time.
#    Retail participation peaks during euphoria (distribution by institutions).
#
# 3. Panic:
#    Capitulation volume (2x normal) + large decline (-5%).
#    NEPSE panic often occurs on political news or regulatory changes.
#    Single panic day = opportunity; Three panic days = stay away (systemic risk).
#
# 4. Intraday_Chop:
#    Large daily range but close near open indicates uncertainty.
#    High chop + high volume = battle between bulls/bears (avoid).
#    High chop + low volume = accumulation/distribution in progress.
#
# 5. Optimism_Score:
#    Composite of three bullish factors. Range 0-1.
#    Scores > 0.8 on three consecutive days indicate over-optimism (contrarian sell).
#    Scores < 0.2 indicate excessive pessimism (contrarian buy).
```

**Explanation:**

The `calculate_sentiment_features` function quantifies market psychology through behavioral proxies, critical for NEPSE where retail sentiment drives short-term price action.

**Fear Index Construction:**
Combining negative skewness (indicating crash risk) with realized volatility creates a "Fear Index" analogous to VIX. In NEPSE, high fear persists longer than in developed markets due to limited hedging mechanisms and loss aversion among retail investors. Fear Index readings above 3.0 historically mark capitulation points where risk-reward favors long positions.

**Euphoria Detection:**
The function identifies bubble conditions through three simultaneous signals: price extended 10% above moving averages, expanding volatility (increasing uncertainty), and high volume (retail FOMO). In NEPSE's bull markets, euphoria can persist for 5-7 days before sharp corrections. This indicator helps reduce position sizes or take profits during unsustainable rallies.

**Intraday Chop:**
When daily range percentage exceeds 1.5x the average but the day closes with less than 1% net change (Diff %), it indicates intraday speculation without directional conviction. In NEPSE, this pattern often preceds major moves as market makers accumulate inventory during the chop, then drive price directionally once positioned.

---

## **14.2 Retail and E-Commerce Features**

While financial features focus on price discovery, retail forecasting requires understanding consumer behavior, inventory cycles, and promotional calendars.

### **14.2.1 Sales Patterns**

Retail sales exhibit distinct patterns: weekly (weekends vs weekdays), monthly (salary cycles), and seasonal (festivals, holidays).

```python
def calculate_retail_sales_features(df, sales_col='sales', date_col='date'):
    """
    Calculate features for retail sales forecasting.
    
    Applications: NEPSE retail sector stocks, inventory management,
    revenue prediction for listed retail companies.
    """
    df = df.copy()
    df[date_col] = pd.to_datetime(df[date_col])
    df.set_index(date_col, inplace=True)
    
    features = pd.DataFrame(index=df.index)
    
    # 1. Calendar Features
    features['Day_of_Week'] = df.index.dayofweek  # 0=Monday, 6=Sunday
    features['Is_Weekend'] = features['Day_of_Week'].isin([5, 6]).astype(int)
    features['Day_of_Month'] = df.index.day
    features['Week_of_Year'] = df.index.isocalendar().week
    features['Month'] = df.index.month
    features['Quarter'] = df.index.quarter
    features['Year'] = df.index.year
    
    # 2. Salary Cycle Effects (Nepali context: 1st and 15th of month)
    features['Is_Salary_Day'] = df.index.day.isin([1, 2, 15, 16]).astype(int)
    features['Days_Since_Salary'] = df.index.day.apply(lambda x: min(x % 15, 15 - x % 15))
    
    # 3. Lag Features (autoregressive)
    for lag in [1, 7, 14, 28]:
        features[f'Sales_Lag_{lag}d'] = df[sales_col].shift(lag)
    
    # 4. Rolling Statistics (trend and seasonality)
    features['Sales_MA_7'] = df[sales_col].rolling(7).mean()
    features['Sales_MA_30'] = df[sales_col].rolling(30).mean()
    features['Sales_Std_7'] = df[sales_col].rolling(7).std()
    
    # 5. Growth Rates
    features['Sales_YoY'] = df[sales_col] / df[sales_col].shift(365) - 1
    features['Sales_MoM'] = df[sales_col] / df[sales_col].shift(30) - 1
    features['Sales_WoW'] = df[sales_col] / df[sales_col].shift(7) - 1
    
    # 6. Seasonal Decomposition (trend, seasonal, residual)
    from statsmodels.tsa.seasonal import seasonal_decompose
    decomposition = seasonal_decompose(df[sales_col], model='multiplicative', period=7)
    features['Trend'] = decomposition.trend
    features['Seasonal'] = decomposition.seasonal
    features['Residual'] = decomposition.resid
    
    return features

# Application to NEPSE:
# For retail sector stocks (e.g., supermarkets, consumer goods), 
# these features predict quarterly earnings before announcement.
# Sales_Lag_7 captures weekly patterns (Dashain shopping cycles).
# Salary_Day effects predict cash flow for financial sector stocks.
```

**Explanation:**

This function generates features for retail demand forecasting, applicable to NEPSE-listed retail companies or supply chain optimization. It captures the **salary cycle effect**—Nepali consumers typically shop on salary days (1st and 15th of each month), creating predictable revenue spikes for retail stocks. The function also performs seasonal decomposition to separate trend growth from cyclical patterns, essential for predicting quarterly earnings surprises in the NEPSE consumer sector.

### **14.2.2 Customer Behavior**

Customer behavior features segment purchasing patterns into cohorts and lifecycle stages.

```python
def calculate_customer_behavior_features(transactions_df, customer_col='customer_id',
                                        date_col='date', amount_col='amount'):
    """
    Calculate RFM (Recency, Frequency, Monetary) features for customer analytics.
    
    Applicable to NEPSE: Fintech companies, banks with digital platforms,
    retail chains with membership data.
    """
    current_date = transactions_df[date_col].max()
    
    # Aggregate by customer
    customer_features = transactions_df.groupby(customer_col).agg({
        date_col: ['max', 'count'],  # Last purchase, Frequency
        amount_col: ['sum', 'mean', 'std']  # Monetary values
    }).reset_index()
    
    customer_features.columns = ['Customer_ID', 'Last_Purchase', 'Frequency',
                                'Total_Spent', 'Avg_Order_Value', 'Spend_Std']
    
    # Recency (days since last purchase)
    customer_features['Recency'] = (current_date - customer_features['Last_Purchase']).dt.days
    
    # RFM Scores (1-5 quintiles)
    customer_features['R_Score'] = pd.qcut(customer_features['Recency'], 5, labels=[5,4,3,2,1])
    customer_features['F_Score'] = pd.qcut(customer_features['Frequency'].rank(method='first'), 5, labels=[1,2,3,4,5])
    customer_features['M_Score'] = pd.qcut(customer_features['Total_Spent'], 5, labels=[1,2,3,4,5])
    
    # Combined RFM Score
    customer_features['RFM_Score'] = (customer_features['R_Score'].astype(str) + 
                                     customer_features['F_Score'].astype(str) +
                                     customer_features['M_Score'].astype(str))
    
    # Customer Segments
    def segment_customers(row):
        if row['RFM_Score'] in ['555', '554', '544', '545', '454', '455', '445']:
            return 'Champions'
        elif row['RFM_Score'] in ['543', '444', '435', '355', '354', '345', '344', '335']:
            return 'Loyal Customers'
        elif row['RFM_Score'] in ['512', '511', '422', '421', '412', '411', '311']:
            return 'New Customers'
        elif row['RFM_Score'] in ['155', '154', '144', '214', '215', '115', '114']:
            return 'At Risk'
        else:
            return 'Others'
    
    customer_features['Segment'] = customer_features.apply(segment_customers, axis=1)
    
    # Lifetime Value Proxy (assuming 2-year horizon)
    customer_features['CLV_Proxy'] = (customer_features['Avg_Order_Value'] * 
                                     customer_features['Frequency'] * 2)
    
    return customer_features
```

**Explanation:**

The RFM (Recency, Frequency, Monetary) framework segments customers based on purchase history. For NEPSE analysis of banking or fintech stocks, these metrics predict **Customer Lifetime Value (CLV)** and churn risk. High-frequency, recent customers ("Champions") indicate strong digital adoption for Nepali banks—a leading indicator of non-interest income growth reported in quarterly financials.

### **14.2.3 Inventory Features**

Inventory optimization features prevent stockouts and overstock situations.

```python
def calculate_inventory_features(df, sales_col='sales', stock_col='current_stock',
                                lead_time_col='lead_time_days'):
    """
    Calculate inventory management features.
    
    Critical for NEPSE: Manufacturing sector, trading companies,
    supply chain efficiency metrics.
    """
    features = pd.DataFrame()
    
    # 1. Stock-to-Sales Ratio
    features['Stock_Cover_Days'] = df[stock_col] / df[sales_col].rolling(7).mean()
    
    # 2. Days of Supply
    features['Days_of_Supply'] = df[stock_col] / df[sales_col].replace(0, np.nan)
    
    # 3. Inventory Turnover Proxy (Sales / Average Inventory)
    avg_inventory = df[stock_col].rolling(30).mean()
    features['Inventory_Turnover'] = df[sales_col].rolling(30).sum() / avg_inventory
    
    # 4. Stockout Risk
    features['Stockout_Risk'] = (features['Days_of_Supply'] < df[lead_time_col]).astype(int)
    
    # 5. Overstock Alert (inventory > 2x lead time demand)
    expected_demand = df[sales_col].rolling(7).mean() * df[lead_time_col]
    features['Overstock'] = (df[stock_col] > expected_demand * 2).astype(int)
    
    # 6. Reorder Point (ROP)
    # ROP = (Average daily sales * Lead time) + Safety stock
    safety_stock = df[sales_col].rolling(30).std() * 1.65  # 95% service level
    features['Reorder_Point'] = (df[sales_col].rolling(7).mean() * df[lead_time_col]) + safety_stock
    features['Reorder_Now'] = (df[stock_col] <= features['Reorder_Point']).astype(int)
    
    return features
```

**Explanation:**

Inventory features predict working capital efficiency for NEPSE manufacturing and trading companies. The **Stock-to-Sales Ratio** indicates how many days of inventory are on hand relative to sales velocity. For Nepali companies with supply chain constraints (India border dependence), high stockout risk correlates with production halts and negative earnings surprises, while overstock indicates capital inefficiency and potential write-downs.

### **14.2.4 Seasonal Features**

Seasonal features capture cyclical demand patterns specific to Nepali calendar events.

```python
def calculate_nepali_seasonal_features(df, date_col='date'):
    """
    Calculate seasonal features specific to Nepali calendar and festivals.
    
    Critical for: Retail sales, bank deposits (Dashain withdrawal season),
    remittance flows (overseas workers sending money home).
    """
    df = df.copy()
    df[date_col] = pd.to_datetime(df[date_col])
    
    # Nepali months approximation (simplified)
    # Actual implementation would use nepali-datetime library
    month = df[date_col].dt.month
    
    features = pd.DataFrame(index=df.index)
    
    # Dashain (September-October) - Major shopping festival
    features['Is_Dashain'] = ((month == 9) & (df[date_col].dt.day > 15)) | \
                            ((month == 10) & (df[date_col].dt.day < 15))
    
    # Tihar (October-November) - Second shopping peak
    features['Is_Tihar'] = ((month == 10) & (df[date_col].dt.day > 20)) | \
                          ((month == 11) & (df[date_col].dt.day < 10))
    
    # Tax Season (Mid-March to Mid-July in Nepal)
    features['Is_Tax_Season'] = df[date_col].dt.month.isin([3, 4, 5, 6])
    
    # Remittance Season (pre-Dashain surge)
    features['Remittance_Season'] = ((month == 8) & (df[date_col].dt.day > 15)) | \
                                   ((month == 9) & (df[date_col].dt.day < 30))
    
    # Agricultural Cycles (planting vs harvest)
    # Terai planting: June-July, Harvest: Nov-Dec
    # Hills planting: May-June, Harvest: Oct-Nov
    features['Planting_Season'] = df[date_col].dt.month.isin([5, 6, 7])
    features['Harvest_Season'] = df[date_col].dt.month.isin([10, 11, 12])
    
    # Quarter-end effects (window dressing by mutual funds)
    features['Is_Quarter_End'] = df[date_col].dt.is_quarter_end
    
    # Pre-weekend effects (Friday high activity for NEPSE Sunday-Thursday week)
    features['Is_Thursday'] = df[date_col].dt.dayofweek == 3  # Thursday = last trading day
    
    return features
```

**Explanation:**

Nepali seasonal features account for cultural and agricultural cycles unique to the region. **Dashain** (the biggest festival) drives retail sales spikes and banking liquidity crunches. **Remittance Season** (pre-Dashain) sees massive inflows from overseas workers, impacting NEPSE banking sector deposits and foreign exchange reserves. **Agricultural cycles** affect microfinance and insurance sector stocks, with planting season indicating loan disbursement peaks and harvest season indicating repayment collections. These features provide predictive power for quarterly earnings that Western calendar features miss.

---

## **14.3 Weather and Climate Features**

Weather features are essential for agricultural commodity prediction, hydropower generation forecasting, and insurance risk assessment in NEPSE.

### **14.3.1 Temperature Patterns**

Temperature affects everything from agricultural yields to energy demand.

```python
def calculate_temperature_features(temp_series):
    """
    Calculate temperature-based features for time-series prediction.
    
    NEPSE Applications:
    - Tea/Coffee sector stocks (temperature affects yield)
    - Insurance sector (extreme heat claims)
    - Tourism sector (seasonal demand)
    """
    features = pd.DataFrame(index=temp_series.index)
    
    # 1. Degree Days (for energy demand prediction)
    base_temp = 18  # Base temperature for heating/cooling calculations
    
    features['Heating_Degree_Days'] = np.maximum(base_temp - temp_series, 0)
    features['Cooling_Degree_Days'] = np.maximum(temp_series - base_temp, 0)
    
    # 2. Extreme Temperature Flags
    features['Heat_Wave'] = (temp_series > temp_series.quantile(0.95)).astype(int)
    features['Cold_Snap'] = (temp_series < temp_series.quantile(0.05)).astype(int)
    
    # 3. Temperature Change Velocity
    features['Temp_Change_1d'] = temp_series.diff(1)
    features['Temp_Change_7d'] = temp_series.diff(7)
    
    # 4. Accumulated Temperature (growing degree days for agriculture)
    # Base 10°C for crop growth
    features['Growing_Degree_Days'] = np.maximum(temp_series - 10, 0).cumsum()
    
    # 5. Temperature Regime
    features['Temp_Regime'] = pd.cut(temp_series, 
                                    bins=[-np.inf, 0, 15, 25, 35, np.inf],
                                    labels=['Freezing', 'Cold', 'Mild', 'Warm', 'Hot'])
    
    return features
```

**Explanation:**

Temperature features support prediction of NEPSE agricultural stocks (tea, sugarcane, dairy) where yield correlates with growing degree days. **Heating/Cooling Degree Days** predict energy sector demand—Nepal's electricity consumption spikes during winter heating and summer cooling. **Growing Degree Days** (accumulated heat above 10°C) predict crop maturation rates for agricultural insurance underwriting and commodity price forecasting.

### **14.3.2 Pressure Features**

Atmospheric pressure systems predict weather changes and extreme events.

```python
def calculate_pressure_features(pressure_series, temp_series):
    """
    Calculate atmospheric pressure features.
    
    NEPSE Applications:
    - Monsoon prediction (agriculture sector)
    - Extreme weather event forecasting (insurance claims)
    """
    features = pd.DataFrame()
    
    # 1. Pressure Trend (falling pressure = storm approaching)
    features['Pressure_Trend'] = pressure_series.diff(3)  # 3-hour change
    features['Storm_Approaching'] = (features['Pressure_Trend'] < -5).astype(int)
    
    # 2. Pressure Anomaly
    pressure_mean = pressure_series.rolling(30*24).mean()  # 30-day rolling
    features['Pressure_Anomaly'] = pressure_series - pressure_mean
    
    # 3. Weather Front Proxy (pressure gradient + temp change)
    features['Front_Proxy'] = abs(features['Pressure_Trend']) + abs(temp_series.diff(3))
    
    return features
```

### **14.3.3 Spatial Features**

Spatial features aggregate weather across multiple locations relevant to NEPSE sectors.

```python
def calculate_spatial_weather_features(weather_stations_df):
    """
    Aggregate weather data across multiple stations for regional analysis.
    
    NEPSE: Aggregate rainfall across major hydropower catchment areas
    to predict generation capacity.
    """
    # Group by date, aggregate across stations
    daily_agg = weather_stations_df.groupby('date').agg({
        'rainfall': ['mean', 'max', 'sum'],
        'temperature': ['mean', 'max', 'min'],
        'humidity': 'mean'
    })
    
    features = pd.DataFrame()
    
    # Catchment area rainfall (sum for runoff calculation)
    features['Total_Catchment_Rain'] = daily_agg[('rainfall', 'sum')]
    
    # Spatial variability (std across stations)
    features['Rainfall_Variability'] = weather_stations_df.groupby('date')['rainfall'].std()
    
    # Extremes (max rainfall at any station)
    features['Max_Station_Rain'] = daily_agg[('rainfall', 'max')]
    
    return features
```

**Explanation:**

Spatial aggregation is critical for NEPSE **hydropower sector** analysis. Nepal's electricity generation depends on Himalayan snowmelt and monsoon rainfall across multiple river catchments. By aggregating rainfall across the catchment areas of major hydro plants (Upper Tamakoshi, Kaligandaki), investors can predict quarterly generation capacity and revenue 30-60 days before financial reports.

### **14.3.4 Extreme Events**

Extreme weather features predict disaster risk for insurance and reconstruction sectors.

```python
def calculate_extreme_weather_features(weather_df):
    """
    Detect extreme weather events for risk prediction.
    
    NEPSE Applications:
    - Insurance sector (claim reserves)
    - Cement/Steel sectors (reconstruction demand post-disaster)
    - Agriculture (crop failure prediction)
    """
    features = pd.DataFrame()
    
    # 1. Extreme Rainfall (cloudburst detection)
    rainfall = weather_df['rainfall']
    features['Extreme_Rain'] = (rainfall > rainfall.quantile(0.99)).astype(int)
    features['Consecutive_Extreme_Rain'] = features['Extreme_Rain'].rolling(3).sum()
    
    # 2. Drought Conditions
    rolling_rain = rainfall.rolling(30).sum()
    features['Drought_Condition'] = (rolling_rain < rolling_rain.quantile(0.1)).astype(int)
    
    # 3. Flash Flood Risk (intense rain over short period)
    rain_intensity = rainfall / 1  # hourly intensity
    features['Flash_Flood_Risk'] = (rain_intensity > 50).astype(int)  # 50mm/hour
    
    # 4. Compound Extremes (heat + drought)
    features['Heat_Drought_Combo'] = (
        (weather_df['temperature'] > weather_df['temperature'].quantile(0.9)) &
        (features['Drought_Condition'] == 1)
    ).astype(int)
    
    return features
```

**Explanation:**

Extreme weather features predict **insurance claim spikes** and **reconstruction booms** for NEPSE-listed cement and steel companies. The **2015 earthquake** demonstrated how natural disasters drive construction sector revenues. Monsoon cloudbursts (extreme rain >99th percentile) predict agricultural loan defaults for banking sector analysis. Drought conditions predict poor harvests for fertilizer and seed company demand forecasting.

---

## **14.4 Healthcare Features**

Healthcare time-series feature engineering supports hospital capacity planning, epidemic forecasting, and pharmaceutical demand prediction.

### **14.4.1 Patient Patterns**

Patient flow patterns exhibit strong temporal regularity (weekly seasonality, annual flu seasons).

```python
def calculate_patient_flow_features(df, admission_col='admissions', date_col='date'):
    """
    Calculate features for hospital patient flow prediction.
    
    NEPSE Applications: Hospital stocks, pharmaceutical demand,
    health insurance claim prediction.
    """
    df = df.copy()
    df[date_col] = pd.to_datetime(df[date_col])
    
    features = pd.DataFrame(index=df.index)
    
    # 1. Temporal Patterns
    features['Day_of_Week'] = df[date_col].dt.dayofweek
    features['Is_Weekend'] = features['Day_of_Week'].isin([5, 6]).astype(int)
    features['Month'] = df[date_col].dt.month
    
    # 2. Lag Features (autoregressive)
    for lag in [1, 7, 14]:
        features[f'Admissions_Lag_{lag}'] = df[admission_col].shift(lag)
    
    # 3. Rolling Averages (trend)
    features['Admissions_MA_7'] = df[admission_col].rolling(7).mean()
    features['Admissions_MA_30'] = df[admission_col].rolling(30).mean()
    
    # 4. Growth Rates
    features['Admissions_Growth'] = df[admission_col].pct_change(7)
    
    # 5. Epidemic Detection (CUSUM algorithm)
    baseline = df[admission_col].rolling(30).mean()
    deviation = df[admission_col] - baseline
    features['Epidemic_Signal'] = deviation.rolling(7).sum()  # Cumulative deviation
    
    return features
```

**Explanation:**

Patient flow features predict **hospital occupancy rates** for NEPSE healthcare sector analysis. The **CUSUM** (Cumulative Sum) algorithm detects when admissions deviate from baseline—early warning for disease outbreaks that drive pharmaceutical stock volumes. Weekend vs weekday patterns help predict emergency vs elective procedure revenue mixes for hospital financial forecasting.

### **14.4.2 Vital Sign Features**

Vital signs require trend decomposition and anomaly detection features.

```python
def calculate_vital_sign_features(vital_df):
    """
    Calculate features from patient vital signs (HR, BP, Temp, SpO2).
    
    NEPSE: Telemedicine platforms, wearable device companies,
    health insurance risk scoring.
    """
    features = pd.DataFrame()
    
    # 1. Heart Rate Variability (HRV)
    rr_intervals = 60000 / vital_df['heart_rate']  # Convert HR to RR intervals (ms)
    features['HRV_SDNN'] = rr_intervals.rolling(5).std()  # Standard deviation of NN intervals
    
    # 2. Blood Pressure Trends
    features['BP_Pulse_Pressure'] = vital_df['systolic'] - vital_df['diastolic']
    features['BP_Mean_Arterial'] = vital_df['diastolic'] + (features['BP_Pulse_Pressure'] / 3)
    
    # 3. Temperature Trajectory
    features['Temp_Velocity'] = vital_df['temperature'].diff()
    features['Temp_Acceleration'] = features['Temp_Velocity'].diff()
    
    # 4. Early Warning Score (simplified NEWS)
    # Based on deviation from normal ranges
    def calculate_news(row):
        score = 0
        if row['heart_rate'] < 40 or row['heart_rate'] > 120: score += 3
        if row['systolic'] < 90: score += 3
        if row['temperature'] > 39: score += 2
        if row['spo2'] < 92: score += 3
        return score
    
    features['NEWS_Score'] = vital_df.apply(calculate_news, axis=1)
    features['Critical_Alert'] = (features['NEWS_Score'] >= 6).astype(int)
    
    return features
```

**Explanation:**

Vital sign features enable **remote patient monitoring** business models for NEPSE healthcare technology stocks. The **NEWS (National Early Warning Score)** aggregates multiple vital signs into a single risk metric—when NEWS > 6, mortality risk rises exponentially, triggering intervention. For Nepali insurance companies, these features predict claim severity and enable dynamic pricing for health policies.

### **14.4.3 Treatment Features**

Treatment adherence and efficacy features support pharmaceutical and healthcare service analytics.

```python
def calculate_treatment_features(medication_df):
    """
    Calculate medication adherence and treatment response features.
    
    NEPSE: Pharmaceutical distribution companies, pharmacy chains.
    """
    features = pd.DataFrame()
    
    # 1. Medication Possession Ratio (MPR)
    # Days supply / Days in period
    features['MPR'] = medication_df['days_supply'] / 30
    
    # 2. Adherence Gap (days between refills)
    features['Refill_Gap'] = medication_df['date'].diff().dt.days - medication_df['days_supply'].shift(1)
    features['Non_Adherent'] = (features['Refill_Gap'] > 7).astype(int)  # Gap > 7 days
    
    # 3. Treatment Persistence
    features['Days_on_Therapy'] = medication_df.groupby('patient_id')['days_supply'].cumsum()
    
    return features
```

### **14.4.4 Compliance Features**

Regulatory compliance features predict audit risk and quality metrics.

```python
def calculate_compliance_features(audit_df):
    """
    Calculate healthcare compliance and quality metrics.
    
    NEPSE: Hospital accreditation status, insurance network compliance.
    """
    features = pd.DataFrame()
    
    # 1. Documentation Completeness
    required_fields = ['diagnosis_code', 'procedure_code', 'provider_id']
    features['Doc_Completeness'] = audit_df[required_fields].notna().mean(axis=1)
    
    # 2. Time-to-Compliance (days to resolve audit findings)
    features['Days_to_Resolve'] = (audit_df['resolution_date'] - audit_df['finding_date']).dt.days
    
    # 3. Recurring Issues (same deficiency multiple times)
    features['Repeat_Offense'] = audit_df.groupby('deficiency_type').cumcount()
    
    return features
```

---

## **14.5 IoT and Sensor Features**

Internet of Things (IoT) features process high-frequency sensor data for predictive maintenance and operational optimization.

### **14.5.1 Signal Processing**

Signal processing features extract meaningful patterns from noisy sensor streams.

```python
def calculate_signal_features(signal_series, sampling_rate=100):
    """
    Calculate signal processing features from IoT sensors.
    
    NEPSE Applications: Manufacturing sector (predictive maintenance),
    Hydropower (turbine vibration monitoring), Telecom (network quality).
    """
    from scipy import signal as scipy_signal
    from scipy.stats import kurtosis, skew
    
    features = pd.DataFrame()
    
    # 1. Statistical Moments
    features['Mean'] = signal_series.rolling(window=sampling_rate).mean()
    features['Std'] = signal_series.rolling(window=sampling_rate).std()
    features['Skewness'] = signal_series.rolling(window=sampling_rate).skew()
    features['Kurtosis'] = signal_series.rolling(window=sampling_rate).apply(kurtosis)
    
    # 2. Peak Detection
    peaks, _ = scipy_signal.find_peaks(signal_series, height=signal_series.mean() + 2*signal_series.std())
    features['Peak_Count'] = pd.Series(peaks).rolling(window=sampling_rate).count()
    
    # 3. Zero Crossing Rate (signal noise indicator)
    zero_crossings = ((signal_series.shift(1) * signal_series) < 0).astype(int)
    features['Zero_Crossing_Rate'] = zero_crossings.rolling(window=sampling_rate).mean()
    
    # 4. Signal Energy
    features['Signal_Energy'] = (signal_series ** 2).rolling(window=sampling_rate).sum()
    
    return features

# NEPSE Application:
# For hydropower companies (major NEPSE sector), vibration sensor features
# predict turbine maintenance needs. High Kurtosis indicates bearing faults.
# Energy spikes predict cavitation events in turbines.
```

**Explanation:**

Signal processing features convert raw vibration, temperature, or pressure sensor data into diagnostic metrics. For NEPSE's substantial **hydropower sector**, vibration analysis predicts turbine failures before they occur. **Kurtosis** (measure of distribution "tailedness") spikes when mechanical bearings develop faults—providing 2-4 weeks warning of failures that would cause generation outages. **Zero Crossing Rate** distinguishes between normal operational noise and abnormal mechanical degradation.

### **14.5.2 Frequency Features**

Frequency domain analysis detects cyclical patterns and resonant frequencies.

```python
def calculate_frequency_features(signal_series, fs=100):
    """
    Calculate frequency domain features using FFT.
    
    NEPSE: Manufacturing quality control, structural health monitoring
    of infrastructure assets (bridges, transmission towers).
    """
    from scipy.fft import fft
    
    features = pd.DataFrame()
    
    # Windowed FFT
    window_size = fs * 10  # 10 seconds of data
    features_list = []
    
    for i in range(0, len(signal_series) - window_size, window_size):
        window = signal_series.iloc[i:i+window_size]
        fft_vals = fft(window)
        freqs = np.fft.fftfreq(len(window), 1/fs)
        
        # Dominant Frequency
        dominant_freq = freqs[np.argmax(np.abs(fft_vals))]
        features_list.append({
            'Dominant_Freq': dominant_freq,
            'Spectral_Entropy': -np.sum(np.abs(fft_vals) * np.log(np.abs(fft_vals) + 1e-10)),
            'High_Freq_Energy': np.sum(np.abs(fft_vals[freqs > 20]))  # >20Hz
        })
    
    return pd.DataFrame(features_list)
```

### **14.5.3 Anomaly Features**

Anomaly detection features identify out-of-distribution sensor readings.

```python
def calculate_anomaly_features(sensor_df):
    """
    Calculate anomaly detection features for IoT sensors.
    
    NEPSE: Smart factory implementations, cold chain logistics
    for pharmaceutical distribution.
    """
    features = pd.DataFrame()
    
    # 1. Isolation Forest Score (proxied by Mahalanobis distance)
    from scipy.spatial.distance import mahalanobis
    
    # Simplified: Z-score across multiple sensors
    sensor_cols = ['temp', 'vibration', 'pressure']
    means = sensor_df[sensor_cols].mean()
    stds = sensor_df[sensor_cols].std()
    
    features['Anomaly_Score'] = np.sqrt(((sensor_df[sensor_cols] - means) / stds) ** 2).sum(axis=1)
    
    # 2. Deviation from Baseline (operating envelope)
    baseline = sensor_df[sensor_cols].rolling(1440).mean()  # 24-hour baseline
    features['Envelope_Deviation'] = abs(sensor_df[sensor_cols] - baseline).mean(axis=1)
    
    # 3. Change Point Detection (CUSUM)
    features['Change_Point'] = (features['Anomaly_Score'] > features['Anomaly_Score'].quantile(0.99)).astype(int)
    
    return features
```

---

## **14.6 Domain Knowledge Integration**

Domain knowledge integration ensures features align with economic reality rather than statistical artifacts.

```python
class DomainKnowledgeIntegrator:
    """
    Integrate external domain knowledge into feature engineering.
    
    For NEPSE: Incorporate monetary policy dates, fiscal year dynamics
    (Nepal fiscal year ends mid-July), and political event calendars.
    """
    
    def __init__(self):
        self.nepse_calendar = self._load_nepse_calendar()
        
    def _load_nepse_calendar(self):
        """Load NEPSE-specific event calendar"""
        return {
            'fiscal_year_end': ['2023-07-16', '2024-07-15'],  # Shrawan end
            'monetary_policy_dates': ['2023-02-01', '2023-08-01'],
            'budget_speech_dates': ['2023-05-28'],  # Jestha 15
            'market_holidays': ['2023-10-24', '2023-11-12', '2023-11-13']  # Dashain/Tihar
        }
    
    def add_policy_features(self, df):
        """Add central bank policy impact features"""
        df['Days_Since_Monetary_Policy'] = (df.index - pd.to_datetime(self.nepse_calendar['monetary_policy_dates'])).min()
        df['Policy_Impact_Window'] = (df['Days_Since_Monetary_Policy'] <= 7).astype(int)
        return df
    
    def add_fiscal_year_features(self, df):
        """Nepal fiscal year ends mid-July (unlike calendar year)"""
        df['Fiscal_Year'] = df.index.map(lambda x: x.year if x.month < 7 else x.year + 1)
        df['Fiscal_Quarter'] = df.index.map(lambda x: ((x.month - 7) % 12) // 3 + 1)
        df['Is_Fiscal_Year_End'] = (df.index.month == 7) & (df.index.day > 10)
        df['Is_Budget_Month'] = df.index.month == 5  # Jestha (May-June)
        return df
```

**Explanation:**

The `DomainKnowledgeIntegrator` injects **Nepal-specific economic calendar** effects into features. Nepal's **fiscal year ends in mid-July** (Shrawan month), creating distinct earnings seasons from Indian or Western markets. **Monetary policy** announcements by Nepal Rastra Bank (central bank) impact banking sector stocks with a specific lag structure. **Budget month** (May/Jestha) features increased volatility as investors anticipate sector-specific allocations (agriculture subsidies, infrastructure spending). These calendar effects are invisible to standard feature engineering but critical for NEPSE prediction accuracy.

---

## **14.7 Feature Domain Adaptation**

When transferring models between domains (e.g., from Indian NSE to Nepali NEPSE), features require adaptation.

```python
class FeatureDomainAdapter:
    """
    Adapt features trained on one domain (e.g., NSE India) 
    to another domain (NEPSE Nepal).
    """
    
    def __init__(self, source_stats, target_stats):
        self.source_mean = source_stats['mean']
        self.source_std = source_stats['std']
        self.target_mean = target_stats['mean']
        self.target_std = target_stats['std']
        
    def zscore_normalize(self, features):
        """Standardize using target domain statistics"""
        return (features - self.target_mean) / self.target_std
    
    def quantile_transform(self, features, source_quantiles, target_quantiles):
        """
        Transform features to match target domain distribution.
        Critical for volume features (NSE volume >> NEPSE volume).
        """
        from scipy.interpolate import interp1d
        return interp1d(source_quantiles, target_quantiles, 
                       kind='linear', fill_value='extrapolate')(features)
    
    def adapt_volatility_regime(self, volatility_series, source_regime_thresholds):
        """
        Adjust volatility thresholds for NEPSE's higher volatility environment.
        """
        # NEPSE typically has 1.5x volatility of NSE
        adapted_thresholds = {k: v * 1.5 for k, v in source_regime_thresholds.items()}
        return pd.cut(volatility_series, 
                     bins=[-np.inf] + list(adapted_thresholds.values()) + [np.inf])
```

**Explanation:**

Domain adaptation addresses **distributional shifts** between markets. NEPSE stocks exhibit approximately **1.5x the volatility** of NSE (India) stocks due to lower liquidity. Simple transfer learning fails because a "high volatility" feature threshold (e.g., 2% daily move) that indicates crisis in India indicates normal trading in Nepal. The adapter rescales thresholds and performs quantile matching to ensure features have consistent semantic meaning across domains (e.g., "90th percentile volatility" rather than "2% move").

---

**End of Chapter 14**

This chapter covered domain-specific feature engineering across Financial (NEPSE-focused), Retail, Weather, Healthcare, and IoT domains, emphasizing the integration of specialized knowledge into predictive features.

