# SECTION 1: FINAL PRODUCTION STRATEGY DEFINITION

This section provides the definitive technical specification for the **"Aureus Sigma Vietnam Value Concentrated"** strategy. This specification is the final output of all research and validation conducted in Phases 1 through 20 and represents the exact logic that will be deployed for live trading.

---

### **1.1 Investment Philosophy & Objectives**

- **Philosophy:** A systematic, quantitative strategy designed to capture the Value premium in a concentrated portfolio of liquid Vietnamese equities, while actively managing downside risk through a hybrid, rules-based overlay.
- **Primary Objective:** Deliver superior, risk-adjusted alpha over a full market cycle.
- **Secondary Objective:** Adhere to institutional risk constraints, specifically a maximum drawdown target of **-35%** and a fixed, manageable number of holdings.

---

### **1.2 Technical Specification Table**

| Component | Parameter | Specification | Rationale & Validation Source |
| :--- | :--- | :--- | :--- |
| **Universe** | **Definition** | **ASC-VN-Liquid-150** (Top ~150-200 stocks) | Validated in Phase 12 as the optimal tradable universe. |
| | **Liquidity Filter** | 63-day ADTV > 10 Billion VND | Determined in Phase 10 Liquidity Deep Dive to balance size and tradability. |
| | **Reconstruction** | **Quarterly, Dynamic** | Proven in Phase 16b to eliminate survivorship bias and capture market evolution. |
| **Alpha Signal**| **Factor** | **Standalone Value** | Proven in Phase 16b to be the most robust and powerful alpha signal (Sharpe 2.60) on a standalone basis. |
| | **Metric** | `Value_Composite` from `qvm_engine_v2_enhanced` | Engine validated through extensive unit testing in early project phases. |
| **Portfolio**| **Construction** | **Select Top 25 Stocks** | **New Constraint.** Industry standard for a high-conviction, manageable active portfolio. Addresses feedback on impractical diversification. |
| **Construction**| **Weighting** | **Equal-Weighted** | Simple, robust, and prevents single-stock concentration. |
| | **Sector Constraint** | **Max 35% per Sector** | **New Constraint.** Prevents unintended sector bets and ensures risk is driven by the Value factor, not sector concentration. |
| **Rebalancing**| **Frequency** | **Quarterly** | Proven in Phase 17 sensitivity analysis to be the optimal balance of signal freshness and cost efficiency. |
| **Risk Overlay**| **Model** | **Hybrid (Regime + Volatility)** | Validated in Phase 18 as the most effective model for improving risk-adjusted returns. |
| | **Regime Layer** | **50% Exposure** during "Bear" or "Stress" periods. | Acts as a primary, decisive circuit-breaker during major market dislocations. |
| | **Volatility Layer** | Target **15% Annualized Volatility** (60-day lookback). | Provides fine-tuned, dynamic risk adjustments based on the portfolio's own realized volatility. |
| | **Final Exposure** | **MINIMUM(Regime Exposure, Volatility Exposure)** | Ensures the most conservative risk posture is always taken. |
| **Execution** | **Transaction Costs** | **30 bps per trade** | Baseline assumption for commissions and market impact. To be stress-tested. |


In [5]:
# ============================================================================
# SECTION 2: DATA LOADING & PREPARATION (CORRECTED)
# ============================================================================
import pandas as pd # *** CRITICAL FIX: Added missing pandas import ***
import numpy as np # *** CRITICAL FIX: Added missing numpy import ***
from sqlalchemy import create_engine, text
import yaml
from pathlib import Path
import pickle

# --- Helper function to create DB connection ---
def create_db_connection():
    # Assumes notebook is in a subfolder like `tests/phase21...`
    config_path = Path.cwd().parent.parent.parent / 'config' / 'database.yml'
    if not config_path.exists():
        # Fallback for different directory structures
        config_path = Path.cwd() / 'config' / 'database.yml'
        if not config_path.exists():
            raise FileNotFoundError("Could not locate database.yml")
            
    with open(config_path, 'r') as f:
        db_config = yaml.safe_load(f)['production']
    connection_string = (
        f"mysql+pymysql://{db_config['username']}:{db_config['password']}"
        f"@{db_config['host']}/{db_config['schema_name']}"
    )
    return create_engine(connection_string, pool_pre_ping=True)

print("📂 Loading all raw data required for the final backtest...")

# --- 1. Database Connection & Date Range ---
engine = create_db_connection()
db_params = {
    'start_date': "2015-۱۲-01", # Start early to ensure data for first rebalance
    'end_date': "2025-07-28"
}

# --- 2. Load Factor Scores (Full Universe) ---
factor_query = text("""
    SELECT date, ticker, Quality_Composite, Value_Composite, Momentum_Composite
    FROM factor_scores_qvm
    WHERE date BETWEEN :start_date AND :end_date AND strategy_version = 'qvm_v2.0_enhanced'
""")
factor_data_raw = pd.read_sql(factor_query, engine, params=db_params, parse_dates=['date'])
print(f"   ✅ Loaded {len(factor_data_raw):,} raw factor observations.")

# --- 3. Load Price Data & Calculate Returns ---
price_query = text("SELECT date, ticker, close FROM equity_history WHERE date BETWEEN :start_date AND :end_date")
price_data_raw = pd.read_sql(price_query, engine, params=db_params, parse_dates=['date'])
price_data_raw['return'] = price_data_raw.groupby('ticker')['close'].pct_change()
daily_returns_matrix = price_data_raw.pivot(index='date', columns='ticker', values='return')
print(f"   ✅ Loaded and processed {len(price_data_raw):,} raw price observations.")

# --- 4. Load Market Regime Signals ---
project_root = Path.cwd().parent.parent
archive_path = project_root / "tests" / "phase8_risk_management"
phase8_results_file = archive_path / "phase8_results.pkl"
with open(phase8_results_file, "rb") as f:
    phase8_results = pickle.load(f)
market_regimes = phase8_results['market_regimes']
market_regimes['risk_on'] = ~market_regimes['regime'].isin(['Bear', 'Stress'])
print("   ✅ Loaded market regime signals.")

# --- 5. Load Sector Mappings ---
sector_info_query = text("SELECT ticker, sector FROM master_info WHERE sector IS NOT NULL")
sector_info = pd.read_sql(sector_info_query, engine).drop_duplicates(subset=['ticker']).set_index('ticker')
print(f"   ✅ Loaded sector mappings for {len(sector_info)} tickers.")

engine.dispose()

# --- 6. Data Integrity Check ---
print("\n📊 Data Integrity Validation:")
print(f"   - Factor Data: {factor_data_raw.shape}")
print(f"   - Daily Returns Matrix: {daily_returns_matrix.shape}")
print(f"   - Market Regimes: {len(market_regimes)} days")
print(f"   - Sector Mappings: {len(sector_info)} tickers")

print("\n✅ All data loaded. Ready to define the final backtesting engine.")

📂 Loading all raw data required for the final backtest...
   ✅ Loaded 1,567,488 raw factor observations.
   ✅ Loaded and processed 2,339,155 raw price observations.
   ✅ Loaded market regime signals.
   ✅ Loaded sector mappings for 728 tickers.

📊 Data Integrity Validation:
   - Factor Data: (1567488, 5)
   - Daily Returns Matrix: (4129, 728)
   - Market Regimes: 2381 days
   - Sector Mappings: 728 tickers

✅ All data loaded. Ready to define the final backtesting engine.


In [9]:
# --- 1. Establish Rebalancing Schedule ---
all_trading_dates = daily_returns_matrix.index
freq_ends = pd.date_range(start=all_trading_dates.min(), end=all_trading_dates.max(), freq=rebalance_freq)
rebalance_dates = [
    all_trading_dates[all_trading_dates.searchsorted(d, side='right') - 1] for d in freq_ends
]
print(f"   - Generated {len(rebalance_dates)} rebalance dates.")

# --- 2. Generate Daily Holdings Matrix ---
aggressive_holdings = pd.DataFrame(0.0, index=all_trading_dates, columns=daily_returns_matrix.columns)

for i in range(len(rebalance_dates)):
    rebal_date = rebalance_dates[i]
    
    # a. Construct the liquid universe for this date
    universe_df = get_liquid_universe_dataframe(
        analysis_date=rebal_date, engine=engine,
        config={'lookback_days': 63, 'adtv_threshold_bn': 10.0, 'top_n': 200, 'min_trading_coverage': 0.6}
    )
    if universe_df.empty or len(universe_df) < 50: continue
    
    # b. Filter factor data to this universe and date
    factors_on_date = factor_data_raw[
        (factor_data_raw['date'] == rebal_date) &
        (factor_data_raw['ticker'].isin(universe_df['ticker']))
    ].copy()
    if len(factors_on_date) < 25: continue

    # c. Select top 25 stocks based on the pure Value signal
    top_25_stocks = factors_on_date.nlargest(25, 'Value_Composite').copy()
    
    # d. Apply Sector Constraints
    top_25_with_sectors = top_25_stocks.join(sector_info, on='ticker')
    sector_counts = top_25_with_sectors['sector'].value_counts()
    max_stocks_per_sector = int(25 * 0.35) # Max 35% = 8 stocks
    
    final_tickers = set()
    for sector, count in sector_counts.items():
        sector_stocks = top_25_with_sectors[top_25_with_sectors['sector'] == sector]
        num_to_keep = min(count, max_stocks_per_sector)
        final_tickers.update(sector_stocks.head(num_to_keep)['ticker'])
    
    final_portfolio_df = top_25_stocks[top_25_stocks['ticker'].isin(final_tickers)]
    
    # e. Assign equal weights
    if not final_portfolio_df.empty:
        weight = 1.0 / len(final_portfolio_df)
        portfolio_weights = pd.Series(weight, index=final_portfolio_df['ticker'])

        # f. Propagate weights to the daily holdings matrix
        start_period = rebal_date + pd.Timedelta(days=1)
        end_period = rebalance_dates[i+1] if i + 1 < len(rebalance_dates) else all_trading_dates.max()
        holding_dates = aggressive_holdings.index[(aggressive_holdings.index >= start_period) & (aggressive_holdings.index <= end_period)]
        valid_tickers = portfolio_weights.index.intersection(aggressive_holdings.columns)
        aggressive_holdings.loc[holding_dates, valid_tickers] = portfolio_weights[valid_tickers].values

print("   - Aggressive Growth holdings matrix constructed.")

# --- 3. Apply Hybrid Risk Overlay ---
aggressive_returns_temp = (aggressive_holdings.shift(1).fillna(0.0) * daily_returns_matrix).sum(axis=1)
realized_vol = aggressive_returns_temp.rolling(window=60).std() * np.sqrt(252)
vol_exposure = (0.15 / realized_vol).shift(1).clip(lower=0.2, upper=1.0).fillna(1.0)
regime_exposure = market_regimes['risk_on'].apply(lambda x: 1.0 if x else 0.5)
common_index = vol_exposure.index.intersection(regime_exposure.index)
hybrid_exposure = pd.DataFrame({'vol': vol_exposure.loc[common_index], 'regime': regime_exposure.loc[common_index]}).min(axis=1)
risk_managed_holdings = aggressive_holdings.mul(hybrid_exposure, axis=0)
print("   - Hybrid risk overlay applied.")

# --- 4. Calculate Final Net Returns ---
holdings_shifted = risk_managed_holdings.shift(1).fillna(0.0)
gross_returns = (holdings_shifted * daily_returns_matrix).sum(axis=1)
turnover = (holdings_shifted - holdings_shifted.shift(1)).abs().sum(axis=1) / 2
costs = turnover * (30 / 10000)
net_returns = gross_returns - costs
print("   - Final net returns calculated.")

return net_returns.rename(strategy_name)

NameError: name 'rebalance_freq' is not defined