# Baseline QVM Engine (v1) - Unit Test

**Purpose:** Validate baseline engine (v1) as control group for scientific bake-off  
**Engine Type:** Simple ROAE-based Quality Signal implementation  
**Test Universe:** 4 tickers (OCB, NLG, FPT, SSI)  
**Test Date:** 2025-03-31 (known data availability)  
**Status:** CONTROL GROUP for signal construction experiment

**Success Criteria:**
- ✅ Unit test runs without errors on 4-ticker universe
- ✅ All factor scores are non-zero and economically reasonable
- ✅ Results represent simple hypothesis baseline
- ✅ **HYPOTHESIS**: Expected ~18% annual return, 1.2 Sharpe ratio

In [1]:
# Setup imports and logging
import sys
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime
import logging

# Add production engine to path
production_path = Path.cwd().parent
sys.path.append(str(production_path))

# Import baseline engine (v1)
from engine.qvm_engine_v1_baseline import QVMEngineV1Baseline

# Setup logging for test visibility
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

print("✅ Baseline QVM Engine (v1) Unit Test Setup Complete")
print(f"Production path: {production_path}")

✅ Baseline QVM Engine (v1) Unit Test Setup Complete
Production path: /Users/ducnguyen/Library/CloudStorage/GoogleDrive-duc.nguyentcb@gmail.com/My Drive/quant-world-invest/factor_investing_project/production


In [2]:
# Initialize baseline engine (v1)
print("🔧 Initializing Baseline QVM Engine (v1)...")

try:
    # Point to project config directory
    project_root = Path.cwd().parent.parent
    config_path = project_root / 'config'
    
    engine = QVMEngineV1Baseline(config_path=str(config_path), log_level='INFO')
    
    print("✅ Baseline engine (v1) initialized successfully")
    print(f"Database connection: {engine.db_config['host']}/{engine.db_config['schema_name']}")
    print(f"Reporting lag: {engine.reporting_lag} days")
    print("📋 Engine Type: Simple ROAE-based Quality Signal")
    
except Exception as e:
    print(f"❌ Engine initialization failed: {e}")
    raise

2025-07-22 21:33:35,644 - CanonicalQVMEngine - INFO - Initializing Canonical QVM Engine
2025-07-22 21:33:35,644 - CanonicalQVMEngine - INFO - Initializing Canonical QVM Engine
2025-07-22 21:33:35,654 - CanonicalQVMEngine - INFO - Configurations loaded successfully
2025-07-22 21:33:35,654 - CanonicalQVMEngine - INFO - Configurations loaded successfully
2025-07-22 21:33:35,719 - CanonicalQVMEngine - INFO - Database connection established successfully
2025-07-22 21:33:35,719 - CanonicalQVMEngine - INFO - Database connection established successfully
2025-07-22 21:33:35,719 - CanonicalQVMEngine - INFO - Canonical QVM Engine initialized successfully
2025-07-22 21:33:35,719 - CanonicalQVMEngine - INFO - Canonical QVM Engine initialized successfully
2025-07-22 21:33:35,720 - CanonicalQVMEngine - INFO - QVM Weights: Quality 40.0%, Value 30.0%, Momentum 30.0%
2025-07-22 21:33:35,720 - CanonicalQVMEngine - INFO - QVM Weights: Quality 40.0%, Value 30.0%, Momentum 30.0%


🔧 Initializing Baseline QVM Engine (v1)...
✅ Baseline engine (v1) initialized successfully
Database connection: localhost/alphabeta
Reporting lag: 45 days
📋 Engine Type: Simple ROAE-based Quality Signal


In [3]:
# Define test parameters
TEST_DATE = pd.Timestamp('2025-07-22')  # Known data availability
TEST_UNIVERSE = ['OCB', 'NLG', 'FPT', 'SSI']  # Multi-sector test universe

EXPECTED_SECTORS = {
    'OCB': 'Banking',
    'NLG': 'Real Estate', 
    'FPT': 'Technology',
    'SSI': 'Securities'
}

print(f"📊 Test Configuration:")
print(f"Test Date: {TEST_DATE.date()}")
print(f"Test Universe: {TEST_UNIVERSE}")
print(f"Expected Sectors: {EXPECTED_SECTORS}")

# Validate quarter availability
quarter_info = engine.get_correct_quarter_for_date(TEST_DATE)
if quarter_info:
    year, quarter = quarter_info
    print(f"✅ Available quarter: {year} Q{quarter}")
else:
    print(f"⚠️ No quarter data available for {TEST_DATE.date()}")

📊 Test Configuration:
Test Date: 2025-07-22
Test Universe: ['OCB', 'NLG', 'FPT', 'SSI']
Expected Sectors: {'OCB': 'Banking', 'NLG': 'Real Estate', 'FPT': 'Technology', 'SSI': 'Securities'}
✅ Available quarter: 2025 Q1


In [4]:
# Test 1: Sector Mapping Validation
print("\n🧪 TEST 1: Sector Mapping Validation")
print("=" * 50)

try:
    sector_map = engine.get_sector_mapping()
    test_sectors = sector_map[sector_map['ticker'].isin(TEST_UNIVERSE)]
    
    print(f"Retrieved sectors for test universe:")
    for _, row in test_sectors.iterrows():
        ticker = row['ticker']
        sector = row['sector']
        expected = EXPECTED_SECTORS[ticker]
        status = "✅" if sector == expected else "❌"
        print(f"{status} {ticker}: {sector} (expected: {expected})")
    
    # Validation
    all_correct = all(
        test_sectors[test_sectors['ticker'] == ticker]['sector'].iloc[0] == expected
        for ticker, expected in EXPECTED_SECTORS.items()
        if ticker in test_sectors['ticker'].values
    )
    
    if all_correct:
        print("✅ TEST 1 PASSED: Sector mapping correct")
    else:
        print("❌ TEST 1 FAILED: Sector mapping incorrect")
        
except Exception as e:
    print(f"❌ TEST 1 ERROR: {e}")


🧪 TEST 1: Sector Mapping Validation
Retrieved sectors for test universe:
✅ NLG: Real Estate (expected: Real Estate)
✅ SSI: Securities (expected: Securities)
✅ FPT: Technology (expected: Technology)
✅ OCB: Banking (expected: Banking)
✅ TEST 1 PASSED: Sector mapping correct


In [5]:
# Test 2: Fundamental Data Retrieval
print("\n🧪 TEST 2: Fundamental Data Retrieval")
print("=" * 50)

try:
    fundamentals = engine.get_fundamentals_correct_timing(TEST_DATE, TEST_UNIVERSE)
    
    if not fundamentals.empty:
        print(f"✅ Retrieved {len(fundamentals)} fundamental records")
        
        # Check data quality
        for ticker in TEST_UNIVERSE:
            ticker_data = fundamentals[fundamentals['ticker'] == ticker]
            if not ticker_data.empty:
                row = ticker_data.iloc[0]
                sector = row.get('sector', 'Unknown')
                
                # Check key metrics
                net_profit = row.get('NetProfit_TTM', 0)
                total_equity = row.get('AvgTotalEquity', 0)
                has_ttm = row.get('has_full_ttm', 0)
                
                print(f"📊 {ticker} ({sector}):")
                print(f"   NetProfit_TTM: {net_profit:,.0f}")
                print(f"   AvgTotalEquity: {total_equity:,.0f}")
                print(f"   Has Full TTM: {bool(has_ttm)}")
            else:
                print(f"⚠️ {ticker}: No fundamental data")
        
        print("✅ TEST 2 PASSED: Fundamental data retrieved")
    else:
        print("❌ TEST 2 FAILED: No fundamental data retrieved")
        
except Exception as e:
    print(f"❌ TEST 2 ERROR: {e}")

2025-07-22 21:33:58,328 - CanonicalQVMEngine - INFO - Retrieved 4 total fundamental records for 2025-07-22
2025-07-22 21:33:58,328 - CanonicalQVMEngine - INFO - Retrieved 4 total fundamental records for 2025-07-22



🧪 TEST 2: Fundamental Data Retrieval
✅ Retrieved 4 fundamental records
📊 OCB (Banking):
   NetProfit_TTM: 2,932,934,728,146
   AvgTotalEquity: 30,838,336,130,891
   Has Full TTM: True
📊 NLG (Real Estate):
   NetProfit_TTM: 1,556,557,651,450
   AvgTotalEquity: 13,803,448,662,579
   Has Full TTM: True
📊 FPT (Technology):
   NetProfit_TTM: 9,855,370,712,531
   AvgTotalEquity: 34,704,201,924,362
   Has Full TTM: True
📊 SSI (Securities):
   NetProfit_TTM: 2,924,802,015,721
   AvgTotalEquity: 25,501,091,461,874
   Has Full TTM: True
✅ TEST 2 PASSED: Fundamental data retrieved


In [6]:
# Test 4: QVM Composite Calculation (CRITICAL TEST)
print("\n🧪 TEST 4: QVM Composite Calculation (CRITICAL)")
print("=" * 50)

try:
    qvm_scores = engine.calculate_qvm_composite(TEST_DATE, TEST_UNIVERSE)
    
    if qvm_scores:
        print(f"✅ Calculated QVM scores for {len(qvm_scores)} tickers")
        print("\n📊 QVM COMPOSITE RESULTS:")
        print("-" * 40)
        
        # Sort by QVM score for ranking
        sorted_scores = sorted(qvm_scores.items(), key=lambda x: x[1], reverse=True)
        
        for rank, (ticker, score) in enumerate(sorted_scores, 1):
            sector = EXPECTED_SECTORS.get(ticker, 'Unknown')
            print(f"{rank}. {ticker} ({sector}): {score:.4f}")
        
        # Validation checks
        non_zero_scores = [score for score in qvm_scores.values() if abs(score) > 0.001]
        reasonable_range = [score for score in qvm_scores.values() if -5 <= score <= 5]
        
        print(f"\n📋 VALIDATION SUMMARY:")
        print(f"   Total scores: {len(qvm_scores)}")
        print(f"   Non-zero scores: {len(non_zero_scores)}")
        print(f"   Reasonable range (-5 to 5): {len(reasonable_range)}")
        
        # Success criteria
        success_criteria = [
            len(qvm_scores) == len(TEST_UNIVERSE),
            len(non_zero_scores) >= 2,  # At least half should be non-zero
            len(reasonable_range) == len(qvm_scores),  # All should be reasonable
            not any(np.isnan(score) for score in qvm_scores.values())  # No NaN values
        ]
        
        if all(success_criteria):
            print("✅ TEST 4 PASSED: QVM calculation successful")
            print("🎯 CANONICAL ENGINE VALIDATION COMPLETE")
        else:
            print("❌ TEST 4 FAILED: QVM calculation issues detected")
            print(f"   Criteria: {success_criteria}")
            
    else:
        print("❌ TEST 4 FAILED: No QVM scores calculated")
        
except Exception as e:
    print(f"❌ TEST 4 ERROR: {e}")
    import traceback
    traceback.print_exc()

2025-07-22 21:34:31,087 - CanonicalQVMEngine - INFO - Calculating QVM composite for 4 tickers on 2025-07-22
2025-07-22 21:34:31,087 - CanonicalQVMEngine - INFO - Calculating QVM composite for 4 tickers on 2025-07-22
2025-07-22 21:34:31,151 - CanonicalQVMEngine - INFO - Retrieved 4 total fundamental records for 2025-07-22
2025-07-22 21:34:31,151 - CanonicalQVMEngine - INFO - Retrieved 4 total fundamental records for 2025-07-22



🧪 TEST 4: QVM Composite Calculation (CRITICAL)


2025-07-22 21:34:31,319 - CanonicalQVMEngine - INFO - Falling back to cross-sectional normalization
2025-07-22 21:34:31,319 - CanonicalQVMEngine - INFO - Falling back to cross-sectional normalization
2025-07-22 21:34:31,338 - CanonicalQVMEngine - INFO - Falling back to cross-sectional normalization
2025-07-22 21:34:31,338 - CanonicalQVMEngine - INFO - Falling back to cross-sectional normalization
2025-07-22 21:34:32,013 - CanonicalQVMEngine - INFO - Successfully calculated QVM scores for 4 tickers
2025-07-22 21:34:32,013 - CanonicalQVMEngine - INFO - Successfully calculated QVM scores for 4 tickers


✅ Calculated QVM scores for 4 tickers

📊 QVM COMPOSITE RESULTS:
----------------------------------------
1. NLG (Real Estate): 0.3070
2. OCB (Banking): 0.2831
3. FPT (Technology): -0.0009
4. SSI (Securities): -0.5892

📋 VALIDATION SUMMARY:
   Total scores: 4
   Non-zero scores: 3
   Reasonable range (-5 to 5): 4
✅ TEST 4 PASSED: QVM calculation successful
🎯 CANONICAL ENGINE VALIDATION COMPLETE


In [7]:
# Final Validation Summary
print("\n🎯 FINAL VALIDATION SUMMARY")
print("=" * 50)

# Run complete validation
try:
    # Test complete engine workflow
    final_scores = engine.calculate_qvm_composite(TEST_DATE, TEST_UNIVERSE)
    
    validation_results = {
        'Engine Initialization': True,
        'Sector Mapping': len(engine.get_sector_mapping()) > 0,
        'Fundamental Data': len(engine.get_fundamentals_correct_timing(TEST_DATE, TEST_UNIVERSE)) > 0,
        'Market Data': len(engine.get_market_data(TEST_DATE, TEST_UNIVERSE)) > 0,
        'QVM Calculation': len(final_scores) > 0,
        'Non-Zero Results': any(abs(score) > 0.001 for score in final_scores.values()),
        'Reasonable Values': all(-10 <= score <= 10 for score in final_scores.values()),
        'No NaN Values': not any(np.isnan(score) for score in final_scores.values())
    }
    
    print("📊 VALIDATION CHECKLIST:")
    all_passed = True
    
    for test_name, result in validation_results.items():
        status = "✅" if result else "❌"
        print(f"{status} {test_name}: {'PASS' if result else 'FAIL'}")
        if not result:
            all_passed = False
    
    print("\n" + "=" * 50)
    if all_passed:
        print("🎉 CANONICAL ENGINE VALIDATION: ✅ PASSED")
        print("🚀 READY FOR PHASE 2: DATA RESTORATION")
        print("\n🎯 GATE REQUIREMENT MET - PROCEED TO PRODUCTION USE")
    else:
        print("🚫 CANONICAL ENGINE VALIDATION: ❌ FAILED")
        print("⚠️  DO NOT PROCEED TO PHASE 2 - FIX ISSUES FIRST")
        print("\n🛑 GATE REQUIREMENT NOT MET - TROUBLESHOOTING REQUIRED")
    
    print("=" * 50)
    
except Exception as e:
    print(f"❌ FINAL VALIDATION ERROR: {e}")
    print("🛑 CANONICAL ENGINE NOT READY FOR PRODUCTION")

2025-07-22 21:34:58,526 - CanonicalQVMEngine - INFO - Calculating QVM composite for 4 tickers on 2025-07-22
2025-07-22 21:34:58,526 - CanonicalQVMEngine - INFO - Calculating QVM composite for 4 tickers on 2025-07-22
2025-07-22 21:34:58,596 - CanonicalQVMEngine - INFO - Retrieved 4 total fundamental records for 2025-07-22
2025-07-22 21:34:58,596 - CanonicalQVMEngine - INFO - Retrieved 4 total fundamental records for 2025-07-22
2025-07-22 21:34:58,690 - CanonicalQVMEngine - INFO - Falling back to cross-sectional normalization
2025-07-22 21:34:58,690 - CanonicalQVMEngine - INFO - Falling back to cross-sectional normalization
2025-07-22 21:34:58,694 - CanonicalQVMEngine - INFO - Falling back to cross-sectional normalization
2025-07-22 21:34:58,694 - CanonicalQVMEngine - INFO - Falling back to cross-sectional normalization



🎯 FINAL VALIDATION SUMMARY


2025-07-22 21:34:58,737 - CanonicalQVMEngine - INFO - Successfully calculated QVM scores for 4 tickers
2025-07-22 21:34:58,737 - CanonicalQVMEngine - INFO - Successfully calculated QVM scores for 4 tickers
2025-07-22 21:34:58,766 - CanonicalQVMEngine - INFO - Retrieved 4 total fundamental records for 2025-07-22
2025-07-22 21:34:58,766 - CanonicalQVMEngine - INFO - Retrieved 4 total fundamental records for 2025-07-22


📊 VALIDATION CHECKLIST:
✅ Engine Initialization: PASS
✅ Sector Mapping: PASS
✅ Fundamental Data: PASS
✅ Market Data: PASS
✅ QVM Calculation: PASS
✅ Non-Zero Results: PASS
✅ Reasonable Values: PASS
✅ No NaN Values: PASS

🎉 CANONICAL ENGINE VALIDATION: ✅ PASSED
🚀 READY FOR PHASE 2: DATA RESTORATION

🎯 GATE REQUIREMENT MET - PROCEED TO PRODUCTION USE


# Validation Notes

## Success Criteria Checklist
- [ ] Engine initializes without errors
- [ ] Sector mapping retrieval works correctly
- [ ] Fundamental data retrieval with point-in-time logic
- [ ] Market data retrieval as of analysis date
- [ ] QVM composite calculation produces reasonable results
- [ ] All factor scores are non-zero and economically sensible
- [ ] No NaN values in output
- [ ] Results are in reasonable range (-10 to +10)

## Expected Behavior
- **OCB (Banking)**: Should have reasonable quality/value scores from banking metrics
- **NLG (Real Estate)**: Should show sector-specific characteristics
- **FPT (Technology)**: Typically high-quality, growth-oriented scores
- **SSI (Securities)**: Should reflect securities sector dynamics

## Gate Requirement
**🚨 CRITICAL**: This unit test serves as the gate requirement for Phase 2 progression. All tests must pass before any production data restoration attempts.

If any test fails, the canonical engine must be fixed before proceeding to avoid contaminating the production data restoration process.

In [8]:
# Setup imports and logging
import sys
import pandas as pd
import numpy as np
from pathlib import Path
from datetime import datetime
import logging

# Add production engine to path
production_path = Path.cwd().parent
sys.path.append(str(production_path))

# Import baseline engine (v1)
from engine.qvm_engine_v1_baseline import QVMEngineV1Baseline

# Setup logging for test visibility
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

print("✅ Baseline QVM Engine (v1) Unit Test Setup Complete")
print(f"Production path: {production_path}")

✅ Baseline QVM Engine (v1) Unit Test Setup Complete
Production path: /Users/ducnguyen/Library/CloudStorage/GoogleDrive-duc.nguyentcb@gmail.com/My Drive/quant-world-invest/factor_investing_project/production
