# GHB Strategy - Universe Re-Optimization

## üìã Purpose
This notebook automates the annual universe refresh process:
1. ‚úÖ Screen full S&P 500 for qualified stocks
2. ‚úÖ Identify top 25 by CAGR
3. ‚úÖ Compare to current universe
4. ‚úÖ Recommend adds/drops
5. ‚úÖ Analyze sector diversification
6. ‚úÖ Generate updated universe list

## ‚è±Ô∏è Runtime
**Expected:** 10-15 minutes (downloads 5 years of data for 500 stocks)

## üìÖ When to Run
- **Required:** Annually (January each year)
- **Optional:** When weekly scanner flags CRITICAL alert
- **Emergency:** If >30% of universe in N2 for 2+ weeks

---

## Step 1: Import Libraries & Setup

In [None]:
import sys
import subprocess
from pathlib import Path
import pandas as pd
import json
from datetime import datetime

# Add backtest directory to path
sys.path.insert(0, str(Path('../backtest').resolve()))

print("‚úÖ Libraries imported")
print(f"üìÖ Re-optimization Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}")
print("\n" + "="*80)

## Step 2: Load Current Universe

Load your existing 25-stock universe to compare against new screening results.

In [None]:
# Load current universe from portfolio file
portfolio_file = Path("../data/ghb_optimized_portfolio.txt")

current_universe = []
if portfolio_file.exists():
    with open(portfolio_file, 'r') as f:
        for line in f:
            line = line.strip()
            # Skip comments and empty lines
            if line and not line.startswith('#'):
                current_universe.append(line)
    
    print(f"üìä CURRENT UNIVERSE: {len(current_universe)} stocks")
    print("="*80)
    print(f"Stocks: {', '.join(sorted(current_universe))}")
    print("="*80)
    
    # Check last modification date
    last_modified = datetime.fromtimestamp(portfolio_file.stat().st_mtime)
    days_since = (datetime.now() - last_modified).days
    print(f"\n‚è∞ Last Updated: {last_modified.strftime('%Y-%m-%d')} ({days_since} days ago)")
    
    if days_since > 365:
        print("üî¥ CRITICAL: Universe is over 1 year old - re-optimization overdue!")
    elif days_since > 180:
        print("üü° WARNING: Universe is over 6 months old - consider updating")
    else:
        print("‚úÖ Universe is relatively fresh")
else:
    print("‚ö†Ô∏è No current universe file found - will create new one")
    current_universe = []

print("\n" + "="*80)

## Step 3: Run S&P 500 Screening

**This will take 10-15 minutes** as it downloads and analyzes 5 years of data for ~500 stocks.

The screening applies GHB volatility criteria:
- Standard Deviation ‚â• 30% OR
- Max Win ‚â• 150% OR
- Avg Win ‚â• 40%

In [None]:
print("üîÑ Running S&P 500 screening...")
print("‚è±Ô∏è Expected time: 10-15 minutes")
print("üì• Downloading 5 years of price data for ~500 stocks...\n")
print("="*80)

# Run screen_stocks.py
result = subprocess.run(
    ['python', '../backtest/screen_stocks.py', '--universe', 'sp500', '--refresh-data'],
    capture_output=True,
    text=True
)

# Print output
print(result.stdout)
if result.stderr:
    print("Warnings/Errors:")
    print(result.stderr)

if result.returncode == 0:
    print("\n‚úÖ Screening completed successfully!")
else:
    print(f"\n‚ùå Screening failed with exit code {result.returncode}")
    print("Check the output above for errors.")

print("="*80)

## Step 4: Load & Analyze Screening Results

Load the CSV results and identify the top 25 stocks by CAGR.

In [None]:
# Find most recent screening results
results_dir = Path("../backtest/results")
screening_files = list(results_dir.glob("stock_screening_*.csv"))

if not screening_files:
    print("‚ùå No screening results found!")
    print("Make sure the screening completed successfully in Step 3.")
else:
    # Get most recent file
    latest_file = max(screening_files, key=lambda p: p.stat().st_mtime)
    print(f"üìÑ Loading: {latest_file.name}\n")
    
    # Load results
    df_screening = pd.read_csv(latest_file)
    
    # Separate qualified and non-qualified
    df_qualified = df_screening[df_screening['Qualified'] == True].copy()
    df_nonqualified = df_screening[df_screening['Qualified'] == False].copy()
    
    # Sort by CAGR
    df_qualified = df_qualified.sort_values('CAGR', ascending=False)
    
    # Get top 25
    df_top25 = df_qualified.head(25)
    new_universe = df_top25['Ticker'].tolist()
    
    print("üìä SCREENING RESULTS")
    print("="*80)
    print(f"Total Stocks Tested: {len(df_screening)}")
    print(f"Qualified: {len(df_qualified)} ({len(df_qualified)/len(df_screening)*100:.1f}%)")
    print(f"Non-Qualified: {len(df_nonqualified)} ({len(df_nonqualified)/len(df_screening)*100:.1f}%)")
    print("="*80)
    
    if len(df_qualified) < 25:
        print(f"\n‚ö†Ô∏è WARNING: Only {len(df_qualified)} stocks qualified (need 25)")
        print("Consider:")
        print("  - Lowering volatility thresholds")
        print("  - Expanding to Russell 1000")
        print("  - Checking if market conditions are unusual")
    
    print(f"\nüìà TOP 25 STOCKS BY CAGR:")
    print("="*80)
    print(df_top25[['Ticker', 'CAGR', 'Total_Return_%', 'Win_Rate_%', 'Avg_Win_%', 'Max_DD_%']].to_string(index=False))
    print("="*80)

## Step 5: Compare New vs Current Universe

Analyze which stocks to keep, add, or remove.

In [None]:
if len(current_universe) > 0 and 'new_universe' in locals():
    # Convert to sets for comparison
    current_set = set(current_universe)
    new_set = set(new_universe)
    
    # Calculate overlaps
    keep_stocks = current_set & new_set  # Intersection
    add_stocks = new_set - current_set    # In new but not current
    drop_stocks = current_set - new_set   # In current but not new
    
    print("\nüîÑ UNIVERSE COMPARISON")
    print("="*80)
    print(f"Current Universe: {len(current_universe)} stocks")
    print(f"New Top 25: {len(new_universe)} stocks")
    print(f"Overlap: {len(keep_stocks)} stocks ({len(keep_stocks)/25*100:.1f}%)")
    print("="*80)
    
    # Stocks to KEEP
    print(f"\n‚úÖ KEEP ({len(keep_stocks)} stocks):")
    print("   These stocks are in both current and new top 25")
    if keep_stocks:
        keep_list = sorted(list(keep_stocks))
        for i in range(0, len(keep_list), 10):
            print(f"   {', '.join(keep_list[i:i+10])}")
    else:
        print("   (None - complete universe refresh)")
    
    # Stocks to ADD
    print(f"\n‚ûï ADD ({len(add_stocks)} stocks):")
    print("   New winners that should be added to universe")
    if add_stocks:
        # Show details for stocks to add
        df_add = df_top25[df_top25['Ticker'].isin(add_stocks)][['Ticker', 'CAGR', 'Total_Return_%', 'Win_Rate_%']]
        print(df_add.to_string(index=False))
    else:
        print("   (None - current universe is optimal)")
    
    # Stocks to DROP
    print(f"\n‚ûñ DROP ({len(drop_stocks)} stocks):")
    print("   Current stocks that didn't make new top 25")
    if drop_stocks:
        drop_list = sorted(list(drop_stocks))
        # Check if they still qualified
        for ticker in drop_list:
            if ticker in df_qualified['Ticker'].values:
                rank = df_qualified[df_qualified['Ticker'] == ticker].index[0] + 1
                cagr = df_qualified[df_qualified['Ticker'] == ticker]['CAGR'].values[0]
                print(f"   {ticker}: Still qualified but ranked #{rank} (CAGR: {cagr:.2f}%)")
            elif ticker in df_nonqualified['Ticker'].values:
                print(f"   {ticker}: ‚ùå NO LONGER QUALIFIES (failed volatility criteria)")
            else:
                print(f"   {ticker}: ‚ö†Ô∏è Not found in screening results")
    else:
        print("   (None - all current stocks still top 25)")
    
    print("\n" + "="*80)
    
    # Recommendation
    overlap_pct = len(keep_stocks) / 25 * 100
    
    print("\nüí° RECOMMENDATION:")
    if overlap_pct >= 80:
        print("   ‚úÖ MINOR UPDATE: High overlap (>80%)")
        print(f"   ‚Üí Keep {len(keep_stocks)} stocks, swap out {len(drop_stocks)} underperformers")
        print("   ‚Üí Low disruption to current portfolio")
    elif overlap_pct >= 50:
        print("   üü° MODERATE UPDATE: Medium overlap (50-80%)")
        print(f"   ‚Üí Keep {len(keep_stocks)} stocks, replace {len(drop_stocks)} with new winners")
        print("   ‚Üí Gradual transition recommended (4-8 weeks)")
    else:
        print("   üî¥ MAJOR REFRESH: Low overlap (<50%)")
        print(f"   ‚Üí Major universe shift: {len(add_stocks)} new stocks incoming")
        print("   ‚Üí Consider full portfolio reset or extended transition")
    
elif 'new_universe' in locals():
    print("\nüìä NEW UNIVERSE (no comparison)")
    print("="*80)
    print(f"Creating new universe with {len(new_universe)} stocks")
    print(f"Stocks: {', '.join(sorted(new_universe))}")
    print("="*80)
else:
    print("\n‚ö†Ô∏è No screening results available for comparison")

## Step 6: Sector Diversification Analysis

Analyze sector concentration in new universe vs current.

In [None]:
if 'new_universe' in locals():
    # Manually define sectors for key stocks (simplified)
    # In production, you'd fetch from yfinance or external API
    sector_map = {
        # Tech
        'NVDA': 'Technology', 'SMCI': 'Technology', 'AVGO': 'Technology', 'GOOGL': 'Technology',
        'GOOG': 'Technology', 'NFLX': 'Technology', 'ANET': 'Technology', 'ORCL': 'Technology',
        'MU': 'Technology', 'APH': 'Technology', 'MSFT': 'Technology', 'META': 'Technology',
        'AMD': 'Technology', 'AMAT': 'Technology', 'MRVL': 'Technology', 'FTNT': 'Technology',
        'PANW': 'Technology', 'PLTR': 'Technology',
        # Energy
        'TRGP': 'Energy', 'MPC': 'Energy', 'DVN': 'Energy', 'WMB': 'Energy', 'FANG': 'Energy',
        # Industrial
        'GE': 'Industrial', 'AXON': 'Industrial', 'PWR': 'Industrial', 'CTAS': 'Industrial',
        # Healthcare
        'LLY': 'Healthcare', 'MCK': 'Healthcare', 'CAH': 'Healthcare', 'MRNA': 'Healthcare',
        'VRTX': 'Healthcare',
        # Utilities
        'CEG': 'Utilities', 'VST': 'Utilities',
        # Consumer
        'DECK': 'Consumer', 'STX': 'Consumer', 'COST': 'Consumer', 'AMZN': 'Consumer',
        'TSLA': 'Consumer', 'BKNG': 'Consumer', 'ROST': 'Consumer',
        # Financial
        'JPM': 'Financial',
    }
    
    # Assign sectors to new universe
    new_sectors = {}
    for ticker in new_universe:
        sector = sector_map.get(ticker, 'Other')
        new_sectors[sector] = new_sectors.get(sector, 0) + 1
    
    print("\nüìä NEW UNIVERSE - SECTOR BREAKDOWN")
    print("="*80)
    for sector in sorted(new_sectors.keys(), key=lambda x: new_sectors[x], reverse=True):
        count = new_sectors[sector]
        pct = count / 25 * 100
        print(f"{sector:15s}: {count:2d} stocks ({pct:5.1f}%)")
    print("="*80)
    
    # Check for concentration risk
    max_sector_pct = max(new_sectors.values()) / 25 * 100
    if max_sector_pct > 50:
        print("\nüî¥ WARNING: High sector concentration (>50% in one sector)")
        print("   ‚Üí Consider diversifying to reduce sector risk")
    elif max_sector_pct > 40:
        print("\nüü° WATCH: Moderate sector concentration (40-50%)")
        print("   ‚Üí Monitor sector trends closely")
    else:
        print("\n‚úÖ HEALTHY: Good sector diversification")
    
    # Compare to current if available
    if len(current_universe) > 0:
        current_sectors = {}
        for ticker in current_universe:
            sector = sector_map.get(ticker, 'Other')
            current_sectors[sector] = current_sectors.get(sector, 0) + 1
        
        print("\nüìä CURRENT UNIVERSE - SECTOR BREAKDOWN (for comparison)")
        print("="*80)
        for sector in sorted(current_sectors.keys(), key=lambda x: current_sectors[x], reverse=True):
            count = current_sectors[sector]
            pct = count / len(current_universe) * 100
            print(f"{sector:15s}: {count:2d} stocks ({pct:5.1f}%)")
        print("="*80)

## Step 7: Generate Updated Universe Files

Create the updated stock list for both the notebook and text file.

**‚ö†Ô∏è This does NOT automatically update your files - you must manually copy/paste.**

In [None]:
if 'new_universe' in locals():
    print("\n" + "="*80)
    print("üìù UPDATED UNIVERSE CODE")
    print("="*80)
    print("\n1Ô∏è‚É£ FOR NOTEBOOK (ghb_portfolio_scanner.ipynb):")
    print("   Copy and paste this into the GHB_UNIVERSE cell:\n")
    print("```python")
    print("# GHB Strategy S&P 500 Optimized Portfolio - 25 Stocks")
    print(f"# Re-optimized: {datetime.now().strftime('%Y-%m-%d')}")
    print("GHB_UNIVERSE = [")
    
    # Format in rows of 4-5 stocks
    sorted_universe = sorted(new_universe)
    for i in range(0, len(sorted_universe), 5):
        batch = sorted_universe[i:i+5]
        line = ', '.join([f"'{ticker}'" for ticker in batch])
        if i + 5 < len(sorted_universe):
            line += ','
        print(f"    {line}")
    
    print("]")
    print("```\n")
    
    print("2Ô∏è‚É£ FOR TEXT FILE (data/ghb_optimized_portfolio.txt):")
    print("   Replace file contents with this:\n")
    print("```")
    print("# GHB Strategy Optimized Portfolio - S&P 500 Optimized Universe (25 stocks)")
    print(f"# Re-optimized: {datetime.now().strftime('%Y-%m-%d')}")
    print("# Updated: [Your next update date - typically 1 year from now]")
    print("#")
    print("# Stocks (Top 25 S&P 500 by CAGR - sorted alphabetically):")
    for ticker in sorted_universe:
        print(ticker)
    print("```\n")
    
    print("="*80)
    print("üìã NEXT STEPS:")
    print("="*80)
    print("1. Review the recommendations above")
    print("2. Decide: Full refresh, partial update, or keep current?")
    print("3. If updating:")
    print("   a. Copy notebook code to ghb_portfolio_scanner.ipynb")
    print("   b. Copy text to data/ghb_optimized_portfolio.txt")
    print("   c. Run backtest to validate (optional)")
    print("4. Plan portfolio transition (immediate/gradual/hybrid)")
    print("5. Execute trades over 2-8 weeks")
    print("\nüìÑ See: docs/RE-OPTIMIZATION_GUIDE.md for detailed instructions")
    print("="*80)
else:
    print("\n‚ö†Ô∏è No new universe available to generate code")

## Step 8: Save Analysis Report

Save a detailed report of this re-optimization for future reference.

In [None]:
if 'new_universe' in locals() and len(current_universe) > 0:
    # Create report
    report_dir = Path("../backtest/results")
    report_file = report_dir / f"reoptimization_report_{datetime.now().strftime('%Y%m%d_%H%M')}.txt"
    
    with open(report_file, 'w') as f:
        f.write("GHB STRATEGY - UNIVERSE RE-OPTIMIZATION REPORT\n")
        f.write("="*80 + "\n")
        f.write(f"Date: {datetime.now().strftime('%Y-%m-%d %H:%M')}\n")
        f.write("="*80 + "\n\n")
        
        f.write("SUMMARY\n")
        f.write("-" * 80 + "\n")
        f.write(f"Current Universe: {len(current_universe)} stocks\n")
        f.write(f"New Top 25: {len(new_universe)} stocks\n")
        f.write(f"Overlap: {len(keep_stocks)} stocks ({len(keep_stocks)/25*100:.1f}%)\n")
        f.write(f"To Add: {len(add_stocks)} stocks\n")
        f.write(f"To Drop: {len(drop_stocks)} stocks\n\n")
        
        f.write("STOCKS TO KEEP\n")
        f.write("-" * 80 + "\n")
        for ticker in sorted(keep_stocks):
            f.write(f"  {ticker}\n")
        f.write("\n")
        
        f.write("STOCKS TO ADD\n")
        f.write("-" * 80 + "\n")
        for ticker in sorted(add_stocks):
            row = df_top25[df_top25['Ticker'] == ticker].iloc[0]
            f.write(f"  {ticker}: CAGR {row['CAGR']:.2f}%, Win Rate {row['Win_Rate_%']:.1f}%\n")
        f.write("\n")
        
        f.write("STOCKS TO DROP\n")
        f.write("-" * 80 + "\n")
        for ticker in sorted(drop_stocks):
            if ticker in df_qualified['Ticker'].values:
                rank = df_qualified[df_qualified['Ticker'] == ticker].index[0] + 1
                cagr = df_qualified[df_qualified['Ticker'] == ticker]['CAGR'].values[0]
                f.write(f"  {ticker}: Still qualified but ranked #{rank} (CAGR: {cagr:.2f}%)\n")
            else:
                f.write(f"  {ticker}: No longer qualifies\n")
        f.write("\n")
        
        f.write("NEW UNIVERSE (alphabetical)\n")
        f.write("-" * 80 + "\n")
        for ticker in sorted(new_universe):
            f.write(f"  {ticker}\n")
    
    print(f"\n‚úÖ Report saved: {report_file.name}")
    print(f"üìÅ Location: {report_file.parent.absolute()}")
else:
    print("\n‚è≠Ô∏è Skipping report (no comparison available)")

## üìä Summary

### What This Notebook Did:
1. ‚úÖ Loaded your current 25-stock universe
2. ‚úÖ Screened full S&P 500 (~500 stocks)
3. ‚úÖ Identified qualified stocks and ranked by CAGR
4. ‚úÖ Compared new top 25 vs current universe
5. ‚úÖ Analyzed which stocks to keep/add/drop
6. ‚úÖ Checked sector diversification
7. ‚úÖ Generated updated universe code
8. ‚úÖ Saved detailed report

### Your Decision:
Based on the analysis above, decide:
- **Keep current universe?** (if overlap >80% and performance good)
- **Partial update?** (swap out worst performers)
- **Full refresh?** (replace with new top 25)

### If Updating:
1. Copy the generated code from Step 7
2. Update `ghb_portfolio_scanner.ipynb` (GHB_UNIVERSE cell)
3. Update `data/ghb_optimized_portfolio.txt`
4. Optional: Run backtest to validate performance
5. Plan portfolio transition strategy

### Documentation:
üìÑ Full guide: `docs/RE-OPTIMIZATION_GUIDE.md`

---

**Next re-optimization:** January 2027 (1 year from now)