## 🔧 **Why Python Modules vs Notebooks?**

### **The Hybrid Approach Philosophy**

This workspace uses both **Python modules** (`.py` files) and **Jupyter notebooks** (`.ipynb` files) strategically:

#### 📓 **Use Notebooks When:**
- **Interactive exploration** - Testing ideas, parameter tuning, data exploration
- **Visualization & reporting** - Charts, analysis reports, stakeholder presentations
- **One-off analysis** - Custom client requests, ad-hoc research questions
- **Model development** - Prototyping new factors before productionizing
- **Documentation** - Explaining methodology with embedded code and results

#### 🐍 **Use Python Modules When:**
- **Reusable business logic** - Factor calculations used across multiple analyses
- **Production automation** - Scripts that run on schedule without human interaction
- **Clean, testable code** - Functions that need unit testing and version control
- **Performance-critical** - Code processing large datasets efficiently
- **Importable functions** - Logic that needs to be imported into other scripts

### **Our Organized Structure:**

```
Bob_EquiLend_Models/
├── 🏭 PRODUCTION (Python Modules)
│   ├── models/core_factors.py        # Factor calculations
│   ├── models/extended_factors.py    # Advanced models
│   └── src/daily_digest.py           # Report generation
│
├── 📊 INTERACTIVE (Notebooks)
│   ├── EquiLend_Consolidated_Playbook.ipynb  # This notebook
│   └── Factor_Development.ipynb              # New factor prototyping
│
└── 📁 SUPPORTING
    ├── data/           # Your securities lending data
    └── documentation/  # Word docs and specifications
```

### **Workflow in Practice:**

1. **🔬 Research** → Develop new ideas in `Factor_Development.ipynb`
2. **🏗️ Build** → Move stable code to modules (`models/core_factors.py`)
3. **📊 Analyze** → Import modules into this playbook for daily analysis  
4. **🚀 Deploy** → Use modules directly for automated production runs

**Result:** You get notebook interactivity for research AND clean, maintainable modules for production!

In [None]:
# Main Jupyter Notebook for EquiLend Financing Models
# Consolidated playbook with all models and analysis

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# EquiLend Equity Financing Research Playbook  \n",
    "*Generated: 2025-06-29*  \n",
    "\n",
    "This notebook consolidates all models discussed – short-squeeze analytics **and** broader equity-financing factors such as credit-equity basis, options-skew divergence, ETF flow pressure, macro-liquidity overlay, ESG constraints, crowd buzz and more."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Setup and Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import sys\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from datetime import date, datetime\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Add organized models to path\n",
    "sys.path.append('../models')\n",
    "\n",
    "# Import our organized models\n",
    "from core_factors import (\n",
    "    ShortInterestMomentum, BorrowCostShock, UtilizationPersistence,\n",
    "    FeeTrendZScore, DaysToCoverZ, LocateProxyFactor, compute_all_factors\n",
    ")\n",
    "\n",
    "from extended_factors import (\n",
    "    BorrowCDSBasis, OptionsSkewDivergence, ETFFlowPressure,\n",
    "    MacroLiquidityStress, ESGConstraintGauge, CrowdBuzzPulse,\n",
    "    EnhancedShortSqueezeV4, compute_extended_factors\n",
    ")\n",
    "\n",
    "print(f\"EquiLend Financing Models Playbook - {date.today()}\")\n",
    "print(\"All models loaded successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Environment Setup\n",
    "\n",
    "Set up your API keys in a `.env` file:\n",
    "```bash\n",
    "FRED_KEY=your_fred_token\n",
    "POLYGON_KEY=your_polygon_token\n",
    "GEMINI_API_KEY=your_gemini_key\\n\",
    "SQL_URI=your_database_connection\n",
    "```"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load environment variables\n",
    "from dotenv import load_dotenv\n",
    "load_dotenv()\n",
    "\n",
    "# Verify key environment variables\n",
    "required_keys = ['FRED_KEY', 'GEMINI_API_KEY']\\n\",
    "missing_keys = [key for key in required_keys if not os.getenv(key)]\n",
    "\n",
    "if missing_keys:\n",
    "    print(f\"⚠️  Missing environment variables: {missing_keys}\")\n",
    "    print(\"Some features may not work without proper API keys\")\n",
    "else:\n",
    "    print(\"✅ All required environment variables found\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Complete Model Catalog\n",
    "\n",
    "The table below lists every model we have covered so far."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Model catalog\n",
    "models_data = {\n",
    "    'Model': [\n",
    "        'Short-Interest Momentum (SIM)',\n",
    "        'Borrow Cost Shock (BCS)',\n",
    "        'Utilisation Persistence (UPI)',\n",
    "        'Fee Trend Z-Score (FTZ)',\n",
    "        'Locate Proxy Factor (LPF)',\n",
    "        'Days-To-Cover Z (DTC_z)',\n",
    "        'Borrow-CDS Basis',\n",
    "        'Options Skew Divergence',\n",
    "        'ETF Flow Pressure',\n",
    "        'Macro Liquidity Stress',\n",
    "        'ESG Constraint Gauge',\n",
    "        'Crowd Buzz Pulse',\n",
    "        'Enhanced Short Squeeze Prediction (SSR v4)'\n",
    "    ],\n",
    "    'Inputs': [\n",
    "        'ΔOn-Loan Qty, ΔFee',\n",
    "        '1-day fee spike vs 30-day σ',\n",
    "        '20-day avg Util',\n",
    "        '20-day fee slope',\n",
    "        'Re-Rate Ratio, B2B loans',\n",
    "        'SI ÷ ADV',\n",
    "        'Fee All, ICE OAS (FRED)',\n",
    "        'Fee, Util, CBOE SKEW (FRED)',\n",
    "        'On-Loan Qty, ETF ΔSharesOut',\n",
    "        'Util, STLFSI2 (FRED)',\n",
    "        'Lender Count, MSCI ESG',\n",
    "        'Reddit/X mention velocity',\n",
    "        'SIM+BCS+UPI+Buzz+Uptick'\n",
    "    ],\n",
    "    'Purpose': [\n",
    "        'Gauge accelerating short build-up',\n",
    "        'Detect sudden scarcity events',\n",
    "        'Persistent tight supply',\n",
    "        'Under-the-radar fee drifts',\n",
    "        'Stand-in for locate surges',\n",
    "        'Short-covering pressure',\n",
    "        'Credit-equity dislocations',\n",
    "        'Hedge mis-pricing signal',\n",
    "        'Arbitrage strain detection',\n",
    "        'Systemic stress overlay',\n",
    "        'Supply limits from ESG',\n",
    "        'Retail-driven squeezes',\n",
    "        'Higher-resolution squeeze detector'\n",
    "    ]\n",
    "}\n",
    "\n",
    "models_df = pd.DataFrame(models_data)\n",
    "print(\"EquiLend Model Catalog:\")\n",
    "print(\"=\" * 80)\n",
    "display(models_df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Load Sample Data\n",
    "\n",
    "Load your EquiLend data or create sample data for testing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create sample data for demonstration\n",
    "np.random.seed(42)\n",
    "n_days = 252  # One trading year\n",
    "n_stocks = 100\n",
    "\n",
    "# Generate date range\n",
    "dates = pd.date_range(start='2024-01-01', periods=n_days, freq='B')\n",
    "\n",
    "# Create sample securities lending data\n",
    "sample_data = []\n",
    "for i in range(n_stocks):\n",
    "    ticker = f\"STOCK_{i:03d}\"\n",
    "    \n",
    "    # Generate synthetic time series\n",
    "    base_fee = np.random.uniform(10, 500)  # Base borrow fee in bps\n",
    "    base_util = np.random.uniform(1, 95)   # Base utilization %\n",
    "    \n",
    "    for date in dates:\n",
    "        # Add some time series dynamics\n",
    "        fee_noise = np.random.normal(0, base_fee * 0.1)\n",
    "        util_noise = np.random.normal(0, base_util * 0.05)\n",
    "        \n",
    "        sample_data.append({\n",
    "            'Date': date,\n",
    "            'ticker': ticker,\n",
    "            'sec_desc': f'Sample Company {i}',\n",
    "            'industry': np.random.choice(['Technology', 'Healthcare', 'Finance', 'Energy', 'Consumer']),\n",
    "            'Fee All (BPS)': max(0, base_fee + fee_noise),\n",
    "            'Active Utilization (%)': np.clip(base_util + util_noise, 0, 100),\n",
    "            'On Loan Quantity': np.random.uniform(100000, 10000000),\n",
    "            'On Loan Quantity Month Diff': np.random.uniform(-50, 200),\n",
    "            'Fee All Month Diff (BPS)': np.random.uniform(-100, 100),\n",
    "            'Short Interest': np.random.uniform(1000000, 50000000),\n",
    "            'Average Daily Volume': np.random.uniform(500000, 5000000),\n",
    "            'Re-Rate Ratio': np.random.uniform(0.5, 3.0),\n",
    "            'B2B Loans': np.random.uniform(0, 1000000),\n",
    "            'Lender Count': np.random.randint(5, 50)\n",
    "        })\n",
    "\n",
    "df = pd.DataFrame(sample_data)\n",
    "df['Date'] = pd.to_datetime(df['Date'])\n",
    "df = df.set_index(['Date', 'ticker']).sort_index()\n",
    "\n",
    "print(f\"Sample dataset created: {len(df)} rows, {len(df.columns)} columns\")\n",
    "print(f\"Date range: {df.index.get_level_values('Date').min()} to {df.index.get_level_values('Date').max()}\")\n",
    "print(f\"Unique tickers: {df.index.get_level_values('ticker').nunique()}\")\n",
    "display(df.head())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Core Factor Analysis\n",
    "\n",
    "Apply all core short squeeze factors to the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Reset index for easier processing\n",
    "df_reset = df.reset_index()\n",
    "\n",
    "# Apply core factors\n",
    "print(\"Computing core factors...\")\n",
    "df_with_factors = compute_all_factors(df_reset)\n",
    "\n",
    "# Show factor summary\n",
    "factor_cols = ['SIM', 'BCS', 'UPI', 'FTZ', 'DTC_z', 'LPF']\n",
    "factor_summary = df_with_factors[factor_cols].describe()\n",
    "\n",
    "print(\"\\nCore Factor Summary:\")\n",
    "print(\"=\" * 50)\n",
    "display(factor_summary)\n",
    "\n",
    "# Plot factor distributions\n",
    "fig, axes = plt.subplots(2, 3, figsize=(15, 10))\n",
    "axes = axes.flatten()\n",
    "\n",
    "for i, factor in enumerate(factor_cols):\n",
    "    df_with_factors[factor].hist(bins=50, ax=axes[i], alpha=0.7, color='steelblue')\n",
    "    axes[i].set_title(f'{factor} Distribution')\n",
    "    axes[i].set_xlabel('Z-Score')\n",
    "    axes[i].set_ylabel('Frequency')\n",
    "    axes[i].axvline(0, color='red', linestyle='--', alpha=0.7)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.suptitle('Core Factor Distributions', y=1.02, fontsize=16)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Extended Factor Analysis\n",
    "\n",
    "Apply extended factors that use external data sources."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Apply extended factors\n",
    "print(\"Computing extended factors...\")\n",
    "df_extended = compute_extended_factors(df_with_factors)\n",
    "\n",
    "# Show extended factor summary\n",
    "extended_cols = ['Borrow_CDS_Basis', 'Options_Skew_Div', 'ETF_Flow_Pressure', \n",
    "                'Macro_Stress', 'ESG_Constraint', 'Crowd_Buzz', 'SSR_v4']\n",
    "\n",
    "available_extended = [col for col in extended_cols if col in df_extended.columns]\n",
    "if available_extended:\n",
    "    extended_summary = df_extended[available_extended].describe()\n",
    "    print(\"\\nExtended Factor Summary:\")\n",
    "    print(\"=\" * 50)\n",
    "    display(extended_summary)\n",
    "else:\n",
    "    print(\"\\n⚠️  Extended factors require external data sources (FRED API, etc.)\")\n",
    "    print(\"Set up API keys to enable extended factor computation.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Factor Correlation Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compute correlation matrix for all factors\n",
    "all_factor_cols = factor_cols + [col for col in extended_cols if col in df_extended.columns]\n",
    "correlation_matrix = df_extended[all_factor_cols].corr()\n",
    "\n",
    "# Plot correlation heatmap\n",
    "plt.figure(figsize=(12, 10))\n",
    "sns.heatmap(correlation_matrix, \n",
    "            annot=True, \n",
    "            cmap='RdBu_r', \n",
    "            center=0,\n",
    "            square=True,\n",
    "            fmt='.2f',\n",
    "            cbar_kws={'label': 'Correlation Coefficient'})\n",
    "plt.title('Factor Correlation Matrix', fontsize=16, pad=20)\n",
    "plt.tight_layout()\n",
    "plt.show()\n",
    "\n",
    "# Find highly correlated factor pairs\n",
    "high_corr_pairs = []\n",
    "for i in range(len(correlation_matrix.columns)):\n",
    "    for j in range(i+1, len(correlation_matrix.columns)):\n",
    "        corr_value = correlation_matrix.iloc[i, j]\n",
    "        if abs(corr_value) > 0.7:  # High correlation threshold\n",
    "            high_corr_pairs.append((\n",
    "                correlation_matrix.columns[i],\n",
    "                correlation_matrix.columns[j],\n",
    "                corr_value\n",
    "            ))\n",
    "\n",
    "if high_corr_pairs:\n",
    "    print(\"\\nHighly Correlated Factor Pairs (|r| > 0.7):\")\n",
    "    print(\"=\" * 50)\n",
    "    for factor1, factor2, corr in high_corr_pairs:\n",
    "        print(f\"{factor1} <-> {factor2}: {corr:.3f}\")\n",
    "else:\n",
    "    print(\"\\nNo highly correlated factor pairs found.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Enhanced Short Squeeze Scoring\n",
    "\n",
    "Use the SSR v4 model to identify potential short squeeze candidates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get latest date data for screening\n",
    "latest_date = df_extended['Date'].max()\n",
    "latest_data = df_extended[df_extended['Date'] == latest_date].copy()\n",
    "\n",
    "# Enhanced Short Squeeze Prediction\n",
    "ssr_v4 = EnhancedShortSqueezeV4()\n",
    "latest_data['SSR_v4_Score'] = ssr_v4.score(latest_data)\n",
    "\n",
    "# Rank by squeeze potential\n",
    "top_squeeze_candidates = latest_data.nlargest(10, 'SSR_v4_Score')\n",
    "\n",
    "print(f\"\\nTop 10 Short Squeeze Candidates ({latest_date.strftime('%Y-%m-%d')}):\")\n",
    "print(\"=\" * 80)\n",
    "\n",
    "display_cols = ['ticker', 'sec_desc', 'industry', 'Fee All (BPS)', \n",
    "               'Active Utilization (%)', 'SSR_v4_Score']\n",
    "available_display_cols = [col for col in display_cols if col in top_squeeze_candidates.columns]\n",
    "\n",
    "display(top_squeeze_candidates[available_display_cols])\n",
    "\n",
    "# Plot squeeze score distribution\n",
    "plt.figure(figsize=(12, 6))\n",
    "plt.subplot(1, 2, 1)\n",
    "latest_data['SSR_v4_Score'].hist(bins=30, alpha=0.7, color='orange', edgecolor='black')\n",
    "plt.title('SSR v4 Score Distribution')\n",
    "plt.xlabel('SSR v4 Score')\n",
    "plt.ylabel('Frequency')\n",
    "plt.axvline(latest_data['SSR_v4_Score'].quantile(0.9), color='red', linestyle='--', \n",
    "           label='90th Percentile')\n",
    "plt.legend()\n",
    "\n",
    "plt.subplot(1, 2, 2)\n",
    "plt.scatter(latest_data['Fee All (BPS)'], latest_data['SSR_v4_Score'], \n",
    "           c=latest_data['Active Utilization (%)'], cmap='viridis', alpha=0.6)\n",
    "plt.colorbar(label='Utilization (%)')\n",
    "plt.xlabel('Fee All (BPS)')\n",
    "plt.ylabel('SSR v4 Score')\n",
    "plt.title('SSR Score vs Borrow Fee')\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Daily Digest Generation\n",
    "\n",
    "Generate automated daily digest using the consolidated data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import daily digest generator\n",
    "sys.path.append('../src')\n",
    "from daily_digest import generate_digest\n",
    "\n",
    "# Calculate market summary stats\n",
    "avg_fee = latest_data['Fee All (BPS)'].mean()\n",
    "avg_fee_chg = (avg_fee / df_extended.groupby('Date')['Fee All (BPS)'].mean().iloc[-2] - 1) * 100\n",
    "avg_util = latest_data['Active Utilization (%)'].mean()\n",
    "avg_util_chg = (avg_util / df_extended.groupby('Date')['Active Utilization (%)'].mean().iloc[-2] - 1) * 100\n",
    "\n",
    "print(f\"Market Summary for {latest_date.strftime('%Y-%m-%d')}:\")\n",
    "print(f\"Average Fee: {avg_fee:.2f} bps ({avg_fee_chg:+.2f}%)\")\n",
    "print(f\"Average Utilization: {avg_util:.2f}% ({avg_util_chg:+.2f}%)\")\n",
    "\n",
    "# Generate digest\n",
    "digest_result = generate_digest(\n",
    "    avg_fee=avg_fee,\n",
    "    avg_fee_chg=avg_fee_chg,\n",
    "    avg_util=avg_util,\n",
    "    avg_util_chg=avg_util_chg,\n",
    "    headline_1=\"MARKET DYNAMICS: Securities lending activity shows continued evolution\",\n",
    "    headline_2=\"FACTOR ANALYSIS: Multiple signals indicate shifting market conditions\",\n",
    "    headline_3=\"RISK MONITORING: Enhanced screening identifies key names to watch\"\n",
    ")\n",
    "\n",
    "print(f\"\\n✅ Daily digest generated: {digest_result['output_file']}\")\n",
    "print(f\"Data processed: {digest_result['data_rows']} rows\")\n",
    "print(f\"Key bullets: {len(digest_result['bullets'])}\")\n",
    "print(f\"Takeaways: {len(digest_result['takeaways'])}\")\n",
    "\n",
    "# Display key insights\n",
    "print(\"\\n📊 Key Data Points:\")\n",
    "for i, bullet in enumerate(digest_result['bullets'], 1):\n",
    "    print(f\"{i}. {bullet}\")\n",
    "\n",
    "print(\"\\n💡 Key Takeaways:\")\n",
    "for i, takeaway in enumerate(digest_result['takeaways'], 1):\n",
    "    print(f\"{i}. {takeaway}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. Model Performance & Validation\n",
    "\n",
    "Basic validation framework for factor performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Simple forward-looking validation\n",
    "# Note: This is simplified - real validation would use proper forward returns\n",
    "\n",
    "# Create synthetic forward returns for demonstration\n",
    "np.random.seed(123)\n",
    "df_extended['Forward_Return_5D'] = np.random.normal(0, 0.02, len(df_extended))\n",
    "\n",
    "# Calculate Information Coefficients (IC) for each factor\n",
    "ic_results = {}\n",
    "for factor in all_factor_cols:\n",
    "    if factor in df_extended.columns:\n",
    "        # Group by date and calculate cross-sectional correlation\n",
    "        daily_ics = df_extended.groupby('Date').apply(\n",
    "            lambda x: x[factor].corr(x['Forward_Return_5D']) if len(x) > 10 else np.nan\n",
    "        ).dropna()\n",
    "        \n",
    "        if len(daily_ics) > 0:\n",
    "            ic_results[factor] = {\n",
    "                'Mean_IC': daily_ics.mean(),\n",
    "                'IC_StdDev': daily_ics.std(),\n",
    "                'IC_IR': daily_ics.mean() / daily_ics.std() if daily_ics.std() > 0 else 0,\n",
    "                'Hit_Rate': (daily_ics > 0).mean()\n",
    "            }\n",
    "\n",
    "# Display IC results\n",
    "if ic_results:\n",
    "    ic_df = pd.DataFrame(ic_results).T\n",
    "    ic_df = ic_df.sort_values('IC_IR', ascending=False)\n",
    "    \n",
    "    print(\"\\nFactor Performance Summary:\")\n",
    "    print(\"=\" * 60)\n",
    "    print(\"Note: Using synthetic returns for demonstration\")\n",
    "    display(ic_df.round(4))\n",
    "    \n",
    "    # Plot IC time series for top factors\n",
    "    top_factors = ic_df.head(3).index.tolist()\n",
    "    \n",
    "    fig, axes = plt.subplots(len(top_factors), 1, figsize=(12, 8))\n",
    "    if len(top_factors) == 1:\n",
    "        axes = [axes]\n",
    "    \n",
    "    for i, factor in enumerate(top_factors):\n",
    "        daily_ics = df_extended.groupby('Date').apply(\n",
    "            lambda x: x[factor].corr(x['Forward_Return_5D']) if len(x) > 10 else np.nan\n",
    "        ).dropna()\n",
    "        \n",
    "        daily_ics.plot(ax=axes[i], alpha=0.7, color='steelblue')\n",
    "        axes[i].axhline(0, color='red', linestyle='--', alpha=0.5)\n",
    "        axes[i].set_title(f'{factor} - Information Coefficient Over Time')\n",
    "        axes[i].set_ylabel('IC')\n",
    "        \n",
    "        # Add rolling mean\n",
    "        daily_ics.rolling(20).mean().plot(ax=axes[i], color='orange', linewidth=2, label='20D MA')\n",
    "        axes[i].legend()\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    plt.show()\n",
    "else:\n",
    "    print(\"⚠️  No IC results available - insufficient data or factors\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 11. Export and Automation\n",
    "\n",
    "Save results and prepare for automated execution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Save processed data\n",
    "output_dir = '../data'\n",
    "os.makedirs(output_dir, exist_ok=True)\n",
    "\n",
    "# Export latest factor scores\n",
    "latest_scores = latest_data[['ticker', 'sec_desc', 'industry'] + all_factor_cols + ['SSR_v4_Score']]\n",
    "latest_scores.to_csv(f'{output_dir}/latest_factor_scores_{latest_date.strftime(\"%Y%m%d\")}.csv', index=False)\n",
    "\n",
    "# Export top squeeze candidates\n",
    "top_squeeze_candidates.to_csv(f'{output_dir}/top_squeeze_candidates_{latest_date.strftime(\"%Y%m%d\")}.csv', index=False)\n",
    "\n",
    "# Export historical IC data if available\n",
    "if ic_results:\n",
    "    ic_df.to_csv(f'{output_dir}/factor_performance_summary.csv')\n",
    "\n",
    "print(f\"✅ Data exported to {output_dir}/\")\n",
    "print(f\"   - Latest factor scores: {len(latest_scores)} securities\")\n",
    "print(f\"   - Top squeeze candidates: {len(top_squeeze_candidates)} securities\")\n",
    "if ic_results:\n",
    "    print(f\"   - Factor performance summary: {len(ic_df)} factors\")\n",
    "\n",
    "# Automation suggestions\n",
    "print(\"\\n🚀 Automation Suggestions:\")\n",
    "print(\"1. Schedule this notebook with Papermill for daily execution\")\n",
    "print(\"2. Set up data pipeline to refresh securities lending data\")\n",
    "print(\"3. Configure email alerts for high squeeze scores\")\n",
    "print(\"4. Implement real-time monitoring dashboard\")\n",
    "print(\"5. Add model performance tracking and alerting\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "This consolidated playbook provides:\n",
    "\n",
    "✅ **Complete model implementation** - All core and extended factors in organized modules  \n",
    "✅ **Automated data processing** - End-to-end pipeline from data to insights  \n",
    "✅ **Performance validation** - IC analysis and factor correlation monitoring  \n",
    "✅ **Daily digest generation** - Automated report creation with key insights  \n",
    "✅ **Export capabilities** - CSV outputs for further analysis and monitoring  \n",
    "\n",
    "### Next Steps:\n",
    "1. Replace sample data with real EquiLend securities lending data\n",
    "2. Set up proper API connections for external data sources\n",
    "3. Implement production data pipeline and scheduling\n",
    "4. Add real-time monitoring and alerting capabilities\n",
    "5. Enhance validation with proper forward-looking returns data"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}

## File Organization Philosophy

This workspace uses a **hybrid approach** that leverages both notebooks and Python modules:

### 📓 **Notebooks for Interactive Work:**
- **Analysis & Exploration** - This main playbook for daily analysis
- **Model Development** - Individual notebooks for developing new factors
- **Backtesting & Validation** - Interactive performance testing
- **Visualization & Reporting** - Charts, reports, and presentations

### 🐍 **Python Modules for Reusable Code:**
- **Core factor classes** - So you can `from models.core_factors import ShortInterestMomentum`
- **Utility functions** - Data loading, preprocessing helpers
- **Production automation** - Scripts that run on schedule without human interaction

### 🔄 **Workflow:**
1. **Develop in notebooks** - Interactive experimentation and testing
2. **Extract to modules** - Once code is stable and reusable
3. **Import back to notebooks** - Clean analysis with modular components
4. **Deploy modules** - For automated/scheduled processes

This gives you the **best of both worlds**: notebook interactivity for analysis and clean modules for production code.