In [None]:
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# MLB Team Success Predictor - Final Predictions\n",
    "\n",
    "This notebook demonstrates the complete prediction pipeline and generates predictions for current/future seasons.\n",
    "\n",
    "## Objectives:\n",
    "1. Load production models\n",
    "2. Make predictions for current season\n",
    "3. Generate team reports\n",
    "4. Create interactive dashboards\n",
    "5. Export predictions for deployment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Import libraries\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns\n",
    "from pathlib import Path\n",
    "import json\n",
    "from datetime import datetime\n",
    "import warnings\n",
    "warnings.filterwarnings('ignore')\n",
    "\n",
    "# Add project root to path\n",
    "import sys\n",
    "sys.path.append('..')\n",
    "\n",
    "# Import custom modules\n",
    "from src.prediction.predictor import DivisionWinnerPredictor, WinsPredictor, MilestonePredictor\n",
    "from src.prediction.prediction_pipeline import PredictionPipeline\n",
    "from src.visualization.interactive_plots import InteractiveDashboard\n",
    "from src.utils.helpers import ensure_dir\n",
    "\n",
    "print(\"Libraries loaded successfully!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Initialize Prediction Pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Initialize prediction pipeline\n",
    "pipeline = PredictionPipeline()\n",
    "\n",
    "# Load all models\n",
    "pipeline.load_models()\n",
    "\n",
    "print(\"Models loaded:\")\n",
    "print(f\"- Division Winner Classifier: {pipeline.division_predictor is not None}\")\n",
    "print(f\"- Wins Predictor: {pipeline.wins_predictor is not None}\")\n",
    "print(f\"- Milestone Predictor: {pipeline.milestone_predictor is not None}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load current season data (2025)\n",
    "current_year = 2025\n",
    "data_path = Path('../data/processed/mlb_data_engineered.csv')\n",
    "df = pd.read_csv(data_path)\n",
    "\n",
    "# For demonstration, we'll use the most recent complete season data\n",
    "# In production, this would be current season statistics\n",
    "recent_data = df[df['year'] == df['year'].max()].copy()\n",
    "print(f\"Using data from {recent_data['year'].iloc[0]} season\")\n",
    "print(f\"Number of teams: {len(recent_data)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Generate Current Season Predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Make comprehensive predictions\n",
    "predictions = pipeline.predict_season(\n",
    "    recent_data,\n",
    "    include_confidence=True,\n",
    "    include_milestones=True\n",
    ")\n",
    "\n",
    "print(f\"Predictions generated for {len(predictions)} teams\")\n",
    "print(f\"\\nPrediction columns: {list(predictions.columns)}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Display top predictions\n",
    "print(\"\\n=== DIVISION WINNER PREDICTIONS ===\")\n",
    "division_leaders = predictions.nlargest(6, 'division_winner_probability')[[\n",
    "    'team_name', 'division_winner_probability', 'division_winner_confidence'\n",
    "]]\n",
    "print(division_leaders)\n",
    "\n",
    "print(\"\\n=== WIN TOTAL PREDICTIONS (Top 10) ===\")\n",
    "win_leaders = predictions.nlargest(10, 'predicted_wins')[[\n",
    "    'team_name', 'predicted_wins', 'win_prediction_lower', 'win_prediction_upper'\n",
    "]]\n",
    "print(win_leaders.round(1))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Detailed Team Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Select a team for detailed analysis\n",
    "team_to_analyze = 'New York Yankees'\n",
    "\n",
    "if team_to_analyze in predictions['team_name'].values:\n",
    "    team_data = predictions[predictions['team_name'] == team_to_analyze].iloc[0]\n",
    "    \n",
    "    print(f\"\\n=== {team_to_analyze} DETAILED PREDICTIONS ===\")\n",
    "    print(f\"\\nWin Projection: {team_data['predicted_wins']:.1f} wins\")\n",
    "    print(f\"95% Confidence Interval: [{team_data['win_prediction_lower']:.1f}, {team_data['win_prediction_upper']:.1f}]\")\n",
    "    print(f\"\\nDivision Winner Probability: {team_data['division_winner_probability']:.1%}\")\n",
    "    print(f\"Confidence Level: {team_data['division_winner_confidence']}\")\n",
    "    \n",
    "    # Milestone probabilities\n",
    "    print(\"\\nMilestone Probabilities:\")\n",
    "    milestone_cols = [col for col in predictions.columns if col.startswith('prob_')]\n",
    "    for col in milestone_cols:\n",
    "        milestone_name = col.replace('prob_', '').replace('_', ' ').title()\n",
    "        print(f\"  {milestone_name}: {team_data[col]:.1%}\")\n",
    "else:\n",
    "    print(f\"Team '{team_to_analyze}' not found in predictions\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate team comparison\n",
    "teams_to_compare = ['New York Yankees', 'Boston Red Sox', 'Tampa Bay Rays']\n",
    "comparison_teams = predictions[predictions['team_name'].isin(teams_to_compare)]\n",
    "\n",
    "if len(comparison_teams) > 0:\n",
    "    fig, axes = plt.subplots(1, 3, figsize=(15, 5))\n",
    "    \n",
    "    # Win projections\n",
    "    ax = axes[0]\n",
    "    teams = comparison_teams['team_name']\n",
    "    wins = comparison_teams['predicted_wins']\n",
    "    errors = comparison_teams[['win_prediction_lower', 'win_prediction_upper']].values.T\n",
    "    errors[0] = wins - errors[0]  # Convert to error format\n",
    "    errors[1] = errors[1] - wins\n",
    "    \n",
    "    ax.bar(range(len(teams)), wins, yerr=errors, capsize=5, alpha=0.7)\n",
    "    ax.set_xticks(range(len(teams)))\n",
    "    ax.set_xticklabels(teams, rotation=45, ha='right')\n",
    "    ax.set_ylabel('Predicted Wins')\n",
    "    ax.set_title('Win Projections with 95% CI')\n",
    "    ax.grid(True, alpha=0.3)\n",
    "    \n",
    "    # Division winner probability\n",
    "    ax = axes[1]\n",
    "    probs = comparison_teams['division_winner_probability']\n",
    "    ax.bar(range(len(teams)), probs, alpha=0.7, color='green')\n",
    "    ax.set_xticks(range(len(teams)))\n",
    "    ax.set_xticklabels(teams, rotation=45, ha='right')\n",
    "    ax.set_ylabel('Probability')\n",
    "    ax.set_title('Division Winner Probability')\n",
    "    ax.set_ylim(0, 1)\n",
    "    ax.grid(True, alpha=0.3)\n",
    "    \n",
    "    # Milestone comparison\n",
    "    ax = axes[2]\n",
    "    if 'prob_achieved_90_wins' in comparison_teams.columns:\n",
    "        milestone_data = comparison_teams[['prob_achieved_90_wins', 'prob_achieved_100_wins']].values.T\n",
    "        x = np.arange(len(teams))\n",
    "        width = 0.35\n",
    "        \n",
    "        ax.bar(x - width/2, milestone_data[0], width, label='90+ Wins', alpha=0.7)\n",
    "        ax.bar(x + width/2, milestone_data[1], width, label='100+ Wins', alpha=0.7)\n",
    "        \n",
    "        ax.set_xticks(x)\n",
    "        ax.set_xticklabels(teams, rotation=45, ha='right')\n",
    "        ax.set_ylabel('Probability')\n",
    "        ax.set_title('Milestone Probabilities')\n",
    "        ax.set_ylim(0, 1)\n",
    "        ax.legend()\n",
    "        ax.grid(True, alpha=0.3)\n",
    "    \n",
    "    plt.tight_layout()\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. League-Wide Analysis"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# League statistics\n",
    "print(\"=== LEAGUE-WIDE PREDICTIONS ===\")\n",
    "print(f\"\\nAverage Predicted Wins: {predictions['predicted_wins'].mean():.1f}\")\n",
    "print(f\"Std Dev of Predictions: {predictions['predicted_wins'].std():.1f}\")\n",
    "print(f\"\\nPredicted Win Distribution:\")\n",
    "print(f\"  90+ wins: {(predictions['predicted_wins'] >= 90).sum()} teams\")\n",
    "print(f\"  100+ wins: {(predictions['predicted_wins'] >= 100).sum()} teams\")\n",
    "print(f\"  Below .500: {(predictions['predicted_wins'] < 81).sum()} teams\")\n",
    "\n",
    "# Visualize distribution\n",
    "fig, axes = plt.subplots(1, 2, figsize=(12, 5))\n",
    "\n",
    "# Win distribution\n",
    "ax = axes[0]\n",
    "predictions['predicted_wins'].hist(bins=15, ax=ax, edgecolor='black', alpha=0.7)\n",
    "ax.axvline(81, color='red', linestyle='--', label='.500 record')\n",
    "ax.axvline(predictions['predicted_wins'].mean(), color='green', linestyle='-', \n",
    "           label=f'Mean: {predictions[\"predicted_wins\"].mean():.1f}')\n",
    "ax.set_xlabel('Predicted Wins')\n",
    "ax.set_ylabel('Number of Teams')\n",
    "ax.set_title('Predicted Win Distribution')\n",
    "ax.legend()\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "# Confidence distribution\n",
    "ax = axes[1]\n",
    "confidence_counts = predictions['division_winner_confidence'].value_counts()\n",
    "confidence_counts.plot(kind='bar', ax=ax, alpha=0.7)\n",
    "ax.set_xlabel('Confidence Level')\n",
    "ax.set_ylabel('Number of Predictions')\n",
    "ax.set_title('Division Winner Prediction Confidence')\n",
    "ax.set_xticklabels(ax.get_xticklabels(), rotation=0)\n",
    "ax.grid(True, alpha=0.3)\n",
    "\n",
    "plt.tight_layout()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Create Interactive Dashboard"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create interactive dashboard\n",
    "dashboard = InteractiveDashboard(save_dir=Path('../dashboards'))\n",
    "\n",
    "# Create prediction results dashboard\n",
    "fig = dashboard.create_prediction_results_dashboard(\n",
    "    predictions[['team_name', 'predicted_wins']].rename(columns={'predicted_wins': 'wins_actual'}),\n",
    "    predictions[['team_name', 'predicted_wins']].rename(columns={'predicted_wins': 'wins_pred'}),\n",
    "    model_name='2025 Season Predictions'\n",
    ")\n",
    "\n",
    "# Note: In production, you would compare against actual results when available\n",
    "print(\"Interactive dashboard created and saved!\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Export Predictions"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare export data\n",
    "export_columns = [\n",
    "    'team_name',\n",
    "    'predicted_wins',\n",
    "    'win_prediction_lower',\n",
    "    'win_prediction_upper',\n",
    "    'division_winner_probability',\n",
    "    'division_winner_prediction',\n",
    "    'division_winner_confidence'\n",
    "]\n",
    "\n",
    "# Add milestone columns if available\n",
    "milestone_cols = [col for col in predictions.columns if col.startswith('prob_')]\n",
    "export_columns.extend(milestone_cols)\n",
    "\n",
    "export_df = predictions[export_columns].sort_values('predicted_wins', ascending=False)\n",
    "\n",
    "# Save to multiple formats\n",
    "output_dir = Path('../predictions')\n",
    "ensure_dir(output_dir)\n",
    "\n",
    "# CSV format\n",
    "csv_path = output_dir / f'mlb_predictions_{current_year}.csv'\n",
    "export_df.to_csv(csv_path, index=False)\n",
    "print(f\"Predictions saved to {csv_path}\")\n",
    "\n",
    "# JSON format\n",
    "json_path = output_dir / f'mlb_predictions_{current_year}.json'\n",
    "export_df.to_json(json_path, orient='records', indent=2)\n",
    "print(f\"Predictions saved to {json_path}\")\n",
    "\n",
    "# Excel format with formatting\n",
    "excel_path = output_dir / f'mlb_predictions_{current_year}.xlsx'\n",
    "with pd.ExcelWriter(excel_path, engine='openpyxl') as writer:\n",
    "    export_df.to_excel(writer, sheet_name='Predictions', index=False)\n",
    "    \n",
    "    # Add summary sheet\n",
    "    summary_data = {\n",
    "        'Metric': ['Total Teams', 'Avg Predicted Wins', 'Teams 90+ Wins', \n",
    "                   'Teams 100+ Wins', 'Predicted Division Winners'],\n",
    "        'Value': [\n",
    "            len(export_df),\n",
    "            export_df['predicted_wins'].mean(),\n",
    "            (export_df['predicted_wins'] >= 90).sum(),\n",
    "            (export_df['predicted_wins'] >= 100).sum(),\n",
    "            export_df['division_winner_prediction'].sum()\n",
    "        ]\n",
    "    }\n",
    "    summary_df = pd.DataFrame(summary_data)\n",
    "    summary_df.to_excel(writer, sheet_name='Summary', index=False)\n",
    "\n",
    "print(f\"Predictions saved to {excel_path}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 7. Generate Team Reports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Generate detailed reports for top teams\n",
    "top_teams = predictions.nlargest(5, 'predicted_wins')['team_name'].values\n",
    "\n",
    "reports_dir = output_dir / 'team_reports'\n",
    "ensure_dir(reports_dir)\n",
    "\n",
    "for team in top_teams:\n",
    "    team_pred = predictions[predictions['team_name'] == team].iloc[0]\n",
    "    \n",
    "    report = f\"\"\"\n",
    "# {team} - 2025 Season Prediction Report\n",
    "\n",
    "## Win Projection\n",
    "- **Predicted Wins**: {team_pred['predicted_wins']:.1f}\n",
    "- **95% Confidence Interval**: [{team_pred['win_prediction_lower']:.1f}, {team_pred['win_prediction_upper']:.1f}]\n",
    "\n",
    "## Division Championship\n",
    "- **Probability**: {team_pred['division_winner_probability']:.1%}\n",
    "- **Prediction**: {'Yes' if team_pred['division_winner_prediction'] else 'No'}\n",
    "- **Confidence Level**: {team_pred['division_winner_confidence']}\n",
    "\n",
    "## Milestone Probabilities\n",
    "\"\"\"\n",
    "    \n",
    "    for col in milestone_cols:\n",
    "        milestone_name = col.replace('prob_', '').replace('_', ' ').title()\n",
    "        report += f\"- **{milestone_name}**: {team_pred[col]:.1%}\\n\"\n",
    "    \n",
    "    report += f\"\\n---\\n*Generated on {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}*\"\n",
    "    \n",
    "    # Save report\n",
    "    report_path = reports_dir / f\"{team.replace(' ', '_')}_report.md\"\n",
    "    with open(report_path, 'w') as f:\n",
    "        f.write(report)\n",
    "\n",
    "print(f\"Generated reports for {len(top_teams)} teams in {reports_dir}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 8. Create Prediction Summary"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create comprehensive prediction summary\n",
    "summary = {\n",
    "    'prediction_date': datetime.now().isoformat(),\n",
    "    'season': current_year,\n",
    "    'total_teams': len(predictions),\n",
    "    'model_versions': {\n",
    "        'division_classifier': 'v1.0',\n",
    "        'wins_regressor': 'v1.0',\n",
    "        'milestone_predictor': 'v1.0'\n",
    "    },\n",
    "    'predictions_summary': {\n",
    "        'avg_predicted_wins': float(predictions['predicted_wins'].mean()),\n",
    "        'std_predicted_wins': float(predictions['predicted_wins'].std()),\n",
    "        'teams_above_500': int((predictions['predicted_wins'] > 81).sum()),\n",
    "        'teams_90_plus_wins': int((predictions['predicted_wins'] >= 90).sum()),\n",
    "        'teams_100_plus_wins': int((predictions['predicted_wins'] >= 100).sum())\n",
    "    },\n",
    "    'top_5_teams': predictions.nlargest(5, 'predicted_wins')[[\n",
    "        'team_name', 'predicted_wins'\n",
    "    ]].to_dict('records'),\n",
    "    'division_favorites': predictions.nlargest(6, 'division_winner_probability')[[\n",
    "        'team_name', 'division_winner_probability'\n",
    "    ]].to_dict('records'),\n",
    "    'confidence_distribution': predictions['division_winner_confidence'].value_counts().to_dict()\n",
    "}\n",
    "\n",
    "# Save summary\n",
    "summary_path = output_dir / f'prediction_summary_{current_year}.json'\n",
    "with open(summary_path, 'w') as f:\n",
    "    json.dump(summary, f, indent=2)\n",
    "\n",
    "print(f\"\\nPrediction summary saved to {summary_path}\")\n",
    "print(\"\\n=== PREDICTION SUMMARY ===\")\n",
    "print(json.dumps(summary['predictions_summary'], indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 9. Deployment Readiness Check"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Verify all outputs for deployment\n",
    "deployment_checklist = {\n",
    "    'models_loaded': all([\n",
    "        pipeline.division_predictor is not None,\n",
    "        pipeline.wins_predictor is not None\n",
    "    ]),\n",
    "    'predictions_generated': len(predictions) > 0,\n",
    "    'csv_export': csv_path.exists(),\n",
    "    'json_export': json_path.exists(),\n",
    "    'excel_export': excel_path.exists(),\n",
    "    'team_reports': reports_dir.exists() and len(list(reports_dir.glob('*.md'))) > 0,\n",
    "    'summary_created': summary_path.exists(),\n",
    "    'dashboard_ready': (output_dir.parent / 'dashboards').exists()\n",
    "}\n",
    "\n",
    "print(\"\\n=== DEPLOYMENT READINESS CHECK ===\")\n",
    "for item, status in deployment_checklist.items():\n",
    "    status_icon = \"✓\" if status else \"✗\"\n",
    "    print(f\"{status_icon} {item}: {status}\")\n",
    "\n",
    "if all(deployment_checklist.values()):\n",
    "    print(\"\\n🎉 All systems ready for deployment!\")\n",
    "else:\n",
    "    print(\"\\n⚠️ Some items need attention before deployment.\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 10. API Response Example"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example API response format\n",
    "def format_api_response(team_name):\n",
    "    \"\"\"Format prediction for API response\"\"\"\n",
    "    team_pred = predictions[predictions['team_name'] == team_name]\n",
    "    \n",
    "    if len(team_pred) == 0:\n",
    "        return {'error': 'Team not found'}\n",
    "    \n",
    "    team_data = team_pred.iloc[0]\n",
    "    \n",
    "    response = {\n",
    "        'team': team_name,\n",
    "        'season': current_year,\n",
    "        'predictions': {\n",
    "            'wins': {\n",
    "                'predicted': float(team_data['predicted_wins']),\n",
    "                'confidence_interval': {\n",
    "                    'lower': float(team_data['win_prediction_lower']),\n",
    "                    'upper': float(team_data['win_prediction_upper'])\n",
    "                }\n",
    "            },\n",
    "            'division_winner': {\n",
    "                'probability': float(team_data['division_winner_probability']),\n",
    "                'prediction': bool(team_data['division_winner_prediction']),\n",
    "                'confidence': team_data['division_winner_confidence']\n",
    "            },\n",
    "            'milestones': {}\n",
    "        },\n",
    "        'generated_at': datetime.now().isoformat()\n",
    "    }\n",
    "    \n",
    "    # Add milestone predictions\n",
    "    for col in milestone_cols:\n",
    "        milestone_name = col.replace('prob_', '')\n",
    "        response['predictions']['milestones'][milestone_name] = float(team_data[col])\n",
    "    \n",
    "    return response\n",
    "\n",
    "# Example API call\n",
    "example_response = format_api_response('New York Yankees')\n",
    "print(\"\\nExample API Response:\")\n",
    "print(json.dumps(example_response, indent=2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Conclusions\n",
    "\n",
    "### Prediction Pipeline Complete!\n",
    "\n",
    "We have successfully:\n",
    "1. ✅ Loaded all trained models\n",
    "2. ✅ Generated comprehensive predictions for all teams\n",
    "3. ✅ Created detailed team reports and comparisons\n",
    "4. ✅ Built interactive dashboards\n",
    "5. ✅ Exported predictions in multiple formats\n",
    "6. ✅ Prepared deployment-ready outputs\n",
    "\n",
    "### Key Predictions Summary:\n",
    "- Identified division winner favorites with confidence levels\n",
    "- Projected win totals with uncertainty bounds\n",
    "- Calculated milestone achievement probabilities\n",
    "- Generated actionable insights for each team\n",
    "\n",
    "### Next Steps:\n",
    "1. Deploy predictions to production API\n",
    "2. Schedule regular updates with new data\n",
    "3. Monitor prediction accuracy throughout season\n",
    "4. Collect feedback for model improvements\n",
    "\n",
    "The MLB Team Success Predictor is ready for production use! 🚀"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}