# MODULE 3: The Shark Predictor
## ML-Powered Deal Prediction & Shark Selection Engine

**Objective**: Build XGBoost multi-label classifier to predict which sharks will invest.

**Model**: XGBoost Multi-Output Classifier  
**Target**: Individual shark investment decisions  
**Features**: 35+ engineered features

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix

import sys
sys.path.append('..')

from src.data.loader import DataLoader
from src.models.shark_predictor import SharkPredictor
from src.models.model_explainer import ModelExplainer

## 1. Load Processed Data

In [None]:
loader = DataLoader(data_dir='../data')
df = loader.load_processed_data()

print(f"Dataset shape: {df.shape}")

## 2. Prepare Features and Targets

In [None]:
# Define shark columns (targets)
shark_cols = ['aman_gupta', 'namita_thapar', 'peyush_bansal', 'vineeta_singh', 'anupam_mittal']

predictor = SharkPredictor()
X, y = predictor.prepare_data(df, shark_cols)

print(f"Features shape: {X.shape}")
print(f"Targets shape: {y.shape}")

## 3. Train Model

In [None]:
results = predictor.train(X, y, test_size=0.2, random_state=42)

print(f"Training Score: {results['train_score']:.4f}")
print(f"Test Score: {results['test_score']:.4f}")

## 4. Model Evaluation

In [None]:
# TODO: Evaluate model performance
# TODO: Generate classification reports
# TODO: Create confusion matrices

## 5. Feature Importance

In [None]:
# TODO: Plot feature importance
# TODO: Analyze top features

## 6. Save Model

In [None]:
predictor.save_model('../models/shark_predictor_xgb.pkl')

## 7. Example Predictions

In [None]:
# TODO: Make sample predictions
# TODO: Show prediction probabilities