# **üèÄ Step 1: Project Setup & Data Loading**

First, we'll import the necessary libraries and load our dataset. We will then filter the data to create our specific 100-player pool from a 5-year window.

**Libraries:**
* **pandas & numpy**: For data manipulation.
* **scikit-learn**: For data scaling and splitting.
* **torch**: For building and training the neural network.

In [17]:
!pip install pandas numpy torch scikit-learn streamlit




In [None]:
import pandas as pd
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
import random

# Load the dataset from the file you provided
try:
    df = pd.read_csv('database_24_25.csv')
except FileNotFoundError:
    print("Error: Make sure 'database_24_25.csv' is in the correct directory.")
    # Fallback dummy data
    data = {'Player': [f'Player_{i}' for i in range(200)],
            'PTS': np.random.uniform(5, 25, 200), 'AST': np.random.uniform(1, 8, 200),
            'TRB': np.random.uniform(2, 12, 200), 'STL': np.random.uniform(0.5, 2, 200),
            'BLK': np.random.uniform(0.2, 2, 200), 'FG%': np.random.uniform(0.4, 0.6, 200)}
    df = pd.DataFrame(data)

# Define the numeric features we want to average
features = ['PTS', 'AST', 'TRB', 'STL', 'BLK', 'FG%']

# Group by player name and calculate the mean for all numeric stats.
# This gives us one row per unique player, representing their season average.
player_pool = df.groupby('Player')[features].mean().reset_index()

print(f"Processing season averages for all {len(player_pool)} players in the dataset.")
print("-" * 60)

# Normalize the features to a scale of 0-1.
scaler = MinMaxScaler()
player_pool_scaled = player_pool.copy()
player_pool_scaled[features] = scaler.fit_transform(player_pool_scaled[features])

print("--- Player Pool (First 5 Players with Season Averages) ---")
# The stats you see now are the averages for each player
print(player_pool.head())
print("\n--- Scaled Player Pool (First 5 Players) ---")
print(player_pool_scaled.head())

Processing season averages for all 562 players in the dataset.
------------------------------------------------------------
--- Player Pool (First 5 Players with Season Averages) ---
          Player        PTS       AST       TRB       STL       BLK       FG%
0     A.J. Green   7.659091  1.272727  2.250000  0.545455  0.113636  0.426455
1    A.J. Lawson   2.750000  0.000000  0.750000  0.000000  0.000000  0.666750
2     AJ Johnson   2.444444  1.222222  1.000000  0.111111  0.000000  0.259333
3   Aaron Gordon  12.333333  3.066667  4.733333  0.466667  0.266667  0.510900
4  Aaron Holiday   4.222222  1.194444  0.944444  0.361111  0.111111  0.345556

--- Scaled Player Pool (First 5 Players) ---
          Player       PTS       AST       TRB       STL       BLK       FG%
0     A.J. Green  0.236100  0.111888  0.157664  0.183117  0.029260  0.426455
1    A.J. Lawson  0.084772  0.000000  0.052555  0.000000  0.000000  0.666750
2     AJ Johnson  0.075353  0.107448  0.070073  0.037302  0.000000  0.25

# **üß™ Step 2: Defining "Optimal" & Generating Training Data**

An ANN needs a clear target to predict. We will define an "optimal team" by creating a **Team Score**. The model's job is to learn how to predict this score.

**Our Team Score Formula:**
$TeamScore = Offense + Defense - ImbalancePenalty$

We will then generate thousands of random 5-player teams, calculate their aggregated stats (our `X`), and their `Team Score` (our `y`).

In [None]:
def calculate_team_score(team_df):
    """Calculates a custom score for a 5-player team."""
    # Using the updated features list which includes 'TRB'
    stats = team_df[features].sum()

    # 1. Offensive Rating (weighted sum, now using TRB for rebounds)
    offense_score = (stats['PTS'] * 0.4) + (stats['AST'] * 0.3) + (stats['TRB'] * 0.1) + (stats['FG%'] * 0.2)

    # 2. Defensive Rating
    defense_score = (stats['STL'] * 0.5) + (stats['BLK'] * 0.5)

    # 3. Positional Imbalance Penalty
    penalty = 0
    if 'Pos' in team_df.columns:
        pos_counts = team_df['Pos'].value_counts()
        guards = pos_counts.get('G', 0)
        forwards = pos_counts.get('F', 0)
        centers = pos_counts.get('C', 0)
        penalty = abs(guards - 2) + abs(forwards - 2) + abs(centers - 1)

    # Final Score
    final_score = (offense_score * 0.6) + (defense_score * 0.4) - (penalty * 0.1)
    return final_score

# Generate 50,000 random teams to create our training dataset
num_combinations = 50000
X_data = []
y_data = []

for _ in range(num_combinations):
    # Select 5 random players from our scaled pool
    team_indices = np.random.choice(player_pool_scaled.index, 5, replace=False)
    team_df_scaled = player_pool_scaled.loc[team_indices]

    # The input (X) is the sum of the team's scaled stats
    team_stats_vector = team_df_scaled[features].sum().values
    X_data.append(team_stats_vector)

    # The label (y) is the team's score, calculated from the original, unscaled data
    team_df_original = player_pool.loc[team_indices]
    team_score = calculate_team_score(team_df_original)
    y_data.append(team_score)

X = np.array(X_data)
y = np.array(y_data)

# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"Generated {len(X)} training samples.")
print(f"Shape of X_train: {X_train.shape}")
print(f"Shape of y_train: {y_train.shape}")

Generated 50000 training samples.
Shape of X_train: (40000, 6)
Shape of y_train: (40000,)


# **ü§ñ Step 3: Build and Train the Artificial Neural Network (PyTorch)**

We'll now define our Multilayer Perceptron (MLP) architecture and training loop using PyTorch.

* **Model**: Two hidden layers with `ReLU` activation and a `Dropout` layer to prevent overfitting.
* **Loss Function**: `MSELoss` (Mean Squared Error), ideal for regression.
* **Optimizer**: `Adam`, a standard and effective optimizer.

In [20]:
# --- 1. PyTorch Model Definition ---
class MLP(nn.Module):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self.layer1 = nn.Linear(input_size, 64)
        self.layer2 = nn.Linear(64, 32)
        self.dropout = nn.Dropout(0.2)
        self.output_layer = nn.Linear(32, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.layer1(x))
        x = self.relu(self.layer2(x))
        x = self.dropout(x)
        x = self.output_layer(x)
        return x

# --- 2. PyTorch Training Loop ---
def train_model(model, X_train, y_train, X_test, y_test, epochs=50, batch_size=32):
    X_train_tensor = torch.tensor(X_train, dtype=torch.float32)
    y_train_tensor = torch.tensor(y_train, dtype=torch.float32).view(-1, 1)
    X_test_tensor = torch.tensor(X_test, dtype=torch.float32)
    y_test_tensor = torch.tensor(y_test, dtype=torch.float32).view(-1, 1)

    loss_function = nn.MSELoss()
    optimizer = optim.Adam(model.parameters())

    print("Starting PyTorch model training...")
    for epoch in range(epochs):
        model.train()
        permutation = torch.randperm(X_train_tensor.size()[0])
        
        for i in range(0, X_train_tensor.size()[0], batch_size):
            optimizer.zero_grad()
            indices = permutation[i:i+batch_size]
            batch_X, batch_y = X_train_tensor[indices], y_train_tensor[indices]
            y_pred = model(batch_X)
            loss = loss_function(y_pred, batch_y)
            loss.backward()
            optimizer.step()

        if (epoch + 1) % 10 == 0:
            model.eval()
            with torch.no_grad():
                test_pred = model(X_test_tensor)
                test_loss = loss_function(test_pred, y_test_tensor)
            print(f'Epoch [{epoch+1}/{epochs}], Loss: {loss.item():.4f}, Test Loss: {test_loss.item():.4f}')
    
    print("Training complete.")
    return model

# --- 3. Instantiate and Train the Model ---
input_size = X_train.shape[1]
pytorch_model = MLP(input_size)

trained_model = train_model(pytorch_model, X_train, y_train, X_test, y_test)

Starting PyTorch model training...
Epoch [10/50], Loss: 4.6111, Test Loss: 0.1109
Epoch [20/50], Loss: 1.8406, Test Loss: 0.0045
Epoch [30/50], Loss: 0.2630, Test Loss: 0.0475
Epoch [40/50], Loss: 0.2788, Test Loss: 0.0245
Epoch [50/50], Loss: 0.2375, Test Loss: 0.0260
Training complete.


# **üèÜ Step 4: Find the Optimal Team**

With our trained model, we can now search for the best team. Since checking every single combination is too slow, we'll test a large number of random combinations and use the model to predict the score for each one. The team with the highest predicted score is our winner.

In [21]:
print("Searching for the optimal team by testing 200,000 combinations...")

best_team_indices = None
best_predicted_score = -np.inf
search_space_size = 200000

player_ids = player_pool.index.to_numpy()

# Set the model to evaluation mode (important for dropout layers)
trained_model.eval()

for _ in range(search_space_size):
    # 1. Pick a random 5-player team
    team_indices = np.random.choice(player_ids, 5, replace=False)
    team_df_scaled = player_pool_scaled.loc[team_indices]

    # 2. Prepare the input vector for the model
    input_vector = team_df_scaled[features].sum().values
    
    # 3. Use the model to predict the score (with PyTorch syntax)
    with torch.no_grad(): # Disable gradient calculation for inference
        input_tensor = torch.tensor(input_vector, dtype=torch.float32).view(1, -1)
        predicted_score = trained_model(input_tensor).item()

    # 4. If this team is the best so far, save it
    if predicted_score > best_predicted_score:
        best_predicted_score = predicted_score
        best_team_indices = team_indices

# Retrieve the optimal team's details from the original (unscaled) dataframe
optimal_team = player_pool.loc[best_team_indices]

print("\n--- üèÜ Optimal Team Found ---")
print(f"Predicted Team Score: {best_predicted_score:.2f}\n")
print(optimal_team[['Player', 'PTS', 'AST', 'TRB', 'STL', 'BLK']])

Searching for the optimal team by testing 200,000 combinations...

--- üèÜ Optimal Team Found ---
Predicted Team Score: 37.58

              Player        PTS       AST       TRB       STL       BLK
31   Anthony Edwards  27.215686  4.529412  5.784314  1.137255  0.686275
249    Jaren Jackson  21.612903  1.774194  5.935484  1.451613  1.709677
330     Kevin Durant  26.923077  4.179487  6.076923  0.820513  1.333333
530      Tyler Herro  23.734694  5.551020  5.571429  0.734694  0.122449
364      Luka Donƒçiƒá  28.136364  7.818182  8.318182  2.000000  0.409091
