# March Mania 2025 - Starter Notebook

## Goal of the competition

The goal of this competition is to predict that probability that the smaller ``TeamID`` will win a given matchup. You will predict the probability for every possible matchup between every possible team over the past 4 years. You'll be given a sample submission file where the ```ID``` value indicates the year of the matchup as well as the identities of both teams within the matchup. For example, for an ```ID``` of ```2025_1101_1104``` you would need to predict the outcome of the matchup between ```TeamID 1101``` vs ```TeamID 1104``` during the ```2025``` tournament. Submitting a ```PRED``` of ```0.75``` indicates that you think that the probability of ```TeamID 1101``` winning that particular matchup is equal to ```0.75```.


## Overview of our submission strategy 
For this starter notebook, we will make a simple submission.

We can predict the winner of a match by considering the respective rankings of the opposing teams, only. Since the largest possible difference is 15 (which is #16 minus #1), we use a rudimentary formula that's 0.5 plus 0.03 times the difference in seeds, leading to a range of predictions spanning from 5% up to 95%. The stronger-seeded team (with a lower seed number from 1 to 16) will be the favorite and will have a prediction above 50%. 

# Starter Code

In [1]:
import pandas as pd
import numpy as np

# ======================================================================
# 1. Load Available Competition Data
# ======================================================================
try:
    # Load seed files
    m_seed = pd.read_csv('/kaggle/input/march-machine-learning-mania-2025/MNCAATourneySeeds.csv')
    w_seed = pd.read_csv('/kaggle/input/march-machine-learning-mania-2025/WNCAATourneySeeds.csv')
    submission = pd.read_csv('/kaggle/input/march-machine-learning-mania-2025/SampleSubmissionStage1.csv')
except FileNotFoundError as e:
    print(f"Error loading file: {e}")
    exit()

# ======================================================================
# 2. Seed Processing with Robust Error Handling
# ======================================================================
def safe_parse_seed(seed):
    """Handle various seed formats and missing values"""
    try:
        seed_str = str(seed).strip()
        if len(seed_str) > 2:
            return int(seed_str[1:3])  # For 2-digit seeds like "W15"
        return int(seed_str[1:])       # For 1-digit seeds like "W01"
    except:
        return 16  # Default to worst seed

# Process seed data
seed_df = pd.concat([m_seed, w_seed])
seed_df['SeedValue'] = seed_df['Seed'].apply(safe_parse_seed)
seed_map = seed_df.set_index(['Season', 'TeamID'])['SeedValue']

# ======================================================================
# 3. Submission Processing
# ======================================================================
def extract_teams(id_str):
    parts = id_str.split('_')
    return int(parts[0]), sorted([int(parts[1]), int(parts[2])])

# Add season and teams to submission
submission[['Season', 'Teams']] = submission['ID'].apply(
    lambda x: pd.Series(extract_teams(x))
)
submission[['Team1', 'Team2']] = pd.DataFrame(submission['Teams'].tolist())

# ======================================================================
# 4. Feature Engineering
# ======================================================================
def get_seed(season, team):
    try:
        return seed_map.at[(season, team)]
    except KeyError:
        return 16

# Add seed information
submission['Seed1'] = submission.apply(lambda x: get_seed(x['Season'], x['Team1']), axis=1)
submission['Seed2'] = submission.apply(lambda x: get_seed(x['Season'], x['Team2']), axis=1)

# Calculate seed difference
submission['SeedDiff'] = submission['Seed2'] - submission['Seed1']

# ======================================================================
# 5. Generate Predictions
# ======================================================================
submission['Pred'] = (0.5 + 0.03 * submission['SeedDiff']).clip(0.05, 0.95)
submission = submission[['ID', 'Pred']]

# ======================================================================
# 6. Validate and Save
# ======================================================================
print("Prediction Summary:")
print(f"Mean Probability: {submission['Pred'].mean():.4f}")
print(f"Minimum Probability: {submission['Pred'].min():.2f}")
print(f"Maximum Probability: {submission['Pred'].max():.2f}")

submission.to_csv('submission.csv', index=False)
print("\nSubmission file created successfully!")

Prediction Summary:
Mean Probability: 0.5025
Minimum Probability: 0.05
Maximum Probability: 0.95

Submission file created successfully!
