The NCAA March Madness tournament is notorious for having game outcomes which are almost impossible to predict because underdog teams beat the favorites. However, there are statistics about each team in the tournament which can be used to predict the winner of a game and thus create a bracket.
All data is contained in the MasterData.xlsx spreadsheet.
The GameData sheet contains the matchups from the 2016, 2017, and 2018 tournaments.
The sheets named with a specific season (i.e 2018-2019) contain the statistics for each team in the NCAA for that season
All data was taken from https://www.teamrankings.com/ncb/team-stats/
All data was taken from Kaggle using the dataset from that year.
To download the data via the Kaggle CLI
kaggle competitions download march-machine-learning-mania-2024 -p data
unzip -d data/kaggle_data data/march-machine-learning-mania-2024.zipOnce the data is downloaded, you can run the feature extraction via
python3 summarize_kaggle.py- Set up the directory for the year
mkdir <year>
cp <prev-year>/Training.ipynb <year>
- Train a classifier via the new
Training.ipynbwe just created and save the model. Only SK learn classifiers are currently supported currently.
python3 main.py create --year <year> --predictor sklearn_seed --predictor-path <path-to-pkl>
- Inspect the bracket
python3 main.py visualize <bracket-path>