Data Analysis and Model Building for 2020 Kaggle/Google Cloud NCAA March Madness Tournament.
Repository Contents:
Data wrangling and manipulation. Extracting data from .csv tables provided via Google Cloud and and engineering team statistics (both game level regular season and postseason) to prepare a feature set for modeling.
Data analysis for trends and patterns in predicting upsets. Designed model to predict upsets (potentially used in final voting ensemble model)
EDA and model building using only end of regular season data available from 2003 - 2019. Primarily used as a benchmark for final model.
Model building using only team statistics from regular season games to predict outcomes. Level 0 model.
Model building using only rankings and ratings from regular season to predict outcomes. Level 0 model.
Model using both regular season statistics and rankings. Benchmark for ensemble method.
NOT COMPLETE- evaluating effectiveness of models and experimenting with model combinations.