Skip to content

boring-school-work/01_SportsPrediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

01_SportsPrediction

Intro to AI mid-sem project.

Team Members

  • John Anatsui Edem.
  • David Saah.

Overview of Problem

In sports prediction, large numbers of factors including the historical performance of the teams, results of matches, and data on players, have to be accounted for to help different stakeholders understand the odds of winning or losing.

In this project, we are tasked to build a model(s) that predict a player's overall rating given the player's profile.

Milestones: ML Life Cycle

Data preparation & feature extraction

  • Data collection and labelling.
    • Acquire data.
  • Data cleaning.
    • Imputing missing values.
  • Data processing.
    • Feature selection.
    • Feature subsetting.
    • Normalising data.
    • Scaling data.

Model engineering

  • Get training & testing data.
  • Train the model with cross-validation.
  • Test the accuracy of the model.
  • Fine tune model (optimisation).
  • Use different models.
    • Train 3 models.
  • Perform ensembling.

Directory Structure

  • app: Source code for model deployment.
  • data: Datasets
    • players_21.csv -> training data.
    • players_22.csv -> testing data.
  • demo: Demo video.
  • models: Saved models.
  • src: Source codes for model training. (.py and .ipynb files)

Chosen Features

  1. potential
  2. wage_eur
  3. passing
  4. dribbling
  5. attacking_short_passing
  6. movement_reactions
  7. power_shot_power
  8. mentality_vision
  9. mentality_composure

Model(s) Used

  • XGBoost Regressor
  • Random Forest Regressor
  • AdaBoost Regressor

Reasons for choosing the XGBoost Regressor

  • Random forest model is very large compared to XGBoost and AdaBoost.
  • XGBoost and AdaBoost have similar performance, but XGBoost is performs better.
    • R-squared score for XGBoost is 0.94 while that of AdaBoost is 0.86.
  • XGBoost is the best model for this dataset.

Deployment

Video Demo

Youtube link: https://youtu.be/mU940v4Ysko

demo.mp4