- Go to https://ufc-predictions.herokuapp.com/
- Select weight-class of the bout
- Select Number of 5 minute rounds the fight is scheduled for
- Select if the fight is a title fight or not
- Select the fighter names
- Click predict
- Scraped event and fight stats, data from 1993 to present date using Beautiful Soup.
- Cleaned, preprocessed and feature engineered the data to each row being a historical representation of both fighters and their individual fights/fight stats.
- Dataset uploaded and now available on Kaggle at: https://www.kaggle.com/rajeevw/ufcdata
- Oversampled minority class, created and tested predictive models using
- Created a web app using dash and deployed it with docker on heroku.
- Accuracy (valid): 0.7218
- AUC Score (valid): 0.7763
0corresponds to Blue: Fighter in the blue corner
1corresponds to Red: Fighter in the red corner
Generally the underdog is in the blue corner and favourite fighter is in the red corner.
The model is therefore (understandably) having a hard time figuring out when the underdog wins. This is because the sport is very volatile and there can be anything from an injury, psychological loss/trauma to just pure luck that determine the winner.
Details about the data
This is a list of every UFC fight in the history of the organisation. Every row contains information about both fighters, fight details and the winner. The data was scraped from ufcstats website. After fightmetric ceased to exist, this came into picture. I saw that there was a lot of information on the website about every fight and every event and there were no existing ways of capturing all this. I used beautifulsoup to scrape the data and pandas to process it. It was a long and arduous process, please forgive any mistakes. I have provided the raw files incase anybody wants to process it differently. This is my first time creating a dataset, any suggestions and corrections are welcome!
Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened. Here are some column definitions:
B_prefix signifies red and blue corner fighter stats respectively
_opp_containing columns is the average of damage done by the opponent on the fighter
KDis number of knockdowns
SIG_STRis no. of significant strikes 'landed of attempted'
SIG_STR_pctis significant strikes percentage
TOTAL_STRis total strikes 'landed of attempted'
TDis no. of takedowns
TD_pctis takedown percentages
SUB_ATTis no. of submission attempts
PASSis no. times the guard was passed?
REVare the number of reversals
HEADis no. of significant strinks to the head 'landed of attempted'
BODYis no. of significant strikes to the body 'landed of attempted'
CLINCHis no. of significant strikes in the clinch 'landed of attempted'
GROUNDis no. of significant strikes on the ground 'landed of attempted'
win_byis method of win
last_roundis last round of the fight (ex. if it was a KO in 1st, then this will be 1)
last_round_timeis when the fight ended in the last round
Formatis the format of the fight (3 rounds, 5 rounds etc.)
Refereeis the name of the Ref
dateis the date of the fight
locationis the location in which the event took place
Fight_typeis which weight class and whether it's a title bout or not
Winneris the winner of the fight
Stanceis the stance of the fighter (orthodox, southpaw, etc.)
Height_cmsis the height in centimeter
Reach_cmsis the reach of the fighter (arm span) in centimeter
Weight_lbsis the weight of the fighter in pounds (lbs)
ageis the age of the fighter
title_boutBoolean value of whether it is title fight or not
weight_classis which weight class the fight is in (Bantamweight, heavyweight, Women's flyweight, etc.)
no_of_roundsis the number of rounds the fight was scheduled for
current_lose_streakis the count of current concurrent losses of the fighter
current_win_streakis the count of current concurrent wins of the fighter
drawis the number of draws in the fighter's ufc career
winsis the number of wins in the fighter's ufc career
lossesis the number of losses in the fighter's ufc career
total_rounds_foughtis the average of total rounds fought by the fighter
total_time_fought(seconds)is the count of total time spent fighting in seconds
total_title_boutsis the total number of title bouts taken part in by the fighter
win_by_Decision_Majorityis the number of wins by majority judges decision in the fighter's ufc career
win_by_Decision_Splitis the number of wins by split judges decision in the fighter's ufc career
win_by_Decision_Unanimousis the number of wins by unanimous judges decision in the fighter's ufc career
win_by_KO/TKOis the number of wins by knockout in the fighter's ufc career
win_by_Submissionis the number of wins by submission in the fighter's ufc career
win_by_TKO_Doctor_Stoppageis the number of wins by doctor stoppage in the fighter's ufc career
How to use from Scratch?
- Clear out the data folder and simply run
scrape_all_data.py(Note: This will scrape everything from the beginning and hence will take a long time.)
EDA_and_preprocessing-1.ipynband after that
EDA_and_preprocessing-2a.ipynbis an alternative where the rows with missing stat values are removed and not treated.)
- Try weighted moving average instead of simple mean to give more importance to stats of recent fights per fighter
Inspiration: https://github.com/Hitkul/UFC_Fight_Prediction Provided ideas on how to store per fight data. Unfortunately, the entire UFC website and fightmetric website changed so couldn't reuse any of the code.
Print Progress Bar: https://gist.github.com/aubricus/f91fb55dc6ba5557fbab06119420dd6a To display progress of how much download is complete in the terminal
Web app: https://github.com/jasonchanhku/ Ideas on how to use dash and google search api to show fighter images