# UFC Fight Card Predictor

## About this Project:

### Project Goal: 

The goal of this project is to be able to read in an upcoming fight card and predict the outcome of each fight. 


For this particular project we will be looking at UFC 272 PPV - Colby Covington vs. Jorge Masvidal.

### Project Description: 

## Let's Get Started... 

### Imports

These are the following imports needed to run this notebook: 

In [1]:
import numpy as np
import pandas as pd

# Visualizing
import matplotlib.pyplot as plt
import seaborn as sns

# default pandas decimal number display format
pd.options.display.float_format = '{:.2f}'.format

# Split 
from sklearn.model_selection import train_test_split

# Scale
from sklearn.preprocessing import MinMaxScaler

# Stats
import scipy.stats as stats

# Ignore Warnings
import warnings
warnings.filterwarnings("ignore")

### Acquire

The first thing we have to do is acquire the data. I will be using a dataset called ufc-master.csv that I found here: 

https://www.kaggle.com/bloodprashure/ufc-p4p-1-dataset/tasks?taskId=4669

Before reading this csv I cleaned the column names to make them easier to work with in python. 

In [2]:
# read csv locally 
ufc = pd.read_csv('ufc-master-cleaned.csv')

In [3]:
# verify data was properly acquired
ufc.head(1)

Unnamed: 0.1,Unnamed: 0,date,event_code,event_name,fullname,fighter_two_name,w,l,d,nc,...,round_five_takedown_percent,round_five_submission_attempt,round_five_submission_reverse,round_five_control_time,round_five_head,round_five_body,round_five_leg,round_five_distance,round_five_clinch,round_five_ground
0,0,2021-03-06 0:00:00,6e2b1d631832921d,UFC 259: Blachowicz vs. Adesanya,Aalon Cruz,Uros Medic,0,1,0,0,...,0,0,0,,,,,,,


In [4]:
# check row/column shape
ufc.shape

(12156, 171)

### The Plan

This df has 172 columns. 

A lot of these columns are full of stats from that specific event, so since I will not have these stats prior to the fight, I won't want to use them as features. 

I will now create a df of stats that I can know before the fight that already exist like height/weight etc.. 

I'll clean and prepare the data. 

After that I can do some exploring and create an mvp,… 

I'll go back to the original df and do more exploration with the goal of creating new features from the stats from past fights that i can turn into averages for that fighter. 

I’ll redo the models with the new features. 

### Prepare

In [5]:
# add the columns I would like to work with into a new df
ufc_cleaned = ufc[['event_name', 'fullname', 'fighter_two_name', 'w', 'l', 'd', 'nc', 'total_rounds', 'belt', 'womens_bout', 'interim_bout', 'strawweight', 'flyweight', 'bantamweight', 'featherweight', 'lightweight', 'middleweight', 'light_heavyweight', 'heavyweight', 'catch_weight', 'open_weight', 'super_heavyweight', 'superfight', 'fight_city', 'fight_state', 'fight_country', 'height', 'weight', 'reach', 'stance', 'slpm', 'stracc', 'sapm', 'strdef', 'tdavg', 'tdacc', 'tddef', 'subavg', 'age_days', 'age']].copy(0)

In [6]:
ufc_cleaned.head()

Unnamed: 0,event_name,fullname,fighter_two_name,w,l,d,nc,total_rounds,belt,womens_bout,...,slpm,stracc,sapm,strdef,tdavg,tdacc,tddef,subavg,age_days,age
0,UFC 259: Blachowicz vs. Adesanya,Aalon Cruz,Uros Medic,0,1,0,0,3,0,0,...,7.58,39,8.88,58,0.0,0,0,0.0,11490,31.0
1,UFC Fight Night: Benavidez vs. Figueiredo,Aalon Cruz,Spike Carlyle,0,1,0,0,3,0,0,...,7.58,39,8.88,58,0.0,0,0,0.0,11119,30.0
2,UFC 28: High Stakes,Aaron Brink,Andrei Arlovski,0,1,0,0,3,0,0,...,3.49,42,5.71,57,0.0,0,0,0.0,9502,26.0
3,UFC Fight Night: Henderson vs Dos Anjos,Aaron Phillips,Matt Hobar,0,1,0,0,3,0,0,...,1.65,56,3.44,39,0.0,0,44,0.4,9149,25.0
4,UFC Fight Night: Kattar vs. Ige,Aaron Phillips,Jack Shore,0,1,0,0,3,0,0,...,1.65,56,3.44,39,0.0,0,44,0.4,11302,30.0


In [7]:
ufc_cleaned.shape

(12156, 40)

In [8]:
# Drop duplicates if any

print(ufc_cleaned.shape)
df = ufc_cleaned.drop_duplicates()
print(ufc_cleaned.shape)

(12156, 40)
(12156, 40)


There are no duplicates.

In [9]:
# combine w, l, d, nc into on target column called outcome

In [10]:
cols = ['w', 'l', 'd', 'nc']
ufc_cleaned['outcome'] = ufc_cleaned[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)

In [11]:
ufc_cleaned.sample(1)

Unnamed: 0,event_name,fullname,fighter_two_name,w,l,d,nc,total_rounds,belt,womens_bout,...,stracc,sapm,strdef,tdavg,tdacc,tddef,subavg,age_days,age,outcome
8808,UFC Fight Night: Covington vs. Woodley,Niko Price,Donald Cerrone,0,0,0,1,3,0,0,...,41,5.83,49,0.89,22,72,0.9,11313,30.0,0_0_0_1


In [12]:
# rename the labels in outcome to be human readable 

In [13]:
ufc_cleaned['outcome'].replace({'1_0_0_0': 'win', '0_1_0_0': 'loss', '0_0_1_0': 'draw', '0_0_0_1': 'no_contest'}, inplace=True)

In [14]:
ufc_cleaned.head()

Unnamed: 0,event_name,fullname,fighter_two_name,w,l,d,nc,total_rounds,belt,womens_bout,...,stracc,sapm,strdef,tdavg,tdacc,tddef,subavg,age_days,age,outcome
0,UFC 259: Blachowicz vs. Adesanya,Aalon Cruz,Uros Medic,0,1,0,0,3,0,0,...,39,8.88,58,0.0,0,0,0.0,11490,31.0,loss
1,UFC Fight Night: Benavidez vs. Figueiredo,Aalon Cruz,Spike Carlyle,0,1,0,0,3,0,0,...,39,8.88,58,0.0,0,0,0.0,11119,30.0,loss
2,UFC 28: High Stakes,Aaron Brink,Andrei Arlovski,0,1,0,0,3,0,0,...,42,5.71,57,0.0,0,0,0.0,9502,26.0,loss
3,UFC Fight Night: Henderson vs Dos Anjos,Aaron Phillips,Matt Hobar,0,1,0,0,3,0,0,...,56,3.44,39,0.0,0,44,0.4,9149,25.0,loss
4,UFC Fight Night: Kattar vs. Ige,Aaron Phillips,Jack Shore,0,1,0,0,3,0,0,...,56,3.44,39,0.0,0,44,0.4,11302,30.0,loss


In [15]:
# rename fullname and fighter_two_name to fighter and opponent respectively

In [16]:
ufc_cleaned.rename(columns={'fullname': 'fighter', 'fighter_two_name': 'opponent'}, inplace=True)

In [17]:
ufc_cleaned.head()

Unnamed: 0,event_name,fighter,opponent,w,l,d,nc,total_rounds,belt,womens_bout,...,stracc,sapm,strdef,tdavg,tdacc,tddef,subavg,age_days,age,outcome
0,UFC 259: Blachowicz vs. Adesanya,Aalon Cruz,Uros Medic,0,1,0,0,3,0,0,...,39,8.88,58,0.0,0,0,0.0,11490,31.0,loss
1,UFC Fight Night: Benavidez vs. Figueiredo,Aalon Cruz,Spike Carlyle,0,1,0,0,3,0,0,...,39,8.88,58,0.0,0,0,0.0,11119,30.0,loss
2,UFC 28: High Stakes,Aaron Brink,Andrei Arlovski,0,1,0,0,3,0,0,...,42,5.71,57,0.0,0,0,0.0,9502,26.0,loss
3,UFC Fight Night: Henderson vs Dos Anjos,Aaron Phillips,Matt Hobar,0,1,0,0,3,0,0,...,56,3.44,39,0.0,0,44,0.4,9149,25.0,loss
4,UFC Fight Night: Kattar vs. Ige,Aaron Phillips,Jack Shore,0,1,0,0,3,0,0,...,56,3.44,39,0.0,0,44,0.4,11302,30.0,loss


In [18]:
ufc_cleaned.dtypes

event_name            object
fighter               object
opponent              object
w                      int64
l                      int64
d                      int64
nc                     int64
total_rounds           int64
belt                   int64
womens_bout            int64
interim_bout           int64
strawweight            int64
flyweight              int64
bantamweight           int64
featherweight          int64
lightweight            int64
middleweight           int64
light_heavyweight      int64
heavyweight            int64
catch_weight           int64
open_weight            int64
super_heavyweight      int64
superfight             int64
fight_city            object
fight_state           object
fight_country         object
height                object
weight               float64
reach                float64
stance                object
slpm                 float64
stracc                 int64
sapm                 float64
strdef                 int64
tdavg         

I want to convert height to a float value. I will also convert it to inches to make it more workable. 

In [19]:
# check the min and max vals for height

In [20]:
ufc_cleaned.height.min()

'--'

In [21]:
ufc_cleaned.height.max()

'6\' 8"'

In [22]:
# replace the -- with 0' 0" so my function below will work

In [23]:
ufc_cleaned['height'].replace({'--': "0' 0\""}, inplace=True)

In [24]:
ufc_cleaned.height.min()

'0\' 0"'

In [28]:
# convert ft to in

In [36]:
def parse_ht(ht):
    # format: 7' 0.0"
    ht_ = ht.split("' ")
    ft_ = float(ht_[0])
    in_ = float(ht_[1].replace("\"",""))
    return (12*ft_) + in_



In [30]:
# apply parse_ht

In [31]:
ufc_cleaned["height_in"] = ufc_cleaned["height"].apply(lambda x:parse_ht(x))

In [32]:
ufc_cleaned.head(1)

Unnamed: 0,event_name,fighter,opponent,w,l,d,nc,total_rounds,belt,womens_bout,...,sapm,strdef,tdavg,tdacc,tddef,subavg,age_days,age,outcome,height_in
0,UFC 259: Blachowicz vs. Adesanya,Aalon Cruz,Uros Medic,0,1,0,0,3,0,0,...,8.88,58,0.0,0,0,0.0,11490,31.0,loss,72.0


In [42]:
# convert float to int

In [40]:
ufc_cleaned['height_in'] = ufc_cleaned.height_in.astype(int) 

In [41]:
ufc_cleaned.height_in.head()

0    72
1    72
2    75
3    69
4    69
Name: height_in, dtype: int64

In [55]:
# check for nulls

In [56]:
ufc_cleaned.isnull().sum()

event_name              0
fighter                 0
opponent                0
w                       0
l                       0
d                       0
nc                      0
total_rounds            0
belt                    0
womens_bout             0
interim_bout            0
strawweight             0
flyweight               0
bantamweight            0
featherweight           0
lightweight             0
middleweight            0
light_heavyweight       0
heavyweight             0
catch_weight            0
open_weight             0
super_heavyweight       0
superfight              0
fight_city              0
fight_state          1011
fight_country           0
height                  0
weight                 13
reach                1292
stance                 95
slpm                    0
stracc                  0
sapm                    0
strdef                  0
tdavg                   0
tdacc                   0
tddef                   0
subavg                  0
age_days    

In [57]:
# fill fight_state null values with the mode which is Navada

In [58]:
ufc_cleaned['fight_state'] = ufc_cleaned.fight_state.fillna(ufc_cleaned.fight_state.mode()[0])

In [59]:
ufc_cleaned.isnull().sum()

event_name              0
fighter                 0
opponent                0
w                       0
l                       0
d                       0
nc                      0
total_rounds            0
belt                    0
womens_bout             0
interim_bout            0
strawweight             0
flyweight               0
bantamweight            0
featherweight           0
lightweight             0
middleweight            0
light_heavyweight       0
heavyweight             0
catch_weight            0
open_weight             0
super_heavyweight       0
superfight              0
fight_city              0
fight_state             0
fight_country           0
height                  0
weight                 13
reach                1292
stance                 95
slpm                    0
stracc                  0
sapm                    0
strdef                  0
tdavg                   0
tdacc                   0
tddef                   0
subavg                  0
age_days    

In [60]:
# drop null rows in weight column

In [61]:
ufc_cleaned = ufc_cleaned[ufc_cleaned.weight.notnull()]

In [62]:
ufc_cleaned.isnull().sum()

event_name              0
fighter                 0
opponent                0
w                       0
l                       0
d                       0
nc                      0
total_rounds            0
belt                    0
womens_bout             0
interim_bout            0
strawweight             0
flyweight               0
bantamweight            0
featherweight           0
lightweight             0
middleweight            0
light_heavyweight       0
heavyweight             0
catch_weight            0
open_weight             0
super_heavyweight       0
superfight              0
fight_city              0
fight_state             0
fight_country           0
height                  0
weight                  0
reach                1279
stance                 84
slpm                    0
stracc                  0
sapm                    0
strdef                  0
tdavg                   0
tdacc                   0
tddef                   0
subavg                  0
age_days    

In [63]:
# drop null rows in reach column

In [64]:
ufc_cleaned = ufc_cleaned[ufc_cleaned.reach.notnull()]

In [65]:
ufc_cleaned.isnull().sum()

event_name            0
fighter               0
opponent              0
w                     0
l                     0
d                     0
nc                    0
total_rounds          0
belt                  0
womens_bout           0
interim_bout          0
strawweight           0
flyweight             0
bantamweight          0
featherweight         0
lightweight           0
middleweight          0
light_heavyweight     0
heavyweight           0
catch_weight          0
open_weight           0
super_heavyweight     0
superfight            0
fight_city            0
fight_state           0
fight_country         0
height                0
weight                0
reach                 0
stance               12
slpm                  0
stracc                0
sapm                  0
strdef                0
tdavg                 0
tdacc                 0
tddef                 0
subavg                0
age_days              0
age                  11
outcome               0
height_in       

In [66]:
# drop null rows in stance column

In [67]:
ufc_cleaned = ufc_cleaned[ufc_cleaned.stance.notnull()]

In [68]:
# drop null rows in age column

In [69]:
ufc_cleaned = ufc_cleaned[ufc_cleaned.age.notnull()]

In [70]:
ufc_cleaned.isnull().sum()

event_name           0
fighter              0
opponent             0
w                    0
l                    0
d                    0
nc                   0
total_rounds         0
belt                 0
womens_bout          0
interim_bout         0
strawweight          0
flyweight            0
bantamweight         0
featherweight        0
lightweight          0
middleweight         0
light_heavyweight    0
heavyweight          0
catch_weight         0
open_weight          0
super_heavyweight    0
superfight           0
fight_city           0
fight_state          0
fight_country        0
height               0
weight               0
reach                0
stance               0
slpm                 0
stracc               0
sapm                 0
strdef               0
tdavg                0
tdacc                0
tddef                0
subavg               0
age_days             0
age                  0
outcome              0
height_in            0
stance_Orthodox      0
stance_Side

In [71]:
# drop null rows in fight_country column

In [72]:
ufc_cleaned = ufc_cleaned[ufc_cleaned.fight_country.notnull()]

In [73]:
ufc_cleaned.isnull().sum()

event_name           0
fighter              0
opponent             0
w                    0
l                    0
d                    0
nc                   0
total_rounds         0
belt                 0
womens_bout          0
interim_bout         0
strawweight          0
flyweight            0
bantamweight         0
featherweight        0
lightweight          0
middleweight         0
light_heavyweight    0
heavyweight          0
catch_weight         0
open_weight          0
super_heavyweight    0
superfight           0
fight_city           0
fight_state          0
fight_country        0
height               0
weight               0
reach                0
stance               0
slpm                 0
stracc               0
sapm                 0
strdef               0
tdavg                0
tdacc                0
tddef                0
subavg               0
age_days             0
age                  0
outcome              0
height_in            0
stance_Orthodox      0
stance_Side

In [79]:
# drop nulls to make sure none were missed
df = df.dropna()

In [80]:
ufc_cleaned.isnull().sum()

event_name           0
fighter              0
opponent             0
w                    0
l                    0
d                    0
nc                   0
total_rounds         0
belt                 0
womens_bout          0
interim_bout         0
strawweight          0
flyweight            0
bantamweight         0
featherweight        0
lightweight          0
middleweight         0
light_heavyweight    0
heavyweight          0
catch_weight         0
open_weight          0
super_heavyweight    0
superfight           0
fight_city           0
fight_state          0
fight_country        0
height               0
weight               0
reach                0
stance               0
slpm                 0
stracc               0
sapm                 0
strdef               0
tdavg                0
tdacc                0
tddef                0
subavg               0
age_days             0
age                  0
outcome              0
height_in            0
stance_Orthodox      0
stance_Side

No more null values.

In [74]:
# create dummy columns for stance column

In [75]:
dummy_df = pd.get_dummies(ufc_cleaned[['stance']], dummy_na=False, drop_first=[False])
ufc_cleaned = pd.concat([ufc_cleaned, dummy_df], axis=1)

In [77]:
ufc_cleaned.head(1)

Unnamed: 0,event_name,fighter,opponent,w,l,d,nc,total_rounds,belt,womens_bout,...,age,outcome,height_in,stance_Orthodox,stance_Sideways,stance_Southpaw,stance_Switch,stance_Orthodox.1,stance_Southpaw.1,stance_Switch.1
0,UFC 259: Blachowicz vs. Adesanya,Aalon Cruz,Uros Medic,0,1,0,0,3,0,0,...,31.0,loss,72,0,0,0,1,0,0,1


In [81]:
# double check to make sure datatype are good

In [78]:
ufc_cleaned.dtypes

event_name            object
fighter               object
opponent              object
w                      int64
l                      int64
d                      int64
nc                     int64
total_rounds           int64
belt                   int64
womens_bout            int64
interim_bout           int64
strawweight            int64
flyweight              int64
bantamweight           int64
featherweight          int64
lightweight            int64
middleweight           int64
light_heavyweight      int64
heavyweight            int64
catch_weight           int64
open_weight            int64
super_heavyweight      int64
superfight             int64
fight_city            object
fight_state           object
fight_country         object
height                object
weight               float64
reach                float64
stance                object
slpm                 float64
stracc                 int64
sapm                 float64
strdef                 int64
tdavg         

In [82]:
# Drop height column
cols_to_drop = ['height']
ufc_cleaned = ufc_cleaned.drop(columns=cols_to_drop)

Now that the data is prepped, I'll make a function so that the prep stage won't clutter my final report notebook. 

In [83]:
def get_n_prep_ufc(): 
    
    # imports
    import pandas as pd
    # Ignore Warnings
    import warnings
    warnings.filterwarnings("ignore")
    
    # read .csv
    ufc = pd.read_csv('ufc-master-cleaned.csv')
    
    # add the columns I would like to work with into a new df
    ufc_cleaned = ufc[['event_name', 'fullname', 'fighter_two_name', 'w', 'l', 'd', 'nc', 'total_rounds', 'belt', 'womens_bout', 'interim_bout', 'strawweight', 'flyweight', 'bantamweight', 'featherweight', 'lightweight', 'middleweight', 'light_heavyweight', 'heavyweight', 'catch_weight', 'open_weight', 'super_heavyweight', 'superfight', 'fight_city', 'fight_state', 'fight_country', 'height', 'weight', 'reach', 'stance', 'slpm', 'stracc', 'sapm', 'strdef', 'tdavg', 'tdacc', 'tddef', 'subavg', 'age_days', 'age']].copy(0)
    
    # Drop duplicates if any
    df = ufc_cleaned.drop_duplicates()
    
    # combine w, l, d, nc into on target column called outcome
    cols = ['w', 'l', 'd', 'nc']
    ufc_cleaned['outcome'] = ufc_cleaned[cols].apply(lambda row: '_'.join(row.values.astype(str)), axis=1)
    
    # rename the labels in outcome to be human readable 
    ufc_cleaned['outcome'].replace({'1_0_0_0': 'win', '0_1_0_0': 'loss', '0_0_1_0': 'draw', '0_0_0_1': 'no_contest'}, inplace=True)
    
    # rename fullname and fighter_two_name to fighter and opponent respectively
    ufc_cleaned.rename(columns={'fullname': 'fighter', 'fighter_two_name': 'opponent'}, inplace=True)
    
    # clean height column
    
    # replace the -- with 0' 0" so my function below will work
    ufc_cleaned['height'].replace({'--': "0' 0\""}, inplace=True)
    
    # convert ft to in
    def parse_ht(ht):
        # format: 7' 0.0"
        ht_ = ht.split("' ")
        ft_ = float(ht_[0])
        in_ = float(ht_[1].replace("\"",""))
        return (12*ft_) + in_
    
    # apply parse_ht
    ufc_cleaned["height_in"] = ufc_cleaned["height"].apply(lambda x:parse_ht(x))

    # convert float to int
    ufc_cleaned['height_in'] = ufc_cleaned.height_in.astype(int) 
    
    # handle null values
    
    # fill fight_state null values with the mode which is Navada
    ufc_cleaned['fight_state'] = ufc_cleaned.fight_state.fillna(ufc_cleaned.fight_state.mode()[0])
    
    # drop null rows in specific columns
    ufc_cleaned = ufc_cleaned[ufc_cleaned.weight.notnull()]
    ufc_cleaned = ufc_cleaned[ufc_cleaned.reach.notnull()]
    ufc_cleaned = ufc_cleaned[ufc_cleaned.stance.notnull()]
    ufc_cleaned = ufc_cleaned[ufc_cleaned.age.notnull()]
    ufc_cleaned = ufc_cleaned[ufc_cleaned.fight_country.notnull()]
    
    # drop nulls to make sure none were missed
    df = df.dropna()
    
    # create dummy columns for stance column and concat to df
    dummy_df = pd.get_dummies(ufc_cleaned[['stance']], dummy_na=False, drop_first=[False])
    ufc_cleaned = pd.concat([ufc_cleaned, dummy_df], axis=1)
    
    # Drop height column
    cols_to_drop = ['height']
    ufc_cleaned = ufc_cleaned.drop(columns=cols_to_drop)

    return ufc_cleaned
    
    

In [84]:
ufc_cleaned = get_n_prep_ufc()

In [85]:
ufc_cleaned.head()

Unnamed: 0,event_name,fighter,opponent,w,l,d,nc,total_rounds,belt,womens_bout,...,tdacc,tddef,subavg,age_days,age,outcome,height_in,stance_Orthodox,stance_Southpaw,stance_Switch
0,UFC 259: Blachowicz vs. Adesanya,Aalon Cruz,Uros Medic,0,1,0,0,3,0,0,...,0,0,0.0,11490,31.0,loss,72,0,0,1
1,UFC Fight Night: Benavidez vs. Figueiredo,Aalon Cruz,Spike Carlyle,0,1,0,0,3,0,0,...,0,0,0.0,11119,30.0,loss,72,0,0,1
3,UFC Fight Night: Henderson vs Dos Anjos,Aaron Phillips,Matt Hobar,0,1,0,0,3,0,0,...,0,44,0.4,9149,25.0,loss,69,0,1,0
4,UFC Fight Night: Kattar vs. Ige,Aaron Phillips,Jack Shore,0,1,0,0,3,0,0,...,0,44,0.4,11302,30.0,loss,69,0,1,0
5,UFC 173: Barao vs Dillashaw,Aaron Phillips,Sam Sicilia,0,1,0,0,3,0,0,...,0,44,0.4,9058,24.0,loss,69,0,1,0


### Explore

In [39]:
# ufc.corr(method ='pearson')

In [38]:
# c_cov = ufc_cleaned[ufc_cleaned.fullname.isin(['Colby Covington'])]

In [31]:
# c_cov.head()