## Data description

This is a list of every UFC fight in the history of the organisation. Every row contains information about both fighters, fight details and the winner.

Each row is a compilation of both fighter stats. Fighters are represented by 'red' and 'blue' (for red and blue corner). So for instance, red fighter has the complied average stats of all the fights except the current one. The stats include damage done by the red fighter on the opponent and the damage done by the opponent on the fighter (represented by 'opp' in the columns) in all the fights this particular red fighter has had, except this one as it has not occured yet (in the data). Same information exists for blue fighter. The target variable is 'Winner' which is the only column that tells you what happened.

#### Columns Values

* R_ and B_ prefix signifies red and blue corner fighter stats respectively
* _opp_  containing columns is the average of damage done by the opponent on the fighter
* KD is number of knockdowns
* SIG_STR is no. of significant strikes 'landed of attempted'
* SIG_STR_pct is significant strikes percentage
* TOTAL_STR is total strikes 'landed of attempted'
* TD is number of takedowns
* TD_pct is takedown percentages
* SUB_ATT is no. of submission attempts
* PASS is number times the guard was passed
* REV is the number of Reversals landed
* HEAD is number of significant strinks to the head 'landed of attempted'
* BODY is number of significant strikes to the body 'landed of attempted'
* CLINCH is number of significant strikes in the clinch 'landed of attempted'
* GROUND is number of significant strikes on the ground 'landed of attempted'
* Win_by is method of win
* Last_round is last round of the fight (ex. if it was a KO in 1st, then this will be 1)
* Last_round_time is when the fight ended in the last round
* Format is the format of the fight (3 rounds, 5 rounds etc.)
* Referee is the name of the Ref
* Date is the date of the fight
* Location is the location in which the event took place
* Fight_type is which weight class and whether it's a title bout or not
* Winner is the winner of the fight
* Stance is the stance of the fighter (orthodox, southpaw, etc.)
* Height_cms is the height in centimeter
* Reach_cms is the reach of the fighter (arm span) in centimeter
* Weight_lbs is the weight of the fighter in pounds (lbs)
* Age is the age of the fighter
* Title_bout Boolean value of whether it is title fight or not
* Weight_class is which weight class the fight is in (Bantamweight, heavyweight, Women's flyweight, etc.)
* No_of_rounds is the number of rounds the fight was scheduled for
* Current_lose_streak is the count of current concurrent losses of the fighter
* Current_win_streak is the count of current concurrent wins of the fighter
* Draw is the number of draws in the fighter's ufc career
* Wins is the number of wins in the fighter's ufc career
* Losses is the number of losses in the fighter's ufc career
* Total_rounds_fought is the average of total rounds fought by the fighter
* Total_time_fought(seconds) is the count of total time spent fighting in seconds
* Total_title_bouts is the total number of title bouts taken part in by the fighter
* Win_by_Decision_Majority is the number of wins by majority judges decision in the fighter's ufc career
* Win_by_Decision_Split is the number of wins by split judges decision in the fighter's ufc career
* Win_by_Decision_Unanimous is the number of wins by unanimous judges decision in the fighter's ufc career
* Win_by_KO/TKO is the number of wins by knockout in the fighter's ufc career
* Win_by_Submission is the number of wins by submission in the fighter's ufc career
* Win_by_TKO_Doctor_Stoppage is the number of wins by doctor stoppage in the fighter's ufc career

In [2]:
import pandas as pd
import numpy as np
from sklearn import preprocessing

### Loading the data

In [19]:
df = pd.read_csv('data/data_modeling.csv')

In [20]:
df.head()

Unnamed: 0,R_fighter,B_fighter,Referee,date,location,Winner,title_bout,weight_class,no_of_rounds,B_current_lose_streak,...,R_win_by_KO/TKO,R_win_by_Submission,R_win_by_TKO_Doctor_Stoppage,R_wins,R_Stance,R_Height_cms,R_Reach_cms,R_Weight_lbs,B_age,R_age
0,Henry Cejudo,Marlon Moraes,Marc Goddard,2019-06-08,"Chicago, Illinois, USA",Red,True,Bantamweight,5,0.0,...,2.0,0.0,0.0,8.0,Orthodox,162.56,162.56,135.0,31.0,32.0
1,Valentina Shevchenko,Jessica Eye,Robert Madrigal,2019-06-08,"Chicago, Illinois, USA",Red,True,Women's Flyweight,5,0.0,...,0.0,2.0,0.0,5.0,Southpaw,165.1,167.64,125.0,32.0,31.0
2,Tony Ferguson,Donald Cerrone,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Red,False,Lightweight,3,0.0,...,3.0,6.0,1.0,14.0,Orthodox,180.34,193.04,155.0,36.0,35.0
3,Jimmie Rivera,Petr Yan,Kevin MacDonald,2019-06-08,"Chicago, Illinois, USA",Blue,False,Bantamweight,3,0.0,...,1.0,0.0,0.0,6.0,Orthodox,162.56,172.72,135.0,26.0,29.0
4,Tai Tuivasa,Blagoy Ivanov,Dan Miragliotta,2019-06-08,"Chicago, Illinois, USA",Blue,False,Heavyweight,3,0.0,...,2.0,0.0,0.0,3.0,Southpaw,187.96,190.5,264.0,32.0,26.0


In [21]:
df.shape

(3360, 145)

In [23]:
df.isnull().sum()

R_fighter       0
B_fighter       0
Referee         0
date            0
location        0
               ..
R_Height_cms    0
R_Reach_cms     0
R_Weight_lbs    0
B_age           0
R_age           0
Length: 145, dtype: int64

### 1 for title and 0 no Title

In [25]:
df.title_bout=[0 if value == False else 1 for value in df["title_bout"]]  # 0=False , 1=True

### The Draw does not help me to predict the winner

In [29]:
df.drop(df[df.Winner == "Draw"].index, inplace=True)

#### Winner (1=Red, 0=Blue)

In [45]:
df["Winner"]= [0 if value == "Blue" else 1 for value in df["Winner"]] 

In [32]:
df.weight_class.unique()

array(['Bantamweight', "Women's Flyweight", 'Lightweight', 'Heavyweight',
       "Women's Strawweight", 'Featherweight', 'Middleweight',
       'Light Heavyweight', "Women's Bantamweight", 'Welterweight',
       'Flyweight', "Women's Featherweight", 'Catch Weight'], dtype=object)

In [33]:
from sklearn.preprocessing import OrdinalEncoder
encoder = OrdinalEncoder(categories=[["Strawweight","Women's Strawweight", "Flyweight","Women's Flyweight", "Bantamweight","Women's Bantamweight","Featherweight","Women's Featherweight",
                                      "Lightweight","Welterweight","Middleweight","Light Heavyweight","Heavyweight","Open Weight","Catch Weight"]],dtype=np.int8)
df['weight_class'] = encoder.fit_transform(df['weight_class'].values.reshape(-1, 1))

In [34]:
df.weight_class.unique()

array([ 4,  3,  8, 12,  1,  6, 10, 11,  5,  9,  2,  7, 14], dtype=int64)

#### A catchweight is a term used in combat sports such as Boxing and Mixed Martial Arts to describe a weight limit for a fight that does not fall in line with the traditional limits for weight classes.

    *"Strawweight"= 0,
    *"Women's Strawweight"= 1,
    *"Flyweight"= 2,
    *"Women's Flyweight"= 3, 
    *"Bantamweight"= 4,
    *"Women's Bantamweight"= 5,
    *"Featherweight"= 6,
    *"Women's Featherweight"= 7,
    *"Lightweight"= 8,
    *"Welterweight"= 9,
    *"Middleweight"= 10,
    *"Light Heavyweight"= 11,
    *"Heavyweight"= 12,
    *"Open Weight"= 13,
    *"Catch Weight"= 14


In [35]:
def label_encoder(df,col):
    le = preprocessing.LabelEncoder()
    transformed = le.fit_transform(df[col])
    df[col] = transformed

### Orthodox = 1, Southpaw= 2, Switch= 3, Open Stance= 0

In [36]:
df.R_Stance.unique()

array(['Orthodox', 'Southpaw', 'Switch', 'Open Stance'], dtype=object)

In [40]:
label_encoder(df,"R_Stance")

## Stance meaning in Boxing

     * A closed stance (minimizing where you can get hit, and increasing mobility backwards and forwards)
     * An open stance (45 degree angle, exposing more of your body) leaves you a bit more vulnerable.
     * Orthodox fighters are for the most part right-handed and stand with their left shoulder forward and right shoulder back.
     * Southpaw individuals are normally left-handed and stand with their right shoulder forward, left shoulder back.
     * Switch fighters could be considered ambidextrous and have the ability to change seamlessly between orthodox and southpaw
      stances.
    

In [41]:
label_encoder(df,"B_Stance")

In [42]:
df.R_Stance.unique()

array([1, 2, 3, 0], dtype=int64)

## We drop some columns for modeling purpose

In [50]:
df.drop(columns=[ "R_fighter","B_fighter", "Referee", "date","location","R_draw","B_draw","no_of_rounds","R_wins","B_wins",
                  "R_losses","B_losses","B_total_title_bouts","R_total_title_bouts"], inplace=True)

In [53]:
df.drop(columns=[ "R_current_lose_streak","B_current_lose_streak"], inplace=True)

In [56]:
df.head()

Unnamed: 0,Winner,title_bout,weight_class,B_current_win_streak,B_avg_BODY_att,B_avg_BODY_landed,B_avg_CLINCH_att,B_avg_CLINCH_landed,B_avg_DISTANCE_att,B_avg_DISTANCE_landed,...,R_win_by_Decision_Unanimous,R_win_by_KO/TKO,R_win_by_Submission,R_win_by_TKO_Doctor_Stoppage,R_Stance,R_Height_cms,R_Reach_cms,R_Weight_lbs,B_age,R_age
0,1,1,4,4.0,9.2,6.0,0.2,0.0,62.6,20.6,...,4.0,2.0,0.0,0.0,1,162.56,162.56,135.0,31.0,32.0
1,1,1,3,3.0,14.6,9.1,11.8,7.3,124.7,42.1,...,2.0,0.0,2.0,0.0,2,165.1,167.64,125.0,32.0,31.0
2,1,0,8,3.0,15.354839,11.322581,6.741935,4.387097,84.741935,38.580645,...,3.0,3.0,6.0,1.0,1,180.34,193.04,155.0,36.0,35.0
3,0,0,4,4.0,17.0,14.0,13.75,11.0,109.5,48.75,...,4.0,1.0,0.0,0.0,1,162.56,172.72,135.0,26.0,29.0
4,0,0,12,1.0,17.0,14.5,2.5,2.0,201.0,59.5,...,1.0,2.0,0.0,0.0,2,187.96,190.5,264.0,32.0,26.0


In [58]:
df.to_csv('data/modeling_numeric2.csv',index=False)

In [None]:
# from sklearn.preprocessing import OneHotEncoder
# model=OneHotEncoder()
# R_one_hot = model.fit_transform(df1[['R_Stance']])
# model1=OneHotEncoder()
# B_one_hot = model1.fit_transform(df1[['B_Stance']])
# B_Stance =pd.DataFrame(data=B_one_hot.todense(), columns=model1.categories_)
# B_Stance= B_Stance.rename(columns={"Open Stance":"B_Open_Stance","Orthodox":"B_Orthodox","Sideways":"B_Sideways","Southpaw":"B_Southpaw","Switch":"B_Switch"})
# R_Stance =pd.DataFrame(data=R_one_hot.todense(), columns=model.categories_)
# R_Stance= R_Stance.rename(columns={"Open Stance":"R_Open_Stance","Orthodox":"R_Orthodox","Sideways":"R_Sideways","Southpaw":"R_Southpaw","Switch":"R_Switch"})
# df1.drop(columns=[ "R_Stance","B_Stance"], inplace=True)