# UFC Fight Data - Exploratory Analysis amd Visualisation
## About the dataset
### Overview
UFC is a 3-to-5 round, 1v1 mixed martial arts competition where a winner is determined either by KO, submission, decision or disqualification(rarely). The UFC was established in 1993

### Structure of the dataset
The dataset sourced is of ufc fighters and their statistics (such as height, age , stance etc)

In [1]:
#adding imports and loading dataset 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px

file_path = 'data/ufc_stats_data.csv'
df = pd.read_csv(file_path)
df

Unnamed: 0,name,nickname,wins,losses,draws,height_cm,weight_in_kg,reach_in_cm,stance,date_of_birth,significant_strikes_landed_per_minute,significant_striking_accuracy,significant_strikes_absorbed_per_minute,significant_strike_defence,average_takedowns_landed_per_15_minutes,takedown_accuracy,takedown_defense,average_submissions_attempted_per_15_minutes
0,Robert Drysdale,,7,0,0,190.50,92.99,,Orthodox,1981-10-05,0.00,0.0,0.00,0.0,7.32,100.0,0.0,21.9
1,Daniel McWilliams,The Animal,15,37,0,185.42,83.91,,,,3.36,77.0,0.00,0.0,0.00,0.0,100.0,21.6
2,Dan Molina,,13,9,0,177.80,97.98,,,,0.00,0.0,5.58,60.0,0.00,0.0,0.0,20.9
3,Paul Ruiz,,7,4,0,167.64,61.23,,,,1.40,33.0,1.40,75.0,0.00,0.0,100.0,20.9
4,Collin Huckbody,All In,8,2,0,190.50,83.91,193.04,Orthodox,1994-09-29,2.05,60.0,2.73,42.0,10.23,100.0,0.0,20.4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4106,John Campetella,,0,1,0,175.26,106.59,,Orthodox,,0.00,0.0,0.00,0.0,0.00,0.0,0.0,0.0
4107,Andre Pederneiras,,1,1,2,172.72,70.31,,Orthodox,1967-03-22,0.00,0.0,0.00,0.0,0.00,0.0,0.0,0.0
4108,Bryson Kamaka,,12,20,1,180.34,77.11,,Orthodox,,9.47,60.0,12.63,0.0,0.00,0.0,100.0,0.0
4109,Matej Penaz,Money,6,1,0,190.50,83.91,210.82,Southpaw,1996-10-14,1.28,33.0,2.55,33.0,0.00,0.0,0.0,0.0


The features(columns) present are good. More features that can be helpful are total career fights, win/lose %, adding the ufc weight categories 

In [3]:
#adding features
df['total_career_fights'] = df['wins']+df['losses']+df['draws']
df['win%']= round((df['wins']/df['total_career_fights']),2)*100
df['loss%']= round((df['losses']/df['total_career_fights']),2)*100
df['draw%']= round((df['draws']/df['total_career_fights']),2)*100

#adding ufc weight categories feature
conditions = [
    (df['weight_in_kg'].isna()), #Undefined
    (df['weight_in_kg'] <= 57.0), #Flyweight
    (df['weight_in_kg'] > 57.0)&(df['weight_in_kg']<=61.2), #Bantamweight
    (df['weight_in_kg'] > 61.2)&(df['weight_in_kg']<=65.8), #Featherweight
    (df['weight_in_kg'] > 65.8)&(df['weight_in_kg']<=70.0), #Lightweight
    (df['weight_in_kg'] > 70.0)&(df['weight_in_kg']<=77.1), #Welterweight
    (df['weight_in_kg'] > 77.1)&(df['weight_in_kg']<=83.9), #Middleweight 
    (df['weight_in_kg'] > 83.9)&(df['weight_in_kg']<=93.0), #Light Heavyweight
    (df['weight_in_kg'] > 93.0) # Heavyweight
]

#creating a list of the values we will assign for each condition
values = ['Undefined','Flyweight','Bantamweight','Featherweight','Lightweight','Welterweight','Middleweight','Light Heavyweight','Heavyweight']

#creating a new column and using np.select to assign values to it using our list as argument
df['weight_class']=np.select(conditions,values)
df

Unnamed: 0,name,nickname,wins,losses,draws,height_cm,weight_in_kg,reach_in_cm,stance,date_of_birth,...,significant_strike_defence,average_takedowns_landed_per_15_minutes,takedown_accuracy,takedown_defense,average_submissions_attempted_per_15_minutes,total_career_fights,win%,loss%,draw%,weight_class
0,Robert Drysdale,,7,0,0,190.50,92.99,,Orthodox,1981-10-05,...,0.0,7.32,100.0,0.0,21.9,7,100.0,0.0,0.0,Light Heavyweight
1,Daniel McWilliams,The Animal,15,37,0,185.42,83.91,,,,...,0.0,0.00,0.0,100.0,21.6,52,29.0,71.0,0.0,Light Heavyweight
2,Dan Molina,,13,9,0,177.80,97.98,,,,...,60.0,0.00,0.0,0.0,20.9,22,59.0,41.0,0.0,Heavyweight
3,Paul Ruiz,,7,4,0,167.64,61.23,,,,...,75.0,0.00,0.0,100.0,20.9,11,64.0,36.0,0.0,Featherweight
4,Collin Huckbody,All In,8,2,0,190.50,83.91,193.04,Orthodox,1994-09-29,...,42.0,10.23,100.0,0.0,20.4,10,80.0,20.0,0.0,Light Heavyweight
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4106,John Campetella,,0,1,0,175.26,106.59,,Orthodox,,...,0.0,0.00,0.0,0.0,0.0,1,0.0,100.0,0.0,Heavyweight
4107,Andre Pederneiras,,1,1,2,172.72,70.31,,Orthodox,1967-03-22,...,0.0,0.00,0.0,0.0,0.0,4,25.0,25.0,50.0,Welterweight
4108,Bryson Kamaka,,12,20,1,180.34,77.11,,Orthodox,,...,0.0,0.00,0.0,100.0,0.0,33,36.0,61.0,3.0,Middleweight
4109,Matej Penaz,Money,6,1,0,190.50,83.91,210.82,Southpaw,1996-10-14,...,33.0,0.00,0.0,0.0,0.0,7,86.0,14.0,0.0,Light Heavyweight


Description of features initially present :
- **name:** The name of the UFC fighter
- **nickname:** The nickname of the UFC fighter, if applicable.
- **wins:** The number of wins the fighter has in their career.
- **losses:** The number of losses the fighter has in their career.
- **draws:** The number of draws the fighter has in their career.
- **height_cm:** The height of the fighter in centimeters.
- **weight_in_kg:** The weight of the fighter in kilograms.
- **reach_in__cm:** The reach of the fighter in centimeters.
- **stance:** The fighting stance of the fighter (Orthodox/Southpaw/Switch).
- **date_of_birth:** The date of birth of the fighter.
- **significant_strikes_landed_per_minute:** The average number of significant strikes landed by the fighter per minute.
- **significant_striking_accuracy:** The percentage of significant strikes that land successfully for the fighter.
- **significant_strikes_absorbed_per_minute:** The average number of significant strikes absorbed by the fighter per minute.
- **significant_strike_defence:** The percentage of opponent's significant strikes that the fighter successfully defends.
- **average_takedowns_landed_per_15_minutes:** The average number of takedowns landed by the fighter per 15 minutes.
- **takedown_accuracy:** The percentage of takedown attempts that are successful for the fighter.
- **takedown_defense:** The percentage of opponent's takedown attempts that the fighter successfully defends.
- **average_submissions_attempted_per_15_minutes:** The average number of submission attempts made by the fighter per 15 minutes.

### Finding undefeated fighters and ranking the top 10 undefeated fighters


In [8]:
df_undefeated = df.loc[df['loss%'] == 0].sort_values(by=['wins'],ascending=False)
per_undefeated_plot = round(df_undefeated.shape[0]/df.shape[0]*100,2)

ud_mean_match_played = round(df_undefeated['total_career_fights'].mean(),1)
print(f'The {per_undefeated_plot}% of fighters are undefeated with an average participation in {ud_mean_match_played} fights')

top_10_fighters = df_undefeated.head(10)
fig1 = px.bar(top_10_fighters, 
              x='name', 
              y='wins', 
              title='The 10 best undefeated fighters',
              labels={
                  'name' : 'Name',
                  'wins': 'No. of wins',
                  'stance' : 'Type of stance used'
              },
              color='stance'
            )
fig1.update_layout(barmode='stack', xaxis = {'categoryorder': 'total descending'})

fig1.show()

The 3.75% of fighters are undefeated with an average participation in 7.3 fights


![Image Description](https://i.pinimg.com/564x/0b/56/77/0b5677c41fa5bb46b148be3027877e62.jpg)

Khabib Nurmagomedov undisputed undefeated, Pound for pound the greatest

### Finding the top 10 fighters with the most losses

In [15]:
df_unwinning = df.loc[df["win%"] == 0].sort_values(by=['losses'], ascending=False)
per_unwinning_fig = round(df_unwinning.shape[0]/df.shape[0] * 100, 2)
uw_mean_match_played = round(df_unwinning["total_career_fights"].mean(), 1)
print(f"{per_unwinning_fig} % of fighters haven't won a match with an average of {uw_mean_match_played} matches played")

worst_10_fighters = df_unwinning.head(10)
fig2 = px.bar(worst_10_fighters,
              x='name',
              y='losses',
              title = "The 10 fighters with most losses",
             labels={'name':'Name',
                     'losses': "No. of losses",
                     "significant_strikes_absorbed_per_minute": "strikes absordeb/min"
                     }, 
             color = "significant_strikes_absorbed_per_minute"
             )
fig2.show()

3.21 % of fighters haven't won a match with an average of 1.8 matches played


Most of these are newcomers and its common to lose in the beginning of your career. Here we can observe Jin Oh Kim absorbed the most strikes per minute.

### Finding the relation between height and winning fights

In [16]:
fig3 = px.scatter(df, x="height_cm", y="wins",
                 labels = {"height_cm": "Height [cm]", "wins": "No. of wins"}) 
fig3.show()

In [17]:
# finding this fighter with the most victories
df.loc[df['wins']==df['wins'].max()]

Unnamed: 0,name,nickname,wins,losses,draws,height_cm,weight_in_kg,reach_in_cm,stance,date_of_birth,...,significant_strike_defence,average_takedowns_landed_per_15_minutes,takedown_accuracy,takedown_defense,average_submissions_attempted_per_15_minutes,total_career_fights,win%,loss%,draw%,weight_class
3969,Travis Fulton,The Ironman,253,53,10,182.88,108.86,,Orthodox,1977-05-29,...,0.0,0.0,0.0,0.0,0.0,316,80.0,17.0,3.0,Heavyweight


With the above plot we have found an insane outlier Travis Fulton . Here's some info about him:

Travis Jon Fulton (May 29, 1977 – July 10, 2021) was an American mixed martial artist and a professional boxer in the heavyweight division of both sports. Known as a longtime veteran in mixed martial arts, he competed in over 300 sanctioned bouts and while he was perhaps best known for competing in smaller US-based promotions, he also competed in the UFC, the USWF, the WEC, Pancrase, M-1 Global, the Chicago Red Bears of the IFL, King of the Cage, RINGS, and Oktagon MMA. He also holds the record for the most sanctioned mixed martial arts bouts, with 320 bouts; in addition to that, he also holds the most wins in mixed martial arts history (255). [Data source: Wikipedia]

![Image Description](//upload.wikimedia.org/wikipedia/commons/thumb/a/ac/Travis_Fulton.png/220px-Travis_Fulton.png)

### Finding the relation between weight and winning fights

In [19]:
fig4 = px.scatter(df, x="weight_in_kg", y="wins",
                 labels = {"weight_in_kg": "Weight [Kg]", "wins": "No. of wins"}) 
fig4.show()

In [20]:
# Finding the heaviest fighter
df.loc[df["weight_in_kg"] == df["weight_in_kg"].max()]

Unnamed: 0,name,nickname,wins,losses,draws,height_cm,weight_in_kg,reach_in_cm,stance,date_of_birth,...,significant_strike_defence,average_takedowns_landed_per_15_minutes,takedown_accuracy,takedown_defense,average_submissions_attempted_per_15_minutes,total_career_fights,win%,loss%,draw%,weight_class
4069,Emmanuel Yarborough,,1,2,0,203.2,349.27,,Open Stance,1960-09-05,...,0.0,0.0,0.0,0.0,0.0,3,33.0,67.0,0.0,Heavyweight


We have found another outlier in the weight category with Emmanual Yaborough weighing 349 kg. Here's some info about him :

Emmanuel Yarbrough (September 5, 1964 – December 21, 2015) was an American martial artist, professional wrestler, football player and actor. He was particularly known for his career in amateur sumo, and held the Guinness World Record for the heaviest living athlete. [Data source: Wikipedia]

![Image Description](https://ufc-video.s3.amazonaws.com/image/Mannysuit.jpg)

### Finding how many fighters have a nickname

In [21]:
perc_fighters_with_no_nick = round((df["nickname"].isna().sum()/df.shape[0])*100,2)
perc_fighters_with_nick = 100 - perc_fighters_with_no_nick
print(f"{perc_fighters_with_nick} % of fighters have a nickname")

54.9 % of fighters have a nickname


### Finding the best fighters in each weight class

In [22]:
weight_cat_list = set(df["weight_class"].values)
print(weight_cat_list)

{'Welterweight', 'Lightweight', 'Flyweight', 'Middleweight', 'Heavyweight', 'Light Heavyweight', 'Featherweight', 'Bantamweight', 'Undefined'}


In [26]:
for i in weight_cat_list:
    data_of_cat = df.loc[df["weight_class"] == i]
    num_of_undefeated = len(data_of_cat[(data_of_cat['win%'] == 100)])
    print(f"There are {num_of_undefeated} undefeated fighters in {i} class and the best 10 are")
    fig = px.bar(data_of_cat.sort_values(by=['win%'], ascending=False).head(10), x='name',
              y='win%',color = "total_career_fights",
              labels={'name':'Name', "win%": "% of wins", "total_career_fights": "Total fights"})
    fig.show()
    num_of_undefeated = data_of_cat[(data_of_cat['win%'] == 100)]

There are 16 undefeated fighters in Welterweight class and the best 10 are


There are 0 undefeated fighters in Lightweight class and the best 10 are


There are 19 undefeated fighters in Flyweight class and the best 10 are


There are 19 undefeated fighters in Middleweight class and the best 10 are


There are 20 undefeated fighters in Heavyweight class and the best 10 are


There are 29 undefeated fighters in Light Heavyweight class and the best 10 are


There are 32 undefeated fighters in Featherweight class and the best 10 are


There are 1 undefeated fighters in Bantamweight class and the best 10 are


There are 10 undefeated fighters in Undefined class and the best 10 are


### Finding the most popular stances

In [28]:
fig5 = px.histogram(df, x="stance",
            labels={'stance':'Type of stance'},
            title = "Most popular stance")
fig5.show()

We observe here that a great amount of fighters prefer the orthodox stance.
Despite this unequal use, let's also observe the average percentage of matches won by the fighters for each stance

In [29]:
df_stance = df.groupby(df["stance"]).sum()
df_stance["win%"] = (df_stance["wins"] / df_stance["total_career_fights"]) * 100
df_stance["lose%"] = (df_stance["losses"] / df_stance["total_career_fights"]) * 100
df_stance["draw%"] = (df_stance["draws"] / df_stance["total_career_fights"]) * 100

fig6 = px.bar(df_stance, x=df_stance.index, y = "win%",
            labels={
                'stance':'Type of stance',
                "win%": "% of matches won",
                "total_career_fights": "No. of fights"},
            color = "total_career_fights", title = "Most successful stance on average")
fig6.update_layout(barmode='stack', xaxis={'categoryorder':'total descending'})
fig6.show()