<h1>UFC Fighter Analysis</h1>

## Data Cleaning

In [None]:
#import the necessary libraries
import pandas as pd
import numpy as np 
from scipy import stats 
%matplotlib inline
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import tkinter as tk
from tkinter import filedialog
from ydata_profiling import ProfileReport

In [None]:
#import the data set
ufc_df = pd.read_csv('/Users/chirag/Downloads/ufc_data/ufc_fighters.csv')
ufc_df.head()

In [None]:
#a top-down view of the fighters dataset that covers a multitude of key statistics
#best to run after performing the remainder of the analysis
profile = ProfileReport(ufc_df, title = "Exploratory Data Analysis of Fighter Data")
profile.to_notebook_iframe()

In [None]:
#replacing all innstances of -- in the data set with null values
ufc_df.replace("--", np.nan, inplace=True)

In [None]:
#function used to split the height column and convert the feet into inches
def convert_height_to_inches(height):
    if pd.isna(height):
        return np.nan
    else:
        height_parts = height.split("'")
        feet = int(float(height_parts[0].strip()))
        inches = int(float(height_parts[1].strip().replace("\"", "")))
        return feet * 12 + inches

#applying the function to make a new column
ufc_df["Height (inches)"] = ufc_df["Height"].apply(convert_height_to_inches)

#dropping the old height column
ufc_df.drop("Height", axis=1, inplace=True)

print(ufc_df.head())

In [None]:
#removing the " from the reach string and converting its data type to float
ufc_df["Reach (inches)"] = ufc_df["Reach"].str.replace("\"", "").astype(float)

ufc_df.drop("Reach", axis=1, inplace=True)

print(ufc_df.head())

In [None]:
#splitting the numeric value and the suffix of lbs. from each other in the Weight column
ufc_df['Weight (lbs)'] = ufc_df['Weight'].str.split(' lbs.').str[0]

#changing the variable type to a numeric variable
ufc_df['Weight (lbs)'] = pd.to_numeric(ufc_df['Weight (lbs)'], errors='coerce')

#removing the previous weight column
ufc_df.drop('Weight', axis=1, inplace=True)

print(ufc_df.head())

In [None]:
#creating a new column called win percentage to represent the win percentage of each fighter
ufc_df['Win %'] = (ufc_df['Wins'] / (ufc_df['Wins'] + ufc_df['Losses'] + ufc_df['Draws'])) * 100
#creating a new column called amount of fights to represent the total amount of fights for each fighter
ufc_df['Amount of Fights'] = (ufc_df['Wins'] + ufc_df['Losses'] + ufc_df['Draws'])

print(ufc_df.head())

## Analysis

#### Graph 1 - Height [in] vs Win %

In [None]:
fig = px.scatter(ufc_df, x='Height (inches)', y='Win %', opacity=0.7, color_discrete_sequence=['#636EFA'])
fig.update_traces(marker_size=6, hovertemplate='Height (inches): %{x}<br>Win %: %{y:.2f}')
fig.update_layout(
    title='Height (inches) vs. Win %',
    xaxis_title='Height (inches)',
    yaxis_title='Win %',
    title_x=0.5
)
fig.show()

#### Graph 2 - Average Winning Percentage

In [None]:
fig = px.histogram(ufc_df.groupby('Stance')['Win %'].mean().reset_index(), x='Stance', y='Win %', title='Average Winning Percentage in every Stance')
fig.update_layout(title_x=0.5)
fig.update_traces(marker_color='#636EFA', hovertemplate='Win %: %{y:.2f}')
fig.update_xaxes(title=None, tickangle=45, tickfont=dict(size=10))
fig.update_yaxes(title='Average Winning Percentage')
fig.show()

#### Graph 3 - Stance Count

In [None]:
fig = px.histogram(ufc_df, x='Stance', title='Stance Count')
fig.update_layout(title_x=0.5)
fig.update_traces(marker_color='#636EFA')
fig.update_xaxes(title=None, tickangle=45, tickfont=dict(size=10))
fig.update_yaxes(title='Count')
fig.show()

In [None]:
#creating a new table with the mean and std win percentage per stance used
avg_win_percentage = ufc_df.groupby('Stance')['Win %'].mean().reset_index()
std_win_percentage = ufc_df.groupby('Stance')['Win %'].std().reset_index()
win_percentage_table = pd.merge(avg_win_percentage, std_win_percentage, on='Stance', suffixes=(' mean', ' std'))
win_percentage_table = win_percentage_table.sort_values('Win % mean', ascending=False)

print(win_percentage_table)

In [None]:
#creating multiple new data frames storing fighteres from each weight class into each 
strawweight = ufc_df[ufc_df['Weight (lbs)'] < 115]
flyweight = ufc_df[(ufc_df['Weight (lbs)'] >= 116) & (ufc_df['Weight (lbs)'] <= 125)]
bantamweight = ufc_df[(ufc_df['Weight (lbs)'] >= 126) & (ufc_df['Weight (lbs)'] <= 135)]
featherweight = ufc_df[(ufc_df['Weight (lbs)'] >= 136) & (ufc_df['Weight (lbs)'] <= 145)]
lightweight = ufc_df[(ufc_df['Weight (lbs)'] >= 146) & (ufc_df['Weight (lbs)'] <= 155)]
welterweight = ufc_df[(ufc_df['Weight (lbs)'] >= 156) & (ufc_df['Weight (lbs)'] <= 170)]
middleweight = ufc_df[(ufc_df['Weight (lbs)'] >= 171) & (ufc_df['Weight (lbs)'] <= 185)]
lightheavyweight = ufc_df[(ufc_df['Weight (lbs)'] >= 186) & (ufc_df['Weight (lbs)'] <= 205)]
heavyweight = ufc_df[ufc_df['Weight (lbs)'] > 206]

In [None]:
# Calculate average height for each weight division
strawweight_avg_height = strawweight['Height (inches)'].mean()
flyweight_avg_height = flyweight['Height (inches)'].mean()
bantamweight_avg_height = bantamweight['Height (inches)'].mean()
featherweight_avg_height = featherweight['Height (inches)'].mean()
lightweight_avg_height = lightweight['Height (inches)'].mean()
welterweight_avg_height = welterweight['Height (inches)'].mean()
middleweight_avg_height = middleweight['Height (inches)'].mean()
lightheavyweight_avg_height = lightheavyweight['Height (inches)'].mean()
heavyweight_avg_height = heavyweight['Height (inches)'].mean()

#calculating average reach for each weight division
strawweight_avg_reach = strawweight['Reach (inches)'].mean()
flyweight_avg_reach = flyweight['Reach (inches)'].mean()
bantamweight_avg_reach = bantamweight['Reach (inches)'].mean()
featherweight_avg_reach = featherweight['Reach (inches)'].mean()
lightweight_avg_reach = lightweight['Reach (inches)'].mean()
welterweight_avg_reach = welterweight['Reach (inches)'].mean()
middleweight_avg_reach = middleweight['Reach (inches)'].mean()
lightheavyweight_avg_reach = lightheavyweight['Reach (inches)'].mean()
heavyweight_avg_reach = heavyweight['Reach (inches)'].mean()

#calculating the amount of fighters that have fought in each weight division
strawweight_count = len(strawweight)
flyweight_count = len(flyweight)
bantamweight_count = len(bantamweight)
featherweight_count = len(featherweight)
lightweight_count = len(lightweight)
welterweight_count = len(welterweight)
middleweight_count = len(middleweight)
lightheavyweight_count = len(lightheavyweight)
heavyweight_count = len(heavyweight)

#calculating the total number of fights (tnf) per weight division
strawweight_tnf = strawweight["Amount of Fights"].sum()
flyweight_tnf = flyweight["Amount of Fights"].sum()
bantamweight_tnf = bantamweight["Amount of Fights"].sum()
featherweight_tnf = featherweight["Amount of Fights"].sum()
lightweight_tnf = lightweight["Amount of Fights"].sum()
welterweight_tnf = welterweight["Amount of Fights"].sum()
middleweight_tnf = middleweight["Amount of Fights"].sum()
lightheavyweight_tnf = lightheavyweight["Amount of Fights"].sum()
heavyweight_tnf = heavyweight["Amount of Fights"].sum()

In [None]:
# Create a new dataframe with the calculations above
ufc_stats_df = pd.DataFrame({'Weight Division': ['Strawweight', 'Flyweight', 'Bantamweight', 'Featherweight', 'Lightweight', 'Welterweight', 'Middleweight', 'Lightheavyweight', 'Heavyweight'],
                              'Average Height (inches)': [strawweight_avg_height, flyweight_avg_height, bantamweight_avg_height, featherweight_avg_height, lightweight_avg_height, welterweight_avg_height, middleweight_avg_height, lightheavyweight_avg_height, heavyweight_avg_height],
                              'Average Reach (inches)': [strawweight_avg_reach, flyweight_avg_reach, bantamweight_avg_reach, featherweight_avg_reach, lightweight_avg_reach, welterweight_avg_reach, middleweight_avg_reach, lightheavyweight_avg_reach, heavyweight_avg_reach],
                             'Amount of Fighters': [strawweight_count, flyweight_count, bantamweight_count, featherweight_count, lightweight_count, welterweight_count, middleweight_count, lightheavyweight_count, heavyweight_count],
                             'Fights per Division': [strawweight_tnf, flyweight_tnf, bantamweight_tnf, featherweight_tnf, lightweight_tnf, welterweight_tnf, middleweight_tnf, lightheavyweight_tnf, heavyweight_tnf]})
print(ufc_stats_df)

#### Graph 4 - Amount of Fighters and Fights per Division by Weight Division

In [29]:
fig = px.bar(ufc_stats_df, x='Weight Division', y='Amount of Fighters', color_discrete_sequence=['forestgreen'], title='Amount of Fighters and Fights per Division by Weight Division')
fig.update_layout(xaxis={'categoryorder':'category ascending'})

fig.add_bar(x=ufc_stats_df['Weight Division'], y=ufc_stats_df['Fights per Division'], marker_color='indigo', name='Fights per Division')

fig.update_xaxes(title='Weight Division', tickangle=45)
fig.update_yaxes(title='Amount of Fighters', secondary_y=False)
fig.update_yaxes(title='Fights per Division', secondary_y=True, showgrid=False)

fig.show()

#### Graph 5 - Average Height and Reach by Weight Division

In [None]:
fig = px.line(ufc_stats_df, x="Weight Division", y="Average Height (inches)", color_discrete_sequence=['indigo'])
fig.add_scatter(x=ufc_stats_df["Weight Division"], y=ufc_stats_df["Average Height (inches)"], mode='markers', marker=dict(color='forestgreen'), name = 'Average Height')

fig.add_trace(px.line(ufc_stats_df, x="Weight Division", y="Average Reach (inches)", color_discrete_sequence=['forestgreen']).data[0])
fig.add_scatter(x=ufc_stats_df["Weight Division"], y=ufc_stats_df["Average Reach (inches)"], mode='markers', marker=dict(color='indigo'), name = 'Average Reach')

fig.update_layout(title="Average Height and Reach by Weight Division", xaxis_title="Weight Division", yaxis_title="Average Height/Reach (inches)", xaxis_tickangle=-45)
fig.show()

#### Final Data Overview - ufc_df

In [None]:
#run the profile report again for a comprehensive look at the data frame upon completing the analysis
profile = ProfileReport(ufc_df, title = "Exploratory Data Analysis of Fighter Data")
profile.to_notebook_iframe()

#### Final Data Overview - ufc_stats_df

In [None]:
#a final look at the ufc stats dataframe created earlier
profile_stats  = ProfileReport(ufc_stats_df, title = "Exploratory Data Analysis of Fighter Data")
profile_stats.to_notebook_iframe()

## Results

- Among the three most commonly utilized stances in combat sports, the 'Switch' stance appears to be the most effective. We can see this in its adoption by fighters who exhibit the highest average win percentages, and by the fact that it has the lowest standard deviation. These findings suggest that most fighters who employ the 'Switch' stance tend to achieve results that are closest to the mean in comparison to the other stances. 'Southpaw' stance ranks as the next most effective, with 'Orthodox' (the most popular stance) occupying the last position. 
  - Also to note, the 'Orthodox' stance should have the lowest standard deviation due to its larger sample size, but it displays the highest standard deviation of all the stances (excluding sideways stance). Consequently, fighters who utilize orthodox stance may experience more disparate outcomes from the mean win percentage. It is important to note that many fighters' stances are not indicated in the original dataset so this analysis solely pertains to the fighters whose stance is listed.

- As shown in Graph 4, the flyweight and bantamweight divisions are the underdogs with the fewest fighters and fights, while lightweight and welterweight divisions are the champions with the most fighters and fights. Logically, this makes sense because lighter fighters generally have less knock-out power, and heavier fighters tire out more easily, leading to more boring fights if no one wins before exhaustion hits. Hence, lightweight and welterweight fighters are the fan favorites; they've got a good balance of power and cardio, which makes for longer and more exciting bouts.
    - As a matter of fact, in the current pound-for-pound ranking list (May 2023), only 3 of the top 10 fighters fall under the flyweight, bantamweight, and heavyweight divisions - their respective champions! The only divisions with multiple fighters in the currrent top 10 pound-for-pound ranking list are welterweight and middleweight, with 2 champions each.