# Analysis of UFC Fights


In [1]:
%load_ext nb_black
%load_ext autoreload
%autoreload 2

<IPython.core.display.Javascript object>

In [2]:
from itertools import repeat
from datetime import timedelta
import ufc_events_eda.utils.paths as path
import ufc_events_eda.visualization.visualize as viz
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import plotly.io as pio
import scipy.stats as stats

<IPython.core.display.Javascript object>

In [3]:
pd.set_option("display.precision", 2)

<IPython.core.display.Javascript object>

## Load datasets


In [4]:
df = pd.read_parquet(path.data_processed_dir("fights_processed.parquet"))
df_events = pd.read_parquet(path.data_processed_dir("events_processed.parquet"))
df.head()

Unnamed: 0,fighter_1,fighter_2,fighter_1_kd,fighter_2_kd,fighter_1_str,fighter_2_str,fighter_1_td,fighter_2_td,fighter_1_sub,fighter_2_sub,weigh_class,method,method_detail,round,time,closure,is_main_event,event_name
0,Chan Sung Jung,Dan Ige,0.0,0.0,92.0,80.0,3.0,0.0,3.0,0.0,Featherweight,U-DEC,,5,5:00,win,True,UFC Fight Night: Jung vs. Ige
1,Serghei Spivac,Aleksei Oleinik,0.0,0.0,71.0,59.0,0.0,1.0,0.0,1.0,Heavyweight,U-DEC,,3,5:00,win,False,UFC Fight Night: Jung vs. Ige
2,Marlon Vera,Davey Grant,0.0,0.0,105.0,83.0,2.0,1.0,3.0,0.0,Bantamweight,U-DEC,,3,5:00,win,False,UFC Fight Night: Jung vs. Ige
3,SeungWoo Choi,Julian Erosa,1.0,0.0,13.0,10.0,0.0,0.0,0.0,0.0,Featherweight,KO/TKO,Punch,1,1:37,win,False,UFC Fight Night: Jung vs. Ige
4,Bruno Silva,Wellington Turman,0.0,0.0,19.0,1.0,0.0,0.0,0.0,0.0,Middleweight,KO/TKO,Punches,1,4:45,win,False,UFC Fight Night: Jung vs. Ige


<IPython.core.display.Javascript object>

In [6]:
df.shape


(6548, 22)

<IPython.core.display.Javascript object>

## Add columns for further analysis


In [5]:
cols = ("str", "kd", "td", "sub")

for col in cols:
    df[f"total_{col}"] = df[f"fighter_1_{col}"] + df[f"fighter_2_{col}"]


<IPython.core.display.Javascript object>

In [6]:
df["fight_duration"] = pd.to_timedelta("00:" + df["time"])
df["fight_duration"] = df.apply(
    lambda row: row["fight_duration"] + timedelta(minutes=(row["round"] - 1) * 5),
    axis=1,
)


<IPython.core.display.Javascript object>

## What is the distribution of the data like?


In [7]:
df[['total_str','total_kd','total_td', 'total_sub', 'fight_duration']].describe()


Unnamed: 0,total_str,total_kd,total_td,total_sub,fight_duration
count,6527.0,6527.0,6527.0,6527.0,6548
mean,69.33,0.43,2.13,0.81,0 days 00:10:24.276878436
std,55.8,0.66,2.22,1.23,0 days 00:06:02.728944682
min,0.0,0.0,0.0,0.0,0 days 00:00:05
25%,26.0,0.0,0.0,0.0,0 days 00:04:31
50%,57.0,0.0,2.0,0.0,0 days 00:12:54
75%,97.0,1.0,3.0,1.0,0 days 00:15:00
max,578.0,5.0,22.0,13.0,0 days 00:25:00


<IPython.core.display.Javascript object>

About Fight duration:
* The mean fight duration is 10 minutes
* The 75% quartile and max fight duration correspond to 3-round (the most common) and 5-round fights, where every round lasts 5 minutes
* The min fight duration corresponds to Jorge Masvidal 5-second KO of Ben Askren at UFC 239

About "total" fight statistics:
* The maximum value of every statistic is much higher than its 75% quartile.
* For every fight statistic, we will talk about which fight produced the most interesting number.

In [19]:
colors = ("#D62C15", "#0056D6")
fig = make_subplots(
    rows=2,
    cols=2,
    vertical_spacing=0.2,
    subplot_titles=(
        "Significant strikes",
        "Knockdowns",
        "Takedowns",
        "Submission attempts",
    ),
)

cols = ("str", "kd", "td", "sub")
idx = 0
for i in range(2):
    for j in range(2):
        fig.add_trace(
            go.Box(
                y=df[f"fighter_1_{cols[idx]}"], name="Fighter 1", marker_color=colors[0]
            ),
            row=i + 1,
            col=j + 1,
        )
        fig.add_trace(
            go.Box(
                y=df[f"fighter_2_{cols[idx]}"], name="Fighter 2", marker_color=colors[1]
            ),
            row=i + 1,
            col=j + 1,
        )
        idx+=1

fig.update_layout(
    font_color="#555963",
    showlegend=False,
    plot_bgcolor="#fff",
    title="<span style='font-weight:bold'>Distribution of per fight statistics <br><span style='font-size:0.8rem; font-weight:normal'>Hover over the boxes to see details</span></span><br>",
)
fig.update_yaxes(mirror=True, showline=True, linecolor="grey")
fig.update_xaxes(mirror=True, showline=True, linecolor="grey")

<IPython.core.display.Javascript object>

Fighter 1 corresponds to the winner of the fight (if the fight had a winner declared). Except for significant strikes, which is roughly normally distributed, all fight statistics are skewed to the right.

**About significant strikes**:
* In the two fights with the highest significant strikes (445 and 290 strikes landed by fighter 1), fighter 1 is Max Holloway, who is known for his pace and volume
* The "loser" with the most significant strikes landed (186) is Joanna Jędrzejczyk, in her fight of the year (2019) against Zhang Weili.

**About knockdowns**:
* The median of knockdowns is 0, which means that usually no knockdowns are scored in a fight.
* There are two fights were fighter 1 landed 5 knockdowns. Ironically, both ended in a judge's decision.
* Conversely, in two fights fighter 2 landed 3 knowckdowns, which still was not enough to make fighter 2 the winner.

**About takedowns**:
* The median of takedowns for fighter 1 is 1, that means that usually a fighter who wins scores at least one takedown.
* The record of takedowns (21) goes to Khabib Nurmagomedov.

**About submission attempts**:
* The fights that had the most submission attempts for fighter 1 and 2 did not end in a submission.


### What is the most common fight duration in rounds?


In [17]:
stats.mode(df["round"])[0][0]


3

<IPython.core.display.Javascript object>

Most fights last <b style="color:#0056D6">3</b> rounds


## What is the most common conclusion for a fight?


In [19]:
by_closure = df.groupby(by="closure")["fighter_1"].count().sort_values()
by_closure[by_closure.index == "win"].sum() / by_closure.sum()


0.981979230299328

<IPython.core.display.Javascript object>

<b style="color:#0056D6; font-size:1.5rem">98%</b> of fights have a declared winner. 2% of fights end in either a draw or no contest.

In [18]:
df[df["method"].isin(("KO/TKO", "SUB"))]["fighter_1"].count() / df["fighter_1"].count()


0.5294746487477092

<IPython.core.display.Javascript object>

<b style="color:#0056D6; font-size:1.5rem">53%</b> of fights end in KO/TKO or submission, which means there is a high chance you finish or get finished if you fight in the UFC. However...

In [18]:
by_method = df.groupby(by="method")["fighter_1"].count().sort_values()
colors = (*list(repeat("#808696", 8)), "#0056D6")
fig = px.bar(y=by_method.index, x=by_method.values, orientation="h")
fig.update_layout(
    plot_bgcolor="#fff",
    title="<b><span style='color:#0056D6'>Unanimous decision (U-DEC)</span> is the most common result for a fight</b><br><span style='font-size:0.8rem'>Fight results in descending order</span>",
    yaxis_title=None,
    xaxis_title="Number of fights",
    font_color="#555963",
)
fig.update_traces(marker_color=colors)
fig.show()


<IPython.core.display.Javascript object>

### What are the most common fight-finishing methods?


In [22]:
by_method_detail = (
    df.groupby(by=["method_detail"])["fighter_1"].count().sort_values().iloc[-6:-1]
)
colors = (*list(repeat("#808696", 3)), *list(repeat("#0056D6", 2)))
fig = px.bar(y=by_method_detail.index, x=by_method_detail.values, orientation="h")
fig.update_layout(
    plot_bgcolor="#fff",
    title="<span style='font-weight:bold'><span style='color:#0056D6'>Punches <span style='color:#555963'>and</span> Rear Naked Chokes</span> are the primary methods for ending fights</span><br><span style='font-size:0.8rem'>Top 5 fight ending methods</span>",
    yaxis_title=None,
    xaxis_title="Number of fights",
    font_color="#555963"
)
fig.update_traces(marker_color=colors)
fig.show()


<IPython.core.display.Javascript object>

Two things to note here. First, punches seem to be more effective than kicks (or rather, it is harder to be good enough at kicking than punching). Second, the rear naked choke (RNC) has produced more finishes than the next two submissions (Guillotine and Armbar) combined. This speaks volumes about how good the RNC is.

## How has the number of fights and unique fighters in the UFC evolved in time?


With unique fighters I refer to the number of fighters that fought at least one fight in a given year.

In [23]:
# Merge with events dataset to get the dates of the fights
df_merged = pd.merge(df, df_events, on="event_name", how="inner")
df_merged.set_index("event_date", inplace=True)

<IPython.core.display.Javascript object>

In [25]:
colors = ("#D62C15", "#0056D6")
count_by_year = df_merged.groupby(df_merged.index.year)["fighter_1"].count().iloc[:-1]
fighters = pd.concat((df_merged["fighter_1"], df_merged["fighter_2"]))
fighters_by_year = fighters.groupby(fighters.index.year).nunique().iloc[:-1]
fig = go.Figure()
fig.add_trace(go.Scatter(x=count_by_year.index, y=count_by_year.values, mode="lines"))
fig.add_trace(
    go.Scatter(x=fighters_by_year.index, y=fighters_by_year.values, mode="lines")
)
annotations = [
    dict(
        x=2024,
        y=fighters_by_year.values[-1],
        xanchor="right",
        yanchor="bottom",
        text="Unique Fighters",
        name="Unique Fighters",
        font=dict(size=14, color=colors[0]),
        showarrow=False,
    ),
    dict(
        x=2024,
        y=count_by_year.values[-1],
        xanchor="right",
        yanchor="bottom",
        text="Number of fights",
        name="Number of fights",
        font=dict(size=14, color=colors[1]),
        showarrow=False,
    ),
]

fig.update_layout(
    title=f"<b>The spread between the <span style='color:{colors[1]}'>number of fights</span> and <span style='color:{colors[0]}'>unique fighters</span> has increased</b>",
    xaxis_title="Year",
    plot_bgcolor="#fff",
    showlegend=False,
    annotations=annotations,
    font_color="#555963",
)
fig.update_xaxes(showline=True, linewidth=1, linecolor="lightgrey")
fig.update_yaxes(showline=True, linewidth=1, linecolor="lightgrey")


fig
fig.show()


<IPython.core.display.Javascript object>

This shows that there is more "diversity" in the fighters that actually fight in recent years. As opposed to the earlier years, where the same fighters would fight more often. In other words, the UFC is now able to choose from a larger pool of fighters to keep the same number of fights.

## How has the number of significant strikes changed through the years?

### Is there a difference between significant strikes in main-event and non-main event fights?


In [26]:
by_year = (
    df_merged.groupby(by=[df_merged.index.year, "is_main_event"])["total_str"]
    .median()
    .iloc[:-2]
    .reset_index()
)

fig = px.line(
    x=by_year["event_date"],
    y=by_year["total_str"],
    color=by_year["is_main_event"],
    color_discrete_sequence=("grey", "#0056D6"),
)
fig.update_layout(
    plot_bgcolor="#fff",
    title="<b><span style='color:#0056D6'>Main event</span> fights have more strikes than <span style='color:grey'>non-main event</span> fights</b><br><span style='font-size:0.85rem'>Mean significant strikes per year of main event and non-main event fights</span>",
    showlegend=False,
    font_color="#555963",
)
fig.add_annotation(
    x=2011,
    y=77,
    xanchor="right",
    yanchor="bottom",
    text="All main events are<br>five rounds since 2011",
    font=dict(size=14),
    arrowhead=4,
)
fig.update_xaxes(showline=True, linewidth=1, linecolor="lightgrey", title="Year")
fig.update_yaxes(
    showline=True, linewidth=1, linecolor="lightgrey", title="Median significant strikes"
)
fig.show()


<IPython.core.display.Javascript object>

We have a lot to unpack. First note that the median of significant strikes landed every year has been increasing. Second, usually there are more significant strikes landed in main-event fights, even though until 2011 only championship fights were 5 rounds. It would be interesting to see how many fights were actually five rounders up until that point. Lastly, the median of significant strikes reached two new maxima in both 2020 and 2021. I am curious to see whether there is a new maxima in 2022.

# Conclusion

First we took a look at per fight statistics. Outside of a few outliers, per fight statistics don't have much variability. Then we found that (fortunately for the UFC) most fights have a legitimate result. About fight results, most fights end in either KO/TKO or submission, particularly by punches and chokes. However, the most common result is unanimous decision. We also discovered an interesting relationship between the yearly number of fights and unique fighters. Finally, we found an uptrend in the yearly median of significant strikes landed.