# Analysis of UFC Events


In [1]:
%load_ext nb_black
%load_ext autoreload
%autoreload 2

<IPython.core.display.Javascript object>

In [2]:
import ufc_events_eda.utils.paths as path

# import ufc_events_eda.utils.preprocess as prep
import ufc_events_eda.visualization.visualize as viz
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import bar_chart_race as bcr

pio.renderers.default = "notebook_connected"

<IPython.core.display.Javascript object>

In [3]:
%%capture
%run './1-cvillafraz-data-processing.ipynb'

<IPython.core.display.Javascript object>

## Load events dataset


In [4]:
df = pd.read_parquet(path.data_processed_dir("events_processed.parquet"))
df.head()

Unnamed: 0,event_name,event_date,event_location,latitude,longitude
0,UFC Fight Night: Jung vs. Ige,2021-06-19,"Las Vegas, Nevada, USA",36.167256,-115.148516
1,UFC Fight Night: Gane vs. Volkov,2021-06-26,"Las Vegas, Nevada, USA",36.167256,-115.148516
2,UFC 264: Poirier vs. McGregor 3,2021-07-10,"Las Vegas, Nevada, USA",36.167256,-115.148516
3,UFC Fight Night: Makhachev vs. Moises,2021-07-17,"Las Vegas, Nevada, USA",36.167256,-115.148516
4,UFC Fight Night: Sandhagen vs. Dillashaw,2021-07-24,"Las Vegas, Nevada, USA",36.167256,-115.148516


<IPython.core.display.Javascript object>

In [5]:
df.set_index("event_date", inplace=True)
df.sort_index(inplace=True)

<IPython.core.display.Javascript object>

## What percentage of events took place in the U.S.


In [10]:
print(f"Events out of the US: {round(df['latitude'].isna().sum()/len(df), 1)*100}%")

Events out of the US: 30.0%


<IPython.core.display.Javascript object>

Roughly <strong style="color:#003C8A">70%</strong> of UFC events took place in the US.


## What is the proportion of numbered (PPV) vs non-numbered events?


Numbered UFC events refer to events that include a number in their name, such as UFC 205. With a few exceptions, most numbered UFC events are PPVs, which makes it important to distinguish them from non-numbered (non PPV) events.


In [11]:
df["is_numbered"] = df["event_name"].str.match("UFC (?=\d)")
round(df["is_numbered"].sum() / len(df), 2)

0.45

<IPython.core.display.Javascript object>

<p>PPVs represent approximately <br> <strong style='color:#003C8A;font-size:3.5rem'>45%</strong><br>of all UFC events</p>


## How has the number of events evolved every year?


In [18]:
by_year = df.groupby(df.index.year)["event_name"].count().sort_index().iloc[:-1]

fig = go.Figure()
fig.add_trace(
    go.Scatter(
        x=by_year.index[:12], y=by_year[:12], mode="lines", marker_color="lightgrey"
    )
)
fig.add_trace(
    go.Scatter(
        x=by_year.index[11:], y=by_year[11:], mode="lines", marker_color="#003C8A"
    )
)
fig.update_layout(
    title="The number of UFC events by year has been increasing since <span style='color:#003C8A;font-weight:bold'>2005</span>",
    xaxis_showgrid=False,
    yaxis_showgrid=False,
    plot_bgcolor="#fff",
    title_font=dict(color="#404b69"),
    font=dict(color="grey"),
    showlegend=False,
)
fig.update_xaxes(linecolor="lightgrey", title="Year")
fig.update_yaxes(linecolor="lightgrey", title="Number of events")
fig.show()

<IPython.core.display.Javascript object>

As shown, the UFC started making more events in 2005. In 2005, the first season of The Ultimate Fighter (a competitive reality show intended to sign new fighters) was launched. The Ultimate Fighter 1 Finale, which took place in April, featured a fight between Forrest Griffin and Stephan Bonnar. This legendary fight is considered by many as the catalyst of the UFC's future success, and explains why the number of events started to grow in 2005.

In addition, the number of events was barely affected by the COVID pandemic. The UFC was the first major sports organization to resume activities after the COVID breakout.


## What are the US cities with the most events?


In [12]:
by_city = (
    df[df["event_location"].str.contains("USA")]
    .groupby(by=["event_location", "latitude", "longitude"])["event_name"]
    .count()
    .reset_index()
    .sort_values("event_name", ascending=False)
)
print(
    f"{by_city.iloc[0]['event_location']}: {round(by_city.iloc[0].event_name / len(df), 2)}"
)

Las Vegas, Nevada, USA: 0.31


<IPython.core.display.Javascript object>

By a mile, the number 1 city with most UFC events is Las Vegas, as <strong style='color:#003C8A'>31%</strong> of events have taken place there. Therefore, Las Vegas will be excluded from further analyses involving cities.


In [20]:
by_city.drop_duplicates(subset=["latitude"], inplace=True)
fig = px.scatter_geo(
    by_city.iloc[1:],
    lat="latitude",
    lon="longitude",
    hover_name="event_location",
    projection="albers usa",
    color="event_name",
    size="event_name",
    color_continuous_scale="blues",
)

fig.update_geos(
    landcolor="lightgrey",
    showcountries=True,
    showsubunits=True,
    scope="usa",
    resolution=110,
)
fig.update_layout(
    title="<span style='color:rgb(8,48,107); font-weight:bold'>Anaheim, Houston, Atlantic City</span> among the US cities with most UFC events",
    coloraxis_colorbar=dict(
        title="Count of events", ticklabelposition="outside bottom"
    ),
)
fig.show()

<IPython.core.display.Javascript object>

As we will see in the next section, Atlantic City used to be one of the most important cities for the UFC. On the other hand, Houston has become more attractive to the UFC in recent years due to its more flexible COVID restrictions.


## What have been the most important (global) cities through the years?

To answer this, let's plot a bar chart race of the number of events per city and year


In [21]:
# The bar_chart_race method requires a df in long format. Hence, I make a pivot table with locations as columns,
# year as index and the cumulative count of events per year and location as values
df["event_bool"] = df["event_name"].astype("bool")
df_wo_vegas = df[~df["event_location"].str.contains("Vegas")]
df_long = (
    pd.pivot_table(
        data=df_wo_vegas,
        values="event_bool",
        columns="event_location",
        index=df_wo_vegas.index.year,
        aggfunc="sum",
    )
    .cumsum()
    .ffill()
)

<IPython.core.display.Javascript object>

In [22]:
%%capture
bcr.bar_chart_race(
    df=df_long,
    filename=str(path.reports_figures_dir('events_by_city.mp4')),
    n_bars=8,
    filter_column_colors=True,
    period_fmt="{x:.0f}",
    title="Events by cities",
    cmap="tab20b",
    period_length=800
)

<IPython.core.display.Javascript object>

In the first early years Birmingham, Alabama was the most important city. Then in 2003 Atlantic City, New Jersey takes over. Around 2009, London, Anaheim and Montreal start catching up. In 2015, Rio de Janeiro and Sao Paulo become more relevant (Brazil is the most important latin american country in MMA). In 2017 London takes the first place, followed by Rio and Sao Paulo. <br><br>
As we all know in 2020 the world went crazy, which forced the UFC to make events only in the US and in Yas Island, Abu Dhabi (known as "Fight Island"). This is why Abu Dhabi took the first place in 2020, and has been since the place with most total UFC events. Moreover, as stated in the previous section, Houston is now in the top 7.


## Conclusion

While the U.S. (especially Las Vegas) continues to be the most important place for the UFC, they have been expanding to new frontiers. This expansion, howerver, was delayed due to the COVID pandemic. I am sure this expansion will resume once most countries lift COVID related travel restrictions. As we know, the number of events by year started growing significantly since 2005, and this was not affected at all by the pandemic.
