# Business Report : UFC Fighter Trends and Insights : Marketing / Promotional Focus
In this project, I analyze UFC fight data to gain insights into fighter performance, trends in fighter metrics, and fight outcomes by weight class. 

## Key Business Questions : 

**Which fighters show consistent improvement over time?**
  - This can identify fighters who could be future stars and worth heavily promoting.


**Which weight classes have the most competitive fights?**
  - This can inform descisions on which weight classes to market as having exciting matchups.


**Are there any emerging trends in fighter metrics that could affect the evolution of the sport?**
  - This would be useful for adapting promotional content or strategy.


**Which fight outcomes are most common, and how do they vary by weight class?**
  - Insight into outcome patterns can guide marketing narratives.

## Dataset downloaded from Kaggle: [UFC Complete Dataset 1996-2024](https://www.kaggle.com/datasets/maksbasher/ufc-complete-dataset-all-events-1996-2024/data)

### This dataset contains the following:

    • Fighter stats - The folder contains 2 files, one is .csv cleaned from duplicates and .txt as a source for the fighter stats dataset
    
    • Large set - The folder contains the biggest dataset yet (contains 7439 rows and 94 columns)
    
    • Medium set - The folder contains the medium dataset for basic tasks (contains 7582 rows and 19 columns)
    
    • Small set - The folder contains the small dataset with data about completed and upcoming events with only 683 rows and 3 columns
    
    • Urls - The folder contains all the urls that were parced to get the data from the UFCstats website


---


## Load and display the data

*In this step, we first import the package(s) needed to load the data and conduct initial exploratory analysis. In this case, we need the Python 'pandas' library, and give it the nickname 'pd'.*
*Then we can read the csv files into dataframes using 'pd.read_csv(...)', and assign them meaningful variable names that we will use to access the data later.*

---


In [13]:
import pandas as pd
# load each csv into it's own dataframe
fighter_stats = pd.read_csv('../data/Fighter stats/fighter_stats.csv')

large_set = pd.read_csv('../data/Large set/large_dataset.csv')

medium_set = pd.read_csv('../data/Medium set/medium_dataset.csv')

complete_events = pd.read_csv('../data/Small set/completed_events_small.csv')

upcoming_events = pd.read_csv('../data/Small set/upcoming_events_small.csv')


In [14]:
# view the head of each dataframe and explore the size and  datatypes

print('Fighter Stats: \n', fighter_stats.head())
print('\n' * 2) # line for separation
print('Large Set: \n', large_set.head())
print('\n' * 2) # line for separation
print('Medium Set: \n', medium_set.head())
print('\n' * 2) # line for separation
print('Completed Events: \n', complete_events.head())
print('\n' * 2) # line for separation
print('Upcoming Events: \n', upcoming_events.head())

Fighter Stats: 
                name  wins  losses  height  weight   reach    stance   age  \
0      Amanda Ribas  12.0     5.0  160.02   56.70  167.64  Orthodox  30.0   
1    Rose Namajunas  13.0     6.0  165.10   56.70  165.10  Orthodox  31.0   
2     Karl Williams  10.0     1.0  190.50  106.59  200.66  Orthodox  34.0   
3       Justin Tafa   7.0     4.0  182.88  119.75  187.96  Southpaw  30.0   
4  Edmen Shahbazyan  13.0     4.0  187.96   83.91  190.50  Orthodox  26.0   

   SLpM  sig_str_acc  SApM  str_def  td_avg  td_acc  td_def  sub_avg  
0  4.63         0.40  3.40     0.61    2.07    0.51    0.85      0.7  
1  3.69         0.41  3.51     0.63    1.38    0.47    0.59      0.5  
2  2.87         0.52  1.70     0.60    4.75    0.50    1.00      0.2  
3  4.09         0.54  5.02     0.47    0.00    0.00    0.50      0.0  
4  3.60         0.52  4.09     0.45    2.24    0.38    0.63      0.6  



Large Set: 
                              event_name          r_fighter        b_fighter  \


---

*After looking over the data in all of these files, I will be focusing on the Large Set and Small Set - The Large Set contains fighter metrics as well as event names and outcomes, but does not include the date of the event or location. The Small Set contains event name, date and location for completed events in one file, and upcoming events in another file.*

*By connecting the completed events data from the Small Set with the event and fighter data in the Large Set, I will be able to measure fighter improvements over time.*

---


## Initial exploration of datasets - Large Set, Completed Events

---

 *In this step, I want to understand the types and structure of the data. This is where I check for missing values, duplicate rows, and whether there are inconsistencies in formatting. All of these determine my next steps in cleaning / manipulating the dataframes.*

In [28]:
print('Large Set Data Types: \n', large_set.dtypes)

Large Set Data Types: 
 event_name             object
r_fighter              object
b_fighter              object
winner                 object
weight_class           object
                       ...   
td_acc_total_diff     float64
str_def_total_diff    float64
td_def_total_diff     float64
sub_avg_diff          float64
td_avg_diff           float64
Length: 95, dtype: object


In [29]:
print('Completed Events Data Types: \n', complete_events.dtypes)

Completed Events Data Types: 
 event       object
date        object
location    object
dtype: object


---

*I may need to convert some columns to String - the event name, fighter names, winner and weight class columns for example. For now, I'll continue to explore the data.*

---

*Check for missing values, and duplicate rows*

---

In [26]:
print('Missing values in Large Set: \n', large_set.isnull().sum())

Missing values in Large Set: 
 event_name            0
r_fighter             0
b_fighter             0
winner                0
weight_class          0
                     ..
td_acc_total_diff     0
str_def_total_diff    0
td_def_total_diff     0
sub_avg_diff          0
td_avg_diff           0
Length: 95, dtype: int64


In [27]:
print('Missing values in Completed Events: \n', complete_events.isnull().sum())

Missing values in Completed Events: 
 event       0
date        0
location    0
dtype: int64


In [24]:
#  Check for duplicates 
print('Duplicates found in Large Set: \n', large_set.duplicated().sum())
print('Duplicates found in Completed Events: \n', complete_events.duplicated().sum())

Duplicates found in Large Set: 
 0
Duplicates found in Completed Events: 
 0


---

## Cleaning the data

---

The first thing I want to do here is to combine the completed events date and location data with the large set.

In [30]:
# connect the 2 dataframes. 

---

## Business Qustions

*In this section, I clean, format, and join the data as needed to be able to answer the business questions I have outlined at the start of this project. Each question has it's own section, to be easy to follow along.*

---

1. Which fighters show consistent improvement over time?




