# Business Report : UFC Fighter Trends and Insights : Marketing / Promotional Focus
In this project, I analyze UFC fight data to gain insights into fighter performance, trends in fighter metrics, and fight outcomes by weight class. 

## Key Business Questions : 

**Which fighters show consistent improvement over time?**
  - This can identify fighters who could be future stars and worth heavily promoting.

Which weight classes have the most competitive fights?**
  - This can inform descisions on which weight classes to market as having exciting matchups.

Are there any emerging trends in fighter metrics that could affect the evolution of the sport?**
  - This would be useful for adapting promotional content or strategy.

Which fight outcomes are most common, and how do they vary by weight class?
  - Insight into outcome patterns can guide marketing narratives.

## Dataset downloaded from Kaggle: [UFC Complete Dataset 1996-2024](https://www.kaggle.com/datasets/maksbasher/ufc-complete-dataset-all-events-1996-2024/data)

### This dataset contains the following:

    • Fighter stats - The folder contains 2 files, one is .csv cleaned from duplicates and .txt as a source for the fighter stats dataset
    
    • Large set - The folder contains the biggest dataset yet (contains 7439 rows and 94 columns)
    • Medium set - The folder contains the medium dataset for basic tasks (contains 7582 rows and 19 columns)
    
    • Small set - The folder contains the small dataset with data about completed and upcoming events with only 683 rows and 3 columns
    
    • Urls - The folder contains all the urls that were parced to get the data from the UFCstats website




## - Load and display the data

*In this step, we first import the package(s) needed to load the data and conduct initial exploratory analysis. In this case, we need the Python 'pandas' library, and give it the nickname 'pd'.*
*Then we can read the csv files into dataframes using 'pd.read_csv(...)', and assign them meaningful variable names that we will use to access the data later.*

In [1]:
import pandas as pd
# load each csv into it's own dataframe
fighter_stats = pd.read_csv('../data/Fighter stats/fighter_stats.csv')

large_set = pd.read_csv('../data/Large set/large_dataset.csv')

medium_set = pd.read_csv('../data/Medium set/medium_dataset.csv')

complete_events = pd.read_csv('../data/Small set/completed_events_small.csv')

upcoming_events = pd.read_csv('../data/Small set/upcoming_events_small.csv')


In [2]:
# view the head of each dataframe and explore the size and  datatypes

print('Fighter Stats: \n', fighter_stats.head())
print('\n' * 2) # line for separation
print('Large Set: \n', large_set.head())
print('\n' * 2) # line for separation
print('Medium Set: \n', medium_set.head())
print('\n' * 2) # line for separation
print('Completed Events: \n', complete_events.head())
print('\n' * 2) # line for separation
print('Upcoming Events: \n', upcoming_events.head())

Fighter Stats: 
                name  wins  losses  height  weight   reach    stance   age  \
0      Amanda Ribas  12.0     5.0  160.02   56.70  167.64  Orthodox  30.0   
1    Rose Namajunas  13.0     6.0  165.10   56.70  165.10  Orthodox  31.0   
2     Karl Williams  10.0     1.0  190.50  106.59  200.66  Orthodox  34.0   
3       Justin Tafa   7.0     4.0  182.88  119.75  187.96  Southpaw  30.0   
4  Edmen Shahbazyan  13.0     4.0  187.96   83.91  190.50  Orthodox  26.0   

   SLpM  sig_str_acc  SApM  str_def  td_avg  td_acc  td_def  sub_avg  
0  4.63         0.40  3.40     0.61    2.07    0.51    0.85      0.7  
1  3.69         0.41  3.51     0.63    1.38    0.47    0.59      0.5  
2  2.87         0.52  1.70     0.60    4.75    0.50    1.00      0.2  
3  4.09         0.54  5.02     0.47    0.00    0.00    0.50      0.0  
4  3.60         0.52  4.09     0.45    2.24    0.38    0.63      0.6  



Large Set: 
                              event_name          r_fighter        b_fighter  \

## After looking over the data in all of these files, I will be focusing on the Large Set and Fighter Stats. These have the most comprehensive data, and cover all fighter info, the events they were in, and win/loss information.

## -- Fighter Stats df initial exploration

In [3]:
# Check datatypes, get summary statistics for numeric columns, and check for null values.

print('Column Data Types: \n', fighter_stats.dtypes)

Column Data Types: 
 name            object
wins           float64
losses         float64
height         float64
weight         float64
reach          float64
stance          object
age            float64
SLpM           float64
sig_str_acc    float64
SApM           float64
str_def        float64
td_avg         float64
td_acc         float64
td_def         float64
sub_avg        float64
dtype: object


### Looks like the 'name' and 'stance' columns are Objects, and may be more useful as String

In [4]:
print('Summary stats of Fighter Stats: \n', fighter_stats.describe())

Summary stats of Fighter Stats: 
               wins       losses       height       weight        reach  \
count  2478.000000  2478.000000  2478.000000  2478.000000  1823.000000   
mean     14.399112     6.111380   178.384262    76.876852   182.035656   
std       9.853474     4.548011     8.851777    17.976646    10.654129   
min       0.000000     0.000000   152.400000    52.160000   147.320000   
25%       9.000000     3.000000   172.720000    65.770000   175.260000   
50%      13.000000     5.000000   177.800000    77.110000   182.880000   
75%      18.000000     8.000000   185.420000    83.910000   190.500000   
max     253.000000    53.000000   210.820000   349.270000   213.360000   

               age         SLpM  sig_str_acc         SApM      str_def  \
count  2318.000000  2478.000000  2478.000000  2478.000000  2478.000000   
mean     37.727783     2.932623     0.406029     3.388567     0.488910   
std       7.251188     1.736693     0.151297     2.027258     0.162305   
min

In [5]:
print('Check for null values: \n', fighter_stats.isnull().sum())

Check for null values: 
 name             1
wins             1
losses           1
height           1
weight           1
reach          656
stance          78
age            161
SLpM             1
sig_str_acc      1
SApM             1
str_def          1
td_avg           1
td_acc           1
td_def           1
sub_avg          1
dtype: int64



### Every column has at least one null value... but 3 columns have significant missing values - reach, stance and age.

In [8]:
#  Check for duplicates 
print('Duplicates found: \n', fighter_stats.duplicated().sum())

Duplicates found: 
 0


### --Large Set initial exploration

In [9]:
# Check datatypes, get summary statistics for numeric columns, and check for null values.

print('Column Data Types: \n', large_set.dtypes)

Column Data Types: 
 event_name             object
r_fighter              object
b_fighter              object
winner                 object
weight_class           object
                       ...   
td_acc_total_diff     float64
str_def_total_diff    float64
td_def_total_diff     float64
sub_avg_diff          float64
td_avg_diff           float64
Length: 95, dtype: object


Some of these may need to be converted to String, just like the fighter_stats dataframe. 

In [10]:
print('Summary stats of the Large Set: \n', large_set.describe())

Summary stats of the Large Set: 
        is_title_bout  finish_round  total_rounds     time_sec         r_kd  \
count    7439.000000   7439.000000   7408.000000  7439.000000  7439.000000   
mean        0.055787      2.336336      3.128915   227.016669     0.249227   
std         0.229525      1.015243      0.652739    98.169665     0.524210   
min         0.000000      1.000000      1.000000     5.000000     0.000000   
25%         0.000000      1.000000      3.000000   149.000000     0.000000   
50%         0.000000      3.000000      3.000000   287.000000     0.000000   
75%         0.000000      3.000000      3.000000   300.000000     0.000000   
max         1.000000      5.000000      5.000000  1080.000000     5.000000   

         r_sig_str  r_sig_str_att  r_sig_str_acc        r_str    r_str_att  \
count  7439.000000    7439.000000    7439.000000  7439.000000  7439.000000   
mean     38.361204      83.786262       0.475335    58.199892   106.374916   
std      32.871278      71.38

In [11]:
print('Check for null values: \n', large_set.isnull().sum())

Check for null values: 
 event_name            0
r_fighter             0
b_fighter             0
winner                0
weight_class          0
                     ..
td_acc_total_diff     0
str_def_total_diff    0
td_def_total_diff     0
sub_avg_diff          0
td_avg_diff           0
Length: 95, dtype: int64


In [12]:
#  Check for duplicates 
print('Duplicates found: \n', large_set.duplicated().sum())

Duplicates found: 
 0
