
**Overview**

This notebook parses all past ufc fight stats when run, it does not include upcoming fights


**scrape ufc fight stats**

Get all event details, name, url, date, location for all ufc events
for each event, get fight details all fights on card
parse each fight to get fight stats of both fighters


In [1]:
# imports
import pandas as pd
from tqdm.notebook import tqdm_notebook

# import library
import scrape_ufc_stats_library as LIB

# import config
import yaml
config = yaml.safe_load(open('scrape_ufc_stats_config.yaml'))

## Parse Event Details

Includes:
* Event
* URL
* Date
* Location

In [2]:
# define url to parse
events_url = config['completed_events_all_url']

In [3]:
# get soup
soup = LIB.get_soup(events_url)

# parse event details
all_event_details_df = LIB.parse_event_details(soup)

# show event details
display(all_event_details_df)

# write event details to file
all_event_details_df.to_csv(config['event_details_file_name'], index=False)

Unnamed: 0,EVENT,URL,DATE,LOCATION
0,UFC 311: Makhachev vs. Moicano,http://ufcstats.com/event-details/39f68882def7...,"January 18, 2025","Inglewood, California, USA"
1,UFC Fight Night: Dern vs. Ribas 2,http://ufcstats.com/event-details/81ddc98fceb3...,"January 11, 2025","Las Vegas, Nevada, USA"
2,UFC Fight Night: Covington vs. Buckley,http://ufcstats.com/event-details/72c9c2eadfc3...,"December 14, 2024","Tampa, Florida, USA"
3,UFC 310: Pantoja vs. Asakura,http://ufcstats.com/event-details/ad23903ef3af...,"December 07, 2024","Las Vegas, Nevada, USA"
4,UFC Fight Night: Yan vs. Figueiredo,http://ufcstats.com/event-details/e955046551f8...,"November 23, 2024","Macau, China"
...,...,...,...,...
712,UFC 6: Clash of the Titans,http://ufcstats.com/event-details/1c3f5e85b59e...,"July 14, 1995","Casper, Wyoming, USA"
713,UFC 5: The Return of the Beast,http://ufcstats.com/event-details/dedc3bb440d0...,"April 07, 1995","Charlotte, North Carolina, USA"
714,UFC 4: Revenge of the Warriors,http://ufcstats.com/event-details/b60391da771d...,"December 16, 1994","Tulsa, Oklahoma, USA"
715,UFC 3: The American Dream,http://ufcstats.com/event-details/1a49e0670dfa...,"September 09, 1994","Charlotte, North Carolina, USA"


## Parse Fight Details
Includes:
* Event
* Bout
* URL

In [4]:
# define list of urls of fights to parse
list_of_events_urls = list(all_event_details_df['URL'])

In [5]:
# create empty df to store fight details
all_fight_details_df = pd.DataFrame(columns=config['fight_details_column_names'])

# loop through each event and parse fight details
for url in tqdm_notebook(list_of_events_urls):

    # get soup
    soup = LIB.get_soup(url)

    # parse fight links
    fight_details_df = LIB.parse_fight_details(soup)
    
    # concat fight details
    all_fight_details_df = pd.concat([all_fight_details_df, fight_details_df])

# show all fight details
display(all_fight_details_df)

# write fight details to file
all_fight_details_df.to_csv(config['fight_details_file_name'], index=False)

  0%|          | 0/717 [00:00<?, ?it/s]

Unnamed: 0,EVENT,BOUT,URL
0,UFC 311: Makhachev vs. Moicano,Islam Makhachev vs. Renato Moicano,http://ufcstats.com/fight-details/daef1691c7d6...
1,UFC 311: Makhachev vs. Moicano,Merab Dvalishvili vs. Umar Nurmagomedov,http://ufcstats.com/fight-details/f39941b3743b...
2,UFC 311: Makhachev vs. Moicano,Jiri Prochazka vs. Jamahal Hill,http://ufcstats.com/fight-details/959fea398ffa...
3,UFC 311: Makhachev vs. Moicano,Jailton Almeida vs. Serghei Spivac,http://ufcstats.com/fight-details/54d48b33e724...
4,UFC 311: Makhachev vs. Moicano,Reinier de Ridder vs. Kevin Holland,http://ufcstats.com/fight-details/69d63c057aeb...
...,...,...,...
10,UFC 2: No Way Out,Orlando Wiet vs. Robert Lucarelli,http://ufcstats.com/fight-details/3b020d4914b4...
11,UFC 2: No Way Out,Frank Hamaker vs. Thaddeus Luster,http://ufcstats.com/fight-details/d917c8c7461b...
12,UFC 2: No Way Out,Johnny Rhodes vs. David Levicki,http://ufcstats.com/fight-details/ccee020be2e8...
13,UFC 2: No Way Out,Patrick Smith vs. Ray Wizard,http://ufcstats.com/fight-details/4b9ae533ccb3...


## Parse Fight Results and Fight Stats
### Fight Results Includes:
* Event
* Bout
* Weightclass
* Method
* Round
* Time
* Time Format
* Referee
* Details

### Fight Stats Includes:
* Event
* Bout
* Round
* Fighter
* Kd
* Sig.Str.
* Sig.Str. %
* Total Str.
* Td
* Td %
* Sub.Att
* Rev.
* Ctrl
* Head
* Body
* Leg
* Distance
* Clinch
* Ground

In [6]:
# define list of urls of fights to parse
list_of_fight_details_urls = list(all_fight_details_df['URL'])

In [8]:
# create empty df to store fight results
all_fight_results_df = pd.DataFrame(columns=config['fight_results_column_names'])
# create empty df to store fight stats
all_fight_stats_df = pd.DataFrame(columns=config['fight_stats_column_names'])

# loop through each fight and parse fight results and stats
for url in tqdm_notebook(list_of_fight_details_urls):

    # get soup
    soup = LIB.get_soup(url)

    # parse fight results and fight stats
    fight_results_df, fight_stats_df = LIB.parse_organise_fight_results_and_stats(
        soup,
        url,
        config['fight_results_column_names'],
        config['totals_column_names'],
        config['significant_strikes_column_names']
        )

    # concat fight results
    all_fight_results_df = pd.concat([all_fight_results_df, fight_results_df])
    # concat fight stats
    all_fight_stats_df = pd.concat([all_fight_stats_df, fight_stats_df])

# show all fight results
display(all_fight_results_df)
# show all fight stats
display(all_fight_stats_df)

# write to file
all_fight_results_df.to_csv(config['fight_results_file_name'], index=False)
# write to file
all_fight_stats_df.to_csv(config['fight_stats_file_name'], index=False)

  0%|          | 0/8001 [00:00<?, ?it/s]

ConnectTimeout: HTTPConnectionPool(host='ufcstats.com', port=80): Max retries exceeded with url: /fight-details/78a7bf038e56881d (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x000001A650FC0DD0>, 'Connection to ufcstats.com timed out. (connect timeout=None)'))

In [None]:
print(soup.prettify())