# <strong>BeautifulSoup:</strong> Web Scrapping - UFC

**Name:** Arsalan Ali<br>
**Email:** arslanchaos@gmail.com

---

### **Table of Contents**
* Website to Scrap: "UFC"
* Link of the site: http://statleaders.ufc.com/
* Fastest Finish
* Fastest Submission
* Most Takedowns Landed
* Most Knockouts Landed

---

## **Fastest Finish**

### Import Libraries

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

### Set URL and Headers

In [2]:
url = 'http://statleaders.ufc.com/en/fight'

# Headers are used to access the websites as a real user (not a bot)
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win 64 ; x64) Apple WeKit /537.36(KHTML , like Gecko) Chrome/80.0.3987.162 Safari/537.36'}

# If you've a proxy then you can use it too
# proxies = {'https': 'http://62.60.160.34:3129'}

### Fetch Webpage

In [3]:
webpage = requests.get(url, headers=headers).text

# We can use the proxy using the proxies parameter
# webpage = requests.get(url, proxies=proxies, headers=headers).text

### Parse Webpage data into BeautifulSoup

In [4]:
soup = BeautifulSoup(webpage, "lxml")

### Testing

In [9]:
# Trying to access H1 tag of the website

soup.find_all("h1")

[<h1>Record Book</h1>]

### Fetching entire Data

In [192]:
# Trying to fetch the names of the companies through DIV and CLASS

ufc_html = soup.find("article", {"id": "FastestFinish-group"})

rank = []
winner = []
loser = []
time = []
date = []

for fighter in ufc_html.find_all("div", {"class": "results-table--tr"})[1:]:
    winner.append(fighter.findAll("span")[1].findAll("a")[0].text)
    loser.append(fighter.findAll("span")[1].findAll("a")[1].text)
    rank.append(fighter.span.text)
    time.append(fighter.findAll("span")[3].text)
    date.append(fighter.findAll("span")[4].text[:11])

cols_dict = {
    "Rank": rank,
    "Time": time,
    "Date": date,
    "Winner": winner,
    "Opponent": loser
}

fastest_ufc_finish = pd.DataFrame(cols_dict)

fastest_ufc_finish

Unnamed: 0,Rank,Time,Date,Winner,Opponent
0,1,0:05,07-06-2019,Jorge Masvidal,Ben Askren
1,2,0:06,01-16-2006,Duane Ludwig,Jonathan Goulet
2,3,0:07,08-29-2009,Todd Duffee,Tim Hague
3,3,0:07,12-10-2011,Chan Sung Jung,Mark Hominick
4,3,0:07,07-21-2012,Ryan Jimmo,Anthony Perosh
5,3,0:07,06-12-2021,Terrance McKinney,Matt Frevola
6,7,0:08,04-02-2008,James Irvin,Houston Alexander
7,7,0:08,01-24-2015,Makwan Amirkhani,Andy Ogle
8,7,0:08,04-11-2015,Leon Edwards,Seth Baczynski
9,10,0:09,09-19-2007,Gray Maynard,Joe Veres


## **Fastest Submission**

In [193]:
# Trying to fetch the names of the companies through DIV and CLASS

ufc_html = soup.find("article", {"id": "FastestSubmission-group"})

rank = []
winner = []
loser = []
time = []
date = []

for fighter in ufc_html.find_all("div", {"class": "results-table--tr"})[1:]:
    winner.append(fighter.findAll("span")[1].findAll("a")[0].text)
    loser.append(fighter.findAll("span")[1].findAll("a")[1].text)
    rank.append(fighter.span.text)
    time.append(fighter.findAll("span")[3].text)
    date.append(fighter.findAll("span")[4].text[:11])

cols_dict = {
    "Rank": rank,
    "Time": time,
    "Date": date,
    "Winner": winner,
    "Opponent": loser
}

fastest_ufc_submission = pd.DataFrame(cols_dict)

fastest_ufc_submission

Unnamed: 0,Rank,Time,Date,Winner,Opponent
0,1,0:14,02-28-2015,Ronda Rousey,Cat Zingano
1,2,0:16,04-02-2008,Marcus Aurelio,Ryan Roberts
2,3,0:17,11-05-2011,Terry Etim,Edward Faaloloto
3,4,0:19,09-17-2016,Chas Skelly,Maximo Blanco
4,5,0:20,12-16-2000,Dennis Hallman,Matt Hughes
5,6,0:23,12-08-2007,Roman Mitichyan,Dorian Price
6,6,0:23,06-13-2015,Patrick Williams,Alejandro Perez
7,8,0:24,02-27-2016,Teemu Packalen,Thibault Gouti
8,9,0:25,07-07-2016,Joe Duffy,Mitch Clarke
9,10,0:27,04-05-2007,Joe Stevenson,Melvin Guillard


## **Most Takedowns Landed**

In [196]:
# Trying to fetch the names of the companies through DIV and CLASS

ufc_html = soup.find("article", {"id": "TakedownsLanded-group"})

rank = []
winner = []
loser = []
takedowns = []
date = []

for fighter in ufc_html.find_all("div", {"class": "results-table--tr"})[1:]:
    winner.append(fighter.findAll("span")[1].findAll("a")[0].text)
    loser.append(fighter.findAll("span")[1].findAll("a")[1].text)
    rank.append(fighter.span.text)
    takedowns.append(fighter.findAll("span")[3].text)
    date.append(fighter.findAll("span")[4].text[:11])

cols_dict = {
    "Rank": rank,
    "Takedowns": takedowns,
    "Date": date,
    "Winner": winner,
    "Opponent": loser
}

most_ufc_takedowns = pd.DataFrame(cols_dict)

most_ufc_takedowns

Unnamed: 0,Rank,Takedowns,Date,Winner,Opponent
0,1,21,05-25-2013,Khabib Nurmagomedov,Abel Trujillo
1,2,16,07-07-2007,Sean Sherk,Hermes Franca
2,3,14,04-25-2015,Demetrious Johnson,Kyoji Horiguchi
3,3,14,06-20-2020,Curtis Blaydes,Alexander Volkov
4,5,13,03-01-2008,Luigi Fioravanti,Luke Cummo
5,5,13,06-13-2020,Merab Dvalishvili,Gustavo Lopez
6,7,12,03-16-2013,Johny Hendricks,Carlos Condit
7,7,12,07-27-2013,Demetrious Johnson,John Moraga
8,7,12,12-17-2016,Colby Covington,Bryan Barberena
9,7,12,04-15-2017,Tim Elliott,Louis Smolka


## **Most Knockdowns Landed**

In [209]:
# Trying to fetch the names of the companies through DIV and CLASS
url = 'http://statleaders.ufc.com/'

# Headers are used to access the websites as a real user (not a bot)
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.3; Win 64 ; x64) Apple WeKit /537.36(KHTML , like Gecko) Chrome/80.0.3987.162 Safari/537.36'}
webpage = requests.get(url, headers=headers).text
soup = BeautifulSoup(webpage, "lxml")
ufc_html = soup.find("article", {"id": "Knockdowns-group"})

rank = []
name = []
knockdowns = []

for fighter in ufc_html.find_all("div", {"class": "results-table--tr"})[1:]:
    name.append(fighter.findAll("span")[1].findAll("a")[0].text)
    rank.append(fighter.span.text)
    knockdowns.append(fighter.findAll("span")[2].text)


cols_dict = {
    "Rank": rank,
    "Knockdowns": knockdowns,
    "Name": name
}

most_ufc_knockdowns = pd.DataFrame(cols_dict)

most_ufc_knockdowns.head(10)

Unnamed: 0,Rank,Knockdowns,Name
0,1,20,Donald Cerrone
1,2,18,Anderson Silva
2,2,18,Jeremy Stephens
3,4,14,Chuck Liddell
4,4,14,Lyoto Machida
5,4,14,Mauricio Rua
6,4,14,Junior Dos Santos
7,4,14,Edson Barboza
8,4,14,Thiago Santos
9,10,13,Melvin Guillard
