# Webscraping Football Matches From The EPL

## Objective:

In this project, we will learn how to scrape football matches data from the English Premier League. First, we will download all of the matches played in several seasons with the help of Python and Requests library. After that, we will parse and clean our data using BeautifulSoup and Pandas libraries. By the end, we will have a single pandas dataframe with all of the EPL matches for different seasons.

## Scraping our first page with requests

In [1]:
import requests

In [2]:
standings = "https://fbref.com/en/comps/9/Primier-League-Stats"

In [3]:
data = requests.get(standings)

In [9]:
data.text[:1000]

'    \n      \n<!DOCTYPE html>\n<html data-version="klecko-" data-root="/home/fb/deploy/www/base" itemscope itemtype="https://schema.org/WebSite" lang="en" class="no-js" >\n<head>\n    <meta charset="utf-8">\n    <meta http-equiv="x-ua-compatible" content="ie=edge">\n    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=2.0" />\n    <link rel="dns-prefetch" href="https://d2p3bygnnzw9w3.cloudfront.net/req/202204185" />\n    <!-- Quantcast Choice. Consent Manager Tag v2.0 (for TCF 2.0) -->\n<script type="text/javascript" async=true>\n    (function() {\n\tvar host = window.location.hostname;\n\tvar element = document.createElement(\'script\');\n\tvar firstScript = document.getElementsByTagName(\'script\')[0];\n\tvar url = \'https://quantcast.mgr.consensu.org\'\n\t    .concat(\'/choice/\', \'XwNYEpNeFfhfr\', \'/\', host, \'/choice.js\')\n\tvar uspTries = 0;\n\tvar uspTriesLimit = 3;\n\telement.async = true;\n\telement.type = \'text/javascript\';\n\telement

## Parsing html links with BeautifulSoup

In [10]:
from bs4 import BeautifulSoup

In [12]:
soup = BeautifulSoup(data.text)

In [13]:
standings_table = soup.select("table.stats_table")[0]

In [15]:
links = standings_table.find_all('a')

In [19]:
links = [l.get('href') for l in links]
links[:10]

['/en/squads/b8fd03ef/Manchester-City-Stats',
 '/en/matches/c294f564/Burnley-Manchester-City-April-2-2022-Premier-League',
 '/en/matches/37e2fe92/Manchester-City-Liverpool-April-10-2022-Premier-League',
 '/en/matches/34fd93f9/Manchester-City-Brighton-and-Hove-Albion-April-20-2022-Premier-League',
 '/en/matches/af522ca3/Manchester-City-Watford-April-23-2022-Premier-League',
 '/en/matches/5ce80a04/Leeds-United-Manchester-City-April-30-2022-Premier-League',
 '/en/players/892d5bb1/Riyad-Mahrez',
 '/en/players/e46012d4/Kevin-De-Bruyne',
 '/en/players/3bb7b8b4/Ederson',
 '/en/squads/822bd0ba/Liverpool-Stats']

In [20]:
links = [l for l in links if "/squads/" in l]
links

['/en/squads/b8fd03ef/Manchester-City-Stats',
 '/en/squads/822bd0ba/Liverpool-Stats',
 '/en/squads/cff3d9bb/Chelsea-Stats',
 '/en/squads/18bb7c10/Arsenal-Stats',
 '/en/squads/361ca564/Tottenham-Hotspur-Stats',
 '/en/squads/19538871/Manchester-United-Stats',
 '/en/squads/7c21e445/West-Ham-United-Stats',
 '/en/squads/8cec06e1/Wolverhampton-Wanderers-Stats',
 '/en/squads/d07537b9/Brighton-and-Hove-Albion-Stats',
 '/en/squads/b2b47a98/Newcastle-United-Stats',
 '/en/squads/a2d435b3/Leicester-City-Stats',
 '/en/squads/47c64c55/Crystal-Palace-Stats',
 '/en/squads/8602292d/Aston-Villa-Stats',
 '/en/squads/cd051869/Brentford-Stats',
 '/en/squads/33c895d4/Southampton-Stats',
 '/en/squads/943e8050/Burnley-Stats',
 '/en/squads/5bfb9659/Leeds-United-Stats',
 '/en/squads/d3fd31cc/Everton-Stats',
 '/en/squads/2abfe087/Watford-Stats',
 '/en/squads/1c781004/Norwich-City-Stats']

In [22]:
team_urls = [f"https://fbref.com{l}" for l in links]
team_urls

['https://fbref.com/en/squads/b8fd03ef/Manchester-City-Stats',
 'https://fbref.com/en/squads/822bd0ba/Liverpool-Stats',
 'https://fbref.com/en/squads/cff3d9bb/Chelsea-Stats',
 'https://fbref.com/en/squads/18bb7c10/Arsenal-Stats',
 'https://fbref.com/en/squads/361ca564/Tottenham-Hotspur-Stats',
 'https://fbref.com/en/squads/19538871/Manchester-United-Stats',
 'https://fbref.com/en/squads/7c21e445/West-Ham-United-Stats',
 'https://fbref.com/en/squads/8cec06e1/Wolverhampton-Wanderers-Stats',
 'https://fbref.com/en/squads/d07537b9/Brighton-and-Hove-Albion-Stats',
 'https://fbref.com/en/squads/b2b47a98/Newcastle-United-Stats',
 'https://fbref.com/en/squads/a2d435b3/Leicester-City-Stats',
 'https://fbref.com/en/squads/47c64c55/Crystal-Palace-Stats',
 'https://fbref.com/en/squads/8602292d/Aston-Villa-Stats',
 'https://fbref.com/en/squads/cd051869/Brentford-Stats',
 'https://fbref.com/en/squads/33c895d4/Southampton-Stats',
 'https://fbref.com/en/squads/943e8050/Burnley-Stats',
 'https://fbref.