<a id='top'></a>

# Opta Event Data Scraping from WhoScored? 
##### Notebook to scrape raw Opta Event data by StatsPerform from [WhoScored?](https://www.whoscored.com/), using [ScraperFC](https://github.com/oseymour/ScraperFC) by [Owen Seymour](https://mobile.twitter.com/owen_seymour)

### By [Edd Webster](https://www.twitter.com/eddwebster)
Notebook first written: 08/12/2021<br>
Notebook last updated: 08/12/2021

![Stats Perform](../../img/logos/stats_perform_logo_small.png)

![Opta](../../img/logos/opta_sports_logo_small.png)

![WhoScored?](../../img/logos/whoscored-logo.png)

___

<a id='sectionintro'></a>

## <a id='import_libraries'>Introduction</a>
This notebook scrapes player Event data from [WhoScored?](https://www.whoscored.com/) using the [ScraperFC](https://github.com/oseymour/ScraperFC) library by [Owen Seymour](https://mobile.twitter.com/owen_seymour), [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames, and [Selenium](https://www.selenium.dev/) and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping.

For more information about this notebook and the author, I'm available through all the following channels:
*    [eddwebster.com](https://www.eddwebster.com/);
*    edd.j.webster@gmail.com;
*    [@eddwebster](https://www.twitter.com/eddwebster);
*    [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);
*    [github/eddwebster](https://github.com/eddwebster/);
*    [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster);
*    [kaggle.com/eddwebster](https://www.kaggle.com/eddwebster); and
*    [hackerrank.com/eddwebster](https://www.hackerrank.com/eddwebster).

![Edd Webster](../../img/fifa21eddwebsterbanner.png)

The accompanying GitHub repository for this notebook can be found [here](https://github.com/eddwebster/football_analytics) and a static version of this notebook can be found [here](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/Capology%20Player%20Salary%20Web%20Scraping.ipynb).

___

<a id='sectioncontents'></a>

## <a id='notebook_contents'>Notebook Contents</a>
1.    [Notebook Dependencies](#section1)<br>
2.    [Project Brief](#section2)<br>
3.    [Data Scraping](#section3)<br>
      1.    [Introduction](#section3.1)<br>
      2.    [Scrape Data by League and Season](#section3.2)<br>
4.    [Summary](#section4)<br>
5.    [Next Steps](#section5)<br>
6.    [References](#section6)<br>

___

<a id='section1'></a>

## <a id='#section1'>1. Notebook Dependencies</a>

This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:
*    [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;
*    [`NumPy`](http://www.numpy.org/) for multidimensional array computing;
*    [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation;
*    [`Beautifulsoup`](https://pypi.org/project/beautifulsoup4/) and [`Selenium`](https://www.selenium.dev/) for web scraping; and
*    [`ScraperFC`](https://github.com/oseymour/ScraperFC) by [Owen Seymour](https://mobile.twitter.com/owen_seymour) (run pip install ScraperFC)

All packages used for this notebook except for [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) and [Selenium](https://www.selenium.dev/) can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/).

### Import Libraries and Modules

In [1]:
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv

# Import Dependencies
%matplotlib inline

# Math Operations
import numpy as np
from math import pi

# Datetime
import datetime
from datetime import date
import time

# Data Preprocessing
import pandas as pd
import os
import re
import random
import glob
from io import BytesIO
from pathlib import Path

# Reading directories
import glob
import os

# Working with JSON
import json
from pandas.io.json import json_normalize

# Web Scraping
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
import re

# ScraperFC library
#import ScraperFC as sfc    # run pip install ScraperFC
import traceback

# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-whitegrid')
import missingno as msno

# Progress Bar
from tqdm import tqdm

# Display in Jupyter
from IPython.display import Image, YouTubeVideo
from IPython.core.display import HTML

# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

print('Setup Complete')

Setup Complete


In [2]:
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))

Python: 3.7.6
NumPy: 1.20.3
pandas: 1.3.2
matplotlib: 3.4.2


### Defined Variables and Lists

##### Today's Date 

In [3]:
# Define today's date
todays_date = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')

##### Lists of folders

In [4]:
# Define lists of folders

## Folders types
lst_folders = ['raw', 'engineered']

## League names
lst_leagues = ['EPL', 'Bundesliga', 'Serie A', 'Ligue 1', 'Argentina Liga Profesional', 'EFL Championship', 'EFL1', 'EFL2']

## Seasons
lst_seasons = ['2009-2010', '2010-2011', '2011-2012', '2012-2013', '2013-2014', '2014-2015', '2015-2016', '2016-2017', '2017-2018', '2018-2019', '2019-2020', '2020-2021', '2021-2022']

## Data types
lst_data_types = ['events', 'formations', 'players']

### Defined Filepaths

In [5]:
# Set up initial paths to subfolders
#base_dir = os.path.join('..', '..')
base_dir = '/Volumes/3TB EXT/2022 work'    # alternative base dir when working from hard drive
data_dir = os.path.join(base_dir, 'data')
data_dir_opta = os.path.join(base_dir, 'data', 'opta')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
scripts_dir = os.path.join(base_dir, 'scripts')

### Custom Libraries
Import the scripts from the [ScraperFC](https://github.com/oseymour/ScraperFC) by [Owen Seymour](https://twitter.com/owen_seymour), stored in the 'ScraperFC' subfolder of the 'scripts' folder.

In [6]:
# Import the customer ScraperFC libraries required tfor scraping data

## Define the filepath of scripts - the 'ScraperFC' subfolder of the 'scripts' folder
sys.path.insert(0, os.path.abspath(scripts_dir))

## Custom scripts for scraping data created as part of the ScraperFC library by Owen Seymour
import ScraperFC as sfc

### Custom Functions (Scrapers)
Two different scrapers written as wrappers around [Owen Seymour](https://twitter.com/owen_seymour)'s code:
1.    Single match (`scrape_whoscored_match`)
2.    Entire season (`scrape_whoscored_season`)

#### Single Match Scraper

In [7]:
# Define function for scraping a full season of Event data from WhoScored
def scrape_whoscored_match(url):

    ## Initiate WhoScored scraper
    scraper = sfc.WhoScored()
    
    ## 
    try:
        data = scraper.scrape_match(url)

    ## 
    except:
        traceback.print_exc()

    ## Close WhoScored scraper
    scraper.close()
    
    ## Return unified season dataset
    return data

#### Full Season Scraper

In [8]:
# Define function for scraping a full season of Event data from WhoScored
def scrape_whoscored_season(season, comp):

    ## Initiate WhoScored scraper
    scraper = sfc.WhoScored()
    
    ## 
    try:
        data = scraper.scrape_matches(season, comp)

    ## 
    except:
        traceback.print_exc()

    ## Close WhoScored scraper
    scraper.close()

### Create Directory Structure
Create folders and subfolders for data, if not already created.

In [9]:
# Temoprary Directory sturcture used 

## Define list of folders
lst_folders = ['events', 'formations', 'players']

# Make the data directory structure
for folder in lst_folders:
    path = os.path.join(folder)
    if not os.path.exists(path):
        os.mkdir(path)

In [10]:
# Make the data directory structure
for folder in lst_folders:
    path = os.path.join(folder)
    if not os.path.exists(path):
        os.mkdir(path)
        for league in lst_leagues:
            league = league.replace(' ', '_').lower()
            path = os.path.join(folder, league)
            if not os.path.exists(path):
                os.mkdir(path)
                for season in lst_seasons:
                    path = os.path.join(folder, league, season)
                    if not os.path.exists(path):
                        os.mkdir(path)
                        for data_type in lst_data_types:
                            path = os.path.join(folder, league, season, data_type)
                            if not os.path.exists(path):
                                os.mkdir(path)
path = os.path.join('reference')
if not os.path.exists(path):
    os.mkdir(path)
path = os.path.join('archive')
if not os.path.exists(path):
    os.mkdir(path)

### Notebook Settings

In [11]:
# Display all columns of displayed pandas DataFrames
pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment = None

## <a id='#section2'>2. Project Brief</a>
This Jupyter notebook is part of a series of notebooks, to scrape, parse, engineer, and unify datasets, that can be used for modeling purposes.

This particular notebook is one of several **web scraping** notebooks, that takes player salary data from the [WhoScored?](https://www.whoscored.com/), and scrapes it using [Selenium](https://www.selenium.dev/) and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) and manipulates it as Dataframes using [pandas](http://pandas.pydata.org/).

This notebook, along with the other notebooks in this project workflow are shown in the following diagram:

![roadmap](../../img/football_analytics_data_roadmap.png)

Links to these notebooks in the [`football_analytics`](https://github.com/eddwebster/football_analytics) GitHub repository can be found at the following:
*    [1. Webscraping](https://github.com/eddwebster/football_analytics/tree/master/notebooks/1_data_scraping)
     +    [FBref Player Stats Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/FBref%20Player%20Stats%20Web%20Scraping.ipynb)
     +    [TransferMarket Player Bio and Status Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Bio%20and%20Status%20Web%20Scraping.ipynb)
     +    [TransferMarket Player Valuation Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Valuation%20Web%20Scraping.ipynb)
     +    [TransferMarkt Player Recorded Transfer Fees Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Webscraping.ipynb)
     +    [Capology Player Salary Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/Capology%20Player%20Salary%20Web%20Scraping.ipynb)
     +    [FBref Team Stats Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/FBref%20Team%20Stats%20Web%20Scraping.ipynb)
     +    [WhoScored? Event Data Scraping]() 
*    [2. Data Parsing](https://github.com/eddwebster/football_analytics/tree/master/notebooks/2_data_parsing)
     +    [ELO Team Ratings Data Parsing](https://github.com/eddwebster/football_analytics/blob/master/notebooks/2_data_parsing/ELO%20Team%20Ratings%20Data%20Parsing.ipynb)
*    [3. Data Engineering](https://github.com/eddwebster/football_analytics/tree/master/notebooks/3_data_engineering)
     +    [FBref Player Stats Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/FBref%20Player%20Stats%20Data%20Engineering.ipynb)
     +    [TransferMarket Player Bio and Status Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Bio%20and%20Status%20Data%20Engineering.ipynb)
     +    [TransferMarket Player Valuation Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Valuation%20Data%20Engineering.ipynb)
     +    [TransferMarkt Player Recorded Transfer Fees Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Data%20Engineering.ipynb)
     +    [Capology Player Salary Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Player%20Salary%20Data%20Engineering.ipynb)
     +    [FBref Team Stats Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/FBref%20Team%20Stats%20Data%20Engineering.ipynb)
     +    [ELO Team Ratings Data Parsing](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/ELO%20Team%20Ratings%20Data%20Parsing.ipynb)
     +    [TransferMarkt Team Recorded Transfer Fee Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Team%20Recorded%20Transfer%20Fee%20Data%20Engineering.ipynb) (aggregated from [TransferMarkt Player Recorded Transfer Fees notebook](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Data%20Engineering.ipynb))
     +    [Capology Team Salary Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Team%20Salary%20Data%20Engineering.ipynb) (aggregated from [Capology Player Salary notebook](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Player%20Salary%20Data%20Engineering.ipynb))
     +    [WhoScored? Event Data Engineering]() 
*    [4. Data Unification](https://github.com/eddwebster/football_analytics/tree/master/notebooks/4_data_unification)
*    [5. Modeling and Data Analysis]()

---

<a id='section3'></a>

## <a id='#section3'>3. Data Scraping</a>

### <a id='#section3.1'>3.1. Introduction</a>
Through WhoScored? Match Centre, it is possible to scrape Opta on-the-ball Event data for football matches from nearly twenty leagues, including the 'Big 5' European leagues, with some leagues going as far back as 2009/10 season.

The following video demonstrates how to extract a single match of data manually. This notebook works per this logic, but in an automated manner.

In [12]:
# Embed video of where Event data can be found in WhoScored!
#Video('../../../../../video/demo/whoscored.mov', width=770, height=530)

This notebook scrapers the data with two different functions:
1.    Single match (`scrape_whoscored_match`)
2.    Entire season (`scrape_whoscored_season`)

### <a id='#section3.2'>3.2. Scrape Data by League and Season</a>
The [ScraperFC](https://github.com/oseymour/ScraperFC) code creates links to individual matches in a season, provided the link to each link.

The following leagues and competitions have been identified as using event data in the match center, so far. This may change and there may be competitions missed, but these are the ones identified so far.

| No.     | League / Cup Competition| Country / Continent     | League Hyperlink    | Available to scrape (right now)   | Earliest season to scrape     |      
|---------|-------------------------|-------------------------|---------------------|-----------------------------------|----------|
| 1.      | EPL                     | England                 | https://www.whoscored.com/Regions/252/Tournaments/2/England-Premier-League     | Y    | 2009/2010     | 
| 2.      | La Liga                 | Spain                   | https://www.whoscored.com/Regions/206/Tournaments/4/Spain-LaLiga     | Y    | 2009/2010     |  
| 3.      | Bundesliga              | Germany                 | https://www.whoscored.com/Regions/81/Tournaments/3/Germany-Bundesliga     | Y    |  2009/2010     | 
| 4.      | Serie A                 | Italy                   | https://www.whoscored.com/Regions/108/Tournaments/5/Italy-Serie-A     | Y    | 2009/2010     |  
| 5.      | Ligue 1                 | France                  | https://www.whoscored.com/Regions/74/Tournaments/22/France-Ligue-1     | Y    | 2009/2010     |  
| 6.      | Liga NOS                | Portugal                | https://www.whoscored.com/Regions/177/Tournaments/21/Portugal-Liga-NOS     | N     | 2016/2017     | 
| 7.      | Eredivisie              | Netherlands             | https://www.whoscored.com/Regions/155/Tournaments/13/Netherlands-Eredivisie     | N     | 2013/2014     | 
| 8.      | Premier League          | Russia                  | https://www.whoscored.com/Regions/182/Tournaments/77/Russia-Premier-League      | N     | 2013/2014     | 
| 9.      | Brasileirão             | Brazil                  | https://www.whoscored.com/Regions/31/Tournaments/95/Brazil-Brasileir%C3%A3o     | N     | 2013          | 
| 10.      | Major League Soccer     | USA                     | https://www.whoscored.com/Regions/233/Tournaments/85/USA-Major-League-Soccer     | N     | 2013          | 
| 11.      | Super Lig               | Turkey                  | https://www.whoscored.com/Regions/225/Tournaments/17/Turkey-Super-Lig     | N     | 2014/2015     | 
| 12.      | EFL Championship        | England                 | https://www.whoscored.com/Regions/252/Tournaments/7/England-Championship     | Y    | 2013/2014       | 
| 13.      | Premiership             | Scotland                | https://www.whoscored.com/Regions/253/Tournaments/20/Scotland-Premiership     | N     | 2020/2021     | 
| 14.      | EFL1                    | England                 | https://www.whoscored.com/Regions/252/Tournaments/8/England-League-One     | Y    | 2018/2019              |  
| 15.      | EFL2                    | England                 | https://www.whoscored.com/Regions/252/Tournaments/9/England-League-Two     | Y    | 2018/2019              | 
| 16.      | Liga Profesional        | Argentina               | https://www.whoscored.com/Regions/11/Tournaments/68/Argentina-Liga-Profesional    | Y    | 2016          |  
| 17.      | Jupiler Pro League      | Belgium                 | https://www.whoscored.com/Regions/22/Tournaments/18/Belgium-Jupiler-Pro-League     | N     | 2020/2021     |
| 18.      | Bundesliga II           | Germany                 | https://www.whoscored.com/Regions/81/Tournaments/6/Germany-Bundesliga-II     | N     | 2015/2016     |
| 19.      | Champions League        | Europe                  | https://www.whoscored.com/Regions/250/Tournaments/12/Europe-Champions-League     | N     | 2009/2010     |
| 20.      | Europa League           | Europe                  | https://www.whoscored.com/Regions/250/Tournaments/30/Europe-Europa-League     | N     | 2012/2013     |
| 21.      | FA Cup              | England    | https://www.whoscored.com/Regions/252/Tournaments/29/England-League-Cup     | N     | 2012/2013 (latter stages of the competition)    |
| 22.      | League Cup              | England    | https://www.whoscored.com/Regions/252/Tournaments/29/England-League-Cup     | N     | 2012/2013 (latter stages of the competition)    |
| 23.      | FIFA World Cup          | International    | https://www.whoscored.com/Regions/247/Tournaments/36/International-FIFA-World-Cup     | N     | 2014          |
| 24.      | European Championship   | International (Europe)     | https://www.whoscored.com/Regions/247/Tournaments/124/International-European-Championship     | N     | 2012            |
| 25.      | African Cup of Nations  | International (Africa)     | https://www.whoscored.com/Regions/247/Tournaments/104/International-Africa-Cup-of-Nations     | N     | 2021 (I think)              |

Leagues that aren't available to scrape right now, can be with slight amendments to the ScraperFC scripts, they just haven't been done yet.

#### <a id='#section3.2.1'>3.2.1. Full Season Scraper

In [None]:
# Full season scraper

## Define season and competition
season = 2021
comp = 'Ligue 1'

## Scrape JSON data for an entire season and saved as a Python dictionary
scrape_whoscored_season(season, comp)

2022-01-09 12:43:48.993794: Scraping, engineering, and saving of the data for the Ligue 1 league for the 2021 season has now started...
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Scraping match data for match 1/380 in the 2020-2021 Ligue 1 season from https://www.whoscored.com/Matches/1464170/Live/France-Ligue-1-2020-2021-Brest-Lyon

  df_home_formations['playerIds'] = df_home_formations['playerIds'].str.replace('[','')
  df_home_formations['playerIds'] = df_home_formations['playerIds'].str.replace(']','')
  df_away_formations['playerIds'] = df_away_formations['playerIds'].str.replace('[','')
  df_away_formations['playerIds'] = df_away_formations['playerIds'].str.replace(']','')
  df_events['period'] = df_events['period'].str.replace('[','')
  df_events['type'] = df_events['type'].str.replace('[','')
  df_events['outcomeType'] = df_events['outcomeType'].str.replace('[','')
  df_events['qualifiers'] = df_events['qualifiers'].str.replace('[','')
  df_events['satisfiedEventsTypes'] = df_events['satisfiedEventsTypes'].str.replace('[','')
  df_events['period'] = df_events['period'].str.replace(']','')
  df_events['type'] = df_events['type'].str.replace(']','')
  df_events['outcomeType'] = df_events['outcomeType'].str.replace(']','')
  df_events['qualifiers'] = df_events['qualifiers'].str.replace(']','')
  df_events['sat

Saving data for 2021-02-19: Brest (2) vs. Lyon (3) in the Ligue 1 league for the 2021 season.
Saving home formation data...
Saving away formation data...
Saving player data...
Saving event data...
Scraping, engineering, and saving of the data for the Ligue 1 league for the 2021 season is now complete
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Scraping match data for match 20/380 in the 2020-2021 Ligue 1 season from https://www.whoscored.com/Matches/1464126/Live/France-Ligue-1-2020-2021-Strasbourg-Reims

In [None]:
# Full season scraper

## Define season and competition
season = 2020
comp = 'Ligue 1'

## Scrape JSON data for an entire season and saved as a Python dictionary
scrape_whoscored_season(season, comp)

##### <a id='#section3.2.1.1'>3.2.1.1. Premier League (EPL)
First season available in WhoScored is the 09/10 season (`year` == 2010).

In [None]:
# Full season scraper

## Define season and competition
season = 2022
comp = 'EPL'

## Scrape JSON data for an entire season and saved as a Python dictionary
scrape_whoscored_season(season, comp)

##### <a id='#section3.2.1.2'>3.2.1.2. Serie A
First season available in WhoScored is the 09/10 season (`year` == 2010).

In [None]:
# Full season scraper

## Define season and competition
season = 2009
comp = 'Serie A'

## Scrape JSON data for an entire season and saved as a Python dictionary
scrape_whoscored_season(season, comp)

##### <a id='#section3.2.1.3'>3.2.1.3. La Liga
First season available in WhoScored is the 09/10 season (year == 2010).

In [None]:
# Full season scraper

## Define season and competition
season = 2009
comp = 'La Liga'

## Scrape JSON data for an entire season and saved as a Python dictionary
scrape_whoscored_season(season, comp)

##### <a id='#section3.2.1.4'>3.2.1.4. Bundesliga
First season available in WhoScored is the 09/10 season (year == 2010).

In [None]:
# Full season scraper

## Define season and competition
season = 2019
comp = 'Bundesliga'

## Scrape JSON data for an entire season and saved as a Python dictionary
json = scrape_whoscored_season(season, comp)

## Display dictionary
json

##### <a id='#section3.2.1.5'>3.2.1.5. Ligue 1
First season available in WhoScored is the 09/10 season (year == 2010).

In [None]:
# Full season scraper

## Define season and competition
season = 2020
comp = 'Ligue 1'

## Scrape JSON data for an entire season and saved as a Python dictionary
json = scrape_whoscored_season(season, comp)

## Display dictionary
json

##### <a id='#section3.2.1.6'>3.2.1.6. MLS

##### <a id='#section3.2.1.7'>3.2.1.7. Championship
First season available in WhoScored is the 13/14 season (year == 2014)

In [None]:
# Full season scraper

## Define season and competition
season = 2021
comp = "EFL Championship"

## Scrape JSON data for an entire season and saved as a Python dictionary
json = scrape_whoscored_season(season, comp)

## Display dictionary
json

#### <a id='#section3.2.8'>3.2.8. League One
First season available in WhoScored is the 13/14 season (year == 2014)

In [None]:
# Full season scraper

## Define season and competition
season = 2014
comp = "EFL1"

## Scrape JSON data for an entire season and saved as a Python dictionary
json = scrape_whoscored_season(season, comp)

## Display dictionary
json

#### <a id='#section3.2.9'>3.2.9. League Two
First season available in WhoScored is the 13/14 season (year == 2014)

In [None]:
# Full season scraper

## Define season and competition
season = 2021
comp = "EFL2"

## Scrape JSON data for an entire season and saved as a Python dictionary
json = scrape_whoscored_season(season, comp)

## Display dictionary
json

#### <a id='#section3.2.2'>3.2.2. Single Match Scraper
Serie A for 21/22 season (`Year` == 2022)

##### <a id='#section3.2.2.1'>3.2.2.1. Premier League (EPL)

In [None]:
# Single match scraper

## Define URL
url = 'https://www.whoscored.com/Matches/1549687/Live/' + \
      'England-Premier-League-2021-2022-West-Ham-Chelsea'

## Scrape JSON data for a single match and saved as a Python dictionary
item = scrape_whoscored_match(url)

---

<a id='section4'></a>

## <a id='#section4'>4. Summary</a>
This notebook scrapes player statstics data from [WhoScored?](https://www.whoscored.com/) using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames, [Selenium](https://www.selenium.dev/) and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping.

___

<a id='section5'></a>

## <a id='#section5'>5. Next Steps</a>
This data is now ready to be engineered.

___

<a id='section6'></a>

## <a id='#section6'>6. References</a>
*    [ScraperFC](https://github.com/oseymour/ScraperFC) Opta event data webscraper for [WhoScored?](https://www.whoscored.com/) by [Owen Seymour](https://mobile.twitter.com/owen_seymour)
*    [Owen Seymour](https://mobile.twitter.com/owen_seymour)'s Google Drive for data already scraped, including Premier League data since 09/10: 
https://drive.google.com/drive/folders/1LhW3wcG5uoAAHcgPHRcJYmIQaMY9GlRI

---

***Visit my website [eddwebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***

[Back to the top](#top)