<a id='top'></a>

# Capology Player Web Scraping
##### Notebook to scrape raw data  from [Capology](https://www.capology.com/) using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) and [Selenium](https://www.selenium.dev/) 

### By [Edd Webster](https://www.twitter.com/eddwebster)
Notebook first written: 01/08/2021<br>
Notebook last updated: 07/08/2021

![title](../../img/logos/capology-logo.jpeg)

___

<a id='sectionintro'></a>

## <a id='import_libraries'>Introduction</a>
This notebook scrapes player statstics data from [Capology](https://www.capology.com/), using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames, and [Selenium](https://www.selenium.dev/) and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping.

For more information about this notebook and the author, I'm available through all the following channels:
*    [eddwebster.com](https://www.eddwebster.com/);
*    edd.j.webster@gmail.com;
*    [@eddwebster](https://www.twitter.com/eddwebster);
*    [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);
*    [github/eddwebster](https://github.com/eddwebster/);
*    [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster);
*    [kaggle.com/eddwebster](https://www.kaggle.com/eddwebster); and
*    [hackerrank.com/eddwebster](https://www.hackerrank.com/eddwebster).

![title](../../img/fifa21eddwebsterbanner.png)

The accompanying GitHub repository for this notebook can be found [here](https://github.com/eddwebster/football_analytics) and a static version of this notebook can be found [here](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/Capology%20Player%20Salary%20Web%20Scraping.ipynb).

___

<a id='sectioncontents'></a>

## <a id='notebook_contents'>Notebook Contents</a>
1.    [Notebook Dependencies](#section1)<br>
2.    [Project Brief](#section2)<br>
3.    [Data Scraping](#section3)<br>
4.    [Data Unification](#section4)<br>
5.    [Data Export](#section5)<br>
6.    [Summary](#section6)<br>
7.    [Next Steps](#section7)<br>
8.    [References](#section8)<br>

___

<a id='section1'></a>

## <a id='#section1'>1. Notebook Dependencies</a>

This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:
*    [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;
*    [`NumPy`](http://www.numpy.org/) for multidimensional array computing;
*    [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation;
*    [`Beautifulsoup`](https://pypi.org/project/beautifulsoup4/) and [Selenium](https://www.selenium.dev/) for web scraping.

All packages used for this notebook except for [`Beautifulsoup`](https://pypi.org/project/beautifulsoup4/) and [Selenium](https://www.selenium.dev/) can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/).

### Import Libraries and Modules

In [3]:
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv

# Import Dependencies
%matplotlib inline

# Math Operations
import numpy as np
from math import pi

# Datetime
import datetime
from datetime import date
import time

# Data Preprocessing
import pandas as pd
import os
import re
import random
import glob
from io import BytesIO
from pathlib import Path

# Reading directories
import glob
import os

# Working with JSON
import json
from pandas.io.json import json_normalize

# Web Scraping
from selenium import webdriver
from bs4 import BeautifulSoup
import requests
from bs4 import BeautifulSoup
import re

# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-whitegrid')
import missingno as msno

# Progress Bar
from tqdm import tqdm

# Display in Jupyter
from IPython.display import Image, YouTubeVideo
from IPython.core.display import HTML

# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

print('Setup Complete')

Setup Complete


In [4]:
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))

Python: 3.9.7
NumPy: 1.21.5
pandas: 1.4.1
matplotlib: 3.5.1


### Defined Variables and Lists

##### Today's Date 

In [5]:
# Define today's date
todays_date = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')

##### Season

In [7]:
# Define variables and lists

## Define season
season = '2023'    # '2020' for the 20/21 season

# Create 'Full Season' and 'Short Season' strings

## Full season
full_season_string = str(int(season)) + '/' + str(int(season) + 1)

## Short season
short_season_string = str((str(int(season))[-2:]) + (str(int(season) + 1)[-2:]))

##### Scraping Variables

In [8]:
options = webdriver.ChromeOptions()

In [9]:
##
options.add_argument('--headless')
options.add_argument('--no-sandbox')
options.add_argument('--disable-dev-shm-usage')

##### Teams and Leagues

In [10]:
# Serie A

## 2013-2014 Serie A

## 2015-2016 Serie A
lst_teams_sa_1516 = ['ac-milan', 'atalanta', 'bologna', 'carpi', 'chievo-verona', 'empoli', 'fiorentina', 'frosinone',
                     'genoa', 'hellas-verona', 'inter-milan', 'juventus', 'lazio', 'napoli', 'palermo', 'roma',
                     'sampdoria', 'sassuolo', 'torino', 'udinese']

## 2016-2017 Serie A
lst_teams_sa_1617 = ['ac-milan', 'atalanta', 'bologna', 'cagliari', 'chievo-verona', 'crotone', 'empoli', 'fiorentina',
                     'genoa', 'inter-milan', 'juventus', 'lazio', 'napoli', 'palermo', 'pescara', 'roma',
                     'sampdoria', 'sassuolo', 'torino', 'udinese']

## 2017-2018 Serie A
lst_teams_sa_1718 = ['ac-milan', 'atalanta', 'benevento', 'bologna', 'cagliari', 'chievo-verona', 'crotone', 'fiorentina',
                     'genoa', 'hellas-verona', 'inter-milan', 'juventus', 'lazio', 'napoli', 'roma',
                     'sampdoria', 'sassuolo', 'spal', 'torino', 'udinese']

## 2018-2019 Serie A
lst_teams_sa_1819 = ['ac-milan', 'atalanta', 'bologna', 'cagliari', 'chievo-verona', 'empoli', 'fiorentina',
                     'frosinone', 'genoa', 'inter-milan', 'juventus', 'lazio', 'napoli', 'parma', 'roma',
                     'sampdoria', 'sassuolo', 'spal', 'torino', 'udinese']

## 2019-2020 Serie A
lst_teams_sa_1920 = ['ac-milan', 'atalanta', 'bologna', 'brescia', 'cagliari', 'fiorentina',
                     'genoa', 'hellas-verona', 'inter-milan', 'juventus', 'lazio', 'lecce', 'napoli', 'parma', 'roma',
                     'sampdoria', 'sassuolo', 'spal', 'torino', 'udinese']

## 2020-2021 Serie A
lst_teams_sa_2021 = ['ac-milan', 'atalanta', 'benevento', 'bologna', 'cagliari', 'crotone', 'fiorentina',
                     'genoa', 'hellas-verona', 'inter-milan', 'juventus', 'lazio', 'napoli', 'parma', 'roma',
                     'sampdoria', 'sassuolo', 'spezia', 'torino', 'udinese']

## 2021-2022 Serie A
lst_teams_sa_2122 = ['ac-milan', 'atalanta', 'bologna', 'cagliari', 'empoli', 'fiorentina',
                     'genoa', 'hellas-verona', 'inter-milan', 'juventus', 'lazio', 'napoli', 'roma', 'salernitana',
                     'sampdoria', 'sassuolo', 'spezia', 'torino', 'udinese', 'venezia']

## 2021-2022 Serie A
lst_teams_sa_2223 = ['ac-milan', 'atalanta', 'bologna', 'cremonese', 'empoli', 'fiorentina',
                     'hellas-verona', 'inter-milan', 'juventus', 'lazio', 'lecce', 'monza', 'napoli',
                     'roma', 'salernitana', 'sampdoria', 'sassuolo', 'spezia', 'torino', 'udinese']

##### Seasons

In [11]:
lst_seasons = ['2016-2017', '2017-2018', '2018-2019', '2019-2020', '2020-2021', '2022-2023']

### Defined Filepaths

In [40]:
# Set up initial paths to subfolders
base_dir = os.path.join("C:\\Users\\vince\\Documents")
data_dir = os.path.join(base_dir, 'data')
data_dir_capology = os.path.join(base_dir, 'data', 'capology')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
data_dir_capology

'C:\\Users\\vince\\Documents\\data\\capology'

### Custom Functions (Scrapers)
Two different scrapers:
1. Previous seasons (`scrape_capology_season_prev`)
2. Current seasons (slightly different webpage structure, so needs to be different) (`scrape_capology_season_current`)

#### Previous season scraper

In [41]:
# Define function for scraping a defined season of Capology data
def scrape_capology_season_prev(lst_teams, season, comp):
    
    ### Print statement
    print(f'Scraping for {comp} for the {season} season has now started...')
    
    ## Create empty list for DataFrame
    dfs_players = []
    
    for team in lst_teams:
        if not os.path.exists(os.path.join(data_dir_capology + f'/raw/{comp}/{season}/{team}_{comp}_{season}.csv')):
            url = f'https://www.capology.com/club/{team}/salaries/{season}/'
            print(f'Scraping {team} for the {season} season')
            wd = webdriver.Chrome('chromedriver', options=options)
            wd.get(url)
            html = wd.page_source
            time.sleep(5)           # if this is too low, it stops works, 5 seems fine
            html = wd.page_source   # this must be repeated, no idea why, but otherwise code doesn't work
            df = pd.read_html(html, header=0)[1]

            ### Data Engineering
            df = df.iloc[1: , :]
            df = df.rename(columns=df.iloc[0])
            df = df[:-1]
            df = df.iloc[1: , :]
            df = df.reset_index()
            df = df.drop(['index', 'Rank'], axis=1)
            df['Team'] = team
            df['Team'] = df['Team'].str.replace('-', ' ').str.title().str.replace('Fc', 'FC').str.replace('Ac', 'AC')
            df['League'] = comp
            df['League'] = df['League'].str.replace('-', ' ').str.title()
            df['Season'] = season
            print(f'Saving DataFrame of {team} for the {season} season')

            ### Save to CSV
            df.to_csv(data_dir_capology + f'/raw/{comp}/{season}/{team}_{comp}_{season}.csv')

            ### Append to joint DataFrame
            dfs_players.append(df)
            
        else:
            df = pd.read_csv(data_dir_capology + f'/raw/{comp}/{season}/{team}_{comp}_{season}.csv', index_col=None, header=0)
            print(f'{team} already scraped and saved for the {season} season')

            ### Append to joint DataFrame
            dfs_players.append(df)
        
    ## Concatenate DataFrames to one DataFrame
    df_players_all = pd.concat(dfs_players)

    ## Engineer unified data
    df_players_all['Team'] = df_players_all['Team'].str.replace('-', ' ').str.title().str.replace('Fc', 'FC')
    df_players_all['League'] = df_players_all['League'].str.replace('-', ' ').str.title()

    ## Save to CSV
    df_players_all.to_csv(data_dir_capology + f'/raw/{comp}/{season}/all_{comp}_{season}.csv')
    
    ### Print statement
    print(f'Scraping for {comp} for the {season} season is now complete')
    
    ## Return unified season dataset
    return df_players_all

#### Current season scraper

In [42]:
# Define function for scraping a defined season of Capology data
def scrape_capology_season_current(lst_teams, season, comp):
    
    ### Print statement
    print(f'Scraping for {comp} for the {season} season has now started...')
    
    ## Create empty list for DataFrame
    dfs_players = []
    
    ## 
    for team in lst_teams:
        if not os.path.exists(os.path.join(data_dir_capology + f'/raw/{comp}/{season}/{team}_{comp}_{season}_last_updated_{todays_date}.csv')):
            url = f'https://www.capology.com/club/{team}/salaries/{season}/'
            print(f'Scraping {team} for the {season} season')
            wd = webdriver.Chrome('chromedriver', options=options)
            wd.get(url)
            html = wd.page_source
            time.sleep(4)           # if this is too low, it stops works, 5 seems fine
            html = wd.page_source   # this must be repeated, no idea why, but otherwise code doesn't work
            df = pd.read_html(html, header=0)[1]

            ### Data Engineering
            df = df.iloc[1: , :]
            df = df.rename(columns=df.iloc[0])
            df = df[:-1]
            df = df.iloc[1: , :]
            df = df.reset_index()
            
            ### Drop 5th and first 3 columns of dataframe
            df = df.iloc[: , 3:]
            
            ### Create new columns
            df['Team'] = team
            df['Team'] = df['Team'].str.replace('-', ' ').str.title().str.replace('Fc', 'FC').str.replace('Ac', 'AC')
            df['League'] = comp
            df['League'] = df['League'].str.replace('-', ' ').str.title()
            df['Season'] = season
            print(f'Saving DataFrame of {team} for the {season} season')

            ### Save to CSV
            df.to_csv(data_dir_capology + f'/raw/{comp}/{season}/{team}_{comp}_{season}_last_updated_{todays_date}.csv')

            ### Append to joint DataFrame
            dfs_players.append(df)
        else:
            df = pd.read_csv(data_dir_capology + f'/raw/{comp}/{season}/{team}_{comp}_{season}_last_updated_{todays_date}.csv', index_col=None, header=0)
            print(f'{team} already scraped and saved for the {season} season')

            ### Append to joint DataFrame
            dfs_players.append(df)
        
    ## Concatenate DataFrames to one DataFrame
    df_players_all = pd.concat(dfs_players)

    ## Engineer unified data
    df_players_all['Team'] = df_players_all['Team'].str.replace('-', ' ').str.title().str.replace('Fc', 'FC')
    df_players_all['League'] = df_players_all['League'].str.replace('-', ' ').str.title()
    df_players_all = df_players_all.drop(df.columns[1], axis=1)

    ## Save to CSV
    df_players_all.to_csv(data_dir_capology + f'/raw/{comp}/{season}/all_{comp}_{season}_last_updated_{todays_date}.csv')
    
    ### Print statement
    print(f'Scraping for {comp} for the {season} season is now complete')
    
    ## Return unified season dataset
    return df_players_all

### Create Directory Structure

In [43]:
# Make the directory structure
path = os.path.join(data_dir_capology, 'raw', 'serie-a')
if not os.path.exists(path):
    os.mkdir(path)

FileNotFoundError: [WinError 3] Impossibile trovare il percorso specificato: 'C:\\Users\\vince\\Documents\\data\\capology\\raw\\serie-a'

### Notebook Settings

In [44]:
# Display all columns of displayed pandas DataFrames
pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment = None

## <a id='#section2'>2. Project Brief</a>
This Jupyter notebook is part of a series of notebooks, to scrape, parse, engineer, and unify datasets, that can be used for modeling purposes.

This particular notebook is one of several **web scraping** notebooks, that takes player salary data from the [Capology](https://www.capology.com/), and scrapes it using [Selenium](https://www.selenium.dev/) and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) and manipulates it as Dataframes using [pandas](http://pandas.pydata.org/).

This notebook, along with the other notebooks in this project workflow are shown in the following diagram:

![roadmap](../../img/football_analytics_data_roadmap.png)

Links to these notebooks in the [`football_analytics`](https://github.com/eddwebster/football_analytics) GitHub repository can be found at the following:
*    [1. Webscraping](https://github.com/eddwebster/football_analytics/tree/master/notebooks/1_data_scraping)
     +    [FBref Player Stats Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/FBref%20Player%20Stats%20Web%20Scraping.ipynb)
     +    [TransferMarket Player Bio and Status Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Bio%20and%20Status%20Web%20Scraping.ipynb)
     +    [TransferMarket Player Valuation Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Valuation%20Web%20Scraping.ipynb)
     +    [TransferMarkt Player Recorded Transfer Fees Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Webscraping.ipynb)
     +    [Capology Player Salary Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/Capology%20Player%20Salary%20Web%20Scraping.ipynb)
     +    [FBref Team Stats Webscraping](https://github.com/eddwebster/football_analytics/blob/master/notebooks/1_data_scraping/FBref%20Team%20Stats%20Web%20Scraping.ipynb)
*    [2. Data Parsing](https://github.com/eddwebster/football_analytics/tree/master/notebooks/2_data_parsing)
     +    [ELO Team Ratings Data Parsing](https://github.com/eddwebster/football_analytics/blob/master/notebooks/2_data_parsing/ELO%20Team%20Ratings%20Data%20Parsing.ipynb)
*    [3. Data Engineering](https://github.com/eddwebster/football_analytics/tree/master/notebooks/3_data_engineering)
     +    [FBref Player Stats Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/FBref%20Player%20Stats%20Data%20Engineering.ipynb)
     +    [TransferMarket Player Bio and Status Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Bio%20and%20Status%20Data%20Engineering.ipynb)
     +    [TransferMarket Player Valuation Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Valuation%20Data%20Engineering.ipynb)
     +    [TransferMarkt Player Recorded Transfer Fees Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Data%20Engineering.ipynb)
     +    [Capology Player Salary Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Player%20Salary%20Data%20Engineering.ipynb)
     +    [FBref Team Stats Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/FBref%20Team%20Stats%20Data%20Engineering.ipynb)
     +    [ELO Team Ratings Data Parsing](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/ELO%20Team%20Ratings%20Data%20Parsing.ipynb)
     +    [TransferMarkt Team Recorded Transfer Fee Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Team%20Recorded%20Transfer%20Fee%20Data%20Engineering.ipynb) (aggregated from [TransferMarkt Player Recorded Transfer Fees notebook](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/TransferMarkt%20Player%20Recorded%20Transfer%20Fees%20Data%20Engineering.ipynb))
     +    [Capology Team Salary Data Engineering](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Team%20Salary%20Data%20Engineering.ipynb) (aggregated from [Capology Player Salary notebook](https://github.com/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Player%20Salary%20Data%20Engineering.ipynb))
*    [4. Data Unification](https://github.com/eddwebster/football_analytics/tree/master/notebooks/4_data_unification)
*    [5. Modeling and Data Analysis]()

---

<a id='section3'></a>

## <a id='#section3'>3. Data Scraping</a>

### <a id='#section3.1'>3.1. Introduction</a>
Two different scrapers:
1. Previous seasons (`scrape_capology_season_prev`)
2. Current seasons (slightly different webpage structure, so needs to be different) (`scrape_capology_season_current`)

### <a id='#section3.2'>3.2. Scrape data by League and Season</a>
The scraper current iterates through manually written lists of teams per league/season, with each function downloading one league/season. Ideally, the scraper would be a for loop that would scrape all teams/leagues/seasons in one command, but this requires a little more work in Selenium, that I may work on at a later date. However, if you run the notebook, data for 17/18-20/21 seasons for the 'Big 5' European leagues + MLS will be scraped.

#### <a id='#section3.2.1'>3.2.1. Premier League

In [46]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_current(lst_teams_sa_2223, '2022-2023', 'serie-a')

## Display DataFrame
df_players_all.head()

Scraping for serie-a for the 2022-2023 season has now started...
Scraping ac-milan for the 2022-2023 season
Saving DataFrame of ac-milan for the 2022-2023 season
Scraping atalanta for the 2022-2023 season
Saving DataFrame of atalanta for the 2022-2023 season
Scraping bologna for the 2022-2023 season
Saving DataFrame of bologna for the 2022-2023 season
Scraping cremonese for the 2022-2023 season
Saving DataFrame of cremonese for the 2022-2023 season
Scraping empoli for the 2022-2023 season
Saving DataFrame of empoli for the 2022-2023 season
Scraping fiorentina for the 2022-2023 season
Saving DataFrame of fiorentina for the 2022-2023 season
Scraping hellas-verona for the 2022-2023 season
Saving DataFrame of hellas-verona for the 2022-2023 season
Scraping inter-milan for the 2022-2023 season
Saving DataFrame of inter-milan for the 2022-2023 season
Scraping juventus for the 2022-2023 season
Saving DataFrame of juventus for the 2022-2023 season
Scraping lazio for the 2022-2023 season
Saving

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

In [46]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_pl_2021, '2020-2021', 'premier-league')

## Display DataFrame
df_players_all.head()

Scraping for premier-league for the 2020-2021 season has now started...
Scraping arsenal for the 2020-2021 season
Saving DataFrame of arsenal for the 2020-2021 season
Scraping aston-villa for the 2020-2021 season
Saving DataFrame of aston-villa for the 2020-2021 season
Scraping brighton for the 2020-2021 season
Saving DataFrame of brighton for the 2020-2021 season
Scraping burnley for the 2020-2021 season
Saving DataFrame of burnley for the 2020-2021 season
Scraping chelsea for the 2020-2021 season
Saving DataFrame of chelsea for the 2020-2021 season
Scraping crystal-palace for the 2020-2021 season
Saving DataFrame of crystal-palace for the 2020-2021 season
Scraping everton for the 2020-2021 season
Saving DataFrame of everton for the 2020-2021 season
Scraping fulham for the 2020-2021 season
Saving DataFrame of fulham for the 2020-2021 season
Scraping leeds for the 2020-2021 season
Saving DataFrame of leeds for the 2020-2021 season
Scraping leicester for the 2020-2021 season
Saving Data

Unnamed: 0,Player,Weekly GrossBase Salary(IN GBP),Annual GrossBase Salary(IN GBP),"Adj. GrossBase Salary(2021, IN GBP)",Pos.,Age,Country,Team,League,Season
0,Mesut Özil,"£ 350,000","£ 18,200,000","£ 18,200,000",F,32,Germany,Arsenal,Premier League,2020-2021
1,Pierre-Emerick Aubameyang,"£ 250,000","£ 13,000,000","£ 13,000,000",F,31,Gabon,Arsenal,Premier League,2020-2021
2,Thomas Partey,"£ 250,000","£ 13,000,000","£ 13,000,000",M,27,Ghana,Arsenal,Premier League,2020-2021
3,Alexandre Lacazette,"£ 182,115","£ 9,470,000","£ 9,470,000",F,29,France,Arsenal,Premier League,2020-2021
4,Willian,"£ 138,462","£ 7,200,000","£ 7,200,000",F,32,Brazil,Arsenal,Premier League,2020-2021


In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_pl_1920, '2019-2020', 'premier-league')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_pl_1819, '2018-2019', 'premier-league')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_pl_1718, '2017-2018', 'premier-league')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_pl_1617, '2016-2017', 'premier-league')

## Display DataFrame
df_players_all.head()

#### <a id='#section3.2.2'>3.2.2. Serie A

In [50]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_current(lst_teams_sa_2223, '2022-2023', 'serie-a')

## Display DataFrame
df_players_all.head()

Scraping for serie-a for the 2022-2023 season has now started...
Scraping ac-milan for the 2022-2023 season
Saving DataFrame of ac-milan for the 2022-2023 season
Scraping atalanta for the 2022-2023 season
Saving DataFrame of atalanta for the 2022-2023 season
Scraping bologna for the 2022-2023 season
Saving DataFrame of bologna for the 2022-2023 season
Scraping cremonese for the 2022-2023 season
Saving DataFrame of cremonese for the 2022-2023 season
Scraping empoli for the 2022-2023 season
Saving DataFrame of empoli for the 2022-2023 season
Scraping fiorentina for the 2022-2023 season
Saving DataFrame of fiorentina for the 2022-2023 season
Scraping hellas-verona for the 2022-2023 season
Saving DataFrame of hellas-verona for the 2022-2023 season
Scraping inter-milan for the 2022-2023 season
Saving DataFrame of inter-milan for the 2022-2023 season
Scraping juventus for the 2022-2023 season
Saving DataFrame of juventus for the 2022-2023 season
Scraping lazio for the 2022-2023 season
Saving

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

In [51]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_sa_2021, '2020-2021', 'serie-a')

## Display DataFrame
df_players_all.head()

Scraping for serie-a for the 2020-2021 season has now started...
Scraping ac-milan for the 2020-2021 season


IndexError: list index out of range

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_sa_1920, '2019-2020', 'serie-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_sa_1819, '2018-2019', 'serie-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_sa_1718, '2017-2018', 'serie-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_sa_1617, '2016-2017', 'serie-a')

## Display DataFrame
df_players_all.head()

#### <a id='#section3.2.3'>3.2.3. La Liga

In [34]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_current(lst_teams_ll_2122, '2021-2022', 'la-liga')

## Display DataFrame
df_players_all.head()

Scraping for la-liga for the 2021-2022 season has now started...
Scraping alaves for the 2021-2022 season
Saving DataFrame of alaves for the 2021-2022 season
Scraping athletic-club for the 2021-2022 season
Saving DataFrame of athletic-club for the 2021-2022 season
Scraping atletico-madrid for the 2021-2022 season
Saving DataFrame of atletico-madrid for the 2021-2022 season
Scraping barcelona for the 2021-2022 season
Saving DataFrame of barcelona for the 2021-2022 season
Scraping cadiz for the 2021-2022 season
Saving DataFrame of cadiz for the 2021-2022 season
Scraping celta-vigo for the 2021-2022 season
Saving DataFrame of celta-vigo for the 2021-2022 season
Scraping elche for the 2021-2022 season
Saving DataFrame of elche for the 2021-2022 season
Scraping espanyol for the 2021-2022 season
Saving DataFrame of espanyol for the 2021-2022 season
Scraping getafe for the 2021-2022 season
Saving DataFrame of getafe for the 2021-2022 season
Scraping granada for the 2021-2022 season
Saving Dat

Unnamed: 0,Player,Weekly GrossBase Salary(IN EUR),Annual GrossBase Salary(IN EUR),Pos.,Age,Status,Expiration,Length,EstimatedGross Total(IN EUR),Team,League,Season
0,Fernando Pacheco,"€ 28,269","€ 1,470,000",GK,29,,"Jun 30, 2023",2-yrs,"€ 2,940,000",Alaves,La Liga,2021-2022
1,Tomás Pina,"€ 27,500","€ 1,430,000",CM,33,,"Jun 30, 2022",1-yr,"€ 1,430,000",Alaves,La Liga,2021-2022
2,Lucas Pérez,"€ 26,731","€ 1,390,000",CF,32,,"Jun 30, 2022",1-yr,"€ 1,390,000",Alaves,La Liga,2021-2022
3,Pere Pons,"€ 20,192","€ 1,050,000",DM,28,,"Jun 30, 2022",1-yr,"€ 1,050,000",Alaves,La Liga,2021-2022
4,Matt Miazga,"€ 17,500","€ 910,000",CB,25,,"Jun 30, 2022",1-yr,"€ 910,000",Alaves,La Liga,2021-2022


In [48]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_ll_2021, '2020-2021', 'la-liga')

## Display DataFrame
df_players_all.head()

Scraping for la-liga for the 2020-2021 season has now started...
alaves already scraped and saved for the 2020-2021 season
athletic-club already scraped and saved for the 2020-2021 season
atletico-madrid already scraped and saved for the 2020-2021 season
barcelona already scraped and saved for the 2020-2021 season
cadiz already scraped and saved for the 2020-2021 season
celta-vigo already scraped and saved for the 2020-2021 season
eibar already scraped and saved for the 2020-2021 season
elche already scraped and saved for the 2020-2021 season
getafe already scraped and saved for the 2020-2021 season
granada already scraped and saved for the 2020-2021 season
huesca already scraped and saved for the 2020-2021 season
levante already scraped and saved for the 2020-2021 season
osasuna already scraped and saved for the 2020-2021 season
real-betis already scraped and saved for the 2020-2021 season
real-madrid already scraped and saved for the 2020-2021 season
real-sociedad already scraped and

Unnamed: 0.1,Unnamed: 0,Player,Weekly GrossBase Salary(IN EUR),Annual GrossBase Salary(IN EUR),"Adj. GrossBase Salary(2021, IN EUR)",Pos.,Age,Country,Team,League,Season
0,0,Rodrigo Battaglia,"€ 52,885","€ 2,750,000","€ 2,750,000",M,29,Argentina,Alaves,La Liga,2020-2021
1,1,Iñigo Córdoba,"€ 51,154","€ 2,660,000","€ 2,660,000",F,23,Spain,Alaves,La Liga,2020-2021
2,2,Jota Peleteiro,"€ 34,808","€ 1,810,000","€ 1,810,000",F,29,Spain,Alaves,La Liga,2020-2021
3,3,Florian Lejeune,"€ 28,846","€ 1,500,000","€ 1,500,000",D,29,France,Alaves,La Liga,2020-2021
4,4,Fernando Pacheco,"€ 28,269","€ 1,470,000","€ 1,470,000",K,28,Spain,Alaves,La Liga,2020-2021


In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_ll_1920, '2019-2020', 'la-liga')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_ll_1819, '2018-2019', 'la-liga')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_ll_1718, '2017-2018', 'la-liga')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_ll_1617, '2016-2017', 'la-liga')

## Display DataFrame
df_players_all.head()

#### <a id='#section3.2.4'>3.2.4. Bundesliga

In [36]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_current(lst_teams_b_2122, '2021-2022', 'bundesliga')

## Display DataFrame
df_players_all.head()

Scraping for bundesliga for the 2021-2022 season has now started...
Scraping arminia-bielefeld for the 2021-2022 season
Saving DataFrame of arminia-bielefeld for the 2021-2022 season
Scraping augsburg for the 2021-2022 season
Saving DataFrame of augsburg for the 2021-2022 season
Scraping bayer-leverkusen for the 2021-2022 season
Saving DataFrame of bayer-leverkusen for the 2021-2022 season
Scraping bayern-munich for the 2021-2022 season
Saving DataFrame of bayern-munich for the 2021-2022 season
Scraping bochum for the 2021-2022 season
Saving DataFrame of bochum for the 2021-2022 season
Scraping borussia-dortmund for the 2021-2022 season
Saving DataFrame of borussia-dortmund for the 2021-2022 season
Scraping eintracht-frankfurt for the 2021-2022 season
Saving DataFrame of eintracht-frankfurt for the 2021-2022 season
Scraping freiburg for the 2021-2022 season
Saving DataFrame of freiburg for the 2021-2022 season
Scraping furth for the 2021-2022 season
Saving DataFrame of furth for the 20

Unnamed: 0,Player,Weekly GrossBase Salary(IN EUR),Annual GrossBase Salary(IN EUR),Pos.,Age,Status,Expiration,Length,EstimatedGross Total(IN EUR),Team,League,Season
0,Lennart Czyborra,"€ 14,423","€ 750,000",LB,22,,"Jun 30, 2022",1-yr,"€ 750,000",Arminia Bielefeld,Bundesliga,2021-2022
1,Joakim Nilsson,"€ 8,846","€ 460,000",CB,27,,"Jun 30, 2022",1-yr,"€ 460,000",Arminia Bielefeld,Bundesliga,2021-2022
2,Cédric Brunner,"€ 7,500","€ 390,000",RB,27,,"Jun 30, 2022",1-yr,"€ 390,000",Arminia Bielefeld,Bundesliga,2021-2022
3,Fabian Klos,"€ 7,115","€ 370,000",CF,33,,"Jun 30, 2022",1-yr,"€ 370,000",Arminia Bielefeld,Bundesliga,2021-2022
4,Manuel Prietl,"€ 5,769","€ 300,000",DM,30,,"Jun 30, 2024",3-yrs,"€ 900,000",Arminia Bielefeld,Bundesliga,2021-2022


In [49]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_b_2021, '2020-2021', 'bundesliga')

## Display DataFrame
df_players_all.head()

Scraping for bundesliga for the 2020-2021 season has now started...
arminia-bielefeld already scraped and saved for the 2020-2021 season
augsburg already scraped and saved for the 2020-2021 season
bayer-leverkusen already scraped and saved for the 2020-2021 season
bayern-munich already scraped and saved for the 2020-2021 season
borussia-dortmund already scraped and saved for the 2020-2021 season
eintracht-frankfurt already scraped and saved for the 2020-2021 season
freiburg already scraped and saved for the 2020-2021 season
hertha-berlin already scraped and saved for the 2020-2021 season
hoffenheim already scraped and saved for the 2020-2021 season
leipzig already scraped and saved for the 2020-2021 season
mainz already scraped and saved for the 2020-2021 season
monchengladbach already scraped and saved for the 2020-2021 season
schalke-04 already scraped and saved for the 2020-2021 season
stuttgart already scraped and saved for the 2020-2021 season
union-berlin already scraped and save

Unnamed: 0.1,Unnamed: 0,Player,Weekly GrossBase Salary(IN EUR),Annual GrossBase Salary(IN EUR),"Adj. GrossBase Salary(2021, IN EUR)",Pos.,Age,Country,Team,League,Season
0,0,Arne Maier,"€ 26,923","€ 1,400,000","€ 1,400,000",M,21,Germany,Arminia Bielefeld,Bundesliga,2020-2021
1,1,Michel Vlap,"€ 19,231","€ 1,000,000","€ 1,000,000",F,23,Netherlands,Arminia Bielefeld,Bundesliga,2020-2021
2,2,Mike van der Hoorn,"€ 14,231","€ 740,000","€ 740,000",D,28,Netherlands,Arminia Bielefeld,Bundesliga,2020-2021
3,3,Joakim Nilsson,"€ 8,846","€ 460,000","€ 460,000",D,26,Sweden,Arminia Bielefeld,Bundesliga,2020-2021
4,4,Marcel Hartel,"€ 8,269","€ 430,000","€ 430,000",F,24,Germany,Arminia Bielefeld,Bundesliga,2020-2021


In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_b_1920, '2019-2020', 'bundesliga')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_b_1819, '2018-2019', 'bundesliga')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_b_1718, '2017-2018', 'bundesliga')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_b_1617, '2016-2017', 'bundesliga')

## Display DataFrame
df_players_all.head()

#### <a id='#section3.2.5'>3.2.5. 2. Bundesliga

In [None]:
# ADD CODE HERE

#### <a id='#section3.2.6'>3.2.6. Ligue 1

In [38]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_current(lst_teams_l1_2122, '2021-2022', 'ligue-1')

## Display DataFrame
df_players_all.head()

Scraping for ligue-1 for the 2021-2022 season has now started...
Scraping angers for the 2021-2022 season
Saving DataFrame of angers for the 2021-2022 season
Scraping bordeaux for the 2021-2022 season
Saving DataFrame of bordeaux for the 2021-2022 season
Scraping brest for the 2021-2022 season
Saving DataFrame of brest for the 2021-2022 season
Scraping clermont for the 2021-2022 season
Saving DataFrame of clermont for the 2021-2022 season
Scraping lens for the 2021-2022 season
Saving DataFrame of lens for the 2021-2022 season
Scraping lille for the 2021-2022 season
Saving DataFrame of lille for the 2021-2022 season
Scraping lorient for the 2021-2022 season
Saving DataFrame of lorient for the 2021-2022 season
Scraping lyon for the 2021-2022 season
Saving DataFrame of lyon for the 2021-2022 season
Scraping marseille for the 2021-2022 season
Saving DataFrame of marseille for the 2021-2022 season
Scraping metz for the 2021-2022 season
Saving DataFrame of metz for the 2021-2022 season
Scrap

Unnamed: 0,Player,Weekly GrossBase Salary(IN EUR),Annual GrossBase Salary(IN EUR),Pos.,Age,Status,Expiration,Length,EstimatedGross Total(IN EUR),Team,League,Season
0,Sofiane Boufal,"€ 32,308","€ 1,680,000",LW,27,,"Jun 30, 2024",3-yrs,"€ 5,040,000",Angers,Ligue 1,2021-2022
1,Stéphane Bahoken,"€ 16,154","€ 840,000",CF,29,,"Jun 30, 2022",1-yr,"€ 840,000",Angers,Ligue 1,2021-2022
2,Ismaël Traoré,"€ 15,000","€ 780,000",CB,34,,"Jun 30, 2022",1-yr,"€ 780,000",Angers,Ligue 1,2021-2022
3,Thomas Mangani,"€ 15,000","€ 780,000",CM,34,,"Jun 30, 2022",1-yr,"€ 780,000",Angers,Ligue 1,2021-2022
4,Souleyman Doumbia,"€ 13,077","€ 680,000",LB,24,,"Jun 30, 2023",2-yrs,"€ 1,360,000",Angers,Ligue 1,2021-2022


In [50]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_l1_2021, '2020-2021', 'ligue-1')

## Display DataFrame
df_players_all.head()

Scraping for ligue-1 for the 2020-2021 season has now started...
angers already scraped and saved for the 2020-2021 season
bordeaux already scraped and saved for the 2020-2021 season
brest already scraped and saved for the 2020-2021 season
dijon already scraped and saved for the 2020-2021 season
lens already scraped and saved for the 2020-2021 season
lille already scraped and saved for the 2020-2021 season
lorient already scraped and saved for the 2020-2021 season
lyon already scraped and saved for the 2020-2021 season
marseille already scraped and saved for the 2020-2021 season
metz already scraped and saved for the 2020-2021 season
monaco already scraped and saved for the 2020-2021 season
montpellier already scraped and saved for the 2020-2021 season
nantes already scraped and saved for the 2020-2021 season
nice already scraped and saved for the 2020-2021 season
nimes already scraped and saved for the 2020-2021 season
psg already scraped and saved for the 2020-2021 season
reims alrea

Unnamed: 0.1,Unnamed: 0,Player,Weekly GrossBase Salary(IN EUR),Annual GrossBase Salary(IN EUR),"Adj. GrossBase Salary(2021, IN EUR)",Pos.,Age,Country,Team,League,Season
0,0,Sofiane Boufal,"€ 32,308","€ 1,680,000","€ 1,680,000",F,27,Morocco,Angers,Ligue 1,2020-2021
1,1,Ibrahim Amadou,"€ 23,462","€ 1,220,000","€ 1,220,000",M,27,France,Angers,Ligue 1,2020-2021
2,2,Loïs Diony,"€ 17,885","€ 930,000","€ 930,000",F,27,France,Angers,Ligue 1,2020-2021
3,3,Stéphane Bahoken,"€ 16,154","€ 840,000","€ 840,000",F,28,Cameroon,Angers,Ligue 1,2020-2021
4,4,Ismaël Traoré,"€ 15,000","€ 780,000","€ 780,000",D,34,Cote d'Ivoire,Angers,Ligue 1,2020-2021


In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_l1_1920, '2019-2020', 'ligue-1')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_l1_1819, '2018-2019', 'ligue-1')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_l1_1718, '2017-2018', 'ligue-1')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_l1_1617, '2016-2017', 'ligue-1')

## Display DataFrame
df_players_all.head()

#### <a id='#section3.2.7'>3.2.7. MLS

In [40]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_current(lst_teams_mls_21, '2021', 'mls')

## Display DataFrame
df_players_all.head()

Scraping for mls for the 2021 season has now started...
Scraping atlanta-united for the 2021 season
Saving DataFrame of atlanta-united for the 2021 season
Scraping austin for the 2021 season
Saving DataFrame of austin for the 2021 season
Scraping chicago-fire for the 2021 season
Saving DataFrame of chicago-fire for the 2021 season
Scraping colorado-rapids for the 2021 season
Saving DataFrame of colorado-rapids for the 2021 season
Scraping columbus-crew for the 2021 season
Saving DataFrame of columbus-crew for the 2021 season
Scraping dc-united for the 2021 season
Saving DataFrame of dc-united for the 2021 season
Scraping fc-cincinnati for the 2021 season
Saving DataFrame of fc-cincinnati for the 2021 season
Scraping fc-dallas for the 2021 season
Saving DataFrame of fc-dallas for the 2021 season
Scraping houston-dynamo for the 2021 season
Saving DataFrame of houston-dynamo for the 2021 season
Scraping inter-miami for the 2021 season
Saving DataFrame of inter-miami for the 2021 season
Sc

Unnamed: 0,Player,Weekly GrossBase Salary(IN USD),Annual GrossBase Salary(IN USD),Pos.,Age,RosterStatus,Expiration,Length,EstimatedGross Total(IN USD),Team,League,Season
0,Josef Martínez,"$ 67,308","$ 3,500,000",CF,27,Reserve,"Dec 31, 2023",3-yrs,"$ 10,500,000",Atlanta United,Mls,2021
1,Ezequiel Barco,"$ 27,404","$ 1,425,000",LW,22,Starter,"Dec 31, 2022",2-yrs,"$ 2,850,000",Atlanta United,Mls,2021
2,Jürgen Damm,"$ 24,808","$ 1,290,000",RW,28,Reserve,"Dec 31, 2021",1-yr,"$ 1,290,000",Atlanta United,Mls,2021
3,Emerson Hyndman,"$ 17,308","$ 900,000",CM,25,Starter,"Dec 31, 2022",2-yrs,"$ 1,800,000",Atlanta United,Mls,2021
4,Brad Guzan,"$ 15,481","$ 805,000",GK,36,Starter,"Dec 31, 2023",3-yrs,"$ 2,415,000",Atlanta United,Mls,2021


In [51]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_mls_20, '2020', 'mls')

## Display DataFrame
df_players_all.head()

Scraping for mls for the 2020 season has now started...
atlanta-united already scraped and saved for the 2020 season
chicago-fire already scraped and saved for the 2020 season
colorado-rapids already scraped and saved for the 2020 season
columbus-crew already scraped and saved for the 2020 season
dc-united already scraped and saved for the 2020 season
fc-cincinnati already scraped and saved for the 2020 season
fc-dallas already scraped and saved for the 2020 season
houston-dynamo already scraped and saved for the 2020 season
inter-miami already scraped and saved for the 2020 season
la-fc already scraped and saved for the 2020 season
la-galaxy already scraped and saved for the 2020 season
minnesota-united already scraped and saved for the 2020 season
montreal-impact already scraped and saved for the 2020 season
nashville-sc already scraped and saved for the 2020 season
ne-revolution already scraped and saved for the 2020 season
nyc-fc already scraped and saved for the 2020 season
ny-red

Unnamed: 0.1,Unnamed: 0,Player,Weekly GrossBase Salary(IN USD),Annual GrossBase Salary(IN USD),"Adj. GrossBase Salary(2021, IN USD)",Pos.,Age,Country,Team,League,Season
0,0,Josef Martínez,"$ 58,808","$ 3,058,000","$ 3,058,000",F,26,Venezuela,Atlanta United,Mls,2020
1,1,Ezequiel Barco,"$ 27,404","$ 1,425,000","$ 1,425,000",F,20,Argentina,Atlanta United,Mls,2020
2,2,Gonzalo Martínez,"$ 17,308","$ 900,000","$ 900,000",F,26,Argentina,Atlanta United,Mls,2020
3,3,Brad Guzan,"$ 13,077","$ 680,004","$ 680,004",K,35,United States,Atlanta United,Mls,2020
4,4,Matheus Rossetto,"$ 8,000","$ 416,000","$ 416,000",M,23,Brazil,Atlanta United,Mls,2020


In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_mls_19, '2019', 'mls')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_mls_18, '2018', 'mls')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_mls_17, '2017', 'mls')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_mls_16, '2016', 'mls')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_mls_15, '2015', 'mls')

## Display DataFrame
df_players_all.head()

#### <a id='#section3.2.8'>3.2.8. Belgian First Division A 

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_current(lst_teams_belgian_2122, '2021-2022', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_belgian_2021, '2020-2021', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_belgian_1920, '2019-2020', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_belgian_1819, '2018-2019', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_belgian_1718, '2017-2018', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_belgian_1617, '2016-2017', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_belgian_1516, '2015-2016', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

In [None]:
## Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_belgian_1415, '2014-2015', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

In [None]:
# Create DataFrame using 'scrape_capology_season' function, include - 1) List of teams (e.g. lst_teams_pl_2021), 2) Season (e.g. 2020-2021), 3) Competition (e.g. premier-league)
df_players_all = scrape_capology_season_prev(lst_teams_belgian_1314, '2013-2014', 'belgian-first-division-a')

## Display DataFrame
df_players_all.head()

#### <a id='#section3.2.9'>3.2.9. Scottish Premiership

In [None]:
# TO ADD CODE HERE

#### <a id='#section3.2.10'>3.2.10. Championship

In [None]:
# TO ADD CODE HERE

---

<a id='section4'></a>

## <a id='#section4'>4. Data Unification</a>
Unify the scraped and landed datasets per team, league and season Glob code.

In [42]:
# Show files in directory
all_files = glob.glob(os.path.join(data_dir_capology + '/raw/*/*/all_*.csv'))
all_files

['../../data/capology/raw/serie-a/2018-2019/all_serie-a_2018-2019.csv',
 '../../data/capology/raw/serie-a/2019-2020/all_serie-a_2019-2020.csv',
 '../../data/capology/raw/serie-a/2021-2022/all_serie-a_2021-2022_last_updated_05092021.csv',
 '../../data/capology/raw/serie-a/2020-2021/all_serie-a_2020-2021.csv',
 '../../data/capology/raw/serie-a/2016-2017/all_serie-a_2016-2017.csv',
 '../../data/capology/raw/serie-a/2017-2018/all_serie-a_2017-2018.csv',
 '../../data/capology/raw/belgian-first-division-a/2018-2019/all_belgian-first-division-a_2018-2019.csv',
 '../../data/capology/raw/belgian-first-division-a/2019-2020/all_belgian-first-division-a_2019-2020.csv',
 '../../data/capology/raw/belgian-first-division-a/2015-2016/all_belgian-first-division-a_2015-2016.csv',
 '../../data/capology/raw/belgian-first-division-a/2021-2022/all_belgian-first-division-a_2021-2022_last_updated_14082021.csv',
 '../../data/capology/raw/belgian-first-division-a/2020-2021/all_belgian-first-division-a_2020-2021.

In [43]:
lst_all_teams = []    # pd.concat takes a list of DataFrames as an argument

for filename in all_files:
    df_temp = pd.read_csv(filename, index_col=None, header=0)
    lst_all_teams.append(df_temp)

df_players_all = pd.concat(lst_all_teams, axis=0, ignore_index=True)

In [44]:
# Engineer unified data

##
df_players_all['Team'] = df_players_all['Team'].str.replace('-', ' ').str.title().str.replace('Fc', 'FC')
df_players_all['League'] = df_players_all['League'].str.replace('-', ' ').str.title()


## Drop duplicates
df_players_all = df_players_all.drop_duplicates()

df_players_all

Unnamed: 0.2,Unnamed: 0,Unnamed: 0.1,Player,Weekly GrossBase Salary(IN EUR),Annual GrossBase Salary(IN EUR),"Adj. GrossBase Salary(2021, IN EUR)",Pos.,Age,Country,Team,League,Season,Status,Expiration,Length,EstimatedGross Total(IN EUR),Unnamed: 2,Weekly GrossBase Salary(IN GBP),Annual GrossBase Salary(IN GBP),"Adj. GrossBase Salary(2021, IN GBP)",EstimatedGross Total(IN GBP),Weekly GrossBase Salary(IN USD),Annual GrossBase Salary(IN USD),"Adj. GrossBase Salary(2021, IN USD)",RosterStatus,EstimatedGross Total(IN USD)
0,0,0.0,Gonzalo Higuaín,"€ 338,327","€ 17,593,000","€ 17,568,773",F,30,Argentina,Ac Milan,Serie A,2018-2019,,,,,,,,,,,,,,
1,1,1.0,Gianluigi Donnarumma,"€ 213,673","€ 11,111,000","€ 11,095,699",K,19,Italy,Ac Milan,Serie A,2018-2019,,,,,,,,,,,,,,
2,2,2.0,Lucas Biglia,"€ 124,635","€ 6,481,000","€ 6,472,075",M,32,Argentina,Ac Milan,Serie A,2018-2019,,,,,,,,,,,,,,
3,3,3.0,Alessio Romagnoli,"€ 124,635","€ 6,481,000","€ 6,472,075",D,23,Italy,Ac Milan,Serie A,2018-2019,,,,,,,,,,,,,,
4,4,4.0,Tiemoué Bakayoko,"€ 124,635","€ 6,481,000","€ 6,472,075",M,23,France,Ac Milan,Serie A,2018-2019,,,,,,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
28339,35,35.0,Pedro Martínez,€ 0,€ 0,€ 0,M,21,Spain,Villarreal,La Liga,2017-2018,,,,,,,,,,,,,,
28340,36,36.0,Chuca,€ 0,€ 0,€ 0,M,20,Spain,Villarreal,La Liga,2017-2018,,,,,,,,,,,,,,
28341,37,37.0,Cédric Bakambu,€ 0,€ 0,€ 0,F,26,Democratic Republic of Congo,Villarreal,La Liga,2017-2018,,,,,,,,,,,,,,
28342,38,38.0,Bruno Soriano,€ 0,€ 0,€ 0,M,33,Spain,Villarreal,La Liga,2017-2018,,,,,,,,,,,,,,


---

<a id='section5'></a>

## <a id='#section5'>5. Export Data</a>

In [45]:
# Export DataFrames
df_players_all.to_csv(data_dir_capology + f'/raw/archive/capology_all_1617_2122_last_updated_{todays_date}.csv', index=None, header=True)
df_players_all.to_csv(data_dir_capology + '/raw/capology_all_latest.csv', index=None, header=True)

---

<a id='section6'></a>

## <a id='#section6'>6. Summary</a>
This notebook scrapes player statstics data from [Capology](https://www.capology.com/) using [pandas](http://pandas.pydata.org/) for data manipulation through DataFrames, [Selenium](https://www.selenium.dev/) and [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) for webscraping.

___

<a id='section7'></a>

## <a id='#section7'>7. Next Steps</a>
This data is now ready to be engineered before being matched to other datasets such as data from [FBref](https://fbref.com/) and[TransferMarkt](https://www.transfermarkt.co.uk/).

The Data Engineering subfolder in GitHub can be found [here](https://github.com/eddwebster/football_analytics/tree/master/notebooks/B\)%20Data%20Engineering) and a static version of the FBref data engineering notebookecord can be found [here](https://nbviewer.org/github/eddwebster/football_analytics/blob/master/notebooks/3_data_engineering/Capology%20Player%20Salary%20Data%20Engineering.ipynb).

___

<a id='section8'></a>

## <a id='#section8'>8. References</a>

---

***Visit my website [eddwebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***

[Back to the top](#top)