<a id='top'></a>

# Webscraping of TransferMarkt Data
##### Notebook to scrape raw data from [TransferMarkt](https://www.transfermarkt.co.uk/) using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) and the [Tyrone Mings web scraper](https://github.com/FCrSTATS/tyrone_mings) by [FCrSTATS](https://twitter.com/FC_rstats).

### By [Edd Webster](https://www.twitter.com/eddwebster)
Notebook first written: 13/09/2020<br>
Notebook last updated: 04/12/2020

![title](../../img/transfermarkt-logo-banner.png)

Click [here](#section5) to jump straight to the Exploratory Data Analysis section and skip the [Task Brief](#section2), [Data Sources](#section3), and [Data Engineering](#section4) sections. Or click [here](#section6) to jump straight to the Conclusion.

___

<a id='sectionintro'></a>

## <a id='import_libraries'>Introduction</a>
This notebook scrapes data from [TransferMarkt](https://www.transfermarkt.co.uk/) using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) and the [Tyrone Mings web scraper](https://github.com/FCrSTATS/tyrone_mings) by [FCrSTATS](https://twitter.com/FC_rstats). This landed data is then manipulated as DataFrames using [pandas](http://pandas.pydata.org/).

For more information about this notebook and the author, I'm available through all the following channels:
*    [eddwebster.com](https://www.eddwebster.com/);
*    edd.j.webster@gmail.com;
*    [@eddwebster](https://www.twitter.com/eddwebster);
*    [linkedin.com/in/eddwebster](https://www.linkedin.com/in/eddwebster/);
*    [github/eddwebster](https://github.com/eddwebster/);
*    [public.tableau.com/profile/edd.webster](https://public.tableau.com/profile/edd.webster);
*    [kaggle.com/eddwebster](https://www.kaggle.com/eddwebster); and
*    [hackerrank.com/eddwebster](https://www.hackerrank.com/eddwebster).

![title](../../img/fifa21eddwebsterbanner.png)

The accompanying GitHub repository for this notebook can be found [here](https://github.com/eddwebster/football_analytics) and a static version of this notebook can be found [here](https://nbviewer.jupyter.org/github/eddwebster/football_analytics/blob/master/notebooks/A%29%20Web%20Scraping/TransferMarkt%20Web%20Scraping%20and%20Parsing.ipynb).

___

<a id='sectioncontents'></a>

## <a id='notebook_contents'>Notebook Contents</a>
1.    [Notebook Dependencies](#section1)<br>
2.    [Project Brief](#section2)<br>
3.    [Data Sources](#section3)<br>
      1.    [Introduction](#section3.1)<br>
      2.    [Data Dictionary](#section3.2)<br>
      3.    [Creating the DataFrame](#section3.3)<br>
      4.    [Initial Data Handling](#section3.4)<br>
      5.    [Export the Raw DataFrame](#section3.5)<br>         
4.    [Data Engineering](#section4)<br>
      1.    [Introduction](#section4.1)<br>
      2.    [Columns of Interest](#section4.2)<br>
      3.    [String Cleaning](#section4.3)<br>
      4.    [Converting Data Types](#section4.4)<br>
      5.    [Export the Engineered DataFrame](#section4.5)<br>
5.    [Exploratory Data Analysis (EDA)](#section5)<br>
      1.    [...](#section5.1)<br>
      2.    [...](#section5.2)<br>
      3.    [...](#section5.3)<br>
6.    [Summary](#section6)<br>
7.    [Next Steps](#section7)<br>
8.    [Bibliography](#section8)<br>

___

<a id='section1'></a>

## <a id='#section1'>1. Notebook Dependencies</a>

This notebook was written using [Python 3](https://docs.python.org/3.7/) and requires the following libraries:
*    [`Jupyter notebooks`](https://jupyter.org/) for this notebook environment with which this project is presented;
*    [`NumPy`](http://www.numpy.org/) for multidimensional array computing;
*    [`pandas`](http://pandas.pydata.org/) for data analysis and manipulation;
*    [`Beautifulsoup`](https://pypi.org/project/beautifulsoup4/) for web scraping; and
*    [`matplotlib`](https://matplotlib.org/contents.html?v=20200411155018) for data visualisations;

All packages used for this notebook except for BeautifulSoup can be obtained by downloading and installing the [Conda](https://anaconda.org/anaconda/conda) distribution, available on all platforms (Windows, Linux and Mac OSX). Step-by-step guides on how to install Anaconda can be found for Windows [here](https://medium.com/@GalarnykMichael/install-python-on-windows-anaconda-c63c7c3d1444) and Mac [here](https://medium.com/@GalarnykMichael/install-python-on-mac-anaconda-ccd9f2014072), as well as in the Anaconda documentation itself [here](https://docs.anaconda.com/anaconda/install/).

### Import Libraries and Modules

In [2]:
# Python ≥3.5 (ideally)
import platform
import sys, getopt
assert sys.version_info >= (3, 5)
import csv

# Import Dependencies
%matplotlib inline

# Math Operations
import numpy as np
from math import pi

# Datetime
import datetime
from datetime import date
import time

# Data Preprocessing
import pandas as pd
import os
import re
import random
from io import BytesIO
from pathlib import Path

# Reading directories
import glob
import os
from os.path import basename

# Flatten lists
from functools import reduce

# Working with JSON
import json
from pandas.io.json import json_normalize

# Web Scraping
import requests
from bs4 import BeautifulSoup
import re

# APIs
from tyrone_mings import * 

# Fuzzy Matching - Record Linkage
import recordlinkage
import jellyfish
import numexpr as ne

# Data Visualisation
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-whitegrid')
import missingno as msno

# Progress Bar
from tqdm import tqdm

# Display in Jupyter
from IPython.display import Image, YouTubeVideo
from IPython.core.display import HTML

# Ignore Warnings
import warnings
warnings.filterwarnings(action="ignore", message="^internal gelsd")

print('Setup Complete')

Setup Complete


In [3]:
# Python / module versions used here for reference
print('Python: {}'.format(platform.python_version()))
print('NumPy: {}'.format(np.__version__))
print('pandas: {}'.format(pd.__version__))
print('matplotlib: {}'.format(mpl.__version__))
print('Seaborn: {}'.format(sns.__version__))

Python: 3.7.6
NumPy: 1.18.1
pandas: 1.0.1
matplotlib: 3.1.3
Seaborn: 0.10.0


### Defined Variables

In [4]:
# Define today's date
today = datetime.datetime.now().strftime('%d/%m/%Y').replace('/', '')

### Defined Filepaths

In [5]:
# Set up initial paths to subfolders
base_dir = os.path.join('..', '..', )
data_dir = os.path.join(base_dir, 'data')
data_dir_tm = os.path.join(base_dir, 'data', 'tm')
img_dir = os.path.join(base_dir, 'img')
fig_dir = os.path.join(base_dir, 'img', 'fig')
video_dir = os.path.join(base_dir, 'video')

### Custom Functions
Here, some of the functions from the Tyrone Mings library have been directly into this notebook due to bug fix. One pull request is made, this section can be removed - EW 29/12/2020.

In [6]:
import pandas as pd
import re
import csv
import js2xml
import datetime
import warnings
from bs4 import BeautifulSoup
import requests
from lxml import etree

In [7]:
# The following code is pasted from the Tyrone Mings library written by FC.rStats

def get_souped_page(page_url):
    '''
    In order not to be blocked for scraping its import to request pages with
    some settings to look more like an actual browser.

    this function takes a page_url from https://www.transfermarkt.com and returns the
    souped page
    '''
    headers = {'User-Agent':
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

    pageTree = requests.get(page_url, headers=headers)
    pageSoup = BeautifulSoup(pageTree.content, 'html.parser')

    return(pageSoup)



def remove_youth(team_string):
    '''
    There are many variations of each club name to identify each squad within
    that club.

    this function takes a team string and returns the team string without youth
    variations
    '''
    team_string = team_string.replace("U16", "")
    team_string = team_string.replace("U17", "")
    team_string = team_string.replace("U18", "")
    team_string = team_string.replace("U19", "")
    team_string = team_string.replace("U20", "")
    team_string = team_string.replace("U21", "")
    team_string = team_string.replace("U22", "")
    team_string = team_string.replace("U23", "")
    team_string = team_string.replace("u16", "")
    team_string = team_string.replace("u17", "")
    team_string = team_string.replace("u18", "")
    team_string = team_string.replace("u19", "")
    team_string = team_string.replace("u20", "")
    team_string = team_string.replace("u21", "")
    team_string = team_string.replace("u22", "")
    team_string = team_string.replace("u23", "")
    team_string = team_string.replace("ii", "")
    team_string = team_string.replace("Youth", "")
    team_string = team_string.replace("jugend", "")
    team_string = team_string.strip()
    return(team_string)


def calculate_age_at_transfer(born, transfer_date):
    '''
    Calculate the age between the date of transfer and the date of birth of the
    player
    '''
    return transfer_date.year - born.year - ((transfer_date.month, transfer_date.day) < (born.month, born.day))


def calculate_age(born, competition_start):
    '''
    Calculate the age between the start date of a competition and the date of birth of the
    player
    '''
    return(competition_start.year - born.year - ((competition_start.month, competition_start.day) < (born.month, born.day)))


def stringify_children(node):
    '''
    a helper to convert the market value chart data into strings
    '''
    s = node.text
    if s is None:
        s = ''
    for child in node:
        s += etree.tostring(child, encoding='unicode')
    return s


def month_to_number(month_string):
    '''
    a helper to change month abbreviations to month numbers
    '''
    if month_string == "Jan":
        return(1)
    elif month_string == "Feb":
        return(2)
    elif month_string == "Mar":
        return(3)
    elif month_string == "Apr":
        return(4)
    elif month_string == "May":
        return(5)
    elif month_string == "Jun":
        return(6)
    elif month_string == "Jul":
        return(7)
    elif month_string == "Aug":
        return(8)
    elif month_string == "Sep":
        return(9)
    elif month_string == "Oct":
        return(10)
    elif month_string == "Nov":
        return(11)
    elif month_string == "Dec":
        return(12)


In [8]:
def get_club_urls_from_league_page(club_url):
    '''
    From a league page such as :
    https://www.transfermarkt.com/premier-league/startseite/wettbewerb/GB1
    retrived the url links for all clubs
    '''

    league_base_page = get_souped_page(club_url)

    club_urls = []
    for row in league_base_page.find_all('table', 'items')[0].select('tr'):
        for item in row.find_all('td', 'hauptlink'):
            try:
                link = item.select('a')[0]['href']
                if link != None:
                    if len(link) > 0:
                        club_urls.append("https://www.transfermarkt.com" + link)
            except:
                pass

    return(list(set(club_urls)))



def get_player_urls_from_club_page(club_url):
    '''
    From a club page such as :
    https://www.transfermarkt.com/manchester-united/startseite/verein/985/saison_id/2019
    retrived the url links for all players
    '''
    club_base_page = get_souped_page(club_url)

    player_urls = []
    for row in club_base_page.find_all('table', 'items')[0].select('tr'):
        for item in row.find_all('td', 'hauptlink'):
            try:
                link = item.select('a')[0]['href']
                if link != None:
                    if len(link) > 0:
                        player_urls.append("https://www.transfermarkt.com" + link)
            except:
                pass

    return(list(set(player_urls)))



def get_player_urls_from_league_page(league_url, verbose = False):
    '''
    From a league page such as :
    https://www.transfermarkt.com/premier-league/startseite/wettbewerb/GB1
    retrived the url links for all players from all clubs

    if you want to check on progress chhange verbose to True
    '''
    players = []

    clubs = get_club_urls_from_league_page(league_url)
    for c in clubs:
        players = players +  get_player_urls_from_club_page(c)
        if verbose:
            print(c.split("/")[3].replace("-", " "), "players added")
    return(players)

def get_league_mean_player_value_for_season(league_url, season):
    '''
    From a league page such as :
    https://www.transfermarkt.com/premier-league/startseite/wettbewerb/GB1
    retrived the mean Transfermarkt player valuation for the league in a season
    '''
    # change url to include season
    league_url = league_url + '/plus/?saison_id=' + season
    # load table as html
    league_base_page = get_souped_page(league_url)
    div = league_base_page.findAll('div', {'class':'responsive-table'})[0]
    data_table = div.find('table')
    # return dummy if values are not available
    if data_table.find('tr').findAll('th')[-1].text !='ø MV':
      return 0
    # get table body
    dt_body = data_table.find('tbody')
    # for every row, get value of the last column and translate into an integer
    results = []
    for row in dt_body.findAll('tr'):
      val = row.findAll('td')[-1].text
      if val == '-':
        val = '0'
      else:
        val = val.replace('€','').replace('m','0000').replace('Th','000').replace('.','')
      results.append(int(val))
    return np.mean(results)


In [13]:
def bio_player_pull(pageSoup, player_id):

############# ADD SECONDARY POSITION INFO #################### #TODO
    ## base info
    player_name = pageSoup.select('h1')[0].get_text().lower()

    DOB = None
    POB = None
    COB = None
    position = None
    age = None
    height = None
    foot = None
    citizenship = None
    second_citizenship = None

    for row in pageSoup.select('tr'):
        try:

            if row.select('th')[0].get_text().strip() == "Date of birth:":
                DOB = row.select('td')[0].get_text().strip()

            # COB = None
            if row.select('th')[0].get_text().strip() == "Place of birth:":
                POB = row.select('td')[0].get_text().strip()
                COB = row.select('td')[0].select('img')[0]['alt']
                
            if row.select('th')[0].get_text().strip() == "Position:":
                position = row.select('td')[0].get_text().strip()

            if row.select('th')[0].get_text().strip() == "Age:":
                age = int(row.select('td')[0].get_text().strip())

            if row.select('th')[0].get_text().strip() == "Height:":
                height = int(float(row.select('td')[0].get_text().strip().replace('m', '').replace(',', '.').strip())*100)

            if row.select('th')[0].get_text().strip() == "Foot:":
                foot = row.select('td')[0].get_text().strip()

        except:
            pass

    if COB == None:
        for row in pageSoup.select('tr'):
            try:
                if row.select('th')[0].get_text().strip() == "Citizenship:":
                    COB = row.select('td')[0].get_text().strip()
            except:
                pass

    for row in pageSoup.select('tr'):
        try:
            if row.select('th')[0].get_text().strip() == "Citizenship:":
                no_of_citizenships = len(row.select('td')[0].select('img'))
                if no_of_citizenships < 2:
                    citizenship = row.select('td')[0].select('img')[0]['alt']
                if no_of_citizenships > 1:
                    second_citizenship = row.select('td')[0].select('img')[1]['alt']
                else:
                    second_citizenship = None
        except:
            pass


    if DOB != None:
        DOB = DOB.replace(" Happy Birthday", "")
        year_of_birth = int(DOB[len(DOB)-4:])
        month_of_birth = month_to_number(DOB.split(" ")[0])
        day_of_birth = int(DOB.split(" ")[1].split(",")[0])
        DOB = datetime.date(year_of_birth, month_of_birth, day_of_birth)

    else:
        year_of_birth = None
        month_of_birth = None
        day_of_birth = None

    biodict = {
        "player_id": player_id,
        "player_name": player_name,
        "day_of_birth": day_of_birth,
        "month_of_birth": month_of_birth,
        "year_of_birth": year_of_birth,
        "pob": POB,
        "cob": COB,
        "dob": DOB,
        "position": position,
        "height": height,
        "foot": foot,
        "citizenship": citizenship,
        "second_citizenship": second_citizenship
    }

    return(biodict)


def current_football_bio_player_pull(pageSoup, player_id):

    if len(pageSoup.select('div.dataRibbonRIP')) > 0:
        current_club = "dead"
        current_club_country = "NA"
    else:
        current_club = pageSoup.select('div.dataZusatzImage')[0].select('img')[0].get('alt').lower()
        if current_club in ["retired", "without club", "unknown"]:
            current_club_country = "NA"
        else:
            current_club_country = pageSoup.select('div.dataZusatzDaten')[0].select('img')[0].get('alt').lower()


    if len(pageSoup.select('div.dataMarktwert')) == 0:
        market_value = 0

    else:
        market_value = pageSoup.select('div.dataMarktwert')[0].get_text().split("Last update")[0].strip()

        if "m" in market_value:
            market_value = int( float(market_value.strip().replace("â‚¬", "").replace("€","").replace("m","")) * 1000000 )
        elif "Th." in market_value:
            market_value = int(market_value.strip().replace("â‚¬", "").replace("€","").replace("Th.","")) * 1000
        elif "-":
            market_value = 0

    joined = None
    contract_expires = None
    contract_option = None
    on_loan_from = None
    on_loan_from_url = None
    on_loan_from_country = None
    loan_contract_expiry = None
    player_agent = None

    for row in pageSoup.select('tr'):
        try:

            if row.select('th')[0].get_text().strip() == "Joined:":
                joined = row.select('td')[0].get_text().strip()

            if row.select('th')[0].get_text().strip() == "Contract expires:":
                contract_expires = row.select('td')[0].get_text().strip()

            if row.select('th')[0].get_text().strip() == "Contract option:":
                contract_option = row.select('td')[0].get_text().strip()

            if row.select('th')[0].get_text().strip() == "On loan from:":
                on_loan_from = row.select('td')[0].get_text().strip().lower()
                on_loan_from_url = "https://www.transfermarkt.com" + row.select('td')[0].select('a')[0]['href']

            if row.select('th')[0].get_text().strip() == "Contract there expires:":
                loan_contract_expiry = row.select('td')[0].get_text().strip()

            if row.select('th')[0].get_text().strip() == "Player agent:":
                player_agent = row.select('td')[0].get_text().strip()
        except:
            pass

    if joined != None:
        year_joined = int(joined[len(joined)-4:])
        month_joined = month_to_number(joined.split(" ")[0])
        day_joined = int(joined.split(" ")[1].split(",")[0])
        joined = datetime.date(year_joined, month_joined, day_joined)

    if contract_expires != None:
        if contract_expires != "-":
            year_expired = int(contract_expires[len(contract_expires)-4:])
            month_expired = month_to_number(contract_expires.split(" ")[0])
            day_expired = int(contract_expires.split(" ")[1].split(",")[0])
            contract_expires = datetime.date(year_expired, month_expired, day_expired)
        else:
            contract_expires = None

    if loan_contract_expiry != None:
        if loan_contract_expiry != "-":
            year_expires = int(loan_contract_expiry[len(loan_contract_expiry)-4:])
            month_expires = month_to_number(loan_contract_expiry.split(" ")[0])
            day_expires = int(loan_contract_expiry.split(" ")[1].split(",")[0])
            loan_contract_expiry = datetime.date(year_expires, month_expires, day_expires)
        else:
            loan_contract_expiry = None

    if loan_contract_expiry != None:
        loan_contract_expiry,contract_expires = contract_expires,loan_contract_expiry

    if on_loan_from != None:
        temp_soup = get_souped_page(on_loan_from_url)
        on_loan_from_country = temp_soup.select('div.dataZusatzDaten')[0].select('img')[0]['alt'].lower()

    statusdict = {
        "player_id": player_id,
        "current_club": current_club,
        "current_club_country": current_club_country,
        "market_value": market_value,
        "joined": joined,
        "contract_expires": contract_expires,
        "contract_option": contract_option,
        "on_loan_from": on_loan_from,
        "on_loan_from_country": on_loan_from_country,
        "loan_contract_expiry": loan_contract_expiry,
        "player_agent": player_agent}

    return(statusdict)

def transfer_history_pull(pageSoup, player_id):


    transfered_from = []
    transferred_to = []
    market_values = []
    transfer_fees = []
    transfer_dates = []
    transfer_season = []
    country_from = []
    country_to = []

    market_values_value = None
#         print("player transfer history")
    not_first_row = True
    for box in pageSoup.select('div.box.transferhistorie'):
        if not_first_row:
            not_first_row = True

            for row in box.select('tr')[2:]:
                try:
                    transfered_from_value = row.select('td')[4].select('a')[0].get('href').split("/")[1].replace("-", " ")
#                 if transfered_from_value == ""
                    transfered_from.append(transfered_from_value)
                except:
                    pass

            for row in box.select('tr')[2:]:
                try:
                    transferred_to_value = row.select('td')[8].select('a')[0].get('href').split("/")[1].replace("-", " ")
                    transferred_to.append(transferred_to_value)
                except:
                    pass

            for row in box.select('tr')[2:]:
                try:
                    market_values_value = row.select('td.zelle-mw')[0].get_text()#.select('img')[0].get('alt')

                    if "m" in market_values_value:
                        market_values_value = int( float(market_values_value.replace("€","").replace("m","")) * 1000000 )
                    elif "k" in market_values_value:
                        market_values_value = int(market_values_value.replace("€","").replace("k","")) * 1000
                    elif "-":
                        market_values_value = 0



                    market_values.append(market_values_value)
                except:
                    pass


        ## grab COUNTRY TO
            for row in box.select('tr')[2:]:
                try:
                    no_images = len(row.select('td')[7].select('img'))
                    if no_images > 0:
                        country_to.append(row.select('td')[7].select('img')[0].get('title').lower())
                    else:
                        country_to.append("no country")
                except:
                    pass


        ## grab COUNTRY FROM
            for row in box.select('tr')[2:]:
                try:
                    no_images = len(row.select('td')[3].select('img'))
                    if no_images > 0:
                        country_from.append(row.select('td')[3].select('img')[0].get('title').lower())
                    else:
                        country_from.append("no country")
                except:
                    pass


        ## grab TRANSFER FEE
        for row in box.select('tr')[2:]:
            try:
                transfer_fees_raw = row.select('td.zelle-abloese')[0].get_text()
                transfer_fees.append(transfer_fees_raw)
            except:
                pass


        ## grab TRANSFER DATE
        for row in box.select('tr')[1:]:
            try:
                date_raw = ""
                date_raw = row.select('td.show-for-small')[0].get_text().strip()#.get_text())#.select('img')[0].get('alt')
                if 'Date' in date_raw:

                    date_raw = date_raw.split(": ")[1]
                    year_of_transfer = int(date_raw[len(date_raw)-4:])
                    month_of_transfer = month_to_number(date_raw.split(" ")[0])
                    day_of_transfer = int(date_raw.split(" ")[1].split(",")[0])
                    transfer_date = datetime.date(year_of_transfer, month_of_transfer, day_of_transfer)
                    transfer_dates.append(transfer_date)
            except:
                pass

        ## grab SEASON
        for row in box.select('tr')[1:]:
            try:
                season_raw = row.select('td')[0].get_text().strip()
                if "Date" in season_raw:
                    pass
                else:
                    if "Total" in season_raw:
                        pass
                    else:
                        if "/" in season_raw:
                            transfer_season.append(season_raw.strip())
            except:
                pass
    ## edit the transfer fees / types

    transfer_types = ['loan' if "oan" in f else 'transfer' for f in transfer_fees]


    transfer_fees_new = []

    for t in range(len(transfer_fees)):

        if transfer_fees[t] == "-":
            transfer_fees_new.append(0)

        elif transfer_fees[t] == "Loan":
            transfer_fees_new.append(0)

        elif transfer_fees[t] == "End of loan":
            transfer_fees_new.append(0)

        elif transfer_fees[t] == "Free transfer":
            transfer_fees_new.append(0)

        elif "Loan fee:" in transfer_fees[t]:
            if "m" in transfer_fees[t]:
                transfer_fees_new.append( int( float( transfer_fees[t].replace("Loan fee:", "").replace("€", "").replace("m", "") ) * 1000000 ) )
            else:
                transfer_fees_new.append( int( transfer_fees[t].replace("Loan fee:", "").replace("€", "").replace("k", "") ) * 1000 )

        elif transfer_fees[t] == "?":
            transfer_fees_new.append(market_values[t])

        elif "m" in transfer_fees[t]:
            transfer_fees_new.append( int( float( transfer_fees[t].replace("€", "").replace("m", "") ) * 1000000 ) )

        elif "k" in transfer_fees[t]:
            transfer_fees_new.append( int( transfer_fees[t].replace("Loan fee:", "").replace("€", "").replace("k", "") ) * 1000 )

        else:
            transfer_fees_new.append(transfer_fees[t])


    ## check internal

    internal_transfer = []

    for t in range(len(transfered_from)):
        if remove_youth(transfered_from[t]) == remove_youth(transferred_to[t]):
            internal_transfer.append("internal")
        else:
            internal_transfer.append("external")

    country_from = country_from[:-1]

    DOB = None

    for row in pageSoup.select('tr'):
        try:
            if row.select('th')[0].get_text().strip() == "Date of birth:":
                DOB = row.select('td')[0].get_text().strip()
        except:
            pass

    # age at transfer
    if DOB != None:
        DOB = DOB.replace(" Happy Birthday", "")
        year_of_birth = int(DOB[len(DOB)-4:])
        month_of_birth = month_to_number(DOB.split(" ")[0])
        day_of_birth = int(DOB.split(" ")[1].split(",")[0])
        DOB = datetime.date(year_of_birth, month_of_birth, day_of_birth)

    age_at_transfer = [calculate_age_at_transfer(DOB, f) for f in transfer_dates]

    player_view = pd.DataFrame(
    {'transfered_from': transfered_from,
     'transferred_to': transferred_to,
     'market_values': market_values,
     'transfer_fees': transfer_fees_new,
     'transfer_dates': transfer_dates,
     'transfer_season': transfer_season,
     'country_to': country_to,
     'country_from': country_from,
     'transfer_types': transfer_types,
     'internal_external_transfer': internal_transfer,
     'age_at_transfer': age_at_transfer
    })

    player_view['player_id'] = player_id


    ####  ADD YOUTH CLUBS ######################################################
    youth_clubs = None
    try:
        for box in pageSoup.select('div.box'):
            if box.select('div')[0].get_text().strip() == 'Youth clubs':
                youth_clubs = box.select('div')[1].get_text().strip()
    except:
        pass

    youth_clubs_list__ = []

    if youth_clubs != None:

        youth_club = youth_clubs

        for f in youth_club.split(","):
    #             print(f.split(" (")[0].strip().lower(), remove_youth(f.split(" (")[0].strip().lower()))
            youth_clubs_list__.append(remove_youth(f.split(" (")[0].strip().lower()))

    if len(player_view[player_view['age_at_transfer'] <= 18].transfered_from) > 0:
        youth_clubs_list__ = youth_clubs_list__ + [remove_youth(ff) for ff in player_view[player_view['age_at_transfer'] <= 18].transfered_from]

    if len(youth_clubs_list__) > 0:
        player_view['all_youth_clubs'] = ','.join(map(str, list(set(youth_clubs_list__))))
    #         print(','.join(map(str, list(set(youth_clubs_list__)))) )
    else:
        player_view['all_youth_clubs'] = remove_youth(player_view.tail(1).iloc[0]['transfered_from'])

    return(player_view)


def performance_history_pull(base_url, player_id, player_dob):
    # print(base_url.replace("profil", "leistungsdatendetails") + "/saison//verein/0/liga/0/wettbewerb//pos/0/trainer_id/0/plus/1")
    pageSoup2 = get_souped_page(base_url.replace("profil", "leistungsdatendetails") + "/saison//verein/0/liga/0/wettbewerb//pos/0/trainer_id/0/plus/1")

    perf_table = pageSoup2.select('table.items')[0]

    ## set empty lists for collection
    season_list = []
    competition_list = []
    competition_code_list = []
    club_list = []
    in_squad_list = []
    appearances_list = []
    ppg_list = []
    goals_list = []
    assists_list = []
    subbed_on_list = []
    subbed_off_list = []
    yellow_card_list = []
    second_yellow_card_list = []
    red_card_list = []
    penalties_list = []
    mins_played_list = []
    clean_sheets_list = []
    goals_conceded_list = []

    #check position
    if len(perf_table.select('tr')[2].select('td')) == 17:
        position_profile = "GK"
    else:
        position_profile = "OUTFIELD"

    if position_profile == "OUTFIELD":
        for row in perf_table.select('tr')[2::]:
            season = row.select('td')[0].get_text()
            competition = row.select('td')[2].get_text()
            competition_code = row.select('td')[2].select('a')[0]['href'].split("/")[4]
            club = row.select('td')[3].select('img')[0]['alt']
            in_squad = int(row.select('td')[4].get_text())
            appearances = row.select('td')[5].get_text()
            ppg = row.select('td')[6].get_text()
            goals = row.select('td')[7].get_text()
            assists = row.select('td')[8].get_text()
            subbed_on = row.select('td')[10].get_text()
            subbed_off = row.select('td')[11].get_text()
            yellow_card = row.select('td')[12].get_text()
            second_yellow_card = row.select('td')[13].get_text()
            red_card = row.select('td')[14].get_text()
            penalties = row.select('td')[15].get_text()
            mins_played = row.select('td')[17].get_text()

            appearances = 0 if appearances == "-" else int(appearances)
            ppg = 0 if ppg in ["-", "0,00"] else float(ppg)
            goals = 0 if goals == "-" else int(goals)
            assists = 0 if assists == "-" else int(assists)
            subbed_on = 0 if subbed_on == "-" else int(subbed_on)
            subbed_off = 0 if subbed_off == "-" else int(subbed_off)
            yellow_card = 0 if yellow_card == "-" else int(yellow_card)
            second_yellow_card = 0 if second_yellow_card == "-" else int(second_yellow_card)
            red_card = 0 if red_card == "-" else int(red_card)
            penalties = 0 if penalties == "-" else int(penalties)

            if mins_played == "-":
                mins_played = 0
            elif "." in mins_played:
                # print(mins_played)
                mins_played = mins_played.replace("'", "").replace(".", "")
            else:
                mins_played = int(mins_played.replace("'", ""))

            goals_conceded = 0
            clean_sheets = 0

            ## append values
            season_list.append(season)
            competition_list.append(competition)
            competition_code_list.append(competition_code)
            club_list.append(club)
            in_squad_list.append(in_squad)
            appearances_list.append(appearances)
            ppg_list.append(ppg)
            goals_list.append(goals)
            assists_list.append(assists)
            subbed_on_list.append(subbed_on)
            subbed_off_list.append(subbed_off)
            yellow_card_list.append(yellow_card)
            second_yellow_card_list.append(second_yellow_card)
            red_card_list.append(red_card)
            penalties_list.append(penalties)
            mins_played_list.append(mins_played)
            clean_sheets_list.append(clean_sheets)
            goals_conceded_list.append(goals_conceded)

    else:
        for row in perf_table.select('tr')[2::]:
            season = row.select('td')[0].get_text()
            competition = row.select('td')[2].get_text()
            competition_code = row.select('td')[2].select('a')[0]['href'].split("/")[4]
            club = row.select('td')[3].select('img')[0]['alt']
            in_squad = int(row.select('td')[4].get_text())
            appearances = row.select('td')[5].get_text()
            ppg = row.select('td')[6].get_text()
            goals = row.select('td')[7].get_text()
            subbed_on = row.select('td')[9].get_text()
            subbed_off = row.select('td')[10].get_text()
            yellow_card = row.select('td')[11].get_text()
            second_yellow_card = row.select('td')[12].get_text()
            red_card = row.select('td')[13].get_text()
            goals_conceded = row.select('td')[14].get_text()
            clean_sheets = row.select('td')[15].get_text()
            mins_played = row.select('td')[16].get_text()

            appearances = 0 if appearances == "-" else int(appearances)
            ppg = 0 if ppg in ["-", "0,00"] else float(ppg)
            goals = 0 if goals == "-" else int(goals)
            subbed_on = 0 if subbed_on == "-" else int(subbed_on)
            subbed_off = 0 if subbed_off == "-" else int(subbed_off)
            yellow_card = 0 if yellow_card == "-" else int(yellow_card)
            second_yellow_card = 0 if second_yellow_card == "-" else int(second_yellow_card)
            red_card = 0 if red_card == "-" else int(red_card)
            clean_sheets = 0 if clean_sheets == "-" else int(clean_sheets)
            goals_conceded = 0 if goals_conceded == "-" else int(goals_conceded)

            if mins_played == "-":
                mins_played = 0
            elif "." in mins_played:
                # print(mins_played)
                mins_played = mins_played.replace("'", "").replace(".", "")
            else:
                mins_played = int(mins_played.replace("'", ""))

            assists = 0
            penalties = 0

            ## append values
            season_list.append(season)
            competition_list.append(competition)
            competition_code_list.append(competition_code)
            club_list.append(club)
            in_squad_list.append(in_squad)
            appearances_list.append(appearances)
            ppg_list.append(ppg)
            goals_list.append(goals)
            assists_list.append(assists)
            subbed_on_list.append(subbed_on)
            subbed_off_list.append(subbed_off)
            yellow_card_list.append(yellow_card)
            second_yellow_card_list.append(second_yellow_card)
            red_card_list.append(red_card)
            penalties_list.append(penalties)
            mins_played_list.append(mins_played)
            clean_sheets_list.append(clean_sheets)
            goals_conceded_list.append(goals_conceded)

    performance_data = pd.DataFrame(
            {'season': season_list,
            'competition': competition_list,
            'competition_code': competition_code_list,
            'club': club_list,
            'in_squad': in_squad_list,
            'appearances': appearances_list,
            'ppg': ppg_list,
            'goals': goals_list,
            'assists': assists_list,
            'subbed_on': subbed_on_list,
            'subbed_off': subbed_off_list,
            'yellow_card': yellow_card_list,
            'second_yellow_card': second_yellow_card_list,
            'red_card': red_card_list,
            'penalties': penalties_list,
            'mins_played': mins_played_list,
            'clean_sheets': clean_sheets_list,
            'goals_conceded': goals_conceded_list
            })


    performance_data['player_id'] = player_id

    age = []
    for s in performance_data.season:

        if "/" in s:
            year = int(s.split("/")[0])
            if year < 30:
                year = 2000 + year
            else:
                year = 1900 + year
            competition_start_date = datetime.date(year, 8, 1)

        else:
            year = int(year)
            competition_start_date = datetime.date(year, 4, 1)

        age.append(calculate_age(player_dob, competition_start_date))

    performance_data['age'] = age

    return(performance_data)

def market_value_historic_pull(base_url, player_id):

    mv_soup = get_souped_page(base_url.replace("profil", "marktwertverlauf"))

    if mv_soup.find("script", text=re.compile("Highcharts.Chart")) != None:

        script = mv_soup.find("script", text=re.compile("Highcharts.Chart")).text
        parsed = js2xml.parse(script)


        xpath = '//array//object//property'

        age_list = []
        club_list = []
        mv_list = []
        date_of_value_list = []

        for i in range(len(parsed.xpath(xpath))):

            age = None
            club = None
            raw_value = None
            date_of_value = None
            date_raw = None

            if parsed.xpath(xpath)[i].get('name') == 'age':
                age = int(stringify_children(parsed.xpath(xpath)[i]).split("number value=")[1].split("/")[0][1:][:-1])
                age_list.append(age)

            if parsed.xpath(xpath)[i].get('name') == 'verein':
                club = stringify_children(parsed.xpath(xpath)[i]).split("<string>")[1].split("</string>")[0].lower()
                club_list.append(club)

            if parsed.xpath(xpath)[i].get('name') == 'mw':
                raw_value = stringify_children(parsed.xpath(xpath)[i]).split("<string>")[1].split("</string>")[0].replace("€", "")

                if "m" in raw_value:
                    raw_value = int( float(raw_value.strip().replace("â‚¬", "").replace("€","").replace("m","")) * 1000000 )
                elif "Th." in raw_value:
                    raw_value = int(raw_value.strip().replace("â‚¬", "").replace("€","").replace("Th.","")) * 1000
                elif "-":
                    raw_value = 0

                mv_list.append(raw_value)

            if parsed.xpath(xpath)[i].get('name') == 'datum_mw':
                date_raw = stringify_children(parsed.xpath(xpath)[i]).split("<string>")[1].split("</string>")[0]
                if date_raw != None:
                    year_of_birth = int(date_raw[len(date_raw)-4:])
                    month_of_birth = month_to_number(date_raw.split(" ")[0])
                    day_of_birth = int(date_raw.split(" ")[1].split(",")[0])
                    date_of_value = datetime.date(year_of_birth, month_of_birth, day_of_birth)

                date_of_value_list.append(date_of_value)


        market_value_history = pd.DataFrame(
        {'club': club_list,
         'value': mv_list,
         'data_date': date_of_value_list,
         'age': age_list
        })

        market_value_history['player_id'] = player_id

        return(market_value_history)

    else:
        market_value_history = pd.DataFrame(
        {'club': [None],
         'value': [None],
         'data_date': [None],
         'age': [None]
        })

        market_value_history['player_id'] = player_id

        return(market_value_history)




def tm_pull(player_page,
            data_folder = "",
            player_bio = False,
            player_status = False,
            transfer_history = False,
            performance_data = False,
            market_value_history = False,
            output = "pandas" ):

    player_id = player_page.split("/")[-1:][0]
    raw_base_page = get_souped_page(player_page)

    bio = bio_player_pull(raw_base_page, player_id)
    output_dict = {}

    player_dob = bio['dob']

    ### if the user has selected to pull player_bio data then run
    if player_bio:

        # pandas output
        if output == "pandas":
            output_dict['player_bio'] = pd.DataFrame.from_dict(bio, orient = "index").transpose()

        # csv output
        elif output == "csv":
            with open((data_folder + player_id + '_bio.csv'),'w') as f:
                w = csv.writer(f)
                w.writerow(bio.keys())
                w.writerow(bio.values())

    ### if the user has selected to pull player_status data then run
    if player_status:

        status = current_football_bio_player_pull(raw_base_page, player_id)

        # pandas output
        if output == "pandas":
            output_dict['player_status'] = pd.DataFrame.from_dict(status, orient = "index").transpose()

        # csv output
        elif output == "csv":
            with open((data_folder + player_id + '_status.csv'),'w') as f:
                w = csv.writer(f)
                w.writerow(status.keys())
                w.writerow(status.values())

    ### if the user has selected to pull player_transfers data then run
    if transfer_history:

        transfers = transfer_history_pull(raw_base_page, player_id)

        # pandas output
        if output == "pandas":
            output_dict['transfer_history'] = transfers

        # csv output
        elif output == "csv":
            transfers.to_csv((data_folder + player_id + '_transfer_history.csv'), index = False)


    ### if the user has selected to pull market_value_history data then run
    if market_value_history:

        historic_market_value = market_value_historic_pull(player_page, player_id)

        # pandas output
        if output == "pandas":
            output_dict['market_value_history'] = historic_market_value

        # csv output
        elif output == "csv":
            historic_market_value.to_csv((data_folder + player_id + '_historic_market_value.csv'), index = False)


    ### if the user has selected to pull performance data then run
    if performance_data:

        perf_data = performance_history_pull(player_page, player_id, player_dob)

        # pandas output
        if output == "pandas":
            output_dict['performance_data'] = perf_data

        # csv output
        elif output == "csv":
            perf_data.to_csv((data_folder + player_id + '_performance_data.csv'), index = False)

    # return all pandas output
    if output == "pandas":
        return(output_dict)

def squad_number_history(base_url, squad_type = "both"):
    '''
    This method returns the squad numbers history of a player.

    Squad numbers history is returned according to the squad type. If the squad type is both,
    then the squad numbers of the players representation for both club and country is returned.

    Args:
    -----------
        base_url: transfermarkt url of the player, string
        squad_type: The type of squad. Default - both. Can be one of ["club", "country", "both], string
    
    Returns:
    -----------
        squad_df: History of squad numbers of the player, Pandas DataFrame
    
    Raises:
    -----------
        Only User Warnings are raised when the squad type doesn't match with one of the 3 options and
        if the player hasn't played for his country and the user requested for both squad numbers.
    '''

    # Check if the squad type is in one of the 3 options
    if squad_type not in ["club", "country", "both"]:
        warnings.warn("Unsupported squad type", UserWarning)
        return None
    
    #Get player id
    player_id = base_url.split("/")[-1:][0]

    # Scrape the squad number page.
    souped_page = get_souped_page(base_url.replace("profil", "rueckennummern"))
    
    #Get the tables with the data directly.
    tables = souped_page.findAll("table", {"class": "items"})

    # Get the columns and add a user defined column
    columns = [col.get_text() for col in tables[0].findAll("th") if col.get_text() != ""]
    columns.append("squad_type")

    # Checking if the player has played for his country
    if len(tables) == 1:
        isNationalAvailable = False
    else:
        isNationalAvailable = True
    values = []

    #Parsing the club squad numbers
    if squad_type in ["both", "club"]:
        for row in tables[0].findAll("tr"):
            row_elem = [val.get_text() for val in row.findAll("td") if val.get_text() != ""]
            if len(row_elem) > 0:
                row_elem.append("club")
                values.append(row_elem)

    #Parsing the coutry squad numbers
    if squad_type in ["both", "country"] and isNationalAvailable:
        for row in tables[1].findAll("tr"):
            row_elem = [val.get_text() for val in row.findAll("td") if val.get_text() != ""]
            if len(row_elem) > 0:
                row_elem.append("country")
                values.append(row_elem)
    elif isNationalAvailable!=True and squad_type in ["both", "country"]:
        warnings.warn("Player hasn't played for his country", UserWarning)
    
    # pandas dataframe output
    if len(values) == 0:
        return None
    squad_df = pd.DataFrame(values, columns=columns)
    squad_df["player_id"] = player_id
    return squad_df    
    
    


### Notebook Settings

In [14]:
pd.set_option('display.max_columns', None)

In [15]:
lst_df_bio = []
dict_output = tm_pull('https://www.transfermarkt.com/raheem-sterling/profil/spieler/134425', player_bio=True, output='pandas')
df = dict_output['player_bio']
lst_df_bio.append(df)

In [16]:
lst_df_bio

[  player_id      player_name day_of_birth month_of_birth year_of_birth  \
 0    134425  raheem sterling            8             12          1994   
 
         pob      cob         dob              position height   foot  \
 0  Kingston  Jamaica  1994-12-08  attack - Left Winger    170  right   
 
   citizenship second_citizenship  
 0        None            Jamaica  ]

---

<a id='section2'></a>

## <a id='#section2'>2. Project Brief</a>
This notebook scrapes data from [TransferMarkt](https://www.transfermarkt.co.uk/) using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) and the [Tyrone Mings web scraper](https://github.com/FCrSTATS/tyrone_mings) by [FCrSTATS](https://twitter.com/FC_rstats). This landed data is then manipulated as DataFrames using [pandas](http://pandas.pydata.org/).

The data of player values produced in this notebook is exported to CSV. This data can be further analysed in Python, joined to other datasets, or explored using dashboarding tools such as Tableau or PowerBI, or explores in a spreadsheet such as Microsoft Excel or Google Sheets.

---

<a id='section3'></a>

## <a id='#section3'>3. Data Sources</a>

### <a id='#section3.1'>3.1. Introduction</a>
[TransferMarkt](https://www.transfermarkt.co.uk/) is a German-based website owned by [Axel Springer](https://www.axelspringer.com/en/) and is the leading website for the football transfer market. The website posts football related data, including: scores and results, football news, transfer rumours, and most usefully for us - calculated estimates ofthe market values for teams and individual players.

To read more about how these estimations are made, [Beyond crowd judgments: Data-driven estimation of market value in association football](https://www.sciencedirect.com/science/article/pii/S0377221717304332) by Oliver Müllera, Alexander Simons, and Markus Weinmann does an excellent job of explaining how the estimations are made and their level of accuracy.

Before conducting our EDA, the data needs to be imported as a DataFrame in the Data Sources section [Section 3](#section3) and cleaned in the Data Engineering section [Section 4](#section4).

We'll be using the [pandas](http://pandas.pydata.org/) library to import our data to this workbook as a DataFrame.

### <a id='#section3.2'>3.2. Data Dictionaries</a>
The [TransferMarkt](https://www.transfermarkt.co.uk/) dataset has six features (columns) with the following definitions and data types:

| Feature     | Data type    |
|------|-----|
| `position_number`    | object     |
| `position_description`    | object     |
| `name`    | object     |
| `dob`    | object     |
| `nationality`    | object     |
| `value`    | object     |

### <a id='#section3.3'>3.3. Creating the DataFrame - scraping the data</a>

#### <a id='#section3.3.1.'>3.3.1. League Codes</a>
Before scraping data from [TransferMarkt](https://www.transfermarkt.co.uk/), we need to look at the leagues that we wish to scrape.

The [Tyrone Mings web scraper](https://github.com/FCrSTATS/tyrone_mings) by [FCrSTATS](https://github.com/FCrSTATS) for [TransferMarkt](https://www.transfermarkt.co.uk/) is made up of two parts:
1.    In the first part, the scraper takes the webpages for each of the individual leagues  e.g. the Championship, and extract the hyperlinks to the pages of all the individual teams in the league table.
2.    In the second part the script, the webscraper uses the list of invidual teams hyperlinks collected in part 1 to then collect the hyperlinks for each of the players for those teams. From this, the scraper can then extract the information we need for each of these players. This information is downloaded in two parts - bio and status. This is then joined together using the player_id.

This information collected for all the players is converted to a [pandas](http://pandas.pydata.org/) DataFrame from which we can view and manipulate the data.

An example webpage for a football league is the following: https://www.transfermarkt.co.uk/jumplist/startseite/wettbewerb/GB1/plus/?saison_id=2019. As we can see, between the subdirectory path of `'/wettbewerb/'` and the `'/plus/'`, there is a 3 or 4 digit code. For The Premier League, the code is GB1.

In order to scrape the webpages from [TransferMarkt](https://www.transfermarkt.co.uk/), the codes the leagues we require, need to be recorded from [TransferMarkt](https://www.transfermarkt.co.uk/). The following leagues are all the leagues that feature on the latest FBref 'Big 5' European leagues dataset, the EFL dataset, and the the [FIFA 21 player dataset](https://www.kaggle.com/stefanoleone992/fifa-21-complete-player-dataset), all of which are datasets for which this [TransferMarkt](https://www.transfermarkt.co.uk/) data will be subsequently matched to.

| League Name on FIFA    | Country    | Corresponding [TransferMarkt](https://www.transfermarkt.co.uk/) League Code    |
|------|-----|-----|
| Österreichische Fußball-Bundesliga    | Austria    | A1    |
| SAF    | Argentina    | AR1N    |
| Hyundai A-League    | Australia    | AUS1    |
| Belgium Pro League    | Belgium    | BE1    |
| Raiffeisen Super League    | Switzerland    | C1    |
| Campeonato Scotiabank    | Chile    | CLPD    |
| Liga Dimayor    | Colombia    | COL1    |
| CSL    | China    | CSL    |
| Superliga    | Denmark    | DK1    |
| LaLiga Santander    | Spain    | ES1    |
| LaLiga 1 I 2 I 3    | Spain    | ES2    |
| Finnliiga    | Finland    | FI1    |
| Ligue 1 Conforama    | France    | FR1    |
| Domino’s Ligue 2    | France    | FR2    |
| Premier League    | England    | GB1    |
| EFL Championship    | England    | GB2    |
| EFL League One    | England    | GB3    |
| EFL League Two    | England    | GB4    |
| Hellas Liga    | Greece    | GR1    |
| SSE Airtricity League    | Ireland    | IR1    |
| Serie A TIM    | Italy    | IT1    |
| Calcio B    | Italy    | IT2    |
| Meiji Yasuda J1 League    | Japan    | JAP1    |
| Croatia Liga    | Croatia    | KR1    |
| Bundesliga    | Germany    | L1    |
| Bundesliga 2    | Germany    | L2    |
| 3. Liga    | Germany    | L3    |
| LIGA Bancomer MX    | Mexico    | MEX1    |
| Major League Soccer    | United States    | MLS1    |
| Eredivisie    | Netherlands    | NL1    |
| Eliteserien    | Norway    | NO1    |
| Ekstraklasa    | Poland    | PL1    |
| Liga NOS    | Portugal    | PO1    |
| Romania Liga I    | Romania    | RO1    |
| K LEAGUE Classic    | South Korea    | RSK1    |
| League of Russia    | Russia    | RU1    |
| Saudi Professional    | League	Saudi Arabia    | SA1    |
| Scottish Premiership    | Scotland    | SC1    |
| Allsvenskan    | Sweden    | SE1    |
| South African FL    | South Africa    | SFA1    |
| Süper Lig    | Turkey    | TR1    |
| Česká Liga    | Czech Republic    | TS1    |
| UAE Gulf League    | United Arab Emirates    | UAE1    |
| Ukraine Liga    | Ukraine    | UKR1    |


Unfortunately, on writing this notebook, the following leagues are that is present on FIFA but cannot scraped from [TransferMarkt](https://www.transfermarkt.co.uk/) with the script and will and therefore not be included in the dataset.

| League Name on FIFA    | Country    | Corresponding [TransferMarkt](https://www.transfermarkt.co.uk/) League Code    |
|------|-----|-----|
| League of Russia    | Russia    | RU1    |

#### <a id='#section3.3.2.'>3.3.2. Define Season and Leagues to Scrape</a>

In [10]:
# Define list of leagues for players to scrape
lst_leagues = ['A1',
               'AR1N',
               'AUS1',
               'BE1',
               'C1',
               'CLPD',
               'COL1',
               'CSL',
               'DK1',
               'ES1',
               'ES2',
               'FI1',
               'FR1',
               'FR2',
               'GB1',
               'GB2',
               'GB3',
               'GB4',
               'GR1',
               'IR1',
               'IT1',
               'IT2',
               'JAP1',
               'KR1',
               'L1',
               'L2',
               'L3',
               'MEX1',
               'MLS1',
               'NL1',
               'NO1',
               'PL1',
               'PO1',
               'RO1',
               'RSK1',
               'RU1',
               'SA1',
               'SC1',
               'SE1',
               'SFA1',
               'TR1',
               'TS1',
               'UAE1',
               'UKR1'
              ]

In [11]:
lst_leagues

['A1',
 'AR1N',
 'AUS1',
 'BE1',
 'C1',
 'CLPD',
 'COL1',
 'CSL',
 'DK1',
 'ES1',
 'ES2',
 'FI1',
 'FR1',
 'FR2',
 'GB1',
 'GB2',
 'GB3',
 'GB4',
 'GR1',
 'IR1',
 'IT1',
 'IT2',
 'JAP1',
 'KR1',
 'L1',
 'L2',
 'L3',
 'MEX1',
 'MLS1',
 'NL1',
 'NO1',
 'PL1',
 'PO1',
 'RO1',
 'RSK1',
 'RU1',
 'SA1',
 'SC1',
 'SE1',
 'SFA1',
 'TR1',
 'TS1',
 'UAE1',
 'UKR1']

In [12]:
# Define season to scrape

## Assign season to variable
season = '2020'    # '2020' for the 20/21 season

## Create 'Full Season' and 'Short Season' strings

### Full season
full_season_string = str(int(season)) + '/' + str(int(season) + 1)

### Short season
short_season_string = str((str(int(season))[-2:]) + (str(int(season) + 1)[-2:]))

In [13]:
full_season_string

'2020/2021'

In [14]:
short_season_string

'2021'

#### <a id='#section3.3.3.'>3.3.3. Player URLs</a>

In [15]:
"""
# Run this script to download all the URLs for the players of interest from TransferMarkt

## Start timer
tic = datetime.datetime.now()

## Create empty list
lst_player_urls = []

## Print time scraping started
print(f'Scraping started at: {tic}')

## Scrape information for each player
for league in lst_leagues:
    try:
        lst_output = get_player_urls_from_league_page(f'https://www.transfermarkt.co.uk/championship/startseite/wettbewerb/{league}/plus/?saison_id={season}', verbose=True)
        lst_player_urls.append(lst_output)
        print(f'All player URLs for the {league} league appended.')
    except:
        pass

## End timer
toc = datetime.datetime.now()

## Print time scraping ended
print(f'Scraping ended at: {toc}')

## Flatten nested list into a single list
lst_player_urls = reduce(lambda x,y: x+y, lst_player_urls)

## No. URLs i.e. players
len_player_urls = len(lst_player_urls)

## Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to scrape the {len_player_urls:,} player urls for the {full_season_string} season is: {total_time/60:0.2f} minutes.')
"""

"\n# Run this script to download all the URLs for the players of interest from TransferMarkt\n\n## Start timer\ntic = datetime.datetime.now()\n\n## Create empty list\nlst_player_urls = []\n\n## Print time scraping started\nprint(f'Scraping started at: {tic}')\n\n## Scrape information for each player\nfor league in lst_leagues:\n    try:\n        lst_output = get_player_urls_from_league_page(f'https://www.transfermarkt.co.uk/championship/startseite/wettbewerb/{league}/plus/?saison_id={season}', verbose=True)\n        lst_player_urls.append(lst_output)\n        print(f'All player URLs for the {league} league appended.')\n    except:\n        pass\n\n## End timer\ntoc = datetime.datetime.now()\n\n## Print time scraping ended\nprint(f'Scraping ended at: {toc}')\n\n## Flatten nested list into a single list\nlst_player_urls = reduce(lambda x,y: x+y, lst_player_urls)\n\n## No. URLs i.e. players\nlen_player_urls = len(lst_player_urls)\n\n## Calculate time take\ntotal_time = (toc-tic).total_sec

In [16]:
# List length i.e. total players
#len(lst_player_urls)

In [17]:
# Save these URLs to a one column DataFrame
#df_player_urls = pd.DataFrame(lst_player_urls, columns=['player_url'])

In [18]:
# Export DataFrame as a CSV file

## Export a copy to the 'archive' subfolder of the TM folder, including the date
#df_player_urls.to_csv(data_dir_tm + f'/raw/{short_season_string}/player_urls/archive/' + f'tm_player_urls_all_{short_season_string}_last_updated_{today}.csv', index=None, header=True)

## Export another copy to the TM folder called 'latest' (can be overwritten)
#df_player_urls.to_csv(data_dir_tm + f'/raw/{short_season_string}/player_urls/' + f'tm_player_urls_all_{short_season_string}_latest.csv', index=None, header=True)

In [19]:
# Import DataFrame as a CSV file
df_player_urls = pd.read_csv(data_dir_tm + f'/raw/{short_season_string}/player_urls/' + f'tm_player_urls_all_{short_season_string}_latest.csv')

In [20]:
# Pandas DataFrame to column
lst_player_urls = df_player_urls['player_url'].tolist()

In [21]:
lst_player_urls

['https://www.transfermarkt.com/emilian-metu/profil/spieler/580622',
 'https://www.transfermarkt.com/ahmet-muhamedbegovic/profil/spieler/271323',
 'https://www.transfermarkt.com/kofi-schulz/profil/spieler/192539',
 'https://www.transfermarkt.com/manuel-maranda/profil/spieler/287995',
 'https://www.transfermarkt.com/michael-steinwender/profil/spieler/375434',
 'https://www.transfermarkt.com/robert-ljubicic/profil/spieler/353634',
 'https://www.transfermarkt.com/armin-gremsl/profil/spieler/159758',
 'https://www.transfermarkt.com/dor-hugi/profil/spieler/293602',
 'https://www.transfermarkt.com/christoph-halper/profil/spieler/308086',
 'https://www.transfermarkt.com/alexander-schmidt/profil/spieler/307939',
 'https://www.transfermarkt.com/martin-majnovics/profil/spieler/394609',
 'https://www.transfermarkt.com/lukas-grozurek/profil/spieler/75829',
 'https://www.transfermarkt.com/michael-blauensteiner/profil/spieler/195854',
 'https://www.transfermarkt.com/christoph-messerer/profil/spieler

In [22]:
# List length i.e. total players
len(lst_player_urls)

22446

#### <a id='#section3.3.3.'>3.3.3. Bio Information</a>

In [None]:
# Run this script to scrape latest version of bio data from TransferMarkt

## Start timer
tic = datetime.datetime.now()

## Create empty list
lst_df_bio = []

## Print time scraping started
print(f'Scraping started at: {tic}')

## Scrape bio information for each player
for player_page in lst_player_urls:
    try:
        dict_output = tm_pull(player_page, player_bio=True, output='pandas')
        df = dict_output['player_bio']
        lst_df_bio.append(df)
        print(f'Bio data appended for: {player_page}')
    except:
        pass
        print(f'Unable to append bio data for: {player_page}')

## Concatenate DataFrames
df_bio = pd.concat(lst_df_bio)

## Print time scraping ended
print(f'Scraping ended at: {toc}')

## Create attribute for the season    
df_bio['season'] = full_season_string

## End timer
toc = datetime.datetime.now()

## Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to scrape the bio data of the {len_player_urls:,} players for the {full_season_string} season is: {total_time/60:0.2f} minutes.')

In [None]:
# Display DataFrame
df_bio.head()

In [None]:
df_bio.shape

##### Export DataFrame

In [None]:
# Export DataFrame as a CSV file

## Export a copy to the 'archive' subfolder of the TM folder, including the date
df_bio.to_csv(data_dir_tm + f'/raw/{short_season_string}/bio/archive/' + f'tm_player_bio_all_{short_season_string}_last_updated_{today}.csv', index=None, header=True)

## Export another copy to the TM folder called 'latest' (can be overwritten)
df_bio.to_csv(data_dir_tm + f'/raw/{short_season_string}/bio/' + f'tm_player_bio_all_{short_season_string}_latest.csv', index=None, header=True)

#### <a id='#section3.3.4.'>3.3.4. Status Information</a>

In [None]:
# Run this script to scrape latest version of status data from TransferMarkt

## Start timer
tic = datetime.datetime.now()

## Create empty list
lst_df_status = []

## Print time scraping started
print(f'Scraping started at: {tic}')

## Scrape status information for each player
for player_page in lst_player_urls:
    try:
        dict_output = tm_pull(player_page, player_status=True, output='pandas')
        df = dict_output['player_status']
        lst_df_status.append(df)
        print(f'Status data appended for: {player_page}')
    except:
        pass
        print(f'Unable to append status data for: {player_page}')

## Concatenate DataFrames
df_status = pd.concat(lst_df_status)

## Print time scraping ended
print(f'Scraping ended at: {toc}')

## Create attribute for the season    
df_status['season'] = full_season_string

## End timer
toc = datetime.datetime.now()

## Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to scrape the status data of the {len_player_urls:,} players for the {full_season_string} season is: {total_time/60:0.2f} minutes.')

In [None]:
# Display DataFrame
df_status.head()

In [None]:
df_status.shape

##### Export DataFrame

In [None]:
# Export DataFrame as a CSV file

## Export a copy to the 'archive' subfolder of the TM folder, including the date
df_status.to_csv(data_dir_tm + f'/raw/{short_season_string}/status/archive/' + f'tm_player_status_all_{short_season_string}_last_updated_{today}.csv', index=None, header=True)

## Export another copy to the TM folder called 'latest' (can be overwritten)
df_status.to_csv(data_dir_tm + f'/raw/{short_season_string}/status/' + f'tm_player_status_all_{short_season_string}_latest.csv', index=None, header=True)

#### <a id='#section3.3.5.'>3.3.5. Transfer History</a> - this scraper currently doesn't work!

In [None]:
# Run this script to scrape latest version of transfer history data from TransferMarkt

## Start timer
tic = datetime.datetime.now()

## Create empty list
lst_df_th = []

## Print time scraping started
print(f'Scraping started at: {tic}')

## Scrape bio information for each player
for player_page in lst_player_urls:
    try:
        dict_output = tm_pull(player_page, transfer_history=True, output='pandas')
        df = dict_output['transfer_history']
        lst_df_th.append(df)
        print(f'Transfer history data appended for: {player_page}')
    except:
        pass
        print(f'Unable to append transfer history data for: {player_page}')

## Concatenate DataFrames
df_th = pd.concat(lst_df_th)

## Print time scraping ended
print(f'Scraping ended at: {toc}')

## Create attribute for the season    
df_th['season'] = full_season_string

## End timer
toc = datetime.datetime.now()

## Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to scrape the transfer history data of the {len_player_urls:,} players for the {full_season_string} season is: {total_time/60:0.2f} minutes.')

In [None]:
# Display DataFrame
df_th.head()

In [None]:
df_th.shape

##### Export DataFrame

In [None]:
# Export DataFrame as a CSV file

## Export a copy to the 'archive' subfolder of the TM folder, including the date
df_th.to_csv(data_dir_tm + f'/raw/{short_season_string}/transfer-history/archive/' + f'tm_player_th_all_{short_season_string}_last_updated_{today}.csv', index=None, header=True)

## Export another copy to the TM folder called 'latest' (can be overwritten)
df_th.to_csv(data_dir_tm + f'/raw/{short_season_string}/transfer-history/' + f'tm_player_th_all_{short_season_string}_latest.csv', index=None, header=True)

#### <a id='#section3.3.6.'>3.3.6. Performance Data</a>

In [None]:
# Run this script to scrape latest version of performance data from TransferMarkt

## Start timer
tic = datetime.datetime.now()

## Create empty list
lst_df_performance = []

## Print time scraping started
print(f'Scraping started at: {tic}')

## Scrape bio information for each player
for player_page in lst_player_urls:
    try:
        dict_output = tm_pull(player_page, performance_data=True, output='pandas')
        df = dict_output['performance_data']
        lst_df_performance.append(df)
        print(f'Performance history data appended for: {player_page}')
    except:
        pass
        print(f'Unable to append performance data for: {player_page}')

## Concatenate DataFrames
df_performance = pd.concat(lst_df_performance)

## Print time scraping ended
print(f'Scraping ended at: {toc}')

## Create attribute for the season    
df_performance['season'] = full_season_string

## End timer
toc = datetime.datetime.now()

## Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to scrape the performance data of the {len_player_urls:,} players for the {full_season_string} season is: {total_time/60:0.2f} minutes.')

In [None]:
# Display DataFrame
df_performance.head()

In [None]:
df_performance.shape

##### Export DataFrame

In [None]:
# Export DataFrame as a CSV file

## Export a copy to the 'archive' subfolder of the TM folder, including the date
df_performance.to_csv(data_dir_tm + f'/raw/{short_season_string}/performance/' + f'tm_player_th_all_{short_season_string}_last_updated_{today}.csv', index=None, header=True)

## Export another copy to the TM folder called 'latest' (can be overwritten)
df_performance.to_csv(data_dir_tm + f'/raw/{short_season_string}/performance/archive/' + f'tm_player_th_all_{short_season_string}_latest.csv', index=None, header=True)

#### <a id='#section3.3.7.'>3.3.7. Market Value History</a>

In [None]:
# Run this script to scrape latest version of market value history data from TransferMarkt

## Start timer
tic = datetime.datetime.now()

## Create empty list
lst_df_value_history = []

## Print time scraping started
print(f'Scraping started at: {tic}')

## Scrape bio information for each player
for player_page in lst_player_urls:
    try:
        dict_output = tm_pull(player_page, market_value_history=True, output='pandas')
        df = dict_output['market_value_history']
        lst_df_value_history.append(df)
        print(f'Market value history data appended for: {player_page}')
    except:
        pass
        print(f'Unable to append market value history data for: {player_page}')

## Concatenate DataFrames
df_value_history = pd.concat(lst_df_value_history)

## Print time scraping ended
print(f'Scraping ended at: {toc}')

## Create attribute for the season    
df_value_history['season'] = full_season_string

## End timer
toc = datetime.datetime.now()

## Calculate time take
total_time = (toc-tic).total_seconds()
print(f'Time taken to scrape the market value history data of the {len_player_urls:,} players for the {full_season_string} season is: {total_time/60:0.2f} minutes.')

Scraping started at: 2020-12-31 17:17:57.343111
Market value history data appended for: https://www.transfermarkt.com/emilian-metu/profil/spieler/580622
Market value history data appended for: https://www.transfermarkt.com/ahmet-muhamedbegovic/profil/spieler/271323
Market value history data appended for: https://www.transfermarkt.com/kofi-schulz/profil/spieler/192539
Market value history data appended for: https://www.transfermarkt.com/manuel-maranda/profil/spieler/287995
Market value history data appended for: https://www.transfermarkt.com/michael-steinwender/profil/spieler/375434
Market value history data appended for: https://www.transfermarkt.com/robert-ljubicic/profil/spieler/353634
Market value history data appended for: https://www.transfermarkt.com/armin-gremsl/profil/spieler/159758
Market value history data appended for: https://www.transfermarkt.com/dor-hugi/profil/spieler/293602
Market value history data appended for: https://www.transfermarkt.com/christoph-halper/profil/spi

Market value history data appended for: https://www.transfermarkt.com/johannes-eggestein/profil/spieler/251303
Market value history data appended for: https://www.transfermarkt.com/christian-ramsebner/profil/spieler/37820
Market value history data appended for: https://www.transfermarkt.com/andres-andrade/profil/spieler/403808
Market value history data appended for: https://www.transfermarkt.com/petar-filipovic/profil/spieler/85922
Market value history data appended for: https://www.transfermarkt.com/pascal-petlach/profil/spieler/354063
Market value history data appended for: https://www.transfermarkt.com/lukas-rath/profil/spieler/67442
Market value history data appended for: https://www.transfermarkt.com/felix-kekoh/profil/spieler/806200
Market value history data appended for: https://www.transfermarkt.com/maximilian-breunig/profil/spieler/584254
Market value history data appended for: https://www.transfermarkt.com/leonardo-lukacevic/profil/spieler/321870
Market value history data app

Market value history data appended for: https://www.transfermarkt.com/nosa-iyobosa-edokpolor/profil/spieler/276841
Market value history data appended for: https://www.transfermarkt.com/manfred-fischer/profil/spieler/269093
Market value history data appended for: https://www.transfermarkt.com/tino-casali/profil/spieler/169271
Market value history data appended for: https://www.transfermarkt.com/mario-stefel/profil/spieler/356558
Market value history data appended for: https://www.transfermarkt.com/lars-nussbaumer/profil/spieler/407503
Market value history data appended for: https://www.transfermarkt.com/philipp-netzer/profil/spieler/17084
Market value history data appended for: https://www.transfermarkt.com/chinedu-obasi/profil/spieler/41663
Market value history data appended for: https://www.transfermarkt.com/jan-zwischenbrugger/profil/spieler/83617
Market value history data appended for: https://www.transfermarkt.com/manuel-thurnwald/profil/spieler/307485
Market value history data app

Market value history data appended for: https://www.transfermarkt.com/mateo-barac/profil/spieler/329629
Market value history data appended for: https://www.transfermarkt.com/christopher-dibon/profil/spieler/54607
Market value history data appended for: https://www.transfermarkt.com/lion-schuster/profil/spieler/391560
Market value history data appended for: https://www.transfermarkt.com/filip-stojkovic/profil/spieler/139751
Market value history data appended for: https://www.transfermarkt.com/maximilian-hofmann/profil/spieler/131231
Market value history data appended for: https://www.transfermarkt.com/lukas-sulzbacher/profil/spieler/375355
Market value history data appended for: https://www.transfermarkt.com/yusuf-demir/profil/spieler/548031
Market value history data appended for: https://www.transfermarkt.com/dragoljub-savic/profil/spieler/574656
Market value history data appended for: https://www.transfermarkt.com/taxiarchis-fountas/profil/spieler/192409
Market value history data appe

Market value history data appended for: https://www.transfermarkt.com/patrick-pentz/profil/spieler/223972
Market value history data appended for: https://www.transfermarkt.com/christoph-schosswendter/profil/spieler/68379
Market value history data appended for: https://www.transfermarkt.com/stefan-radulovic/profil/spieler/527529
Market value history data appended for: https://www.transfermarkt.com/manprit-sarkaria/profil/spieler/288690
Market value history data appended for: https://www.transfermarkt.com/maudo-jarjue/profil/spieler/450462
Market value history data appended for: https://www.transfermarkt.com/christian-schoissengeyr/profil/spieler/157242
Market value history data appended for: https://www.transfermarkt.com/thomas-ebner/profil/spieler/122781
Market value history data appended for: https://www.transfermarkt.com/johannes-handl/profil/spieler/293084
Market value history data appended for: https://www.transfermarkt.com/niels-hahn/profil/spieler/404936
Market value history data

Market value history data appended for: https://www.transfermarkt.com/mauro-osores/profil/spieler/532351
Market value history data appended for: https://www.transfermarkt.com/nicolas-lamendola/profil/spieler/756809
Market value history data appended for: https://www.transfermarkt.com/bruno-bianchi/profil/spieler/55998
Market value history data appended for: https://www.transfermarkt.com/matias-solohaga/profil/spieler/837975
Market value history data appended for: https://www.transfermarkt.com/ramiro-carrera/profil/spieler/282283
Market value history data appended for: https://www.transfermarkt.com/guillermo-acosta/profil/spieler/422917
Market value history data appended for: https://www.transfermarkt.com/cristian-lucchetti/profil/spieler/30685
Market value history data appended for: https://www.transfermarkt.com/franco-pizzicanella/profil/spieler/424046
Market value history data appended for: https://www.transfermarkt.com/camilo-albornoz/profil/spieler/745945
Market value history data 

Market value history data appended for: https://www.transfermarkt.com/juan-cruz-guasone/profil/spieler/832977
Market value history data appended for: https://www.transfermarkt.com/federico-vietto/profil/spieler/401579
Market value history data appended for: https://www.transfermarkt.com/jorge-ortiz/profil/spieler/55212
Market value history data appended for: https://www.transfermarkt.com/matias-escudero/profil/spieler/267893
Market value history data appended for: https://www.transfermarkt.com/brian-nievas/profil/spieler/661074
Market value history data appended for: https://www.transfermarkt.com/federico-costa/profil/spieler/425136
Market value history data appended for: https://www.transfermarkt.com/franco-rivasseau/profil/spieler/724566
Market value history data appended for: https://www.transfermarkt.com/damian-lemos/profil/spieler/68557
Market value history data appended for: https://www.transfermarkt.com/dylan-gissi/profil/spieler/81056
Market value history data appended for: htt

Market value history data appended for: https://www.transfermarkt.com/franco-sivetti/profil/spieler/625204
Market value history data appended for: https://www.transfermarkt.com/enzo-kalinski/profil/spieler/67818
Market value history data appended for: https://www.transfermarkt.com/juan-fuentes/profil/spieler/319468
Market value history data appended for: https://www.transfermarkt.com/lucas-rodriguez/profil/spieler/380996
Market value history data appended for: https://www.transfermarkt.com/jonathan-schunke/profil/spieler/78232
Market value history data appended for: https://www.transfermarkt.com/marcos-rojo/profil/spieler/93176
Market value history data appended for: https://www.transfermarkt.com/nicolas-bazzana/profil/spieler/497157
Market value history data appended for: https://www.transfermarkt.com/facundo-mura/profil/spieler/642759
Market value history data appended for: https://www.transfermarkt.com/manuel-castro/profil/spieler/335984
Market value history data appended for: https

Market value history data appended for: https://www.transfermarkt.com/enzo-ybanez/profil/spieler/534034
Market value history data appended for: https://www.transfermarkt.com/juan-andrada/profil/spieler/457212
Market value history data appended for: https://www.transfermarkt.com/christian-almeida/profil/spieler/195936
Market value history data appended for: https://www.transfermarkt.com/gabriel-alanis/profil/spieler/342149
Market value history data appended for: https://www.transfermarkt.com/andres-mehring/profil/spieler/240903
Market value history data appended for: https://www.transfermarkt.com/alan-cantero/profil/spieler/836494
Market value history data appended for: https://www.transfermarkt.com/miguel-merentiel/profil/spieler/481367
Market value history data appended for: https://www.transfermarkt.com/tomas-badaloni/profil/spieler/568546
Market value history data appended for: https://www.transfermarkt.com/danilo-ortiz/profil/spieler/265590
Market value history data appended for: h

Market value history data appended for: https://www.transfermarkt.com/janeiler-rivas/profil/spieler/208729
Market value history data appended for: https://www.transfermarkt.com/agustin-bolivar/profil/spieler/534623
Market value history data appended for: https://www.transfermarkt.com/paolo-goltz/profil/spieler/55183
Market value history data appended for: https://www.transfermarkt.com/leonardo-morales/profil/spieler/457061
Market value history data appended for: https://www.transfermarkt.com/maximiliano-coronel/profil/spieler/125082
Market value history data appended for: https://www.transfermarkt.com/alexis-martin-arias/profil/spieler/360521
Market value history data appended for: https://www.transfermarkt.com/jhonatan-agudelo/profil/spieler/313426
Market value history data appended for: https://www.transfermarkt.com/matias-garcia/profil/spieler/267393
Market value history data appended for: https://www.transfermarkt.com/ignacio-miramon/profil/spieler/836518
Market value history data 

Market value history data appended for: https://www.transfermarkt.com/maximiliano-lovera/profil/spieler/441422
Market value history data appended for: https://www.transfermarkt.com/matias-caruzzo/profil/spieler/56573
Market value history data appended for: https://www.transfermarkt.com/miguel-barbieri/profil/spieler/431703
Market value history data appended for: https://www.transfermarkt.com/alfonso-parot/profil/spieler/84238
Market value history data appended for: https://www.transfermarkt.com/fernando-torrent/profil/spieler/433772
Market value history data appended for: https://www.transfermarkt.com/josue-ayala/profil/spieler/55955
Market value history data appended for: https://www.transfermarkt.com/joel-lopez-pissano/profil/spieler/565193
Market value history data appended for: https://www.transfermarkt.com/federico-martinez/profil/spieler/411433
Market value history data appended for: https://www.transfermarkt.com/nicolas-colazo/profil/spieler/116277
Market value history data appe

Market value history data appended for: https://www.transfermarkt.com/facundo-tobares/profil/spieler/633046
Market value history data appended for: https://www.transfermarkt.com/federico-milo/profil/spieler/283395
Market value history data appended for: https://www.transfermarkt.com/felipe-rodriguez/profil/spieler/202906
Market value history data appended for: https://www.transfermarkt.com/facundo-bertoglio/profil/spieler/107525
Market value history data appended for: https://www.transfermarkt.com/fernando-roman/profil/spieler/836251
Market value history data appended for: https://www.transfermarkt.com/gaston-gil-romero/profil/spieler/222052
Market value history data appended for: https://www.transfermarkt.com/nazareno-solis/profil/spieler/419660
Market value history data appended for: https://www.transfermarkt.com/yoel-juarez/profil/spieler/660286
Market value history data appended for: https://www.transfermarkt.com/sebastian-rincon/profil/spieler/223917
Market value history data appe

Market value history data appended for: https://www.transfermarkt.com/santiago-simon/profil/spieler/661131
Market value history data appended for: https://www.transfermarkt.com/paulo-diaz/profil/spieler/271478
Market value history data appended for: https://www.transfermarkt.com/ignacio-scocco/profil/spieler/51088
Market value history data appended for: https://www.transfermarkt.com/lucas-beltran/profil/spieler/628366
Market value history data appended for: https://www.transfermarkt.com/javier-pinola/profil/spieler/7358
Market value history data appended for: https://www.transfermarkt.com/enzo-fernandez/profil/spieler/648195
Market value history data appended for: https://www.transfermarkt.com/santos-borre/profil/spieler/323831
Market value history data appended for: https://www.transfermarkt.com/jorge-carrascal/profil/spieler/354145
Market value history data appended for: https://www.transfermarkt.com/nahuel-gallardo/profil/spieler/548466
Market value history data appended for: https:

Market value history data appended for: https://www.transfermarkt.com/franco-sbuttoni/profil/spieler/135872
Market value history data appended for: https://www.transfermarkt.com/cristian-vega/profil/spieler/443689
Market value history data appended for: https://www.transfermarkt.com/matias-nani/profil/spieler/486597
Market value history data appended for: https://www.transfermarkt.com/leonardo-villalba/profil/spieler/269817
Market value history data appended for: https://www.transfermarkt.com/franco-cristaldo/profil/spieler/332259
Market value history data appended for: https://www.transfermarkt.com/francisco-cerro/profil/spieler/125107
Market value history data appended for: https://www.transfermarkt.com/claudio-riano/profil/spieler/238011
Market value history data appended for: https://www.transfermarkt.com/dany-cure/profil/spieler/258740
Market value history data appended for: https://www.transfermarkt.com/jose-luis-fernandez/profil/spieler/75407
Market value history data appended f

Market value history data appended for: https://www.transfermarkt.com/pablo-aranda/profil/spieler/829382
Market value history data appended for: https://www.transfermarkt.com/nicolas-orsini/profil/spieler/289149
Market value history data appended for: https://www.transfermarkt.com/matias-vera/profil/spieler/745786
Market value history data appended for: https://www.transfermarkt.com/juan-pablo-krilanovich/profil/spieler/661155
Market value history data appended for: https://www.transfermarkt.com/jose-luis-gomez/profil/spieler/282357
Market value history data appended for: https://www.transfermarkt.com/ignacio-cechi/profil/spieler/836281
Market value history data appended for: https://www.transfermarkt.com/fernando-belluschi/profil/spieler/26460
Market value history data appended for: https://www.transfermarkt.com/marcelino-moreno/profil/spieler/456617
Market value history data appended for: https://www.transfermarkt.com/tomas-belmonte/profil/spieler/483446
Market value history data app

Market value history data appended for: https://www.transfermarkt.com/juan-sanchez-mino/profil/spieler/169249
Market value history data appended for: https://www.transfermarkt.com/domingo-blanco/profil/spieler/285127
Market value history data appended for: https://www.transfermarkt.com/nicolas-castro/profil/spieler/441160
Market value history data appended for: https://www.transfermarkt.com/diego-nunez/profil/spieler/498133
Market value history data appended for: https://www.transfermarkt.com/jhonatan-candia/profil/spieler/259917
Market value history data appended for: https://www.transfermarkt.com/emiliano-papa/profil/spieler/30820
Market value history data appended for: https://www.transfermarkt.com/lautaro-parisi/profil/spieler/610569
Market value history data appended for: https://www.transfermarkt.com/facundo-kruspzky/profil/spieler/830847
Market value history data appended for: https://www.transfermarkt.com/alan-ruiz/profil/spieler/190442
Market value history data appended for: h

Market value history data appended for: https://www.transfermarkt.com/agustin-almendra/profil/spieler/491698
Market value history data appended for: https://www.transfermarkt.com/exequiel-zeballos/profil/spieler/661132
Market value history data appended for: https://www.transfermarkt.com/walter-bou/profil/spieler/334233
Market value history data appended for: https://www.transfermarkt.com/carlos-tevez/profil/spieler/4276
Market value history data appended for: https://www.transfermarkt.com/frank-fabra/profil/spieler/156561
Market value history data appended for: https://www.transfermarkt.com/junior-alonso/profil/spieler/273000
Market value history data appended for: https://www.transfermarkt.com/lisandro-lopez/profil/spieler/125019
Market value history data appended for: https://www.transfermarkt.com/sebastian-perez/profil/spieler/182108
Market value history data appended for: https://www.transfermarkt.com/jan-hurtado/profil/spieler/459115
Market value history data appended for: https:

Market value history data appended for: https://www.transfermarkt.com/jose-luis-rodriguez/profil/spieler/430339
Market value history data appended for: https://www.transfermarkt.com/evelio-cardozo/profil/spieler/652551
Market value history data appended for: https://www.transfermarkt.com/rodrigo-schlegel/profil/spieler/504258
Market value history data appended for: https://www.transfermarkt.com/marcelo-diaz/profil/spieler/83895
Market value history data appended for: https://www.transfermarkt.com/mauricio-martinez/profil/spieler/309388
Market value history data appended for: https://www.transfermarkt.com/julian-lopez/profil/spieler/625203
Market value history data appended for: https://www.transfermarkt.com/leonardo-sigali/profil/spieler/54570
Market value history data appended for: https://www.transfermarkt.com/matias-nunez/profil/spieler/840964
Market value history data appended for: https://www.transfermarkt.com/gaston-gomez/profil/spieler/337627
Market value history data appended f

Market value history data appended for: https://www.transfermarkt.com/ezequiel-cerutti/profil/spieler/267882
Market value history data appended for: https://www.transfermarkt.com/sebastian-torrico/profil/spieler/68040
Market value history data appended for: https://www.transfermarkt.com/jonathan-herrera/profil/spieler/485880
Market value history data appended for: https://www.transfermarkt.com/agustin-hausch/profil/spieler/668514
Market value history data appended for: https://www.transfermarkt.com/emanuel-maciel/profil/spieler/666330
Market value history data appended for: https://www.transfermarkt.com/geronimo-poblete/profil/spieler/216714
Market value history data appended for: https://www.transfermarkt.com/federico-gattoni/profil/spieler/660735
Market value history data appended for: https://www.transfermarkt.com/matias-palacios/profil/spieler/621370
Market value history data appended for: https://www.transfermarkt.com/ezequiel-navarro/profil/spieler/745877
Market value history dat

Market value history data appended for: https://www.transfermarkt.com/franco-godoy/profil/spieler/579974
Market value history data appended for: https://www.transfermarkt.com/ezequiel-bonifacio/profil/spieler/334224
Market value history data appended for: https://www.transfermarkt.com/jonathan-galvan/profil/spieler/216730
Market value history data appended for: https://www.transfermarkt.com/lucas-esquivel/profil/spieler/829969
Market value history data appended for: https://www.transfermarkt.com/yeimar-gomez-andrade/profil/spieler/343359
Market value history data appended for: https://www.transfermarkt.com/gabriel-carabajal/profil/spieler/280157
Market value history data appended for: https://www.transfermarkt.com/federico-milo/profil/spieler/283395
Market value history data appended for: https://www.transfermarkt.com/pablo-palacio/profil/spieler/655659
Market value history data appended for: https://www.transfermarkt.com/claudio-corvalan/profil/spieler/125108
Market value history data

Market value history data appended for: https://www.transfermarkt.com/patrick-langlois/profil/spieler/601976
Market value history data appended for: https://www.transfermarkt.com/joe-ledley/profil/spieler/34409
Market value history data appended for: https://www.transfermarkt.com/nigel-boogaard/profil/spieler/43116
Market value history data appended for: https://www.transfermarkt.com/jason-hoffman/profil/spieler/62839
Market value history data appended for: https://www.transfermarkt.com/roy-odonovan/profil/spieler/34629
Market value history data appended for: https://www.transfermarkt.com/angus-thurgate/profil/spieler/559464
Market value history data appended for: https://www.transfermarkt.com/ben-kantarovski/profil/spieler/66394
Market value history data appended for: https://www.transfermarkt.com/jack-duncan/profil/spieler/171382
Market value history data appended for: https://www.transfermarkt.com/steven-ugarkovic/profil/spieler/262971
Market value history data appended for: https:/

Market value history data appended for: https://www.transfermarkt.com/jamie-young/profil/spieler/13435
Market value history data appended for: https://www.transfermarkt.com/jai-ingham/profil/spieler/307784
Market value history data appended for: https://www.transfermarkt.com/macklin-freke/profil/spieler/631103
Market value history data appended for: https://www.transfermarkt.com/phillip-cancar/profil/spieler/716810
Market value history data appended for: https://www.transfermarkt.com/daniel-margush/profil/spieler/414085
Market value history data appended for: https://www.transfermarkt.com/kosta-grozos/profil/spieler/500924
Market value history data appended for: https://www.transfermarkt.com/keanu-baccus/profil/spieler/479037
Market value history data appended for: https://www.transfermarkt.com/alessandro-lopane/profil/spieler/819840
Market value history data appended for: https://www.transfermarkt.com/tass-mourdoukoutas/profil/spieler/561049
Market value history data appended for: htt

Market value history data appended for: https://www.transfermarkt.com/ben-warland/profil/spieler/285225
Market value history data appended for: https://www.transfermarkt.com/alex-wilkinson/profil/spieler/43128
Market value history data appended for: https://www.transfermarkt.com/rhyan-grant/profil/spieler/108108
Market value history data appended for: https://www.transfermarkt.com/jordi-swibel/profil/spieler/714261
Market value history data appended for: https://www.transfermarkt.com/tom-heward-belle/profil/spieler/349422
Market value history data appended for: https://www.transfermarkt.com/paulo-retre/profil/spieler/257685
Market value history data appended for: https://www.transfermarkt.com/trent-buhagiar/profil/spieler/415273
Market value history data appended for: https://www.transfermarkt.com/calem-nieuwenhof/profil/spieler/734487
Market value history data appended for: https://www.transfermarkt.com/liam-mcging/profil/spieler/675114
Market value history data appended for: https://

Market value history data appended for: https://www.transfermarkt.com/domenic-costanzo/profil/spieler/820949
Market value history data appended for: https://www.transfermarkt.com/aleksandar-jovanovic/profil/spieler/50377
Market value history data appended for: https://www.transfermarkt.com/lachlan-rose/profil/spieler/840008
Market value history data appended for: https://www.transfermarkt.com/markel-susaeta/profil/spieler/54245
Market value history data appended for: https://www.transfermarkt.com/james-meredith/profil/spieler/48111
Market value history data appended for: https://www.transfermarkt.com/jake-mcging/profil/spieler/355721
Market value history data appended for: https://www.transfermarkt.com/mark-milligan/profil/spieler/37372
Market value history data appended for: https://www.transfermarkt.com/anthony-golec/profil/spieler/104096
Market value history data appended for: https://www.transfermarkt.com/denis-genreau/profil/spieler/470618
Market value history data appended for: h

Market value history data appended for: https://www.transfermarkt.com/santiago-colombatto/profil/spieler/395122
Market value history data appended for: https://www.transfermarkt.com/keito-nakamura/profil/spieler/405397
Market value history data appended for: https://www.transfermarkt.com/ibrahima-sory-sankhon/profil/spieler/426648
Market value history data appended for: https://www.transfermarkt.com/jhonny-lucas/profil/spieler/540990
Market value history data appended for: https://www.transfermarkt.com/facundo-colidio/profil/spieler/491705
Market value history data appended for: https://www.transfermarkt.com/ko-matsubara/profil/spieler/307300
Market value history data appended for: https://www.transfermarkt.com/pol-garcia/profil/spieler/178217
Market value history data appended for: https://www.transfermarkt.com/duckens-nazon/profil/spieler/345763
Market value history data appended for: https://www.transfermarkt.com/siebren-lathouwers/profil/spieler/452580
Market value history data app

Market value history data appended for: https://www.transfermarkt.com/carlos-cuesta/profil/spieler/474589
Market value history data appended for: https://www.transfermarkt.com/gerardo-arteaga/profil/spieler/469718
Market value history data appended for: https://www.transfermarkt.com/eboue-kouassi/profil/spieler/314137
Market value history data appended for: https://www.transfermarkt.com/joakim-maehle/profil/spieler/369674
Market value history data appended for: https://www.transfermarkt.com/junya-ito/profil/spieler/348791
Market value history data appended for: https://www.transfermarkt.com/paul-onuachu/profil/spieler/272855
Market value history data appended for: https://www.transfermarkt.com/luca-oyen/profil/spieler/551085
Market value history data appended for: https://www.transfermarkt.com/theo-bongonda/profil/spieler/280701
Market value history data appended for: https://www.transfermarkt.com/pierre-dwomoh/profil/spieler/652803
Market value history data appended for: https://www.t

Market value history data appended for: https://www.transfermarkt.com/onur-kaya/profil/spieler/33709
Market value history data appended for: https://www.transfermarkt.com/geoffry-hairemans/profil/spieler/110292
Market value history data appended for: https://www.transfermarkt.com/nikola-storm/profil/spieler/183120
Market value history data appended for: https://www.transfermarkt.com/william-togui/profil/spieler/362291
Market value history data appended for: https://www.transfermarkt.com/gustav-engvall/profil/spieler/248024
Market value history data appended for: https://www.transfermarkt.com/victor-wernersson/profil/spieler/271816
Market value history data appended for: https://www.transfermarkt.com/maryan-shved/profil/spieler/359247
Market value history data appended for: https://www.transfermarkt.com/sheldon-bateau/profil/spieler/113378
Market value history data appended for: https://www.transfermarkt.com/lucas-bijker/profil/spieler/187283
Market value history data appended for: http

Market value history data appended for: https://www.transfermarkt.com/emmanuel-agbadou/profil/spieler/683895
Market value history data appended for: https://www.transfermarkt.com/isaac-nuhu/profil/spieler/740615
Market value history data appended for: https://www.transfermarkt.com/smail-prevljak/profil/spieler/328264
Market value history data appended for: https://www.transfermarkt.com/jonathan-heris/profil/spieler/101531
Market value history data appended for: https://www.transfermarkt.com/mamadou-kone/profil/spieler/171165
Market value history data appended for: https://www.transfermarkt.com/ortwin-de-wolf/profil/spieler/450218
Market value history data appended for: https://www.transfermarkt.com/nils-schouterden/profil/spieler/62700
Market value history data appended for: https://www.transfermarkt.com/julien-ngoy/profil/spieler/286005
Market value history data appended for: https://www.transfermarkt.com/boris-lambert/profil/spieler/585711
Market value history data appended for: http

Market value history data appended for: https://www.transfermarkt.com/alessio-castro-montes/profil/spieler/340166
Market value history data appended for: https://www.transfermarkt.com/tim-kleindienst/profil/spieler/193033
Market value history data appended for: https://www.transfermarkt.com/nurio-fortuna/profil/spieler/290261
Market value history data appended for: https://www.transfermarkt.com/matheo-parmentier/profil/spieler/737533
Market value history data appended for: https://www.transfermarkt.com/igor-plastun/profil/spieler/97335
Market value history data appended for: https://www.transfermarkt.com/dino-arslanagic/profil/spieler/110660
Market value history data appended for: https://www.transfermarkt.com/milad-mohammadi/profil/spieler/333355
Market value history data appended for: https://www.transfermarkt.com/sinan-bolat/profil/spieler/33862
Market value history data appended for: https://www.transfermarkt.com/roman-bezus/profil/spieler/97567
Market value history data appended f

Market value history data appended for: https://www.transfermarkt.com/frederic-frans/profil/spieler/46715
Market value history data appended for: https://www.transfermarkt.com/arne-cassaert/profil/spieler/725003
Market value history data appended for: https://www.transfermarkt.com/andi-koshi/profil/spieler/502268
Market value history data appended for: https://www.transfermarkt.com/warleson/profil/spieler/477460
Market value history data appended for: https://www.transfermarkt.com/aldom-deuro/profil/spieler/648971
Market value history data appended for: https://www.transfermarkt.com/kevin-hoggas/profil/spieler/379737
Market value history data appended for: https://www.transfermarkt.com/charles-vanhoutte/profil/spieler/569845
Market value history data appended for: https://www.transfermarkt.com/merveille-goblet/profil/spieler/193470
Market value history data appended for: https://www.transfermarkt.com/johanna-omolo/profil/spieler/85834
Market value history data appended for: https://www

Market value history data appended for: https://www.transfermarkt.com/dries-wuytens/profil/spieler/83951
Market value history data appended for: https://www.transfermarkt.com/serge-leuko/profil/spieler/184917
Market value history data appended for: https://www.transfermarkt.com/paul-keita/profil/spieler/164735
Market value history data appended for: https://www.transfermarkt.com/nikola-pej%C4%8Di%C4%87/profil/spieler/840907
Market value history data appended for: https://www.transfermarkt.com/leonardo-bertone/profil/spieler/194975
Market value history data appended for: https://www.transfermarkt.com/daan-heymans/profil/spieler/446491
Market value history data appended for: https://www.transfermarkt.com/alessandro-albanese/profil/spieler/407347
Market value history data appended for: https://www.transfermarkt.com/melvin-sitti/profil/spieler/689089
Market value history data appended for: https://www.transfermarkt.com/andrija-vukcevic/profil/spieler/294987
Market value history data append

Market value history data appended for: https://www.transfermarkt.com/steve-rouiller/profil/spieler/88404
Market value history data appended for: https://www.transfermarkt.com/koro-kone/profil/spieler/61005
Market value history data appended for: https://www.transfermarkt.com/noam-baumann/profil/spieler/192605
Market value history data appended for: https://www.transfermarkt.com/ransford-selasi/profil/spieler/292333
Market value history data appended for: https://www.transfermarkt.com/olivier-custodio/profil/spieler/183312
Market value history data appended for: https://www.transfermarkt.com/fabio-daprela/profil/spieler/66109
Market value history data appended for: https://www.transfermarkt.com/mijat-maric/profil/spieler/6666
Market value history data appended for: https://www.transfermarkt.com/mattia-bottani/profil/spieler/86484
Market value history data appended for: https://www.transfermarkt.com/stefano-guidotti/profil/spieler/396405
Market value history data appended for: https://w

Market value history data appended for: https://www.transfermarkt.com/dejan-djokic/profil/spieler/524710
Market value history data appended for: https://www.transfermarkt.com/denis-simani/profil/spieler/135344
Market value history data appended for: https://www.transfermarkt.com/matteo-di-giusto/profil/spieler/421823
Market value history data appended for: https://www.transfermarkt.com/manuel-sutter/profil/spieler/59140
Market value history data appended for: https://www.transfermarkt.com/yannick-schmid/profil/spieler/291896
Market value history data appended for: https://www.transfermarkt.com/justin-ospelt/profil/spieler/405831
Market value history data appended for: https://www.transfermarkt.com/mohamed-coulibaly/profil/spieler/74607
Market value history data appended for: https://www.transfermarkt.com/nico-hug/profil/spieler/274681
Market value history data appended for: https://www.transfermarkt.com/benjamin-buchel/profil/spieler/66708
Market value history data appended for: https:

Market value history data appended for: https://www.transfermarkt.com/jean-ruiz/profil/spieler/345067
Market value history data appended for: https://www.transfermarkt.com/dennis-iapichino/profil/spieler/123711
Market value history data appended for: https://www.transfermarkt.com/nathan-senerine/profil/spieler/488171
Market value history data appended for: https://www.transfermarkt.com/siyar-doldur/profil/spieler/346888
Market value history data appended for: https://www.transfermarkt.com/geoffroy-serey-die/profil/spieler/77708
Market value history data appended for: https://www.transfermarkt.com/aimery-pinga/profil/spieler/347097
Market value history data appended for: https://www.transfermarkt.com/christian-zock/profil/spieler/337017
Market value history data appended for: https://www.transfermarkt.com/baltazar/profil/spieler/585346
Market value history data appended for: https://www.transfermarkt.com/guillaume-hoarau/profil/spieler/23934
Market value history data appended for: https

Market value history data appended for: https://www.transfermarkt.com/betim-fazliji/profil/spieler/410636
Market value history data appended for: https://www.transfermarkt.com/musah-nuhu/profil/spieler/477681
Market value history data appended for: https://www.transfermarkt.com/lawrence-ati-zigi/profil/spieler/254285
Market value history data appended for: https://www.transfermarkt.com/tim-staubli/profil/spieler/346894
Market value history data appended for: https://www.transfermarkt.com/fabio-solimando/profil/spieler/507463
Market value history data appended for: https://www.transfermarkt.com/yannis-letard/profil/spieler/590280
Market value history data appended for: https://www.transfermarkt.com/victor-ruiz/profil/spieler/336857
Market value history data appended for: https://www.transfermarkt.com/thody-elie-youan/profil/spieler/465715
Market value history data appended for: https://www.transfermarkt.com/lukas-watkowiak/profil/spieler/196791
Market value history data appended for: ht

Market value history data appended for: https://www.transfermarkt.com/gabriel-castellon/profil/spieler/189785
Market value history data appended for: https://www.transfermarkt.com/vicente-fernandez/profil/spieler/534263
Market value history data appended for: https://www.transfermarkt.com/felipe-chamorro/profil/spieler/837082
Market value history data appended for: https://www.transfermarkt.com/leandro-benegas/profil/spieler/125899
Market value history data appended for: https://www.transfermarkt.com/fabian-ahumada/profil/spieler/371823
Market value history data appended for: https://www.transfermarkt.com/renato-tarifeno/profil/spieler/460877
Market value history data appended for: https://www.transfermarkt.com/lucas-acevedo/profil/spieler/284306
Market value history data appended for: https://www.transfermarkt.com/nicolas-solabarrieta/profil/spieler/671953
Market value history data appended for: https://www.transfermarkt.com/agustin-farias/profil/spieler/267894
Market value history da

In [None]:
# Display DataFrame
df_value_history.head()

In [None]:
df_value_history.shape

##### Export DataFrame

In [None]:
# Export DataFrame as a CSV file

## Export a copy to the 'archive' subfolder of the TM folder, including the date
df_value_history.to_csv(data_dir_tm + f'/raw/{short_season_string}/market-value-history/archive/' + f'tm_player_th_all_{short_season_string}_last_updated_{today}.csv', index=None, header=True)

## Export another copy to the TM folder called 'latest' (can be overwritten)
df_value_history.to_csv(data_dir_tm + f'/raw/{short_season_string}/market-value-history/' + f'tm_player_th_all_{short_season_string}_latest.csv', index=None, header=True)

## <a id='#section4'>4. Summary</a>
This notebook scrapes data from [TransferMarkt](https://www.transfermarkt.co.uk/) using [Beautifulsoup](https://pypi.org/project/beautifulsoup4/) and the [Tyrone Mings web scraper](https://github.com/FCrSTATS/tyrone_mings) by [FCrSTATS](https://twitter.com/FC_rstats). This landed data is then manipulated as DataFrames using [pandas](http://pandas.pydata.org/).

This data includes Bio and Status data, as well as Transfer History, Performance History and Market Value History data.

## <a id='#section5'>5. Next Steps</a>
The next step is to take this data and engineer it so that it's ready for analysis and to be matched against other data source.

## <a id='#section6'>6. References</a>

#### Data and Web Scraping
*    [tyrone_mings GitHub repository](https://github.com/FCrSTATS/tyrone_mings) by [FCrSTATS](https://github.com/FCrSTATS)
*    [Python Package Index (PyPI) tyrone-mings library](https://pypi.org/project/tyrone-mings/)
*    [Beyond crowd judgments: Data-driven estimation of market value in association football](https://www.sciencedirect.com/science/article/pii/S0377221717304332) by Oliver Müllera, Alexander Simons, and Markus Weinmann.
*    [06/04/2020: BBC - Premier League squads 'drop £1.6bn in value'](https://www.bbc.co.uk/sport/football/52221463).

---

***Visit my website [EddWebster.com](https://www.eddwebster.com) or my [GitHub Repository](https://github.com/eddwebster) for more projects. If you'd like to get in contact, my Twitter handle is [@eddwebster](http://www.twitter.com/eddwebster) and my email is: edd.j.webster@gmail.com.***

[Back to the top](#top)