# Financial data about players and club (Everton FC)

We are going to webscrape Transfermarkt for the player market values, how much the team spent on new signing and how much they recieved for players sold. 

Market value website: https://www.transfermarkt.co.uk/transfers/transferrekorde/statistik?saison_id=2017&land_id=0&ausrichtung=&spielerposition_id=&altersklasse=&leihe=&w_s=&plus=1

Transfer info website: https://www.transfermarkt.com/premier-league/transfers/wettbewerb/GB1/saison_id/2017 

In [151]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

In [152]:
# Get out Player Name, Position, Left Club, Joined Club, Fee
player_names = []
player_positions = []
player_left_clubs = []
player_joined_clubs = []
player_fees = []
player_values = []

headers = {'User-Agent': 
           'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36'}

base_url = "https://www.transfermarkt.co.uk/transfers/transferrekorde/statistik?saison_id=2017&land_id=0&ausrichtung=&spielerposition_id=&altersklasse=&leihe=&w_s=&plus=1&page="

# Loop through multiple pages
for page_number in range(1, 11):
    page = base_url + str(page_number)
    pageTree = requests.get(page, headers=headers)
    pageSoup = BeautifulSoup(pageTree.content, 'html.parser')
    player_rows = pageSoup.find_all("tr", {"class": ["odd", "even"]})

    for player_row in player_rows:
        player_name = player_row.select("a")[0].text
        player_position = player_row.select("td")[4].text
        player_left_club = player_row.select("td")[11].text
        player_joined_club = player_row.select("td")[15].text
        player_fee = player_row.select("td")[17].text
        player_value = player_row.select("td")[6].text

        player_names.append(player_name)
        player_positions.append(player_position)
        player_left_clubs.append(player_left_club)
        player_joined_clubs.append(player_joined_club)
        player_fees.append(player_fee)
        player_values.append(player_value)

df = pd.DataFrame({
    "Player Name": player_names,
    "Player Position": player_positions,
    "Left Club": player_left_clubs,
    "Joined Club": player_joined_clubs,
    "Market Value": player_values,
    "Fee": player_fees
})

df

Unnamed: 0,Player Name,Player Position,Left Club,Joined Club,Market Value,Fee
0,Neymar,Left Winger,\nBarcelona,\nParis SG,€100.00m,€222.00m
1,Philippe Coutinho,Attacking Midfield,\nLiverpool,\nBarcelona,€90.00m,€135.00m
2,Ousmane Dembélé,Right Winger,\nBor. Dortmund,\nBarcelona,€33.00m,€135.00m
3,Romelu Lukaku,Centre-Forward,\nEverton,\nMan Utd,€50.00m,€84.70m
4,Virgil van Dijk,Centre-Back,\nSouthampton,\nLiverpool,€30.00m,€84.65m
...,...,...,...,...,...,...
245,Avilés Hurtado,Right Winger,\nClub Tijuana,\nMonterrey,€2.50m,€7.13m
246,Éver Banega,Central Midfield,\nInter,\nSevilla FC,€16.00m,€7.00m
247,Nolito,Left Winger,\nMan City,\nSevilla FC,€12.00m,€7.00m
248,Ryad Boudebouz,Attacking Midfield,\nMontpellier,\nReal Betis,€10.00m,€7.00m


In [153]:
# lets clean up the data
df["Left Club"] = df["Left Club"].str.replace("\n", "")
df["Joined Club"] = df["Joined Club"].str.replace("\n", "")
df['Fee'] = df['Fee'].str.replace('€', '').str.replace('m', '')
df['Market Value'] = df['Market Value'].str.replace('€', '').str.replace('m', '')

df['Market Value'] = pd.to_numeric(df['Market Value'], errors='coerce')
df['Fee'] = pd.to_numeric(df['Fee'], errors='coerce')

df.info()
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 250 entries, 0 to 249
Data columns (total 6 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Player Name      250 non-null    object 
 1   Player Position  250 non-null    object 
 2   Left Club        250 non-null    object 
 3   Joined Club      250 non-null    object 
 4   Market Value     238 non-null    float64
 5   Fee              246 non-null    float64
dtypes: float64(2), object(4)
memory usage: 11.8+ KB


Unnamed: 0,Player Name,Player Position,Left Club,Joined Club,Market Value,Fee
0,Neymar,Left Winger,Barcelona,Paris SG,100.0,222.00
1,Philippe Coutinho,Attacking Midfield,Liverpool,Barcelona,90.0,135.00
2,Ousmane Dembélé,Right Winger,Bor. Dortmund,Barcelona,33.0,135.00
3,Romelu Lukaku,Centre-Forward,Everton,Man Utd,50.0,84.70
4,Virgil van Dijk,Centre-Back,Southampton,Liverpool,30.0,84.65
...,...,...,...,...,...,...
245,Avilés Hurtado,Right Winger,Club Tijuana,Monterrey,2.5,7.13
246,Éver Banega,Central Midfield,Inter,Sevilla FC,16.0,7.00
247,Nolito,Left Winger,Man City,Sevilla FC,12.0,7.00
248,Ryad Boudebouz,Attacking Midfield,Montpellier,Real Betis,10.0,7.00
