# **Getting data about the 5 players**

Given the 5 players found, let's try to find additional data on them, which may be useful for a more complete evaluation on the players, by scraping from the *Transfermarkt* site.

## **Installing dependencies**

In [None]:
import requests
from bs4 import BeautifulSoup
import pandas as pd

## **Scraping**

We save the urls of the webpage of *Transfermarkt* related to the players in question.

In [None]:
urls = ['https://www.transfermarkt.it/jarrad-branthwaite/profil/spieler/661053',
        'https://www.transfermarkt.it/gabriel-magalhaes/profil/spieler/435338',
        'https://www.transfermarkt.it/federico-gatti/profil/spieler/509022',
        'https://www.transfermarkt.it/matthias-ginter/profil/spieler/124502',
        'https://www.transfermarkt.it/pawel-bochniewicz/profil/spieler/248395']

In [None]:
headers = {
    'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36'
}

Let's explore the structure of the responses.

In [None]:
response1 = requests.get(urls[0], headers = headers)

In [None]:
soup1 = BeautifulSoup(response1.content, 'html.parser')

In [None]:
soup1.select_one('div[class="info-table info-table--right-space"]').text.strip(' ').split('\n')

['',
 'Nome completo:',
 'Jarrad Paul Branthwaite',
 'Nato il:',
 '',
 '27/giu/2002 (21)',
 '',
 'Luogo di nascita:',
 '',
 'Carlisle\xa0\xa0 ',
 'Altezza:',
 '1,95\xa0m',
 'Nazionalità:',
 '',
 '\xa0\xa0Inghilterra                    ',
 'Posizione:',
 '',
 '                    Difesa - Difensore centrale                ',
 'Piede:',
 'sinistro',
 '',
 '                    Squadra attuale:',
 '                ',
 '',
 '',
 '',
 '',
 'FC Everton ',
 'In rosa da:',
 '',
 '                            13/gen/2020                        ',
 'Scadenza:',
 '30/giu/2027',
 'Ultimo prolungamento:',
 '06/ott/2023',
 'Social Media:',
 '',
 '',
 '',
 ' ',
 '',
 ' ',
 '',
 '',
 '']

In [None]:
response2 = requests.get('https://www.transfermarkt.it/jarrad-branthwaite/verletzungen/spieler/661053', headers = headers)

In [None]:
soup2 = BeautifulSoup(response2.content, 'html.parser')

In [None]:
soup2.select('table[class="items"]')[1].text.strip(' ').split('\n')

['',
 '',
 '',
 'StagionegiorniInfortuniPartite perse',
 '',
 '',
 '',
 '22/2345 giorni49',
 '',
 '21/2232 giorni10',
 '',
 '20/2146 giorni112',
 '',
 '']

Let's create the dataframe.

In [None]:
players = []
for url in urls:
  response = requests.get(url, headers = headers)
  soup = BeautifulSoup(response.content, 'html.parser')
  info = soup.select_one('div[class="info-table info-table--right-space"]').text.strip(' ').split('\n')
  player_data = {
      'Name' : info[2],
      'Born (Age)' : info[5],
      'Birthplace' : info[9],
      'Height' : info[11],
      'Nationality' : info[14],
      'Position' : info[17],
      'Foot' : info[19],
      'Prosecutor' : info[22],
      'Actual team' : info[31],
      'Expiration of contract' : info[36],
      'Last extension of contract' : info[38]
  }
  players.append(player_data)
df = pd.DataFrame(players)

In [None]:
df

Unnamed: 0,Name,Born (Age),Birthplace,Height,Nationality,Position,Foot,Prosecutor,Actual team,Expiration of contract,Last extension of contract
0,Jarrad Paul Branthwaite,27/giu/2002 (21),Carlisle,"1,95 m",Inghilterra,Difesa - Difensore central...,sinistro,,Scadenza:,,
1,Gabriel dos Santos Magalhães,19/dic/1997 (26),São Paulo,"1,90 m",Brasile,Difesa - Difensore central...,sinistro,Bertolucci Sports,FC Arsenal,30/giu/2027,21/ott/2022
2,,Luogo di nascita:,"1,90 m",,,destro,,Squadra attuale:,31/gen/2022 ...,Fornitore:,Social Media:
3,Matthias Lukas Ginter,19/gen/1994 (30),Freiburg im Breisgau,"1,89 m",Germania,Difesa - Difensore central...,destro,Unique Sports Group,SC Friburgo,30/giu/2027,
4,Paweł Piotr Bochniewicz,30/gen/1996 (28),Dębica,"1,94 m",Polonia,Difesa - Difensore central...,sinistro,Wasserman,SC Heerenveen,30/giu/2026,20/feb/2023


## **Cleaning and completing the dataframe manually**

In [None]:
response = requests.get(urls[2], headers = headers)
soup = BeautifulSoup(response.content, 'html.parser')
info = soup.select_one('div[class="info-table info-table--right-space"]').text.strip(' ').split('\n')

In [None]:
info

['',
 'Nato il:',
 '',
 '24/giu/1998 (25)',
 '',
 'Luogo di nascita:',
 '',
 'Rivoli\xa0\xa0 ',
 'Altezza:',
 '1,90\xa0m',
 'Nazionalità:',
 '',
 '\xa0\xa0Italia                    ',
 'Posizione:',
 '',
 '                    Difesa - Difensore centrale                ',
 'Piede:',
 'destro',
 'Procuratore:',
 '',
 "One Players' Agent ",
 '',
 '                    Squadra attuale:',
 '                ',
 '',
 '',
 '',
 '',
 'Juventus FC ',
 'In rosa da:',
 '',
 '                            31/gen/2022                        ',
 'Scadenza:',
 '30/giu/2028',
 'Ultimo prolungamento:',
 '25/ott/2023',
 'Fornitore:',
 'adidas',
 'Social Media:',
 '',
 '',
 '',
 ' ',
 '',
 '',
 '']

In [None]:
player_data = {
      'Name' : 'Federico Gatti',
      'Born (Age)' : info[3],
      'Birthplace' : info[7],
      'Height' : info[9],
      'Nationality' : info[12],
      'Position' : info[15],
      'Foot' : info[17],
      'Prosecutor' : info[20],
      'Actual team' : info[28],
      'Expiration of contract' : info[33],
      'Last extension of contract' : info[35]
  }

In [None]:
df = df.append(player_data, ignore_index = True)

  df = df.append(player_data, ignore_index = True)


In [None]:
df

Unnamed: 0,Name,Born (Age),Birthplace,Height,Nationality,Position,Foot,Prosecutor,Actual team,Expiration of contract,Last extension of contract
0,Jarrad Paul Branthwaite,27/giu/2002 (21),Carlisle,"1,95 m",Inghilterra,Difesa - Difensore central...,sinistro,,Scadenza:,,
1,Gabriel dos Santos Magalhães,19/dic/1997 (26),São Paulo,"1,90 m",Brasile,Difesa - Difensore central...,sinistro,Bertolucci Sports,FC Arsenal,30/giu/2027,21/ott/2022
2,Matthias Lukas Ginter,19/gen/1994 (30),Freiburg im Breisgau,"1,89 m",Germania,Difesa - Difensore central...,destro,Unique Sports Group,SC Friburgo,30/giu/2027,
3,Paweł Piotr Bochniewicz,30/gen/1996 (28),Dębica,"1,94 m",Polonia,Difesa - Difensore central...,sinistro,Wasserman,SC Heerenveen,30/giu/2026,20/feb/2023
4,Federico Gatti,24/giu/1998 (25),Rivoli,"1,90 m",Italia,Difesa - Difensore central...,destro,One Players' Agent,Juventus FC,30/giu/2028,25/ott/2023


In [None]:
df.iloc[0]['Expiration of contract'] = '30/giu/2027'

In [None]:
df.iloc[0]['Actual team'] = 'FC Everton'

In [None]:
df.iloc[0]['Last extension of contract'] = '06/ott/2023'

In [None]:
df.iloc[2]['Last extension of contract'] =

Unnamed: 0,Name,Born (Age),Birthplace,Height,Nationality,Position,Foot,Prosecutor,Actual team,Expiration of contract,Last extension of contract
0,Jarrad Paul Branthwaite,27/giu/2002 (21),Carlisle,"1,95 m",Inghilterra,Difesa - Difensore central...,sinistro,,FC Everton,30/giu/2027,06/ott/2023
1,Gabriel dos Santos Magalhães,19/dic/1997 (26),São Paulo,"1,90 m",Brasile,Difesa - Difensore central...,sinistro,Bertolucci Sports,FC Arsenal,30/giu/2027,21/ott/2022
2,Matthias Lukas Ginter,19/gen/1994 (30),Freiburg im Breisgau,"1,89 m",Germania,Difesa - Difensore central...,destro,Unique Sports Group,SC Friburgo,30/giu/2027,
3,Paweł Piotr Bochniewicz,30/gen/1996 (28),Dębica,"1,94 m",Polonia,Difesa - Difensore central...,sinistro,Wasserman,SC Heerenveen,30/giu/2026,20/feb/2023
4,Federico Gatti,24/giu/1998 (25),Rivoli,"1,90 m",Italia,Difesa - Difensore central...,destro,One Players' Agent,Juventus FC,30/giu/2028,25/ott/2023


In [None]:
market_value_mln = [25,60,15,3.5,18]

In [None]:
df['Valore (mln)'] = market_value_mln

In [None]:
df

Unnamed: 0,Name,Born (Age),Birthplace,Height,Nationality,Position,Foot,Prosecutor,Actual team,Expiration of contract,Last extension of contract,Valore (mln)
0,Jarrad Paul Branthwaite,27/giu/2002 (21),Carlisle,"1,95 m",Inghilterra,Difesa - Difensore central...,sinistro,,FC Everton,30/giu/2027,06/ott/2023,25.0
1,Gabriel dos Santos Magalhães,19/dic/1997 (26),São Paulo,"1,90 m",Brasile,Difesa - Difensore central...,sinistro,Bertolucci Sports,FC Arsenal,30/giu/2027,21/ott/2022,60.0
2,Matthias Lukas Ginter,19/gen/1994 (30),Freiburg im Breisgau,"1,89 m",Germania,Difesa - Difensore central...,destro,Unique Sports Group,SC Friburgo,30/giu/2027,,15.0
3,Paweł Piotr Bochniewicz,30/gen/1996 (28),Dębica,"1,94 m",Polonia,Difesa - Difensore central...,sinistro,Wasserman,SC Heerenveen,30/giu/2026,20/feb/2023,3.5
4,Federico Gatti,24/giu/1998 (25),Rivoli,"1,90 m",Italia,Difesa - Difensore central...,destro,One Players' Agent,Juventus FC,30/giu/2028,25/ott/2023,18.0


In [None]:
days_out_injury = [45+32+46,50+48,17+28,22+334+30,12]
df['Days out due to injury in last 4 season'] = days_out_injury

In [None]:
df

Unnamed: 0,Name,Born (Age),Birthplace,Height,Nationality,Position,Foot,Prosecutor,Actual team,Expiration of contract,Last extension of contract,Valore (mln),Days out due to injury in last 4 season
0,Jarrad Paul Branthwaite,27/giu/2002 (21),Carlisle,"1,95 m",Inghilterra,Difesa - Difensore central...,sinistro,,FC Everton,30/giu/2027,06/ott/2023,25.0,123
1,Gabriel dos Santos Magalhães,19/dic/1997 (26),São Paulo,"1,90 m",Brasile,Difesa - Difensore central...,sinistro,Bertolucci Sports,FC Arsenal,30/giu/2027,21/ott/2022,60.0,98
2,Matthias Lukas Ginter,19/gen/1994 (30),Freiburg im Breisgau,"1,89 m",Germania,Difesa - Difensore central...,destro,Unique Sports Group,SC Friburgo,30/giu/2027,,15.0,45
3,Paweł Piotr Bochniewicz,30/gen/1996 (28),Dębica,"1,94 m",Polonia,Difesa - Difensore central...,sinistro,Wasserman,SC Heerenveen,30/giu/2026,20/feb/2023,3.5,386
4,Federico Gatti,24/giu/1998 (25),Rivoli,"1,90 m",Italia,Difesa - Difensore central...,destro,One Players' Agent,Juventus FC,30/giu/2028,25/ott/2023,18.0,12


## **Saving the dataframe**

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
path = '/content/drive/MyDrive/DC_Inter/'
with open(path+'5difenders_data.csv', 'wb') as f:
   df.to_csv(f)