# Web-Scraping Soccer Data
### Getting tranfers

In this section, we will use [Soccernews](https://www.soccernews.com/soccer-transfers/) to gather the list of all the trades between our 4 leagues and create a dataframe for use in our analysis

In [1]:
# Import packages
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import re

In [2]:
# create our function to fetch our data
def fetch_data(url, lg, yr, df, cols):
    
    # get all our table rows from the passed url
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')
    rows = soup.find('tbody').find_all('tr')    
    
    # for each row, get the desired HTML tags and their text (which is miscelaneous information 
    # about each trade)
    
    for row in rows[1:]:
        date = row.find(class_ = 'date').text
        name = row.find(class_ = 'player-deals').find('h4').text
        pos = row.find(class_ = 'player-deals').find('p').text
        country = row.find('img')['title']
        price = row.find(class_ = 'price-status').text
        fromtrade = row.find_all('td', attrs = {'class': None})[0].text
        totrade = row.find_all('td', attrs = {'class': None})[1].text
        temp = pd.DataFrame(data = [[date, name, pos, country, price, fromtrade, totrade, lg,\
                                     yr]], columns = cols)
        df = df.append(temp, ignore_index = True)
    
    return df

In [3]:
# define our empty data frame
cols = ['Date', 'Player', 'Position', 'Nationality', 'Price', 'From', 'To', 'League', 'Year']
df = pd.DataFrame(columns = cols)

In [4]:
# create our league and year specific data frames, and all their associated URLs
base_url = 'https://www.soccernews.com/soccer-transfers/'
league_url = ['english-premier-league-transfers', 
              'italian-serie-a-transfers',
              'spanish-la-liga-transfers',
              'german-bundesliga-transfers']
league_url = [base_url + i for i in league_url]
yr_append = ['-2017-2018', '-2016-2017', '-2015-2016', '-2014-2015','-2013-2014', 
             '-2012-2013', '-2011-2012', '-2010-2011', '-2009-2010','-2008-2009', 
             '-2007-2008']
league = ['EPL', 'Serie A', 'La Liga', 'Bundesliga']
year = [str(i) for i in range(2018, 2008 - 1, -1)]

In [5]:
# create our year and league arrays to pass to our data-fetcher
urls = [i + j for i in league_url for j in yr_append]
year = year * len(league)
league = np.repeat(league, len(yr_append))

In [6]:
# fetch all our data and print out each league + year when done
for i in range(len(urls)):
    try:
        df = fetch_data(url = urls[i],
                        lg = league[i], 
                        yr = year[i], 
                        df = df, 
                        cols = cols)
        print('completed:', league[i], year[i])
    except:
        pass
print()
print('complete')

completed: EPL 2018
completed: EPL 2017
completed: EPL 2016
completed: EPL 2015
completed: EPL 2014
completed: EPL 2013
completed: EPL 2012
completed: EPL 2011
completed: EPL 2010
completed: EPL 2009
completed: EPL 2008
completed: Serie A 2018
completed: Serie A 2017
completed: Serie A 2016
completed: Serie A 2015
completed: Serie A 2014
completed: Serie A 2013
completed: Serie A 2012
completed: Serie A 2011
completed: Serie A 2010
completed: Serie A 2009
completed: Serie A 2008
completed: La Liga 2018
completed: La Liga 2017
completed: La Liga 2016
completed: La Liga 2015
completed: La Liga 2014
completed: La Liga 2013
completed: La Liga 2012
completed: La Liga 2011
completed: La Liga 2010
completed: La Liga 2009
completed: La Liga 2008
completed: Bundesliga 2018
completed: Bundesliga 2017
completed: Bundesliga 2016
completed: Bundesliga 2015
completed: Bundesliga 2014
completed: Bundesliga 2013

complete


__Perfect!__

Looks like we were able to get all the trades and years we wanted. Now let's combine this data with the stadium and results data pulls in the Transfer Analysis workbook to look at our data in a bit more detail