# Web-Scraping Soccer Data
### Getting data from English, Italian, German, and Spanish Soccer leagues

In this section, we will use [Sky Sports](https://www.skysports.com/) to gather our standings and results from the years where we have trade data, as they have results in an easy to parse format

In [1]:
# Import packages
from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
import re

In [2]:
# define our function to get our data
def fetch_data(url, lg, yr, df, cols):
    
    # get the html from our passed url
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')
    
    # initialize our variables we want to capture
    pos = [] 
    team = [] 
    pl = []
    w = []
    d = []
    l = []
    f = []
    a = []
    gd = []
    pts = []
    
    # get the table object
    table = soup.find(class_ = 'standing-table__table')
    
    # find the table body 
    b = table.find('tbody').find_all('tr')
    
    # loop through each table row and add the next value to the corresponding list
    for tr in b:
        td = tr.find_all('td')
        pos.append(td[0].text.strip())
        team.append(td[1].text.strip())
        pl.append(td[2].text.strip())
        w.append(td[3].text.strip())
        d.append(td[4].text.strip())
        l.append(td[5].text.strip())
        f.append(td[6].text.strip())
        a.append(td[7].text.strip())
        gd.append(td[8].text.strip())
        pts.append(td[9].text.strip())

    # combine our fields into a dataframe    
    out = pd.DataFrame(list(zip(pos, team, pl, w, d, l, f, a, gd, pts)), 
                      columns = cols)
    # add the year and league from our search and return the data frame
    out['Year'] = yr
    out['League'] = lg
    df = df.append(out, ignore_index = True, sort = False)
    
    return df

In [3]:
# find our base url
base_url = 'https://www.skysports.com/'

# create our list of additional pieces of our URL
labels = ['premier-league', 'serie-a', 'la-liga', 'bundesliga']
labels = [i + '-table/' for i in labels]

# create our full list of urls
urls = [base_url + i for i in labels]
year = [str(i) for i in range(2018, 2008 - 1, -1)]
urls = [i + j for i in urls for j in year]

# create our list of years and leagues
league = np.repeat(['epl', 'serie a', 'la liga', 'bundesliga'], len(urls)/4)
years = year * 4

In [10]:
# initialize an empty data frame
cols = ['Position', 'Team', 'Played', 'W', 'D', 'L', 'GF', 'GA', 'GD', 'PTS']
df = pd.DataFrame(columns = cols)

# try to loop through our urls, years, and leagues to gather our data
for i in range(len(urls)):
    try:
        df = fetch_data(url = urls[i],
                        lg = league[i], 
                        yr = years[i], 
                        df = df, 
                        cols = cols)
        print('completed:', league[i], years[i])
    except:
        pass
print()
print('complete')

completed: epl 2018
completed: epl 2017
completed: epl 2016
completed: epl 2015
completed: epl 2014
completed: epl 2013
completed: epl 2012
completed: epl 2011
completed: epl 2010
completed: epl 2009
completed: epl 2008
completed: serie a 2018
completed: serie a 2017
completed: serie a 2016
completed: serie a 2015
completed: serie a 2014
completed: serie a 2013
completed: serie a 2012
completed: serie a 2011
completed: serie a 2010
completed: serie a 2009
completed: serie a 2008
completed: la liga 2018
completed: la liga 2017
completed: la liga 2016
completed: la liga 2015
completed: la liga 2014
completed: la liga 2013
completed: la liga 2012
completed: la liga 2011
completed: la liga 2010
completed: la liga 2009
completed: la liga 2008
completed: bundesliga 2018
completed: bundesliga 2017
completed: bundesliga 2016
completed: bundesliga 2015
completed: bundesliga 2014
completed: bundesliga 2013
completed: bundesliga 2012
completed: bundesliga 2011
completed: bundesliga 2010
completed

In [12]:
# write our results to a csv
df.to_csv('league_results.csv', index = True)

Now that we have our results, we can combine that with the other data in the Transfer Analysis notebook to analyze our trades