# Surprising Leicester City 2015/2016 season

![](https://upload.wikimedia.org/wikipedia/pt/0/07/Logo_Premier_League_2016_2017_1.jpg)

The Premier League 15/16 season was a wonder with a unlikely champion and giants falling for behind. Claudio Ranieri's squad and the likes of Kanté, Vardy and Mahrez enchanted the world beating the odds and winning the trophy.

![](https://d3nfwcxd527z59.cloudfront.net/content/uploads/2017/07/19101731/Leicester-City-Premier-League-title-2015-16.jpg)

**As a football fan and an aspiring data scientist, I will analyse it following the according plan:**

* Web Scraping the season table 
* How did they let us win? (EDA and Data Visualization)

## **Web Scraping(season table and players stats)**

The first step is to get all the necessary data from web via web scraping. For such activity, we have to make a HTTP request for the url and get its status code. Any code starting with 2..(200,201...) means success.

In [1]:
#Importing libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import requests # http requests
from bs4 import BeautifulSoup #Manipulating HTML
import re #regular expressions
import csv #Manipulating csv files

In [2]:
#Defining website to request
base_url = 'https://www.skysports.com/premier-league-table/2015'
page = requests.get(base_url) #Making the request
page.status_code

200

Now we have our content in our variable called page.

In [None]:
page.text

Did you get it? I didn't. 
<br/>To make our page content better looking, we're going to use the library called BeautifulSoup.
We're going to parse the html and make it prettier.

In [3]:
soup = BeautifulSoup(page.text, 'html.parser')
print(soup.prettify())

<!DOCTYPE doctype html>
<html class="no-js" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Premier League Table &amp; Standings - Sky Sports Football
  </title>
  <meta content="NOODP,INDEX,FOLLOW" name="robots">
   <meta content="Complete table of Premier League standings for the 2015/2016 Season, plus access to tables from past seasons and other Football leagues." name="description"/>
   <meta content="Sky, Sports, Football, News, Premier, League, Fixtures, Results, Tables, Photos, Transfer, Centre, Arsenal, Aston Villa, Birmingham City, Blackburn Rovers, Bolton Wanderers, Burnley, Chelsea, Everton, Fulham, Hull City, Liverpool, Manchester City, Man City, Manchester United, Man Utd, Portsmouth, Stoke City, Sunderland, Tottenham Hotspur, Spurs, West Ham, Wigan Athletic, Wolverhampton Wanderers, Wolves, watch, video, live, pc. torress, benitez" name="keywords"/>
   <meta content="IE=edge,chrome=1" http-equiv="X-UA-Compatible"/>
   <link href="//e0.365dm.com" rel="dns-prefetc

We have the full html of the page now but we're only interested to scrape the season table from it.
<br/>Using the .find() method, we're looking for a table body tag.
We're turning it into string because we're going to manipulate it later.

In [4]:
tables = str(soup.find("tbody"))
tables

'<tbody>\n<tr class="standing-table__row" data-item-id="152">\n<td class="standing-table__cell">1</td>\n<td class="standing-table__cell standing-table__cell--name" data-long-name="Leicester City" data-short-name="Leicester City">\n<a class="standing-table__cell--name-link" href="/leicester-city">Leicester City</a>\n</td>\n<td class="standing-table__cell">38</td>\n<td class="standing-table__cell is-hidden--bp35">23</td>\n<td class="standing-table__cell is-hidden--bp35">12</td>\n<td class="standing-table__cell is-hidden--bp35">3</td>\n<td class="standing-table__cell is-hidden--bp35">68</td>\n<td class="standing-table__cell is-hidden--bp35">36</td>\n<td class="standing-table__cell">32</td>\n<td class="standing-table__cell" data-sort-value="1">81</td>\n<td class="standing-table__cell is-hidden--bp15 is-hidden--bp35 is-hidden" data-sort-value="0">\n</td>\n</tr>\n<tr class="standing-table__row" data-item-id="413">\n<td class="standing-table__cell">2</td>\n<td class="standing-table__cell stan

It's looking confusing again but we can already notice the data we want such as the club names and their respective positions.
<br/>Using the regular expression (re) library, we're about the stripe all the html tags.

In [5]:
cleanr = re.compile('<.*?>')
cleantext = re.sub(cleanr, '', tables)
cleantext

'\n\n1\n\nLeicester City\n\n38\n23\n12\n3\n68\n36\n32\n81\n\n\n\n\n2\n\nArsenal\n\n38\n20\n11\n7\n65\n36\n29\n71\n\n\n\n\n3\n\nTottenham Hotspur\n\n38\n19\n13\n6\n69\n35\n34\n70\n\n\n\n\n4\n\nManchester City\n\n38\n19\n9\n10\n71\n41\n30\n66\n\n\n\n\n5\n\nManchester United\n\n38\n19\n9\n10\n49\n35\n14\n66\n\n\n\n\n6\n\nSouthampton\n\n38\n18\n9\n11\n59\n41\n18\n63\n\n\n\n\n7\n\nWest Ham United\n\n38\n16\n14\n8\n65\n51\n14\n62\n\n\n\n\n8\n\nLiverpool\n\n38\n16\n12\n10\n63\n50\n13\n60\n\n\n\n\n9\n\nStoke City\n\n38\n14\n9\n15\n41\n55\n-14\n51\n\n\n\n\n10\n\nChelsea\n\n38\n12\n14\n12\n59\n53\n6\n50\n\n\n\n\n11\n\nEverton\n\n38\n11\n14\n13\n59\n55\n4\n47\n\n\n\n\n12\n\nSwansea City\n\n38\n12\n11\n15\n42\n52\n-10\n47\n\n\n\n\n13\n\nWatford\n\n38\n12\n9\n17\n40\n50\n-10\n45\n\n\n\n\n14\n\nWest Bromwich Albion\n\n38\n10\n13\n15\n34\n48\n-14\n43\n\n\n\n\n15\n\nCrystal Palace\n\n38\n11\n9\n18\n39\n51\n-12\n42\n\n\n\n\n16\n\nBournemouth\n\n38\n11\n9\n18\n45\n67\n-22\n42\n\n\n\n\n17\n\nSunderland\n

Better but still not what we want, more additional data cleaning

In [6]:
cleantext2 = list(cleantext.split("\n"))
cleantext2

['',
 '',
 '1',
 '',
 'Leicester City',
 '',
 '38',
 '23',
 '12',
 '3',
 '68',
 '36',
 '32',
 '81',
 '',
 '',
 '',
 '',
 '2',
 '',
 'Arsenal',
 '',
 '38',
 '20',
 '11',
 '7',
 '65',
 '36',
 '29',
 '71',
 '',
 '',
 '',
 '',
 '3',
 '',
 'Tottenham Hotspur',
 '',
 '38',
 '19',
 '13',
 '6',
 '69',
 '35',
 '34',
 '70',
 '',
 '',
 '',
 '',
 '4',
 '',
 'Manchester City',
 '',
 '38',
 '19',
 '9',
 '10',
 '71',
 '41',
 '30',
 '66',
 '',
 '',
 '',
 '',
 '5',
 '',
 'Manchester United',
 '',
 '38',
 '19',
 '9',
 '10',
 '49',
 '35',
 '14',
 '66',
 '',
 '',
 '',
 '',
 '6',
 '',
 'Southampton',
 '',
 '38',
 '18',
 '9',
 '11',
 '59',
 '41',
 '18',
 '63',
 '',
 '',
 '',
 '',
 '7',
 '',
 'West Ham United',
 '',
 '38',
 '16',
 '14',
 '8',
 '65',
 '51',
 '14',
 '62',
 '',
 '',
 '',
 '',
 '8',
 '',
 'Liverpool',
 '',
 '38',
 '16',
 '12',
 '10',
 '63',
 '50',
 '13',
 '60',
 '',
 '',
 '',
 '',
 '9',
 '',
 'Stoke City',
 '',
 '38',
 '14',
 '9',
 '15',
 '41',
 '55',
 '-14',
 '51',
 '',
 '',
 '',
 '',
 '10',
 '

Almost there, we don't have html tags or line breaks (\n) anymore, only empty values in the string. Our last data cleaning step is to remove those.

In [7]:
cleantext3 = list(filter(lambda a: a != '', cleantext2))
print(cleantext3)

['1', 'Leicester City', '38', '23', '12', '3', '68', '36', '32', '81', '2', 'Arsenal', '38', '20', '11', '7', '65', '36', '29', '71', '3', 'Tottenham Hotspur', '38', '19', '13', '6', '69', '35', '34', '70', '4', 'Manchester City', '38', '19', '9', '10', '71', '41', '30', '66', '5', 'Manchester United', '38', '19', '9', '10', '49', '35', '14', '66', '6', 'Southampton', '38', '18', '9', '11', '59', '41', '18', '63', '7', 'West Ham United', '38', '16', '14', '8', '65', '51', '14', '62', '8', 'Liverpool', '38', '16', '12', '10', '63', '50', '13', '60', '9', 'Stoke City', '38', '14', '9', '15', '41', '55', '-14', '51', '10', 'Chelsea', '38', '12', '14', '12', '59', '53', '6', '50', '11', 'Everton', '38', '11', '14', '13', '59', '55', '4', '47', '12', 'Swansea City', '38', '12', '11', '15', '42', '52', '-10', '47', '13', 'Watford', '38', '12', '9', '17', '40', '50', '-10', '45', '14', 'West Bromwich Albion', '38', '10', '13', '15', '34', '48', '-14', '43', '15', 'Crystal Palace', '38', '11',

Turning this string in a csv file is our last step in our web scraping. To such activity, we will use the csv library.
We will write the header firstly and then rows for each club data.

In [8]:
headers = ['Position','Club','G','W','D','L','F','A','GD','PTS']
i = 0
with open('season1516.csv', 'w', newline='') as myfile:
    wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)    
    wr.writerow(headers)
    while i < 20:
        wr.writerow(cleantext3[10*i:10*(i+1)])
        i = i + 1

In [9]:
df = pd.read_csv('season1516.csv')
df.head()

Unnamed: 0,Position,Club,G,W,D,L,F,A,GD,PTS
0,1,Leicester City,38,23,12,3,68,36,32,81
1,2,Arsenal,38,20,11,7,65,36,29,71
2,3,Tottenham Hotspur,38,19,13,6,69,35,34,70
3,4,Manchester City,38,19,9,10,71,41,30,66
4,5,Manchester United,38,19,9,10,49,35,14,66


## How did they let us win? (EDA and Data Visualization)

Now it's time to crunch some data.<br/><br/>There is a famous Leicester chant and I'll share some part of it because is our hypothesis

*...We play from the back and counter attack<br/>Champions of England, 
<br/>You made us sing that ...*<br/>

* What make teams champions? Having a deadly attack or a solid defense? <br/>
* How the Big Six (the six biggest teams of England) let Leicester take the silverplate home?

### What make teams champions? Having a deadly attack or a solid defense?