## Introduction to the project

To start this project, at first, we request all necessary data, that is, every match result from La Liga Santander & La Liga Smartbank in the last 5 years. This request is done through BeSoccer API.

Here starts the final project of the Master in Data Science at KSCHOOL, done by Pablo Fernández Matus. The name of this project is "Spanish La Liga Predictions" and it is realised during the first half of year 2021.

During this project we will be working with data related to soccer matches results, from which we will try to extract relevant information in order to build a prediction model. The type of model, and what it will consist on are decissions that would be made as the content of the data being worked on becomes better understood.

The project is divided into different notebooks following the steps taken for the data processing, algorithm execution and data analysis. This final step is performed via Tableau dashboards operating as the front-end of the project.

All the notebooks that make up this project can be found in the following repository, including the Tableau file that contains the Front-end.

https://github.com/PabloMatus6/Spanish-LaLiga-Round-Prediction

In terms of software or licences required to run this program, it is sufficient to have access to Jupyter Notebook for the code. To view the front-end in Tableau, you will need a Tableau account or request the free trial offered by Tableau desktop.

# Imports

In [6]:
import pandas as pd 
import numpy as np 
import os 
import requests
import json
import matplotlib.pyplot as plt
import seaborn as sns

# 0.- Before starting

As explained in the readme of the repository of this project, the requests with which the data were obtained at the start of the project are no longer available. Therefore the code of this notebook should be omitted, being only necessary to download the csv files from the following links to Google Drive:

matches_request = https://drive.google.com/file/d/17QJDFMFqM5N7Z4DdOM6hV-w2SVz3iLej/view?usp=sharing

standings_request = https://drive.google.com/file/d/1li73lQZYAnVly2BjKs-GTH1QI6VlSInb/view?usp=sharing

These files should be saved in the same local folder as the other notebooks.

Once these files have been downloaded, you should continue directly through notebook 03. The rest of this notebook & notebook 02 show how the downloaded drive data was obtained via BeSoccer API (notebook 1) & how the merge was done (notebook 2). 

# 1.- Requests

In case you want to get the data directly via API, you must run this notebook, but for that, you must correct the values of 'year' in the request, and accordingly in notebook 02 (Preparing keys for merge part) and notebook 04 (the plots are referred to the seasons from 15-16 to 20-21). The values to be entered in 'year' should be those corresponding to the last five seasons at the time of notebook execution.

### 1.1- Matches Request

The project starts with the data request from La Liga Santander & La Liga Smartbank matches in the last 5 years. This request is done via the BeSoccer API. A personal key for the requests is not needed as it is already configured in the notebook. 

In case this key stops working in the future, it will be necessary to request a temporary free account to the Besoccer API, and enter the key that is received replacing the old one in the links of the requests.

In [None]:
for league_division in range(1, 3):
    for league_season in range(2017, 2022):
        for league_round in range(1, 39):
            matches_url = f"https://apiclient.besoccerapps.com/scripts/api/api.php?key=023afbc77c5610fefc3fc8976e451752&tz=Europe/Madrid&format=json&req=matchs&league={league_division}&round={league_round}&order=twin&twolegged=1&year={league_season}"
            response = requests.get(matches_url)
            result = json.loads(response.content)
            if league_round == 1 and league_division == 1 and league_season == 2017:
                df1 = pd.DataFrame(result['match'])
            else:
                df1 = df1.append(result['match'])

In [3]:
df1.head(5)

Unnamed: 0,id,year,group,total_group,round,local,visitor,league_id,stadium,team1,...,visitor_goals,result,live_minute,status,channels,winner,penaltis1,penaltis2,prorroga,stadium2
0,96233,2017,1,1,1,Málaga,Osasuna,30626,La Rosaleda,704143,...,1,1-1,,1,"[{'id': '20', 'name': 'Gol', 'image': 'https:/...",0,0,0,False,
1,96229,2017,1,1,1,Deportivo,Eibar,30626,Municipal Riazor,704138,...,1,2-1,,1,"[{'id': '185', 'name': 'beIN LaLiga', 'image':...",704138,0,0,False,
2,96226,2017,1,1,1,Barcelona,Real Betis,30626,Camp Nou,704136,...,2,6-2,,1,"[{'id': '185', 'name': 'beIN LaLiga', 'image':...",704136,0,0,False,
3,96230,2017,1,1,1,Granada,Villarreal,30626,Nuevo Los Cármenes,704141,...,1,1-1,,1,"[{'id': '185', 'name': 'beIN LaLiga', 'image':...",0,0,0,False,
4,96228,2017,1,1,1,Sevilla,Espanyol,30626,Ramón Sánchez Pizjuán,704147,...,4,6-4,,1,"[{'id': '185', 'name': 'beIN LaLiga', 'image':...",704147,0,0,False,


We realise that we need to add for La Liga Smartbank four more rounds, as this division has two more teams, there are 42 rounds instead of 38. However, after several tests, adding it is problematic and the information can be confusing, as in the first division these rounds are not available, so it was finally decided not to add it. 

In [4]:
df1.tail()

Unnamed: 0,id,year,group,total_group,round,local,visitor,league_id,stadium,team1,...,visitor_goals,result,live_minute,status,channels,winner,penaltis1,penaltis2,prorroga,stadium2
6,91110,2021,1,1,38,Real Oviedo,Sabadell,57314,Carlos Tartiere,6382799,...,1,2-1,,1,"[{'id': '325', 'name': 'M. LaLiga', 'image': '...",6382799,0,0,False,
7,91112,2021,1,1,38,UD Logroñés,Girona,57314,Las Gaunas,6382792,...,4,1-4,,1,"[{'id': '325', 'name': 'M. LaLiga', 'image': '...",6391868,0,0,False,
8,91104,2021,1,1,38,FC Cartagena,CD Castellón,57314,Municipal Cartagonova,6382787,...,0,1-0,,1,"[{'id': '325', 'name': 'M. LaLiga', 'image': '...",6382787,0,0,False,
9,91109,2021,1,1,38,Rayo Vallecano,Leganés,57314,Vallecas,6382798,...,1,1-1,,1,"[{'id': '303', 'name': '#Vamos', 'image': 'htt...",0,0,0,False,
10,91105,2021,1,1,38,Real Sporting,Lugo,57314,El Molinón-Enrique Castro Quini,6382800,...,0,1-0,,1,"[{'id': '325', 'name': 'M. LaLiga', 'image': '...",6382800,0,0,False,


At this point, we decide to save the entire dataframe before we start working on it and it undergoes modifications.

In [5]:
df1.to_csv('matches_request')

### 1.2.- Standings Request

The second request is then made to complete the information already available. In this request, information is requested about the league table according to the round, division and season of each team.

In [11]:
for league_division in range(1, 3):
    for league_season in range(2016, 2022):
        for league_round in range(1, 39):
            standings_url = f"https://apiclient.besoccerapps.com/scripts/api/api.php?key=023afbc77c5610fefc3fc8976e451752&tz=Europe/Madrid&format=json&req=tables&league={league_division}&round={league_round}&year={league_season}"
            response_standings = requests.get(standings_url)
            result_standings = json.loads(response_standings.content)
            if league_round == 1 and league_division == 1 and league_season == 2017:
                df3 = pd.DataFrame(result_standings['table'])
            else:
                df3 = df3.append(result_standings['table'])

In [12]:
df3.head(5)

Unnamed: 0,id,group,group_name,conference,team,points,wins,draws,losses,shield,...,coef,coefficients,mark,class_mark,round,pos,countrycode,abbr,form,direction
0,429,1,Liga Santander,0,Barcelona,3,1,0,0,https://thumb.resfu.com/img_data/escudos/mediu...,...,,,1,cha,1,1,ES,FCB,w,
1,2107,1,Liga Santander,0,Real Madrid,3,1,0,0,https://thumb.resfu.com/img_data/escudos/mediu...,...,,,1,cha,1,2,ES,RMA,w,
2,1102,1,Liga Santander,0,Sevilla,3,1,0,0,https://thumb.resfu.com/img_data/escudos/mediu...,...,,,1,cha,1,3,ES,SEV,w,
3,2563,1,Liga Santander,0,Las Palmas,3,1,0,0,https://thumb.resfu.com/img_data/escudos/mediu...,...,,,2,prev,1,4,ES,UDL,w,
4,901,1,Liga Santander,0,Deportivo,3,1,0,0,https://thumb.resfu.com/img_data/escudos/mediu...,...,,,3,uefa,1,5,ES,DEP,w,


In [13]:
df3.columns

Index(['id', 'group', 'group_name', 'conference', 'team', 'points', 'wins',
       'draws', 'losses', 'shield', 'cflag', 'basealias', 'gf', 'ga', 'avg',
       'matchs_coef', 'points_coef', 'coef', 'coefficients', 'mark',
       'class_mark', 'round', 'pos', 'countrycode', 'abbr', 'form',
       'direction'],
      dtype='object')

In [14]:
df3 = df3.drop(['group', 'group_name', 'conference', 'shield', 'cflag', 'basealias',
       'matchs_coef', 'points_coef', 'coef', 'coefficients', 'mark',
       'class_mark', 'countrycode', 'abbr',
       'direction'] , axis = 1 )

In [16]:
df3.tail()

Unnamed: 0,id,team,points,wins,draws,losses,gf,ga,avg,round,pos,form
17,673,CD Castellón,41,11,8,19,35,44,-9,38,18,lwddl
18,1578,UD Logroñés,41,10,11,17,26,47,-21,38,19,wdldl
19,2198,Sabadell,40,9,13,16,36,44,-8,38,20,ddwwl
20,1598,Lugo,37,8,13,17,32,50,-18,38,21,llldl
21,140,Albacete,36,9,9,20,25,46,-21,38,22,lwdlw


There is some information left in this request that we need to be able to do de merge with the DataFrame created by Matches request. 

In [17]:
for league_division in range(1, 3):
    for league_season in range(2016, 2022):
        for round_num in range(1, 39):
            standings_url_bis = f"https://apiclient.besoccerapps.com/scripts/api/api.php?key=023afbc77c5610fefc3fc8976e451752&tz=Europe/Madrid&format=json&req=tables&league={league_division}&round={round_num}&year={league_season}"
            response_standings_bis = requests.get(standings_url_bis)
            result_standings_bis = json.loads(response_standings_bis.content)
            if league_division == 1 and league_season == 2017 and round_num == 1:
                df4 = pd.DataFrame(result_standings_bis['table'])
                df4['year'] = league_season
                df4['division'] = league_division
            else:
                df4_bis = pd.DataFrame(result_standings_bis['table'])
                df4_bis['year'] = league_season
                df4_bis['division'] = league_division
                df4 = pd.concat([df4, df4_bis])

In [18]:
df4.tail()

Unnamed: 0,id,group,group_name,conference,team,points,wins,draws,losses,shield,...,mark,class_mark,round,pos,countrycode,abbr,form,direction,year,division
17,673,1,,0,CD Castellón,41,11,8,19,https://thumb.resfu.com/img_data/escudos/mediu...,...,,,38,18,ES,CAS,lwddl,,2021,2
18,1578,1,,0,UD Logroñés,41,10,11,17,https://thumb.resfu.com/img_data/escudos/mediu...,...,3.0,desc,38,19,ES,UDL,wdldl,d,2021,2
19,2198,1,,0,Sabadell,40,9,13,16,https://thumb.resfu.com/img_data/escudos/mediu...,...,3.0,desc,38,20,ES,SAB,ddwwl,d,2021,2
20,1598,1,,0,Lugo,37,8,13,17,https://thumb.resfu.com/img_data/escudos/mediu...,...,3.0,desc,38,21,ES,LUG,llldl,,2021,2
21,140,1,,0,Albacete,36,9,9,20,https://thumb.resfu.com/img_data/escudos/mediu...,...,3.0,desc,38,22,ES,ALB,lwdlw,,2021,2


Proceed in the same way to save the dataframe extracted from this request. In this way, two dataframes are available and must be joined into one, using the information they have in common.

In [19]:
df4.to_csv('standings_request')