# Football Match Outcome Prediction - Project

Aim of project to be able to give prediction on how likely the home team is to win a game based on past performance.

Final . Use trained data performance for teams in different leagues to predict performance in Champions league match up. 

## Resources:

- https://towardsdatascience.com/what-ive-learnt-predicting-soccer-matches-with-machine-learning-b3f8b445149d
- https://towardsdatascience.com/machine-learning-algorithms-for-football-prediction-using-statistics-from-brazilian-championship-51b7d4ea0bc8

In [3]:
import pandas as pd
import pickle
import numpy as np

In [55]:
year = 2020
pl_results = pd.read_csv(f"Football-Dataset/premier_league/Results_{year}_premier_league.csv")

pl_results.head()

Unnamed: 0,Home_Team,Away_Team,Result,Link,Season,Round,League
0,Liverpool,Norwich City,4-1,https://www.besoccer.com/match/liverpool/norwi...,2020,1,premier_league
1,West Ham,Man. City,0-5,https://www.besoccer.com/match/west-ham-united...,2020,1,premier_league
2,AFC Bournemouth,Sheffield United,1-1,https://www.besoccer.com/match/afc-bournemouth...,2020,1,premier_league
3,Burnley,Southampton,3-0,https://www.besoccer.com/match/burnley-fc/sout...,2020,1,premier_league
4,Crystal Palace,Everton,0-0,https://www.besoccer.com/match/crystal-palace-...,2020,1,premier_league


## Feature Engineering
### Cleaning Datasets

In [7]:
# Extract ELO of each team
elo_list_raw = pickle.load(open('/home/arman/Documents/AiCore/projects/football/Football-Outcome-Predictions/datasets/elo_dict.pkl', 'rb'))

In [8]:
# Convert elo_list_raw dictionary into pandas dataframe (transposed)
elo_list = pd.DataFrame.from_dict(elo_list_raw).T.reset_index()

In [54]:
# Add features to table

def add_elo(results_table):
    '''
    Adds Elo_home and Elo_away columns to dataframe for each match for given results_table.

    Args
    ----------
    results_table: pandas.core.frame.DataFrame
        Dataframe of match results containing the Home_Team, Away_Team, Result, Link, Season,
        Round and League.

    Returns
    -------
    results_table_elo: pandas.core.frame.DataFrame
        Merges input results_table dataframe with global dataframe elo_list, which contains 
        values for Elo_home and Elo_away for each match. Uses equivalent values in Link
        and index columns to merge dataframes.
    '''
    results_table_elo = pd.merge(results_table, elo_list, left_on="Link", right_on="index").drop("index", axis=1) # .drop index column as duplicate values with link column
    return results_table_elo

def home_away_goals(results_table):
    '''
    Adds Home_Goals and Away_Goals columns to dataframe. Uses the Results column and inputs 
    values for number of home goals and away goals for each match for given results_table. 

    Args
    ----------
    results_table: pandas.core.frame.DataFrame
        Dataframe of match results containing the Home_Team, Away_Team, Result, Link, Season,
        Round, League (and more).

    Returns
    -------
    results_table: pandas.core.frame.DataFrame
        Input results_table with additional columns stating the number of Home and Away goals.
    '''

    home_result = []
    away_result = []

    # iterate through each value in Result column and splits into home and away goals
    for results in results_table["Result"]:
        home_result.append(int(results[:results.find('-')]))
        away_result.append(int(results[results.find('-')+1:]))

    # create new columns Home_Goals and Away_Goals
    results_table["Home_Goals"] = home_result
    results_table["Away_Goals"] = away_result
    
    return results_table

def win_loss_draw(results_table):
    '''
    Adds Label column to results_table which dictates if match result was Home Win, Away Win
    or a Draw.

    Requires Home_Goals and Away_Goals column in data which can be generated using the 
    home_away_goals function. (This function will automatically run the home_away_goals 
    function on input results_table if Home_Goals and Away_Goals columns not found).

    Args
    ----------
    results_table: pandas.core.frame.DataFrame
        Dataframe of match results containing the Home_Team, Away_Team, Result, Link, Season,
        Round, League (and more).

    Returns
    -------
    results_table: pandas.core.frame.DataFrame
        Input results_table with additional columns stating the number of Home and Away goals.
    '''

    # if Home_Goals and Away_Goals columns are not in the results_table, generate them here
    if "Home_Goals" and "Away_Goals" not in results_table:
        home_away_goals(results_table)

    #define conditions for win or loss
    conditions = [results_table["Home_Goals"] > results_table["Away_Goals"], 
                results_table["Home_Goals"] < results_table["Away_Goals"]]

    # define choices : 1-Home Win, -1-Away Win, 0-Draw
    choices = ['1', '-1']

    #create new column in DataFrame that displays results of comparisons
    results_table["Label"] = np.select(conditions, choices, default="0")

    return results_table

def number_of_teams(results_table):
    results_table["Number_teams"] = len(pl_results["Home_Team"].unique())
    return results_table

def total_rounds(results_table):
    results_table["Total_rounds"] = max(pl_results["Round"])
    return results_table

def points_home(results_table):
    # map each team in given league with number of points. This will track their points throughout each game played
    team_points = {team : 0 for team in results_table["Home_Team"].unique()}

    # initialise dataframe which will state the number of points the Home and Away team has before going into a game
    points_final = pd.DataFrame(
        {"Points_Home" : [],
        "Points_Away" : []
        }
    )

    # loop through each record in the results_table
    for index, row in results_table.iterrows():
        # new dataframe which contains number of points for the Home and Away team for particular game 
        current_points = pd.DataFrame(
                {"Points_Home" : [team_points[row['Home_Team']]],
                "Points_Away" : [team_points[row['Away_Team']]]
                }
            )

        # Win +3, Loss +0, Draw +1
        if row['Home_Goals'] > row['Away_Goals']:
            team_points[row['Home_Team']] += 3

        elif row['Home_Goals'] == row['Away_Goals']:
            team_points[row['Home_Team']] += 1
            team_points[row['Away_Team']] += 1

        else:
            team_points[row['Away_Team']] += 3

        # Append new points values to end of dataframe
        # reset_index(drop=True) helps avoid InvalidIndexError
        points_final = pd.concat([points_final, current_points]).reset_index(drop=True)

    # Append Points_Home and Points_Away columns to results_table for each record
    results_table = pd.concat([results_table, points_final], axis=1)

    return results_table



def total_goals(results_table):
    # dictionary which stores the total goals scored by each team and total goals scored against each team 
    goals_for = {team : 0 for team in results_table["Home_Team"].unique()}
    goals_against = {team : 0 for team in results_table["Home_Team"].unique()}

    # initialise dataframe which will state the number of goals the Home and Away team have scored/conceded before going into a game
    goals_final = pd.DataFrame(
        {"Total_Goals_For_Home_Team" : [],
        "Total_Goals_Against_Home_Team"  : [],
        "Total_Goals_For_Away_Team" : [],
        "Total_Goals_Against_Away_Team" : []
        }
    )

     # loop through each record in the results_table
    for index, row in results_table.iterrows():

        # new dataframe which contains number of points for the Home and Away team for particular game 
        current_goals = pd.DataFrame(
                {"Total_Goals_For_Home_Team" : [goals_for[row['Home_Team']]],
                "Total_Goals_Against_Home_Team"  : [goals_against[row['Home_Team']]],
                "Total_Goals_For_Away_Team" : [goals_for[row['Away_Team']]],
                "Total_Goals_Against_Away_Team" : [goals_against[row['Away_Team']]]
                }
            )

        # add number of goals scored in game to the total number of goals scored for each team
        goals_for[row['Home_Team']] += row['Home_Goals']
        goals_for[row['Away_Team']] += row['Away_Goals']

        # add number of goals conceded in game to the total number of goals conceded for each team
        goals_against[row['Home_Team']] += row['Away_Goals']
        goals_against[row['Away_Team']] += row['Home_Goals']


        # Append new goals values to end of dataframe
        goals_final = pd.concat([goals_final, current_goals]).reset_index(drop=True)

    # Append Total_Goals_For_Home_Team, Total_Goals_For_Away_Team, Total_Goals_Against_Home_Team and Total_Goals_Against_Away_Team to results_table for each record
    results_table = pd.concat([results_table, goals_final], axis=1)

    return results_table
    

In [56]:
pl_results = add_elo(pl_results)
pl_results = win_loss_draw(pl_results)
pl_results = number_of_teams(pl_results)
pl_results = points_home(pl_results)
pl_results = total_goals(pl_results)

pl_results


Unnamed: 0,Home_Team,Away_Team,Result,Link,Season,Round,League,Elo_home,Elo_away,Home_Goals,Away_Goals,Label,Number_teams,Points_Home,Points_Away,Total_Goals_For_Home_Team,Total_Goals_Against_Home_Team,Total_Goals_For_Away_Team,Total_Goals_Against_Away_Team
0,Liverpool,Norwich City,4-1,https://www.besoccer.com/match/liverpool/norwi...,2020,1,premier_league,97.0,72.0,4,1,1,20,0.0,0.0,0.0,0.0,0.0,0.0
1,West Ham,Man. City,0-5,https://www.besoccer.com/match/west-ham-united...,2020,1,premier_league,77.0,97.0,0,5,-1,20,0.0,0.0,0.0,0.0,0.0,0.0
2,AFC Bournemouth,Sheffield United,1-1,https://www.besoccer.com/match/afc-bournemouth...,2020,1,premier_league,72.0,66.0,1,1,0,20,0.0,0.0,0.0,0.0,0.0,0.0
3,Burnley,Southampton,3-0,https://www.besoccer.com/match/burnley-fc/sout...,2020,1,premier_league,73.0,77.0,3,0,1,20,0.0,0.0,0.0,0.0,0.0,0.0
4,Crystal Palace,Everton,0-0,https://www.besoccer.com/match/crystal-palace-...,2020,1,premier_league,77.0,79.0,0,0,0,20,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,Leicester,Man. Utd,0-2,https://www.besoccer.com/match/leicester-city-...,2020,38,premier_league,83.0,91.0,0,2,-1,20,62.0,63.0,67.0,39.0,64.0,36.0
376,Man. City,Norwich City,5-0,https://www.besoccer.com/match/manchester-city...,2020,38,premier_league,96.0,70.0,5,0,1,20,78.0,21.0,97.0,35.0,26.0,70.0
377,Newcastle,Liverpool,1-3,https://www.besoccer.com/match/newcastle-unite...,2020,38,premier_league,73.0,95.0,1,3,-1,20,44.0,96.0,37.0,55.0,82.0,32.0
378,Southampton,Sheffield United,3-1,https://www.besoccer.com/match/southampton-fc/...,2020,38,premier_league,78.0,66.0,3,1,1,20,49.0,54.0,48.0,59.0,38.0,36.0


In [51]:
x = {1: 2, 3: 4, 4: 3, 2: 1, 0: 0}

points_ordered = {k: v for k, v in sorted(x.items(), key=lambda item: item[1])}

points_ordered

{0: 0, 2: 1, 1: 2, 4: 3, 3: 4}

In [35]:
# POSITION (need to add in goal difference)
position = {team : 0 for team in results_table["Home_Team"].unique()}
for index, team in enumerate(points_ordered):
    position[team] = index+1

position

{'Liverpool': 1,
 'West Ham': 16,
 'AFC Bournemouth': 17,
 'Burnley': 9,
 'Crystal Palace': 14,
 'Watford': 18,
 'Tottenham Hotspur': 6,
 'Leicester': 5,
 'Newcastle': 13,
 'Man. Utd': 3,
 'Arsenal': 8,
 'Aston Villa': 19,
 'Brighton & Hove Albion': 15,
 'Everton': 12,
 'Norwich City': 20,
 'Southampton': 11,
 'Man. City': 2,
 'Sheffield United': 10,
 'Chelsea': 4,
 'Wolves': 7}