# EPL Lineup Predcitions by Dan Brooks

The goal of this notebook was to take data from the 2018-2019 EPL (English Premier League) season and try to create the “best possible team” from the players across the league. The result would be a combination of various players from various positions into one “super” team. The data was provided via [Fantasy Premier League 2018-2019 Dataset](https://www.kaggle.com/delayedkarma/fantasy-premier-league-20182019) on Kaggle. The dataset was originally meant to be used for forecasting fantasy soccer (FPL; Fantasy Premier League). The FPL is a contest in which people choose players from different teams, and they score points according to their performance. I decided to disregard the fantasy portion of the data, due to my lack of knowledge of the scoring and how the entire process works. I decided to simply go with the best team based off a scoring metric that I devised. I chose this dataset set simply because I enjoy soccer, and this was a good problem to try a combinatorial unsupervised learning algorithm that I have be wanting to use.

The reason why dataset lends itself to a combinatorial type algorithm is because there is a different cost (or fitness) associated with each player that is in the lineup. Each defender, goalkeeper, midfielder and forward perform different each week. Scores can vary from week to week, depending on who they are playing, who they are playing with, how well they perform individually and a variety of other intangible factors that cannot be modeled. Deciding who to pick can be a difficult choice. That is where the algorithm comes in. The algorithm can try many combinations and choose which combination is the best choice. One might say that if you are looking at the cost of a player/players why not just score them, and then take the max of each position and that is the optimal solution. That would work for a univariate cost function, but when there are multiple variables that you are try to optimize, that solution becomes rather difficult.


In [1]:
import random
import itertools

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

from operator import itemgetter

# Genetic Search Algorithm

The algorithm that I chose is called the “Genetic Search Algorithm”. I translated it from the ruby programming language into python [My Code Here](https://github.com/DanielBrooks253/Clever-Algorithms). I then modified the cost functions and mutation functions to satify the purpose of this particular problem. The original code was written in the book [Clever Algorithms by Jason Brownlee](https://github.com/clever-algorithms/CleverAlgorithms). The algorithm utilizes the idea of parents and gene mutations to find combinations that optimize a certain cost function. There are four main parts to the algorithm:
<ol>
<li>Create a random assortment of lineups (Take largest cost as “best”)</li>
<li> Randomly compare two items; choose who is better (selected group) </li>
<li> Take two parents from selected group and combine into children </li>
<li> Randomly mutate items within each child </li> 
</ol>	


In [2]:
def point_mutation(lineup, rate, person_lookup, formation, week):
    '''
    Description:
        Randomly change a players inside of a lineup

    Input:
        lineup(list): list of players
        rate(dbl): the rate at which the mutation happens
        person_lookup(dict): Players broken down by week and position
        formation(tuple): formation for the players
        week(int): The week for of the season

    Output:
        child(list): The line up with the mutation changes that occured (If any)
    '''

    # Get the number of players by position that need to be on the field
    defense, midfield, forward = formation

    # Flags to check if the randomly selected mutation matches the player that is already in the lineup.
    gkp_match, def_match, mid_match, fwd_match = (True, True, True, True)
    
    child = lineup.copy()

    # Loop through all of the players in the lineup
    for idx, j in enumerate(lineup):
        # Check to see if the random number is less than the mutation rate.
        # If the number is less that the rate, procees to change the lineup at the partiucular index.
        if random.uniform(0, 1) < rate:
            # Goal Keeper
            if idx == 0:
                while gkp_match:
                    gkp_idx = int(random.sample(range((len(person_lookup[week]['GKP'])-1)), 1)[0])

                    if person_lookup[week]['GKP'][gkp_idx] in child:
                        continue
                    else:
                        gkp_match = False
                        child[idx] = person_lookup[week]['GKP'][gkp_idx]
            # Defender
            elif idx <= defense:
                while def_match:
                    def_idx = int(random.sample(range((len(person_lookup[week]['DEF'])-1)), 1)[0])

                    if person_lookup[week]['DEF'][def_idx] in child:
                        continue
                    else:
                        def_match = False
                        child[idx] = person_lookup[week]['DEF'][def_idx]
            # Midfield
            elif idx <= (defense+midfield):
                while mid_match:
                    mid_idx = int(random.sample(range((len(person_lookup[week]['MID'])-1)), 1)[0])

                    if person_lookup[week]['MID'][mid_idx] in child:
                        continue
                    else:
                        mid_match = False
                        child[idx] = person_lookup[week]['MID'][mid_idx]

            else:
                # Forward
                while fwd_match:
                    fwd_idx = int(random.sample(range((len(person_lookup[week]['FWD'])-1)), 1)[0])

                    if person_lookup[week]['FWD'][fwd_idx] in child:
                        continue
                    else:
                        fwd_match = False
                        child[idx] = person_lookup[week]['FWD'][fwd_idx]


    return child

def crossover(parent1, parent2, rate, formation):
    '''
    Description:
        mutate a new child based off of the combination of two parents.

    Input:
        parent1(list): list of lineups
        parent2(list): list of lineups
        rate(dbl): the rate at which the muatation happens. If a random number
        is greater than the rate, no mutation will occur.
        formation(tuple): the formation that the team is playing.

    Output:
        a possible combination of the two parents.
    '''

    # The number of players by position
    defense, midfield, forward = formation

    # Flags to determine if the players repeat within the two parents
    def_same_flag, def_same_count = (True, 0)
    mid_same_flag, mid_same_count = (True, 0)
    fwd_same_flag, fwd_same_count = (True, 0)

    # If the mutation rate is lower than the overall rate,
    # take part of parent 1 and part of parent two and combine them 
    # into a new child

    # Goalie
    if random.uniform(0, 1) >= rate:
        return parent1['lineup']
    else:
        # 50-50 chance to choose the goalie from parent 1 or parent 2
        if random.randint(0,1) == 0:
            new_gkp = [parent1['lineup'][0]]
        else:
            new_gkp = [parent2['lineup'][0]]

        # Defense
        # Choose a random number that descides how many players from parent 1 or parent 2
        # should be taken
        parent1_def = random.randint(0, defense)
        parent2_def = defense - parent1_def 

        # If no players from parent 1 should be taken, then use all of parent 2
        # If no players from parent 2 should be taken, then use all of parent 1
        if parent1_def == 0:
            new_def = parent2['lineup'][1:(defense+1)]
        elif parent2_def == 0:
            new_def = parent1['lineup'][1:(defense+1)]
        else:
            # Try and sample the players from the two parents. If the players are the 
            # same between the two parents, then resample. Repeat this process 5 times.
            # If any of the players are still the same, randomly select parent 1 or parent 2. (Logic below)
            while (def_same_flag == True) and (def_same_count < 4):
                def_random_parent1 = random.sample(range(0, (defense-1)), parent1_def)
                def_random_parent2 = random.sample(range(0, (defense-1)), parent2_def)

                def_sample_parent_1 = list(np.array(parent1['lineup'][1:(defense+1)])[def_random_parent1])
                def_sample_parent_2 = list(np.array(parent2['lineup'][1:(defense+1)])[def_random_parent2])

                def_same_check = sum([i in def_sample_parent_2 for i in def_sample_parent_1])

                if def_same_check > 0:
                    def_same_count += 1
                else:
                    def_same_flag = False
                    new_def = def_sample_parent_1 + def_sample_parent_2
        
        # Midfield
        # Same logic as above for midfield
        parent1_mid = random.randint(0, midfield)
        parent2_mid = midfield - parent1_mid 

        if parent1_mid == 0:
            new_mid = parent2['lineup'][(defense+1):(defense+midfield+1)]
        elif parent2_mid == 0:
            new_mid = parent1['lineup'][(defense+1):(defense+midfield+1)]
        else:
            while (mid_same_flag == True) and (mid_same_count < 4):
                mid_random_parent1 = random.sample(range(0, (midfield-1)), parent1_mid)
                mid_random_parent2 = random.sample(range(0, (midfield-1)), parent2_mid)

                mid_sample_parent_1 = list(np.array(parent1['lineup'][(defense+1):(defense+midfield+1)])[mid_random_parent1])
                mid_sample_parent_2 = list(np.array(parent2['lineup'][(defense+1):(defense+midfield+1)])[mid_random_parent2])

                mid_same_check = sum([i in mid_sample_parent_2 for i in mid_sample_parent_1])

                if mid_same_check > 0:
                    mid_same_count += 1
                else:
                    mid_same_flag = False
                    new_mid = mid_sample_parent_1 + mid_sample_parent_2

        # Forward
        # If there is only one forward on the field, there is a 50-50 chance of
        # selecting them from parent 1 or parent 2.

        # If there is more than one forward; Same logic as above for forwards
        if forward == 1:
            if random.randint(0, 1) == 0:
                new_fwd = list(parent1['lineup'][-1:])
            else:
                new_fwd = list(parent2['lineup'][-1:])
        else:
            parent1_fwd = random.randint(0, forward)
            parent2_fwd = forward - parent1_fwd 

            if parent1_fwd == 0:
                new_fwd = parent2['lineup'][(defense+midfield+1):]
            elif parent2_fwd == 0:
                new_fwd = parent1['lineup'][(defense+midfield+1):]
            else:
                while (fwd_same_flag == True) and (fwd_same_count < 5):
                    fwd_random_parent1 = random.sample(range(0, (forward-1)), parent1_fwd)
                    fwd_random_parent2 = random.sample(range(0, (forward-1)), parent2_fwd)

                    fwd_sample_parent_1 = list(np.array(parent1['lineup'][(defense+midfield+1):])[fwd_random_parent1])
                    fwd_sample_parent_2 = list(np.array(parent2['lineup'][(defense+midfield+1):])[fwd_random_parent2])

                    fwd_same_check = sum([i in fwd_sample_parent_2 for i in fwd_sample_parent_1])

                    if fwd_same_check > 0:
                        fwd_same_count += 1
                    else:
                        fwd_same_flag = False
                        new_fwd = fwd_sample_parent_1 + fwd_sample_parent_2
        
        # If the random sampling from the teams did not result in a new unique team
        # (IE there is repats in the positions), than randomly select the position
        # from parent 1 or parent 2.
        if fwd_same_count >= 4:
            if random.randint(0, 1) == 0:
                new_fwd = parent1['lineup'][(defense+midfield+1):]
            else:
                new_fwd = parent2['lineup'][(defense+midfield+1):]

        if mid_same_count >= 4 :
            if random.randint(0, 1) == 0:
                new_mid = parent1['lineup'][(defense+1):(defense+midfield+1)]
            else:
                new_mid = parent2['lineup'][(defense+1):(defense+midfield+1)]

        if def_same_count >= 4:
            if random.randint(0, 1) == 0:
                new_def = parent1['lineup'][1:(defense+1)]
            else:
                new_def = parent2['lineup'][1:(defense+1)]

        # Return the concatenated list of the new players.
        return new_gkp + new_def + new_mid + new_fwd

def reproduce(selected, pop_size, p_cross, p_mutation, formation, person_lookup, week):
    '''
    Description:
        Take parent strings a combine them combine/mutate them into children strings.

    Input:
        selected(list): list of dictionaries that are potential best lineups
        pop_size(int): number of candiadtes to consider
        p_cross(dbl): rate at which the parent to parent-to-parent crossover happens
        p_mutation(dbl): rate at which the single point mutation will happen (changing random players)
        person_lookup(dict): Players broken down by week and position
        formation(tuple): formation for the players
        week(int): The week for of the season 

    Output:
        children(list): list of all the children created from the parents.
    '''
    children = []
    # Selects the items before (Odd indicies) and after (even indicies) 
    # for mutation later
    for i, p1 in enumerate(selected):
        if i % 2 == 0:
            p2 = selected[i+1]
        else:
            p2 = selected[i-1]

        if i == (len(selected)-1):
            p2 = selected[0]

        child = {}
        # Tries to combine the two parents into new children
        child['lineup'] = crossover(p1, p2, p_cross, formation)

        # Randomly mutates points within the child
        child['lineup'] = point_mutation(child['lineup'], p_mutation, person_lookup, formation, week)
        children.append(child)
    
        if len(children) == pop_size:
            break

    return children

def binary_tournament(pop):
    '''
    Description:
        The function will randomly select two different lists from the population pool.
        It will then compared the cost of the two list against each other. Which ever
        list has a better cost, they get selected and move on in the selection process.

    Input:
        pop(list): List of dictionaries containing the cost and lineups for each item under
        consideration

    Output:
        dictionary containg the list that won the "tournament"

        {bitstring: [],
         fitness: int}
    '''
    # Randomly select two bitstrings from the overall list
    i, j = random.randint(0, (len(pop)-1)), random.randint(0, (len(pop)-1))

    # Check to see if the indicies are the same
    while i == j:
        j =  random.randint(0, len(pop)-1)

    # Select the string with the highest cost to move forward.
    if pop[i]['fitness'] >= pop[j]['fitness']:
        return pop[i]
    else: 
        return pop[j]

def onemax(lineup, player_scores_weeks, week, player_cost_week, cost):
    '''
    The cost of the algorithm is the sum of the player costs 
    '''
    if cost is None:
        scores = [player_scores_weeks[week][i] for i in lineup]
        return sum(scores)
    else:
        if week == 0: # Week 0 is a projected week for the whole season, so it needs to be scaled down 
            scores = sum([player_scores_weeks[week][i] for i in lineup])/5
        else:
            # Other weeks are less than the cost of the players, so the need to be scaled up
            scores = sum([player_scores_weeks[week][i] for i in lineup])*15

        cost = sum([player_cost_week[week][i] for i in lineup])
        return scores - cost

def random_lineup(formation, person_lookup, week):
    '''
    Description:
        Create a random list of players

    Input:
        formation(tuple): tuple containing the number of defenders, midfielders and forwards in the formation
        person_lookup(dict): Dictionary containing the names of the players for each position

    Output:
        A list of randomly generated players.
    '''
    goalie = 1
    defense, midfield, forward = formation

    goalie_idx = random.sample(range(0, (len(person_lookup[week]['GKP'])-1)), 1)
    defense_idx = random.sample(range(0, (len(person_lookup[week]['DEF'])-1)), defense)
    midfield_idx = random.sample(range(0, (len(person_lookup[week]['MID'])-1)), midfield)
    forward_idx = random.sample(range(0, (len(person_lookup[week]['FWD'])-1)), forward)

    goalies = list(person_lookup[week]['GKP'][goalie_idx])
    defenders = list(person_lookup[week]['DEF'][defense_idx])
    midies = list(person_lookup[week]['MID'][midfield_idx])
    forwards = list(person_lookup[week]['FWD'][forward_idx])

    return goalies + defenders + midies + forwards

def search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week, cost = None):
    '''
Description:
    This function contians the main logic of the program. It will search through
    the various combinations of children and parents to find a soultion that will
    optimize the given cost functions. In this case, the optimal solution is a 
    list of all ones.

Input:
    max_gens(int): The number of generations to loop through to find an optimal
    solution.
    num_bits(int): The number of bits contained in the list
    pop_size(int): The number of children to be children to be considered against the parent.
    p_crossover(dbl): THe probability of keeping the parent vs mixing aprent and child.
        ex) p_crossover = .99; probability of keeping parent to next generation = .01
    p_mutation(dbl): The probability of changing a single position in a list.

Output:
    best(list): A list of a possible optimal solution.  
    '''
    early_stop = 0

    # Create a random set of possible solutions
    population = [{'lineup':random_lineup(formation, person_lookup, week)} for i in range(pop_size)]
    
    # Find the "cost" of each colution
    for idx,_ in enumerate(population):
        population[idx]['fitness'] = onemax(population[idx]['lineup'], player_scores_weeks, week, player_cost_week, cost)

    # Sort the lists to find the solution with the highest cost, and keep the largest as the best solution
    best = sorted(population, key=itemgetter('fitness'), reverse=True)[0]

    # # Loop through the number of generations
    for gen in range(max_gens):
        old_fitness = best['fitness']

        # Randomly select possible offspring
        selected = [binary_tournament(population) for _ in range(len(population))]
        # Reproduce children based off of the parent 
        children = reproduce(selected, pop_size, p_crossover, p_mutation, formation, person_lookup, week)

        for idx, _ in enumerate(children):
            children[idx]['fitness'] = onemax(children[idx]['lineup'], player_scores_weeks, week, player_cost_week, cost)

        sort_children =  sorted(children, key=itemgetter('fitness'), reverse=True) 

        # If the best child has a higher cost than current, replace the current with child
        if sort_children[0]['fitness'] >= best['fitness']:
            best = sort_children[0].copy()

        population = sort_children.copy()

        print('Gen: {0}, Fitness: {1}, Best: {2} '.format(gen, best['fitness'], best['lineup']))

        if old_fitness == best['fitness']:
            early_stop += 1
        else:
            early_stop = 0

        if early_stop == 5:
            break

    return best

# Data Prep

The data consisted of 12 different csv files, representing 12 different weeks of the EPL. The first file, denoted with a wk0 at the end, is the projected stats for each player. This is what the analysts, or the creator of the file, believes the player will accomplish throughout the season in total. Since that is a preseason measure, the cost of the lineup for that week is higher than the subsequent weeks. For that reason, the formulas used in the program have been scaled accordingly. Also, some of the players names repeated in the dataset. They could have been traded mid week, or the could just have the same last name (There was no first name qualifier in the data), therefore; I had to make an adjusted name column, that concatenated the players name and the team that they played for. Also, some players did not play during the week. Meaning all of there stats were zero. Those players were filtered out of the dataset for the week and could not make a roster.

In [3]:
def normalize_data(col):
    '''
    Helper function to normalize a array of data betwene 0 (minimum) and 1 (maximum)

    Input:
        col(list/series): columns from a dataframe that is to be normalized

    Output:
        Normalized array with values between 0 and 1
    '''
    return (col-np.min(col))/(np.max(col)-np.min(col))

def scoring(df, label, week):
    '''
    This function will go through and calculate the scores for each player at their 
    given poisition. The scores are used by the algorithms cost function to figure out
    what the total "cost" would be to add a patricular player to the roster.

    Input:
        df(pd.dataframe): List of all the players for a given position
        label(str): label for the position of the players in ther
        week(int): the week of the season the daya is in.
            Week 0 is "preseaon" and uses projected numbers. The coefficients
            of the equations have to be addujsted accordingly
    
    Output:
        a dataframe with the calculated scores appended on the end.
    '''

    data = df.copy()

    # a Multiplier for the coefficients (There are 38 games in the average EPL season)
    if week == 0:
        ceof_multi = 38
    else:
        ceof_multi = 1

    # Equation to calculate the goalkeepers scores
    if label == 'GKP':
        data['Norm_Influence'] = 5.0*normalize_data(data['Influence'])
        data['Scores'] = 10*ceof_multi - \
                          1.5*ceof_multi*data['Goals_conceded'] + \
                          2.0*ceof_multi*data['Penalties_saved'] + \
                          ceof_multi*data['Saves'] - \
                          3.0*ceof_multi*data['Red_cards'] + \
                          2.5*ceof_multi*data['Norm_Influence']

        # If the score is equal to the intercept, then the player has no stats to be considered
        # Therefore they will not be on a roster.
        
        return data[data['Scores'] != 10*ceof_multi]

    # Equation to calculate the defenders scores
    elif label == 'DEF':
        data['Norm_Influence'] = 5.0*normalize_data(data['Influence'])
        data['Norm_Threat'] = 5.0*normalize_data(data['Threat'])
        data['Norm_Creativity'] = 5.0*normalize_data(data['Creativity'])
        data['Scores'] = 10*ceof_multi - \
                            ceof_multi*data['Goals_conceded'] + \
                          5*ceof_multi*data['Goals_scored'] + \
                          2.5*ceof_multi*data['Norm_Influence'] - \
                          1.5*ceof_multi*data['Norm_Threat'] + \
                          1.5*ceof_multi*data['Norm_Creativity'] - \
                          2.0*ceof_multi*data['Red_cards'] + \
                          ceof_multi*data['Assists'] - \
                          2.0*ceof_multi*data['Own_goals']  
        return data[data['Scores'] != 10*ceof_multi]     

    # Equation to calculate the Midfielders scores
    elif label == 'MID':
        data['Norm_Influence'] = 5.0*normalize_data(data['Influence'])
        data['Norm_Creativity'] = 5.0*normalize_data(data['Creativity'])
        data['Norm_Threat'] = 5.0*normalize_data(data['Threat'])
        data['Scores'] = 10*ceof_multi - \
                        ceof_multi*data['Goals_conceded'] + \
                        2.0*ceof_multi*data['Goals_scored'] + \
                        2.5*ceof_multi*data['Norm_Influence'] + \
                        1.5*ceof_multi*data['Norm_Threat'] + \
                        5.0*ceof_multi*data['Norm_Creativity'] - \
                        2.0*ceof_multi*data['Red_cards'] + \
                        2.0*ceof_multi*data['Assists'] - \
                        2.0*ceof_multi*data['Own_goals']

        return data[data['Scores'] != 10*ceof_multi] 

    # Equation to calculate the Forwards scores
    else:
        data['Norm_Influence'] = 5.0*normalize_data(data['Influence'])
        data['Norm_Creativity'] = 5.0*normalize_data(data['Creativity'])
        data['Norm_Threat'] = 5.0*normalize_data(data['Threat'])
        data['Scores'] = 10*ceof_multi + \
                            2.0*ceof_multi*data['Goals_scored'] + \
                            2.5*ceof_multi*data['Norm_Influence'] + \
                            1.5*ceof_multi*data['Norm_Creativity'] + \
                            5.0*ceof_multi*data['Norm_Threat'] - \
                            2.5*ceof_multi*data['Red_cards'] + \
                            ceof_multi*data['Assists']
        return data[data['Scores'] != 10*ceof_multi] 

In [4]:
PATH = 'https://raw.githubusercontent.com/DanielBrooks253/Kaggle/main/Soccer_Team/Data/'
filenames = ['FPL_2018_19_Wk0.csv', # Projected Scores
             'FPL_2018_19_Wk1.csv',
             'FPL_2018_19_Wk2.csv',
             'FPL_2018_19_Wk3.csv',
             'FPL_2018_19_Wk4.csv',
             'FPL_2018_19_Wk5.csv',
             'FPL_2018_19_Wk6.csv',
             'FPL_2018_19_Wk7.csv',
             'FPL_2018_19_Wk8.csv',
             'FPL_2018_19_Wk9.csv',
             'FPL_2018_19_Wk10.csv',
             'FPL_2018_19_Wk11.csv']

person_lookup = {'DEF': [],
                 'GKP': [],
                 'MID': [],
                 'FWD': []}

# Get the individual weeks and also the overall combined dataset
weeks = [pd.read_csv(PATH + i) for i in filenames]
adjust_names = weeks.copy()

# Some of the names of the players repeat.
# This causes issues when trying to create dictionary keys later on.
# THerefore I concatenated the team name with the last name to make a unique key.
for idx, _ in enumerate(adjust_names):
    adjust_names[idx]['New_Name'] = adjust_names[idx]['Name'] + ', ' + adjust_names[idx]['Team']  

In [5]:
adjust_names[0]

Unnamed: 0,Name,Team,Position,Cost,Creativity,Influence,Threat,ICT,Goals_conceded,Goals_scored,...,Penalties_missed,Penalties_saved,Saves,Yellow_cards,Red_cards,TSB,Minutes,Bonus,Points,New_Name
0,Adam Smith,BOU,DEF,45,345.5,455.0,144.0,94.5,38,1,...,0,0,0,6,0,0.3,2067,3,56,"Adam Smith, BOU"
1,Adrian,WHU,GKP,45,0.0,470.4,0.0,47.0,29,0,...,0,0,69,2,0,0.6,1710,5,72,"Adrian, WHU"
2,Aguero,MCI,FWD,110,570.8,966.4,1484.0,302.5,12,21,...,0,0,0,2,0,12.6,1960,22,169,"Aguero, MCI"
3,Ake,BOU,DEF,50,115.1,932.4,287.0,133.5,59,2,...,0,0,0,5,0,5.7,3352,8,102,"Ake, BOU"
4,Albrighton,LEI,MID,55,718.3,580.0,300.0,160.2,42,2,...,0,0,0,5,1,1.1,2533,12,107,"Albrighton, LEI"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
467,Zohore,CAR,FWD,50,0.0,0.0,0.0,0.0,0,0,...,0,0,0,0,0,1.1,0,0,0,"Zohore, CAR"
468,van Aanholt,CRY,DEF,55,303.8,537.0,319.0,116.3,38,5,...,0,0,0,7,0,5.6,2182,7,95,"van Aanholt, CRY"
469,van Dijk,LIV,DEF,60,122.6,601.0,277.0,100.2,28,0,...,0,0,0,1,0,14.4,2253,5,78,"van Dijk, LIV"
470,van La Parra,HUD,MID,50,261.6,322.6,544.0,112.7,36,3,...,0,0,0,3,1,0.1,2131,3,74,"van La Parra, HUD"


In [6]:
adjust_names[1]

Unnamed: 0,Name,Team,Position,Cost,Creativity,Influence,Threat,ICT,Goals_conceded,Goals_scored,...,Penalties_missed,Penalties_saved,Saves,Yellow_cards,Red_cards,TSB,Minutes,Bonus,Points,New_Name
0,Abraham,CHE,FWD,55,0.0,0.0,0.0,0.0,0,0,...,0,0,0,0,0,0.2,0,0,0,"Abraham, CHE"
1,Adam Smith,BOU,DEF,45,15.5,16.2,0.0,3.2,0,0,...,0,0,0,0,0,0.6,90,0,6,"Adam Smith, BOU"
2,Adrian,WHU,GKP,45,0.0,0.0,0.0,0.0,0,0,...,0,0,0,0,0,0.6,0,0,0,"Adrian, WHU"
3,Aguero,MCI,FWD,110,29.6,8.2,39.0,7.7,0,0,...,0,0,0,0,0,32.0,78,0,2,"Aguero, MCI"
4,Ake,BOU,DEF,50,1.1,35.0,0.0,3.6,0,0,...,0,0,0,0,0,3.9,90,2,8,"Ake, BOU"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
520,Zohore,CAR,FWD,50,0.0,0.0,0.0,0.0,0,0,...,0,0,0,0,0,0.7,0,0,0,"Zohore, CAR"
521,van Aanholt,CRY,DEF,55,22.4,35.6,23.0,8.1,0,0,...,0,0,0,0,0,5.7,90,2,11,"van Aanholt, CRY"
522,van Dijk,LIV,DEF,60,1.3,14.2,0.0,1.6,0,0,...,0,0,0,0,0,18.3,90,0,6,"van Dijk, LIV"
523,van La Parra,HUD,MID,50,0.0,0.0,0.0,0.0,0,0,...,0,0,0,0,0,0.1,0,0,0,"van La Parra, HUD"


In [7]:
# Get the scores for all of the positions for all of the available weeks
list_player_score_df = [pd.concat(list(map(lambda x: scoring(adjust_names[i][adjust_names[i]['Position'] == x], x, i), ['DEF', 'FWD', 'GKP', 'MID'])), axis=0)
                           for i in range(0, 12)]

player_cost_week = [dict(zip(adjust_names[i]['New_Name'],adjust_names[i]['Cost'])) for i in range(0, 12)]

# Create a list of dictionaries for each week.
# The dictionary maps the players name to their scores
player_scores_weeks = [dict(zip(i['New_Name'], i['Scores'])) for i in list_player_score_df]

# Filter the weeks to only players who played 
player_score_keys = [list(i.keys()) for i in player_scores_weeks]
filtered_weeks = [i[i['New_Name'].isin(j)] for i,j in zip(adjust_names, player_score_keys)]

In [8]:
# Get the mapping from the position to the person
filter_person = [list(map(lambda x: filtered_weeks[i][filtered_weeks[i]['Position'] == x], ['DEF', 'FWD', 'GKP', 'MID'])) for i in range(0, 12)]
person_lookup = [{j['Position'].unique()[0]:np.array(j['New_Name']) for j in i} for i in filter_person]

# Different Formations (# Defenders - # Midfielders - # Forwards)

## 4-3-3

In [9]:
formation = (4,3,3)
max_gens = 100
pop_size = 30
p_crossover = .99
p_mutation = 1.0/64
week = 0

lineups = search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week)

print('\n ***************************************************************************** \n' 
'                        Best Lineup for week {0}                                    \n' 
'                         Formation: {1}-{2}-{3}                                     \n' 
'***************************************************************************** \n'                    
'GKP: {4} \n'
'DEF: {5} \n'
'MID: {6} \n'
'FWD: {7}'.format(week, formation[0], formation[1], formation[2], lineups['lineup'][0], 
                                                                  lineups['lineup'][1:(formation[0]+1)],
                                                                  lineups['lineup'][(formation[0]+1):(formation[0]+formation[1]+1)],
                                                                  lineups['lineup'][(formation[0]+formation[1]+1):]))

Gen: 0, Fitness: 13876.207379359008, Best: ['Hamer, HUD', 'Young, MUN', 'Emerson, CHE', 'Smalling, MUN', 'Delph, MCI', 'Sterling, MCI', 'Henderson, LIV', 'Salah, LIV', 'Austin, SOU', 'Depoitre, HUD', 'Jesus, MCI'] 
Gen: 1, Fitness: 16128.91897234853, Best: ['De Gea, MUN', 'Kongolo, HUD', 'Delph, MCI', 'Smalling, MUN', 'Emerson, CHE', 'Salah, LIV', 'Surman, BOU', 'McTominay, MUN', 'Aguero, MCI', 'Giroud, CHE', 'Jesus, MCI'] 
Gen: 2, Fitness: 16128.91897234853, Best: ['De Gea, MUN', 'Kongolo, HUD', 'Delph, MCI', 'Smalling, MUN', 'Emerson, CHE', 'Salah, LIV', 'Surman, BOU', 'McTominay, MUN', 'Aguero, MCI', 'Giroud, CHE', 'Jesus, MCI'] 
Gen: 3, Fitness: 16128.91897234853, Best: ['De Gea, MUN', 'Kongolo, HUD', 'Delph, MCI', 'Smalling, MUN', 'Emerson, CHE', 'Salah, LIV', 'Surman, BOU', 'McTominay, MUN', 'Aguero, MCI', 'Giroud, CHE', 'Jesus, MCI'] 
Gen: 4, Fitness: 16173.4726608942, Best: ['De Gea, MUN', 'Young, MUN', 'Emerson, CHE', 'Delph, MCI', 'Smalling, MUN', 'Noble, WHU', 'Surman, BOU',

## 3-6-1

In [10]:
formation = (3,6,1)
max_gens = 100
pop_size = 30
p_crossover = .99
p_mutation = 1.0/64
week = 7

lineups = search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week)

print('\n ***************************************************************************** \n' 
'                        Best Lineup for week {0}                                    \n' 
'                         Formation: {1}-{2}-{3}                                     \n' 
'***************************************************************************** \n'                    
'GKP: {4} \n'
'DEF: {5} \n'
'MID: {6} \n'
'FWD: {7}'.format(week, formation[0], formation[1], formation[2], lineups['lineup'][0], 
                                                                  lineups['lineup'][1:(formation[0]+1)],
                                                                  lineups['lineup'][(formation[0]+1):(formation[0]+formation[1]+1)],
                                                                  lineups['lineup'][(formation[0]+formation[1]+1):]))

Gen: 0, Fitness: 254.50692445987, Best: ['Hart, BUR', 'Jagielka, EVE', 'Sakho, CRY', 'Lovren, LIV', 'Stanislas, BOU', 'Mane, LIV', 'David Silva, MCI', 'Arter, CAR', 'Davis, SOU', 'Cavaleiro, WOL', 'Aguero, MCI'] 
Gen: 1, Fitness: 254.50692445987, Best: ['Hart, BUR', 'Jagielka, EVE', 'Sakho, CRY', 'Lovren, LIV', 'Stanislas, BOU', 'Mane, LIV', 'David Silva, MCI', 'Arter, CAR', 'Davis, SOU', 'Cavaleiro, WOL', 'Aguero, MCI'] 
Gen: 2, Fitness: 254.50692445987, Best: ['Hart, BUR', 'Jagielka, EVE', 'Sakho, CRY', 'Lovren, LIV', 'Stanislas, BOU', 'Mane, LIV', 'David Silva, MCI', 'Arter, CAR', 'Davis, SOU', 'Cavaleiro, WOL', 'Aguero, MCI'] 
Gen: 3, Fitness: 258.4602509699841, Best: ['Hart, BUR', 'Souare, CRY', 'Delph, MCI', 'Doherty, WOL', 'Stanislas, BOU', 'Alli, TOT', 'Willian, CHE', 'Mane, LIV', 'Pedro, CHE', 'Armstrong, SOU', 'Tosun, EVE'] 
Gen: 4, Fitness: 258.4602509699841, Best: ['Hart, BUR', 'Souare, CRY', 'Delph, MCI', 'Doherty, WOL', 'Stanislas, BOU', 'Alli, TOT', 'Willian, CHE', 'Mane

## 4-5-1

In [11]:
formation = (4,5,1)
max_gens = 100
pop_size = 30
p_crossover = .99
p_mutation = 1.0/64
week = 5

lineups = search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week)

print('\n ***************************************************************************** \n' 
'                        Best Lineup for week {0}                                    \n' 
'                         Formation: {1}-{2}-{3}                                     \n' 
'***************************************************************************** \n'                    
'GKP: {4} \n'
'DEF: {5} \n'
'MID: {6} \n'
'FWD: {7}'.format(week, formation[0], formation[1], formation[2], lineups['lineup'][0], 
                                                                  lineups['lineup'][1:(formation[0]+1)],
                                                                  lineups['lineup'][(formation[0]+1):(formation[0]+formation[1]+1)],
                                                                  lineups['lineup'][(formation[0]+formation[1]+1):]))

Gen: 0, Fitness: 241.3680434738859, Best: ['Gazzaniga, TOT', 'Kelly, CRY', 'Ecuele Manga, CAR', 'Holebas, WAT', 'Vertonghen, TOT', 'Walcott, EVE', 'Armstrong, SOU', 'Sterling, MCI', 'Mane, LIV', 'Sobhi, HUD', 'Arnautovic, WHU'] 
Gen: 1, Fitness: 263.67877868779277, Best: ['Bettinelli, FUL', 'Kelly, CRY', 'Ecuele Manga, CAR', 'Holebas, WAT', 'Kompany, MCI', 'Mane, LIV', 'Sterling, MCI', 'Armstrong, SOU', 'Walcott, EVE', 'Salah, LIV', 'Muto, NEW'] 
Gen: 2, Fitness: 274.54477225172576, Best: ['Gazzaniga, TOT', 'Stephens, SOU', 'Holebas, WAT', 'Fernandez, NEW', 'Vertonghen, TOT', 'Salah, LIV', 'Dier, TOT', 'Jorginho, CHE', 'Kante, CHE', 'Sterling, MCI', 'Vietto, FUL'] 
Gen: 3, Fitness: 274.54477225172576, Best: ['Gazzaniga, TOT', 'Stephens, SOU', 'Holebas, WAT', 'Fernandez, NEW', 'Vertonghen, TOT', 'Salah, LIV', 'Dier, TOT', 'Jorginho, CHE', 'Kante, CHE', 'Sterling, MCI', 'Vietto, FUL'] 
Gen: 4, Fitness: 288.02649796995115, Best: ['Gazzaniga, TOT', 'Holebas, WAT', 'Fernandez, NEW', 'Ecuele

## 4-4-2

In [12]:
formation = (4,4,2)
max_gens = 100
pop_size = 30
p_crossover = .99
p_mutation = 1.0/64
week = 10

lineups = search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week)

print('\n ***************************************************************************** \n' 
'                        Best Lineup for week {0}                                    \n' 
'                         Formation: {1}-{2}-{3}                                     \n' 
'***************************************************************************** \n'                    
'GKP: {4} \n'
'DEF: {5} \n'
'MID: {6} \n'
'FWD: {7}'.format(week, formation[0], formation[1], formation[2], lineups['lineup'][0], 
                                                                  lineups['lineup'][1:(formation[0]+1)],
                                                                  lineups['lineup'][(formation[0]+1):(formation[0]+formation[1]+1)],
                                                                  lineups['lineup'][(formation[0]+formation[1]+1):]))

Gen: 0, Fitness: 242.25221241011405, Best: ['Hart, BUR', 'Darmian, MUN', 'Bryan, FUL', 'Ake, BOU', 'Robertson, LIV', 'Willian, CHE', 'Ki Sung-yueng, NEW', 'Sanchez, MUN', 'Izquierdo, BHA', 'Lukaku, MUN', 'Joselu, NEW'] 
Gen: 1, Fitness: 266.9785536900193, Best: ['Hart, BUR', 'Schindler, HUD', 'Alonso, CHE', 'Trippier, TOT', 'Rice, WHU', 'Zinchenko, MCI', 'Lamela, TOT', 'Zambo Anguissa, FUL', 'David Silva, MCI', 'Aubameyang, ARS', 'Mounie, HUD'] 
Gen: 2, Fitness: 278.23334069261364, Best: ['Hart, BUR', 'Darmian, MUN', 'Ake, BOU', 'Kiko Femenia, WAT', 'Rice, WHU', 'Sanchez, MUN', 'Willian, CHE', 'David Silva, MCI', 'Fabregas, CHE', 'Lukaku, MUN', 'Joselu, NEW'] 
Gen: 3, Fitness: 284.5074092635824, Best: ['Hart, BUR', 'Bryan, FUL', 'David Luiz, CHE', 'Trippier, TOT', 'Alonso, CHE', 'Sanchez, MUN', 'Ki Sung-yueng, NEW', 'Willian, CHE', 'Zinchenko, MCI', 'Morata, CHE', 'Lukaku, MUN'] 
Gen: 4, Fitness: 289.80662044001673, Best: ['Hart, BUR', 'Trippier, TOT', 'Schindler, HUD', 'Alonso, CHE', 

# Best Team for Lowest Player Cost

## 4-3-3

In [13]:
formation = (4,3,3)
max_gens = 100
pop_size = 30
p_crossover = .99
p_mutation = 1.0/64
week = 1

lineups = search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week, True)

print('\n ***************************************************************************** \n' 
'                        Best Lineup for week {0}                                    \n' 
'                         Formation: {1}-{2}-{3}                                     \n' 
'***************************************************************************** \n'                    
'GKP: {4} \n'
'DEF: {5} \n'
'MID: {6} \n'
'FWD: {7}'.format(week, formation[0], formation[1], formation[2], lineups['lineup'][0], 
                                                                  lineups['lineup'][1:(formation[0]+1)],
                                                                  lineups['lineup'][(formation[0]+1):(formation[0]+formation[1]+1)],
                                                                  lineups['lineup'][(formation[0]+formation[1]+1):]))

Gen: 0, Fitness: 3337.8435330811562, Best: ['Cech, ARS', 'Bamba, CAR', 'Robertson, LIV', 'Vertonghen, TOT', 'Bryan, FUL', 'Mane, LIV', 'Ward, CAR', 'McDonald, FUL', 'Wilson, BOU', 'Reid, CAR', 'Sturridge, LIV'] 
Gen: 1, Fitness: 3337.8435330811562, Best: ['Cech, ARS', 'Bamba, CAR', 'Robertson, LIV', 'Vertonghen, TOT', 'Bryan, FUL', 'Mane, LIV', 'Ward, CAR', 'McDonald, FUL', 'Wilson, BOU', 'Reid, CAR', 'Sturridge, LIV'] 
Gen: 2, Fitness: 3337.8435330811562, Best: ['Cech, ARS', 'Bamba, CAR', 'Robertson, LIV', 'Vertonghen, TOT', 'Bryan, FUL', 'Mane, LIV', 'Ward, CAR', 'McDonald, FUL', 'Wilson, BOU', 'Reid, CAR', 'Sturridge, LIV'] 
Gen: 3, Fitness: 3616.8498968866124, Best: ['McCarthy, SOU', 'Bamba, CAR', 'Kongolo, HUD', 'Coady, WOL', 'Chambers, FUL', 'Fraser, BOU', 'Elyounoussi, SOU', 'Doucoure, WAT', 'Jimenez, WOL', 'Reid, CAR', 'Wilson, BOU'] 
Gen: 4, Fitness: 3776.8140123898356, Best: ['McCarthy, SOU', 'Robertson, LIV', 'Gomez, LIV', 'Chambers, FUL', 'Hoedt, SOU', 'Fraser, BOU', 'Elyou

## 3-6-1

In [14]:
formation = (3, 6, 1)
max_gens = 100
pop_size = 30
p_crossover = .99
p_mutation = 1.0/64
week = 11

lineups = search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week, True)

print('\n ***************************************************************************** \n' 
'                        Best Lineup for week {0}                                    \n' 
'                         Formation: {1}-{2}-{3}                                     \n' 
'***************************************************************************** \n'                    
'GKP: {4} \n'
'DEF: {5} \n'
'MID: {6} \n'
'FWD: {7}'.format(week, formation[0], formation[1], formation[2], lineups['lineup'][0], 
                                                                  lineups['lineup'][1:(formation[0]+1)],
                                                                  lineups['lineup'][(formation[0]+1):(formation[0]+formation[1]+1)],
                                                                  lineups['lineup'][(formation[0]+formation[1]+1):]))

Gen: 0, Fitness: 3034.106836178913, Best: ['Patricio, WOL', 'Schlupp, CRY', 'Clark, NEW', 'Simpson, LEI', 'Kouyate, CRY', 'Neves, WOL', 'Diakhaby, HUD', 'Zinchenko, MCI', 'Fraser, BOU', 'Gunnarsson, CAR', 'Aguero, MCI'] 
Gen: 1, Fitness: 3317.2398638114137, Best: ['Patricio, WOL', 'Fosu-Mensah, FUL', 'Schlupp, CRY', 'Clark, NEW', 'Deulofeu, WAT', 'Damour, CAR', 'Paterson, CAR', 'Sterling, MCI', 'Brady, BUR', 'Neves, WOL', 'Aguero, MCI'] 
Gen: 2, Fitness: 3520.405562871034, Best: ['Pickford, EVE', 'Schlupp, CRY', 'Clark, NEW', 'Simpson, LEI', 'Neves, WOL', 'Xhaka, ARS', 'Shaqiri, LIV', 'Fraser, BOU', 'Hayden, NEW', 'Sobhi, HUD', 'Aguero, MCI'] 
Gen: 3, Fitness: 3618.8817203165436, Best: ['Pickford, EVE', 'Schlupp, CRY', 'Clark, NEW', 'Simpson, LEI', 'Doucoure, WAT', 'Fernandinho, MCI', 'Deulofeu, WAT', 'Ralls, CAR', 'Barkley, CHE', 'Fraser, BOU', 'Aguero, MCI'] 
Gen: 4, Fitness: 3933.8589429110325, Best: ['Pickford, EVE', 'Schlupp, CRY', 'Clark, NEW', 'Simpson, LEI', 'Sterling, MCI', 'Z

## 4-5-1

In [15]:
formation = (4, 5, 1)
max_gens = 100
pop_size = 30
p_crossover = .99
p_mutation = 1.0/64
week = 1

lineups = search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week, True)

print('\n ***************************************************************************** \n' 
'                        Best Lineup for week {0}                                    \n' 
'                         Formation: {1}-{2}-{3}                                     \n' 
'***************************************************************************** \n'                    
'GKP: {4} \n'
'DEF: {5} \n'
'MID: {6} \n'
'FWD: {7}'.format(week, formation[0], formation[1], formation[2], lineups['lineup'][0], 
                                                                  lineups['lineup'][1:(formation[0]+1)],
                                                                  lineups['lineup'][(formation[0]+1):(formation[0]+formation[1]+1)],
                                                                  lineups['lineup'][(formation[0]+formation[1]+1):]))

Gen: 0, Fitness: 2529.072309366634, Best: ['Fabianski, WHU', 'Kabasele, WAT', 'Steve Cook, BOU', 'Hadergjonaj, HUD', 'Ecuele Manga, CAR', 'Schurrle, FUL', 'Atsu, NEW', 'Cork, BUR', 'Milner, LIV', 'De Bruyne, MCI', 'Joselu, NEW'] 
Gen: 1, Fitness: 2606.938346392383, Best: ['Fabianski, WHU', 'Rice, WHU', 'Bertrand, SOU', 'Steve Cook, BOU', 'Hadergjonaj, HUD', 'Jota, WOL', 'Milner, LIV', 'Cork, BUR', 'Atsu, NEW', 'Schurrle, FUL', 'Joselu, NEW'] 
Gen: 2, Fitness: 2833.9018865872386, Best: ['Dubravka, NEW', 'Ecuele Manga, CAR', 'Holebas, WAT', 'Kabasele, WAT', 'Hadergjonaj, HUD', 'Doucoure, WAT', 'Hughes, WAT', 'Schurrle, FUL', 'Milner, LIV', 'Cork, BUR', 'Joselu, NEW'] 
Gen: 3, Fitness: 2833.901886587239, Best: ['Dubravka, NEW', 'Ecuele Manga, CAR', 'Holebas, WAT', 'Kabasele, WAT', 'Hadergjonaj, HUD', 'Doucoure, WAT', 'Milner, LIV', 'Hughes, WAT', 'Schurrle, FUL', 'Cork, BUR', 'Joselu, NEW'] 
Gen: 4, Fitness: 2856.8613822043603, Best: ['Dubravka, NEW', 'Schlupp, CRY', 'Rice, WHU', 'Francis

## 4-4-2

In [16]:
formation = (4,4,2)
max_gens = 100
pop_size = 30
p_crossover = .99
p_mutation = 1.0/64
week = 3

lineups = search(max_gens, formation, pop_size, p_crossover, p_mutation, person_lookup, player_scores_weeks, week, player_cost_week, True)

print('\n ***************************************************************************** \n' 
'                        Best Lineup for week {0}                                    \n' 
'                         Formation: {1}-{2}-{3}                                     \n' 
'***************************************************************************** \n'                    
'GKP: {4} \n'
'DEF: {5} \n'
'MID: {6} \n'
'FWD: {7}'.format(week, formation[0], formation[1], formation[2], lineups['lineup'][0], 
                                                                  lineups['lineup'][1:(formation[0]+1)],
                                                                  lineups['lineup'][(formation[0]+1):(formation[0]+formation[1]+1)],
                                                                  lineups['lineup'][(formation[0]+formation[1]+1):]))

Gen: 0, Fitness: 3428.879908396444, Best: ['McCarthy, SOU', 'Zanka, HUD', 'Taylor, BUR', 'Stones, MCI', 'Steve Cook, BOU', 'Pogba, MUN', 'Mane, LIV', 'Stephens, BHA', 'Pereyra, WAT', 'Niasse, EVE', 'Ings, SOU'] 
Gen: 1, Fitness: 3428.879908396444, Best: ['McCarthy, SOU', 'Zanka, HUD', 'Taylor, BUR', 'Stones, MCI', 'Steve Cook, BOU', 'Pogba, MUN', 'Mane, LIV', 'Stephens, BHA', 'Pereyra, WAT', 'Niasse, EVE', 'Ings, SOU'] 
Gen: 2, Fitness: 3470.2559535302426, Best: ['McCarthy, SOU', 'Steve Cook, BOU', 'Daniels, BOU', 'Jagielka, EVE', 'Robertson, LIV', 'Ralls, CAR', 'Jota, WOL', 'Henderson, LIV', 'Fernandinho, MCI', 'Lukaku, MUN', 'Ings, SOU'] 
Gen: 3, Fitness: 3809.637687205837, Best: ['Fabri, FUL', 'Jagielka, EVE', 'Jonny, WOL', 'Robertson, LIV', 'Schindler, HUD', 'Moutinho, WOL', 'Gosling, BOU', 'Mane, LIV', 'Pogba, MUN', 'Lukaku, MUN', 'Ings, SOU'] 
Gen: 4, Fitness: 3939.121863215036, Best: ['McCarthy, SOU', 'Zanka, HUD', 'Jagielka, EVE', 'Steve Cook, BOU', 'Daniels, BOU', 'Mane, LIV',

# Conclusion

The data consisted of 12 different csv files, representing 12 different weeks of the EPL. The first file, denoted with a wk0 at the end, is the projected stats for each player. This is what the analysts, or the creator of the file, believes the player will accomplish throughout the season in total. Since that is a preseason measure, the cost of the lineup for that week is higher than the subsequent weeks. For that reason, the formulas used in the program have been scaled accordingly.