# Coursework: Optimisation of a fantasy football team

The coursework is described in detail in the documentation provided on Moodle. This notebook contains some code for basic functions that read in the data file and define the solution/constraint checker that you must use to check your final solution.

As noted in the coursework, you don't have to use Python or DEAP to tackle this. However, the practicals have covered a lot of functionality that will be useful so you should find that the DEAP libraries provide a quick way to start and will save you some time in writing code.

## Important Information

If you use another language, then you should write out your solution to a csv file as a comma separated list of 0,1s (one value per row) indicating which players are included and then will  need to read it back to use the checker function, You report should include the screenshot of the  output from the function provided in this notebook, and *not* your own version of the function



In [None]:
# Preliminaries: import libraries, function to read csv file with the data required

import pandas as pd
import numpy as np
import os

from pathlib import Path

def read_file_from_directory(directory, filename):
    """
    Reads the content of a file from a given directory.
    
    Args:
        directory (str or Path): Path to the directory.
        filename (str): Name of the file to read.
    
    Returns:
        str: Content of the file.
    """
    # Ensure directory is a Path object
    directory = Path(directory)
    
    # Build the full path
    file_path = directory / filename
    
    # Check if file exists
    if not file_path.exists():
        raise FileNotFoundError(f"File not found: {file_path}")
    
    # Read file content
   
    if not file_path.exists():
        raise FileNotFoundError(f"CSV file not found: {file_path}")
    
    return pd.read_csv(file_path).reset_index(drop=True)

    
    return content




# Data
The code below reads in the datafile and calculates the number of players available.  
Change the filepath to your local drive.

The file is sorted by player type. As I may check your solution **DO NOT** sort the file or alter it in any way as my code will expect to see it in this format.

Feel free to browse the file and analyse the data in any way you think might be useful

In [None]:
# THIS FUNCTION READS THE DATA FILE CONTAINING THE INFORMATION RE EACH PLAYER

# read data

dir_path = "/Users/emma/Downloads"   # change to the directory where you have stored the csv file
file_name = "clean-data.csv"              # file name
    
try:
    data = read_file_from_directory(dir_path, file_name)
except Exception as e:
    print(f"Error: {e}")



num_players = len(data.index)
budget = 100  # do not change
print("num possible players is %s" % (num_players))


# Helpful data
The code below extracts some useful information from the data that will be useful to you when writing your program. In particular:

- a list containing the **points** per player:  e.g. points[i] refers to the **points** associated with player *i*
- a list containing the **cost** per player: e.g. cost[i] refers to the **cost** associated with player *i*
- a list **gk** which indicates which player is a *goal-keeper*. The list is the same length as the number of players. gk[i]=0 if player *i* is not a goal-keeper; gk[i]=1 if player *i* is a goal-keeper
- a list **mid** which indicates which player is a *midfielder*. The list is the same length as the number of players. mid[i]=0 if player *i* is not a mid-fielder; mid[i]=1 if player *i* is a midfielder
- a list **defe** which indicates which player is a *defender*. The list is the same length as the number of players. defe[i]=0 if player *i* is not a defender; defe[i]=1 if player *i* is a defender
- a list **stri** which indicates which player is a *striker*. The list is the same length as the number of players. stri[i]=0 if player *i* is not a striker; stri[i]=1 if player *i* is a striker

In [None]:
# HELPFUL DATA STRUCTURES
# these can be used for calculating points and costs and are also used in the constraint_checking function
points = data['Points'] 
cost = data['Cost']
    

# create lists with all elements initialised to 0
gk = np.zeros(num_players)
mid = np.zeros(num_players)
defe = np.zeros(num_players)
stri = np.zeros(num_players)

for i in range(num_players):
    if data['Position'][i] == 'GK':
        gk[i] = 1
    elif data['Position'][i] == 'DEF':
        defe[i] = 1
    elif data['Position'][i] == 'MID':
        mid[i] = 1
    elif data['Position'][i] == 'STR':
        stri[i]=1
  

In [None]:
# for example, the gk list has a 1 indicating the position in the csv file of goalkeepers

print(gk)

# Solution and constraint checker function

You are free to represent an individiual in any way you wish during the search process. However, at the end of the evolutionary run, you *must* convert your solution to a list of length *num_players* in which each element is either 0 or 1. An element *i* should be set to 0 if player *i* is not included in the team, and to 1 if player *is* **is** included in the team.

You *must* call this function with your best solution and include a screen shot of the output in your report.

In [None]:
# check the constraints
# the function MUST be passed a list of length num_players in which each bit is set to 0 or 1


def check_constraints(individual):
     
    broken_constraints = 0
    
    
    totalpoints = np.sum(np.multiply(points, individual))

    # exactly 11 players
    c1 = np.sum(individual)
    if  c1 != 11:
        broken_constraints+=1
        print("Broken Constraint: Total number of players is %s" %(c1))
        
    
    #need cost <= 100"
    c2 = np.sum(np.multiply(cost, individual)) 
    if c2 > 100:
        broken_constraints+=1
        print("Broken Constraint: Cost is %s" %(c2))
        
    
    
    # need only 1 GK
    c3 = np.sum(np.multiply(gk, individual))
    if  c3 != 1:
        broken_constraints+=1
        print("Broken Constraint: Number of GOALIES is %s " %(c3))
    
    # need less than 3-5 DEF"
    c4 = np.sum(np.multiply(defe,individual))
    if  c4 > 5 or c4 < 3:
        broken_constraints+=1
        print("Broken Constraint: Number of DEFENDERS is %s " %(c4))
            
    #need 3- 5 MID
    c5 = np.sum(np.multiply(mid,individual))
    if  c5 > 5 or c5 < 3: 
        broken_constraints+=1
        print("Broken Constraint: Number of MID is %s " %(c5))
        
    # need 1 -1 3 STR"
    c6 = np.sum(np.multiply(stri,individual))
    if c6 > 3 or c6 < 1: 
        broken_constraints+=1
        print("Broken Constraint: Number of STRIKERS is %s " %(c6))
        
    # get indices of players selected
    selectedPlayers = [idx for idx, element in enumerate(individual) if element==1]
    
    if broken_constraints>0:
        print("INVALID SOLUTION")
    print(" ")
    print(" ")
    
        
   
    
    return broken_constraints, totalpoints, c2, selectedPlayers

# Example Function for Calculating Constraint Violations

An example function is provided below for checking all the constraints. Note this is not well designed!  It simply checks if a constraint is broken (yes/no) and returns the total number of broken constraints, the total cost, and the number of players. It does not take any account of the extent to which a constraint is broken (e.g. whether there are 12 players in the team, or 500 players in the team). Similarly for the budget, it just notes if it is exceeded - not by how much

To get good solutions, you will need to consider how to modify this function...

The function can be called from a custom evaluation function (example defined below called *footballEval*)

In [None]:
# basic example of function to assess broken constraints, and calculate cost and points
# returns total-points, number of constraints broken, total cost... can be customised to return more detailed information

def solutionEvaluator(individual):
     
    broken_constraints = 0
    
    totalpoints = np.sum(np.multiply(points, individual))

   
    # exactly 11 players
    c1 = np.sum(individual)
    if  c1 != 11:
        broken_constraints+=1
  
        
    
    #need cost <= 100"
    c2 = np.sum(np.multiply(cost, individual)) 
    if c2 > 100:
        broken_constraints+=1
       
    
    # need only 1 GK
    c3 = np.sum(np.multiply(gk, individual))
    if  c3 != 1:
        broken_constraints+=1

    
    # need less than 3-5 DEF"
    c4 = np.sum(np.multiply(defe,individual))
    if  c4 > 5 or c4 < 3:
        broken_constraints+=1
        
    #need 3- 5 MID
    c5 = np.sum(np.multiply(mid,individual))
    if  c5 > 5 or c5 < 3: 
        broken_constraints+=1
      
        
    # need 1 -1 3 STR"
    c6 = np.sum(np.multiply(stri,individual))
    if c6 > 3 or c6 < 1: 
        broken_constraints+=1
    
        
    # get indices of players selected
    selectedPlayers = [idx for idx, element in enumerate(individual) if element==1]
    

        
    # return num constraints broken, points and team cost
    
    return broken_constraints, totalpoints, c2

# Basic EA using DEAP

This creates a basic EA (as per the early practicals) in which a solution is represented as a list of 523 bits, each with value 0 or 1 indicating the player at index *i* in the csv file is chosen or not.  It uses a simple initialisation procedure in which each bit has an equal probability of being set to 0 or 1 *(think about why this is not an efficient representation!)*


A custom evaluation function called *footballEval* is defined and registered.  The function assigns a crude penalty to solutions that break one or more constraints. It is very basic and unlikely to lead to good solutions:
- the number of broken constraints from the *SolutionEvaluator* function is multiplied by an arbitrary constant, and this value subtracted from the 'totalpoints' score
- if no constraints are broken, i.e. the solution is valid, the score will be positive (and should be maximised); if any constraint is broken, the score will be  a negative value

You will want to consider how to modify this function (in conjunction with the SolutionEvaluator) to better guide the EA towards good solutions

All parameters/choice of operators are arbitrary - consider how you might change these


In [None]:
import array
import numpy
import random

from deap import base
from deap import creator
from deap import tools
from deap import algorithms


creator.create("FitnessMax", base.Fitness, weights=(1.0,))
creator.create("Individual", list, fitness=creator.FitnessMax)

toolbox = base.Toolbox()
# Attribute generator 
toolbox.register("attr_bool", random.randint, 0, 1)

# Structure initializers: an individual has 523 values (0,1)
toolbox.register("individual", tools.initRepeat, creator.Individual,  toolbox.attr_bool, num_players)


toolbox.register("population", tools.initRepeat, list, toolbox.individual)


# custom evaluation function that assigns a fixed penalty to solutions that break one or more constraints

def  footballEval(individual):
    broken_constraints, totalpoints, teamCost  = solutionEvaluator(individual)
 
    
    # arbitrary penalty function that assigns penalty of 1000 x number of broken constraints
    if broken_constraints > 0:
        penalty=10000* broken_constraints
    else:
        penalty=0
        

        
    return (totalpoints-penalty),

# register the necessary functions
toolbox.register("evaluate", footballEval)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutFlipBit, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)

def main():
    pop = toolbox.population(n=300)
    hof = tools.HallOfFame(1)
    stats = tools.Statistics(lambda ind: ind.fitness.values)
    stats.register("avg", numpy.mean)
    stats.register("std", numpy.std)
    stats.register("min", numpy.min)
    stats.register("max", numpy.max)

    pop, log = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=100, 
                                   stats=stats, halloffame=hof, verbose=True)
    
    return pop,log,hof
    




In [None]:
##############################
# run the main function 
pop, log, hof = main()


best = hof[0].fitness.values[0]   # best fitness found is stored at index 0 in the hof list

# look in the logbook to see what generation this was found at

max = log.select("min")  # min fitness per generation stored in log

for i in range(100):  # set to ngen
        fit = max[i]
        if fit == best:
            break        
        
print("max fitness found is %s at gen %s " % (best, i))

 
##############################

# CHECKER

Call the code below with your best solution and include a screen shot of the output in your report

**If your code uses a representation that is different from the one in this notebook (i.e. a bitstring of length 512 where a 0/1 indicates if the player is selected for the team) then see the instruction in cell 28 below**


# call the checker to indicate which constraints are broken
broken_constraints, totalpoints, c2, selectedPlayers = check_constraints(hof[0])


print("Final Cost %s " %(c2))
print("Final Points %s" %(totalpoints))
print("Number of broken constraints %s" %(broken_constraints))
print("selected players are %s" %(selectedPlayers))
    
  

# Running the checker with a different representation

If you use a different representation, please use the code below to call the checker.  You need to supply a list with the indices of the players that you have chosen using your preferred representation. This list should contain 11 numbers (line 2 below)

In [None]:
# the indices of the players selected in your best solution -  modify this code with your selected players
mySelectedPlayers = [1,3,5,7,9,11,13,15,100,200,300]

individual = [0]*523 # make a list containing 523 0's

for player in mySelectedPlayers:
    individual[player]=1
    
broken_constraints, totalpoints, c2, selectedPlayers = check_constraints(individual)


print("Final Cost %s " %(c2))
print("Final Points %s" %(totalpoints))
print("Number of broken constraints %s" %(broken_constraints))
print("selected players are %s" %(selectedPlayers))
    