# Coursework: Optimisation of a fantasy football team

The coursework is described in detail in the documentation provided on Moodle. This notebook contains some code for basic functions that read in the data file and define the solution/constraint checker that you must use to check your final solution.

As noted in the coursework, you don't have to use Python or DEAP to tackle this. However, the practicals have covered a lot of functionality that will be useful so you should find that the DEAP libraries provide a quick way to start and will save you some time in writing code.

## Important Information

If you use another language, then you should write out your solution to a csv file as a comma separated list of 0,1s (one value per row) indicating which players are included and then will  need to read it back to use the checker function, You report should include the screenshot of the  output from the function provided in this notebook, and *not* your own version of the function



In [1]:
import pandas as pd
import numpy as np
import os
from pathlib import Path

def read_file_from_directory(directory,filename):
    directory=Path(directory)
    file_path=directory/filename
    if not file_path.exists():raise FileNotFoundError(f"File not found: {file_path}")
    if not file_path.exists():raise FileNotFoundError(f"CSV file not found: {file_path}")
    return pd.read_csv(file_path).reset_index(drop=True)
    return content

# Data
The code below reads in the datafile and calculates the number of players available.  
Change the filepath to your local drive.

The file is sorted by player type. As I may check your solution **DO NOT** sort the file or alter it in any way as my code will expect to see it in this format.

Feel free to browse the file and analyse the data in any way you think might be useful

In [2]:
dir_path="."
file_name="clean-data.csv"
try:data=read_file_from_directory(dir_path,file_name)
except Exception as e:print(f"Error: {e}")
num_players=len(data.index)
budget=100
print("num possible players is %s" % (num_players))

num possible players is 523


# Helpful data
The code below extracts some useful information from the data that will be useful to you when writing your program. In particular:

- a list containing the **points** per player:  e.g. points[i] refers to the **points** associated with player *i*
- a list containing the **cost** per player: e.g. cost[i] refers to the **cost** associated with player *i*
- a list **gk** which indicates which player is a *goal-keeper*. The list is the same length as the number of players. gk[i]=0 if player *i* is not a goal-keeper; gk[i]=1 if player *i* is a goal-keeper
- a list **mid** which indicates which player is a *midfielder*. The list is the same length as the number of players. mid[i]=0 if player *i* is not a mid-fielder; mid[i]=1 if player *i* is a midfielder
- a list **defe** which indicates which player is a *defender*. The list is the same length as the number of players. defe[i]=0 if player *i* is not a defender; defe[i]=1 if player *i* is a defender
- a list **stri** which indicates which player is a *striker*. The list is the same length as the number of players. stri[i]=0 if player *i* is not a striker; stri[i]=1 if player *i* is a striker

In [3]:
points=data['Points'] 
cost=data['Cost']
gk=np.zeros(num_players)
mid=np.zeros(num_players)
defe=np.zeros(num_players)
stri=np.zeros(num_players)

for i in range(num_players):
    if data['Position'][i]=='GK':gk[i]=1
    elif data['Position'][i]=='DEF':defe[i]=1
    elif data['Position'][i]=='MID':mid[i]=1
    elif data['Position'][i]=='STR':stri[i]=1 

# Solution and constraint checker function

You are free to represent an individiual in any way you wish during the search process. However, at the end of the evolutionary run, you *must* convert your solution to a list of length *num_players* in which each element is either 0 or 1. An element *i* should be set to 0 if player *i* is not included in the team, and to 1 if player *is* **is** included in the team.

You *must* call this function with your best solution and include a screen shot of the output in your report.

In [5]:
def check_constraints(individual):
    broken_constraints=0
    totalpoints=np.sum(np.multiply(points,individual))
    c1=np.sum(individual)
    if  c1 !=11:
        broken_constraints+=1
        print("Broken Constraint: Total number of players is %s" %(c1))
    c2 = np.sum(np.multiply(cost, individual)) 
    if c2 >100:
        broken_constraints+=1
        print("Broken Constraint: Cost is %s" %(c2))
    c3 = np.sum(np.multiply(gk, individual))
    if  c3 != 1:
        broken_constraints+=1
        print("Broken Constraint: Number of GOALIES is %s " %(c3))
    c4 = np.sum(np.multiply(defe,individual))
    if  c4 > 5 or c4 < 3:
        broken_constraints+=1
        print("Broken Constraint: Number of DEFENDERS is %s " %(c4))
    c5 = np.sum(np.multiply(mid,individual))
    if  c5 > 5 or c5 < 3: 
        broken_constraints+=1
        print("Broken Constraint: Number of MID is %s " %(c5))
    c6 = np.sum(np.multiply(stri,individual))
    if c6 > 3 or c6 < 1: 
        broken_constraints+=1
        print("Broken Constraint: Number of STRIKERS is %s " %(c6))
    selectedPlayers = [idx for idx, element in enumerate(individual) if element==1]
    if broken_constraints>0:print("INVALID SOLUTION")
    print(" ")
    print(" ")
    return broken_constraints, totalpoints, c2, selectedPlayers

# Example Function for Calculating Constraint Violations

An example function is provided below for checking all the constraints. Note this is not well designed!  It simply checks if a constraint is broken (yes/no) and returns the total number of broken constraints, the total cost, and the number of players. It does not take any account of the extent to which a constraint is broken (e.g. whether there are 12 players in the team, or 500 players in the team). Similarly for the budget, it just notes if it is exceeded - not by how much

To get good solutions, you will need to consider how to modify this function...

The function can be called from a custom evaluation function (example defined below called *footballEval*)

In [6]:
def solutionEvaluator(individual):
    totalpoints=np.sum(np.multiply(points,individual))
    penalty=0
    num_selected=np.sum(individual)
    if num_selected!=11:penalty+=abs(num_selected-11)*100
    total_cost=np.sum(np.multiply(cost,individual))
    if total_cost>100:penalty+=(total_cost-100)*500
    num_gk=np.sum(np.multiply(gk,individual))
    if num_gk!=1:penalty+=abs(num_gk-1)*200
    num_def=np.sum(np.multiply(defe,individual))
    if num_def<3:penalty+=(3-num_def)*150
    elif num_def>5:penalty+=(num_def-5)*150
    num_mid=np.sum(np.multiply(mid,individual))
    if num_mid<3:penalty+=(3-num_mid)*150
    elif num_mid>5:penalty+=(num_mid-5)*150
    num_str=np.sum(np.multiply(stri,individual))
    if num_str<1:penalty+=(1-num_str)*150
    elif num_str>3:penalty+=(num_str-3)*150
    return totalpoints,penalty,total_cost

# Basic EA using DEAP

This creates a basic EA (as per the early practicals) in which a solution is represented as a list of 523 bits, each with value 0 or 1 indicating the player at index *i* in the csv file is chosen or not.  It uses a simple initialisation procedure in which each bit has an equal probability of being set to 0 or 1 *(think about why this is not an efficient representation!)*


A custom evaluation function called *footballEval* is defined and registered.  The function assigns a crude penalty to solutions that break one or more constraints. It is very basic and unlikely to lead to good solutions:
- the number of broken constraints from the *SolutionEvaluator* function is multiplied by an arbitrary constant, and this value subtracted from the 'totalpoints' score
- if no constraints are broken, i.e. the solution is valid, the score will be positive (and should be maximised); if any constraint is broken, the score will be  a negative value

You will want to consider how to modify this function (in conjunction with the SolutionEvaluator) to better guide the EA towards good solutions

All parameters/choice of operators are arbitrary - consider how you might change these


In [7]:
import array
import numpy
import random
from multiprocessing.dummy import Pool as ThreadPool
from deap import base
from deap import creator
from deap import tools
from deap import algorithms

creator.create("FitnessMax",base.Fitness,weights=(1.0,))
creator.create("Individual",list,fitness=creator.FitnessMax)

toolbox=base.Toolbox()
def sparse_bit():
    return 1 if random.random()<0.02 else 0
toolbox.register("attr_bool",sparse_bit)
toolbox.register("individual",tools.initRepeat,creator.Individual,toolbox.attr_bool,num_players)
toolbox.register("population",tools.initRepeat,list,toolbox.individual)

def footballEval(individual):
    totalpoints,penalty,teamCost=solutionEvaluator(individual)
    return (totalpoints-penalty),

def mutSwapPlayers(individual,indpb):
    ones=[i for i,v in enumerate(individual) if v==1]
    zeros=[i for i,v in enumerate(individual) if v==0]
    for i in range(len(individual)):
        if random.random()<indpb:
            if ones and zeros:
                remove=random.choice(ones)
                add=random.choice(zeros)
                individual[remove]=0
                individual[add]=1
                ones.remove(remove)
                zeros.remove(add)
                ones.append(add)
                zeros.append(remove)
    return individual,

toolbox.register("evaluate",footballEval)
toolbox.register("mate",tools.cxTwoPoint)
toolbox.register("mutate",mutSwapPlayers,indpb=0.15)
toolbox.register("select",tools.selTournament,tournsize=3)

pool=ThreadPool()
toolbox.register("map",pool.map)

def main():
    pop=toolbox.population(n=300)
    hof=tools.HallOfFame(1)
    stats=tools.Statistics(lambda ind:ind.fitness.values)
    stats.register("avg",numpy.mean)
    stats.register("std",numpy.std)
    stats.register("min",numpy.min)
    stats.register("max",numpy.max)
    pop,log=algorithms.eaSimple(pop,toolbox,cxpb=0.5,mutpb=0.2,ngen=1000,stats=stats,halloffame=hof,verbose=True)
    return pop,log,hof

In [8]:
pop, log, hof = main()


best = hof[0].fitness.values[0]
max = log.select("min")

for i in range(100):
        fit = max[i]
        if fit == best:
            break        
        
print("max fitness found is %s at gen %s " % (best, i))

gen	nevals	avg     	std    	min   	max
0  	300   	-1542.88	4256.12	-25604	881
1  	180   	-185.867	1730.08	-21452	891
2  	179   	190.69  	909.58 	-6796 	895
3  	183   	276.893 	972.888	-8761 	1056
4  	195   	160.24  	1751.03	-11926	1084
5  	179   	283.697 	1267.46	-8324 	1084
6  	167   	300.35  	1465.57	-10523	1156
7  	202   	396.97  	1295.3 	-12371	1273
8  	175   	593.7   	967.425	-7622 	1235
9  	183   	309.033 	1458.82	-8176 	1248
10 	166   	451.287 	1432.89	-8725 	1534
11 	175   	528.8   	1221.74	-7813 	1534
12 	188   	443.523 	1576.96	-11241	1433
13 	183   	522.96  	1479.45	-9509 	1446
14 	195   	438.503 	1706.19	-12213	1476
15 	172   	687.6   	1207.78	-10488	1532
16 	171   	669.62  	1275.78	-8480 	1573
17 	194   	579.897 	1711.96	-14235	1598
18 	166   	711.36  	1715.44	-18737	1668
19 	177   	743.27  	1347.18	-8273 	1668
20 	171   	810.01  	1490.47	-10494	1668
21 	196   	813.903 	1405.34	-9399 	1682
22 	175   	827.24  	1578.16	-10335	1711
23 	176   	970.487 	1350.79	-10138	1725
24 	

# CHECKER

Call the code below with your best solution and include a screen shot of the output in your report

**If your code uses a representation that is different from the one in this notebook (i.e. a bitstring of length 512 where a 0/1 indicates if the player is selected for the team) then see the instruction in cell 28 below**

In [10]:
broken_constraints,totalpoints,c2,selectedPlayers=check_constraints(hof[0])
print("final cost %s"%(c2));print("final points %s"%(totalpoints));print("broken constraints %s"%(broken_constraints));print("selected players %s"%(selectedPlayers))

 
 
final cost 99.6
final points 1794
broken constraints 0
selected players [3, 15, 16, 52, 63, 180, 236, 261, 376, 382, 468]


In [11]:
num_gk=int(np.sum(np.multiply(gk,hof[0])));num_def=int(np.sum(np.multiply(defe,hof[0])));num_mid=int(np.sum(np.multiply(mid,hof[0])));num_str=int(np.sum(np.multiply(stri,hof[0])))
print("gk: %s"%(num_gk));print("def: %s"%(num_def));print("mid: %s"%(num_mid));print("str: %s"%(num_str))

gk: 1
def: 5
mid: 3
str: 2
