# Velogames solver: Italy 2020
A script to calculate the optimal team that could have been chosen for a given race in [Velogames fantasy cycling](https://www.velogames.com/)

This script uses the [Beautiful Soup](https://www.crummy.com/software/BeautifulSoup/) library to scrape rider data, and the [Pyomo](http://www.pyomo.org/) optimisation library to construct and solve a linear program described below

In Velogames fantasy cycling, you must select a team of 9 riders, each with a specific cost based on their expected performance, spending no more than 100 points. 

Each rider is classed as either an All-Rounder, a Climber, a Sprinter or is Unclassed. A team must contain 2 All-Rounders, 2 Climbers, 1 Sprinter and 3 Unclassed riders. The 9th selection can be from any of these categories.

At the end of the race, each rider will have accumulated a score based on their performance, and the aim is to pick a team with the highest combined score at the end of the race.

The optimisation problem can be stated as:

$maximise \sum_{j=1}^{n} x_j y_j$

$s.t.$ 

$\sum_{j=1}^{n} x_j=9$

$\sum_{j=1}^{n} x_j z_j \leq 100$

$\sum_{j=1}^{n} x_j a_j \geq 2$

$\sum_{j=1}^{n} x_j c_j \geq 2$

$\sum_{j=1}^{n} x_j s_j \geq 1$

$\sum_{j=1}^{n} x_j u_j \geq 3$

where $j=1...n$ is the set of all riders

$x_j\in[0,1]$ is a binary decision variable denoting if rider $j$ is chosen (1 for chosen, 0 for not chosen)

$z_j\in Z^+$ and $y_j\in Z^+$ are the cost and score parameters of rider $j$ respectively

$a_j\in[0,1]$, $c_j\in[0,1]$, $s_j\in[0,1]$ and $u_j\in[0,1]$ are binary parameters denoting if rider $j$ is an All-Rounder, Climber, Sprinter or Unclassed respectively, with the further parameter constraint that $a_i+c_i+s_i+u_i=1$ $\forall i=1...n$ (i.e. each rider is allocated to one and only one of the 4 categories) and by implication $\sum_{j=1}^{n} a_j+\sum_{j=1}^{n} c_j+\sum_{j=1}^{n} s_j+\sum_{j=1}^{n} u_j=n$ (i.e. the sum of the number of riders in each category is equal to the total number of riders)

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import pyomo.environ as pyo
import numpy as np

In [2]:
# first, retrieve all relevant rider data from the velogames rider page
remote_url='https://www.velogames.com/italy/2020/riders.php'

page = requests.get(remote_url)
soup = BeautifulSoup(page.content)

tables = soup.findChildren('table')

rider_table = tables[0] #assume that rider table is the first table on the page
rows = rider_table.findChildren(['tr'])

riders = []
for row in rows:
    cells = row.findChildren('td')
    if len(cells)>5: #ignore empty rows
        riders.append([cells[1].string,cells[3].string,cells[2].string,int(cells[5].string),int(cells[4].string)])

riders_df = pd.DataFrame(riders, columns=['name', 'class', 'team', 'cost', 'score'])

# refactor the class allocation into separate binary-valued columns
# as per stated optimisation problem
for rider_class in riders_df['class'].unique():
    riders_df[rider_class] = np.where(riders_df['class'] == rider_class, 1,0)

riders_df

Unnamed: 0,name,class,team,cost,score,All Rounder,Climber,Sprinter,Unclassed
0,Geraint Thomas,All Rounder,Ineos Grenadiers,24,260,1,0,0,0
1,Simon Yates,Climber,Mitchelton-Scott,22,45,0,1,0,0
2,Jakob Fuglsang,All Rounder,Astana Pro Team,20,1197,1,0,0,0
3,Steven Kruijswijk,All Rounder,Team Jumbo-Visma,20,191,1,0,0,0
4,Vincenzo Nibali,All Rounder,Trek - Segafredo,18,1116,1,0,0,0
...,...,...,...,...,...,...,...,...,...
171,Simone Bevilacqua,Unclassed,Vini Zabù - KTM,4,4,0,0,0,1
172,Marco Frapporti,Unclassed,Vini Zabù - KTM,4,154,0,0,0,1
173,Matteo Spreafico,Unclassed,Vini Zabù - KTM,4,4,0,0,0,1
174,Etienne Van Empel,Unclassed,Vini Zabù - KTM,4,139,0,0,0,1


In [3]:
# set up an abstract model with appropriate variables and parameters with allocated domains
model = pyo.AbstractModel()
model.R = pyo.Set(initialize=range(len(riders_df))) # riders, indexed 0...n
model.x = pyo.Var(model.R, domain=pyo.Boolean) # choice of riders to be made
model.y = pyo.Param(model.R, domain=pyo.NonNegativeIntegers, initialize=riders_df.score.to_dict()) # score for each rider
model.z = pyo.Param(model.R, domain=pyo.NonNegativeIntegers, initialize=riders_df.cost.to_dict())  # cost of each rider
model.a = pyo.Param(model.R, domain=pyo.Boolean, initialize=riders_df['All Rounder'].to_dict())
model.c = pyo.Param(model.R, domain=pyo.Boolean, initialize=riders_df['Climber'].to_dict())
model.s = pyo.Param(model.R, domain=pyo.Boolean, initialize=riders_df['Sprinter'].to_dict())
model.u = pyo.Param(model.R, domain=pyo.Boolean, initialize=riders_df['Unclassed'].to_dict())

# objective function
def obj_function(model):
    return pyo.summation(model.y, model.x)
model.obj = pyo.Objective(rule=obj_function, sense=pyo.maximize)

# cost constraint
def cost_rule(model):
    return sum(model.x[i]*model.z[i] for i in model.R) <= 100
model.cost_constraint = pyo.Constraint(rule=cost_rule)

# choice constraint
def choice_rule(model):
    return sum(model.x[i] for i in model.R) == 9
model.choice_constraint = pyo.Constraint(rule=choice_rule)

# All Rounder constraint
def all_rounder_rule(model):
    return sum(model.a[i]*model.x[i] for i in model.R) >= 2
model.all_rounder_constraint = pyo.Constraint(rule=all_rounder_rule)

# Climber constraint
def climber_rule(model):
    return sum(model.c[i]*model.x[i] for i in model.R) >= 2
model.climber_constraint = pyo.Constraint(rule=climber_rule)

# Sprinter constraint
def sprinter_rule(model):
    return sum(model.s[i]*model.x[i] for i in model.R) >= 1
model.sprinter_constraint = pyo.Constraint(rule=sprinter_rule)

# Unclassed constraint
def unclassed_rule(model):
    return sum(model.u[i]*model.x[i] for i in model.R) >= 3
model.unclassed_constraint = pyo.Constraint(rule=unclassed_rule)

In [4]:
# solve model and show resulting team
instance = model.create_instance()
results = pyo.SolverFactory('glpk').solve(instance)

riders_df['chosen']=[bool(instance.x[i].value) for i in range(len(riders_df))]
riders_df[riders_df['chosen']][['name','class','team','cost','score']]

Unnamed: 0,name,class,team,cost,score
6,Peter Sagan,Sprinter,BORA - hansgrohe,16,1486
9,Wilco Kelderman,All Rounder,Team Sunweb,14,2174
11,João Almeida,All Rounder,Deceuninck - Quick Step,12,2966
13,Pello Bilbao,Climber,Bahrain - McLaren,10,1481
18,Tao Geoghegan Hart,All Rounder,Ineos Grenadiers,10,2459
30,Ruben Guerreiro,Unclassed,EF Pro Cycling,8,841
81,Filippo Ganna,Unclassed,Ineos Grenadiers,6,1597
110,Jai Hindley,Climber,Team Sunweb,6,2106
117,Brandon McNulty,Unclassed,UAE-Team Emirates,6,1018


In [5]:
# total cost
riders_df[riders_df['chosen']].cost.sum()

88

In [6]:
# total score
riders_df[riders_df['chosen']].score.sum()

16128