<h2>Xây dựng mô hình tối ưu hóa đội hình</h2>


<h5>Phần này sẽ sử dụng mô hình tuyến tính nguyên hỗn hợp (MILP - Mixed-Integer Linear Programming) để chọn ra đội hình gồm 11 cầu thủ tối ưu nhất</h5>

<h4>1. Các thư viện</h4>

In [444]:
import numpy as np
import pandas as pd
from pulp import *


<h4>2. Phân tích dữ liệu</h4>

In [445]:
sofifa = pd.read_csv('./data/sofifa_players_attr_modified.csv',encoding='utf-8-sig')
print(sofifa.shape)
sofifa.head(10)

(85427, 70)


Unnamed: 0,Acceleration,Age,Aggression,Agility,All_Positions,Attacking_Work_Rate,Balance,Ball_Control,Birthday,Body_Type,...,Team2_Position,Team2_Rating,Traits,Update_Date,Value,Vision,Volleys,Wage,Weak_Foot,Weight
0,56,19,55,41,CB,Low,47,26,2004-01-08,Lean (170-185),...,,,,2023-09-22,€100K,34,26,€2K,3,80
1,60,19,53,45,RB CB,Medium,53,38,1994-02-19,Normal (185+),...,,,,2013-09-20,€35K,30,22,€1K,2,82
2,60,18,31,55,ST,Medium,81,44,1997-01-10,Lean (170-185),...,,,,2015-09-21,€70K,49,49,€2K,3,68
3,60,19,42,64,CM,Medium,64,50,1995-01-11,Lean (185+),...,,,,2014-09-18,€25K,54,38,€2K,2,77
4,22,18,24,26,GK,Medium,52,14,1998-03-02,Normal (170-185),...,,,,2016-09-20,€60K,26,6,€2K,3,80
5,57,17,45,44,CB,Medium,64,30,2000-01-06,Lean (170-185),...,,,,2017-09-18,€60K,27,20,€6K,3,67
6,68,19,50,56,RB,Medium,66,54,1998-12-18,Normal (170-185),...,,,,2018-08-21,€110K,33,27,€3K,3,70
7,63,17,33,59,ST,Medium,49,44,2001-07-22,Normal (185+),...,,,,2019-09-19,€70K,33,50,€1K,2,77
8,65,19,57,56,CM,Medium,68,53,1999-09-02,Lean (170-185),...,,,,2019-09-19,€110K,49,40,€4K,3,70
9,65,22,50,55,LWB,Medium,60,46,1998-06-16,Lean (170-185),...,,,,2020-09-23,€100K,44,33,€4K,2,73


Lấy phiên bản mới nhất Sofifa

In [446]:
last_version = sofifa.sort_values('Update_Date', ascending=False)['Update_Date'].unique()[0]
last_version

'2023-10-04'

Sử dụng tất cả các cầu thủ trong phiên bản mới nhất

In [447]:
sofifa_last_version = sofifa[sofifa['Update_Date'] == last_version]
sofifa_last_version = sofifa_last_version[sofifa_last_version['Team1_Position'] != 'RES']

In [448]:
sofifa_last_version.isnull().any()

Acceleration     False
Age              False
Aggression       False
Agility          False
All_Positions    False
                 ...  
Vision           False
Volleys          False
Wage             False
Weak_Foot        False
Weight           False
Length: 70, dtype: bool

In [449]:
print(np.unique(sofifa_last_version['Preferred_Foot']))
print(np.unique(sofifa_last_version['Body_Type']))

['Left' 'Right']
['Lean (170-)' 'Lean (170-185)' 'Lean (185+)' 'Normal (170-)'
 'Normal (170-185)' 'Normal (185+)' 'Stocky (170-)' 'Stocky (170-185)'
 'Stocky (185+)' 'Unique']


In [450]:
sofifa_last_version['Preferred_Foot'] = sofifa_last_version['Preferred_Foot'].map({'Left': 0, 'Right': 1})

Thay đổi giá trị tiền tệ cho Value (giá trị cầu thủ) và Wage (lương theo tuần)

In [451]:
currencies = np.array([list(value)[0] for value in sofifa_last_version.Value.tolist()])
print(np.unique(currencies, return_counts=True))

(array(['€'], dtype='<U1'), array([460], dtype=int64))


In [452]:
def value_to_num(col):
    if pd.isnull(col): return 0
    
    value = col.replace('€', '').replace('M', '').replace('K', '')
    
    if col[-1] == 'M': unit = 1e6
    elif col[-1] == 'K': unit = 1e3
    else: unit = 1
    
    return float(value)*unit

sofifa_last_version.Value = sofifa_last_version.Value.apply(value_to_num)
sofifa_last_version.Wage = sofifa_last_version.Wage.apply(value_to_num)

In [453]:
sofifa_last_version[['Player_Name', 'Value', 'Wage']].head()

Unnamed: 0,Player_Name,Value,Wage
63705,D. Mubama,525000.0,4000.0
64118,J. Feeney,500000.0,3000.0
64131,O. Kellyman,550000.0,1000.0
64140,T. O'Reilly,625000.0,6000.0
64142,M. Olakigbe,625000.0,5000.0


<h4>3. Mô hình đưa ra 11 cầu thủ tối ưu cho đội bóng</h4>

Với mỗi đội bóng, mô hình sẽ cho ra kết quả 11 cầu thủ tối ưu nhất dựa vào: <br> 
- All_Positions: các vị trí cầu thủ trên sân
- Potential: chỉ số tiềm năng tổng thể đánh giá cầu thủ
- Value: giá trị cầu thủ

In [454]:
def get_optimized_team(df, n_gk_line=0, n_defense_line=0, n_attack_midfield_combined=0):

    gk_line = ["GK"]
    attack_line = ["LS", "ST", "RS", "LW", "LF", "CF", "RF", "RW"]
    midfield_line = ["LAM", "CAM", "RAM", "LM", "LCM", "CM", "RCM", "RM", "LDM", "CDM", "RDM"]
    defense_line = ["LB", "LCB", "CB", "RCB", "RB", "LWB", "RWB"]

    attack_midfield_combined = attack_line + midfield_line

    list_dicts = [  {},   {},   {},   {}]
    list_pos   = [gk_line, defense_line, midfield_line, attack_line]
    list_n     = [n_gk_line, n_defense_line, n_attack_midfield_combined]

    players_ids = [str(i) for i in range(df.shape[0])]
    ratings = {i:rating for i, rating in zip(players_ids, df.Potential.values)}
    values = {i:value for i, value in zip(players_ids, df.Value.values)}
        
    for player_dict, player_pos, n_players in zip(list_dicts, list_pos, list_n):
        if n_players <= 0: continue
            
        for i, pos in zip(players_ids, df.All_Positions.values):

            for k in player_pos:

                player_dict.update({i:1} if pos.find(k) >= 0 else {i:0})
            
    players_vars = LpVariable.dicts(name="Players", indexs=players_ids, cat=LpBinary)

    prob = LpProblem(name="SoFIFA", sense=LpMaximize)
    
    # objective function
    prob += lpSum([ratings[i]*players_vars[i] for i in players_ids])
        
    # constraints
    prob += lpSum([players_vars[i] for i in players_ids]) == sum(list_n)
    prob += lpSum([values[i]*players_vars[i] for i in players_ids]) #<= max_budget
    for dict_player, n_players in zip(list_dicts, list_n):
        if n_players > 0:
            prob += lpSum([dict_player[i]*players_vars[i] for i in players_ids]) == n_players    

    prob.solve()
    
    idxs = np.array([int(v.name.split("_")[-1]) for v in prob.variables() if v.varValue == 1])
    mask_players = np.zeros(df.shape[0], dtype=np.bool_)
    mask_players[idxs] = True
    
    return prob.status, mask_players

Sử dụng các đội trong FBref trong mùa mới nhất

In [455]:
fbref = pd.read_csv('./data/fbref_matchinfos_modified.csv',encoding='utf-8-sig')
fbref = fbref[fbref['Season'] == '2023/2024']

In [456]:
list_all_teams = fbref['Home_Team'].unique()
list_all_teams

array(['Tottenham Hotspur', 'Aston Villa', 'Sheffield United', 'Chelsea',
       'Nottingham Forest', 'Manchester City', 'Brentford',
       'Newcastle United', 'Bournemouth', 'Liverpool', 'Arsenal',
       'Brighton & Hove Albion', 'Wolverhampton Wanderers',
       'West Ham United', 'Crystal Palace', 'Fulham', 'Burnley',
       'Manchester United', 'Everton', 'Luton Town'], dtype=object)

Kiểm tra các tên đội bị khác tên giữa Sofifa và FBref

In [457]:
different_team = set(fbref['Home_Team']) - set(sofifa_last_version['Team1'])
different_team

{'Bournemouth'}

In [458]:
sofifa_last_version['Team1'] = sofifa_last_version['Team1'].replace('AFC Bournemouth', 'Bournemouth')
sofifa_last_version['Team2'] = sofifa_last_version['Team2'].replace('AFC Bournemouth', 'Bournemouth')

Đội hình tối ưu cho các đội

In [459]:
optimize_teams = []

for team in list_all_teams:

    this_team = sofifa_last_version[(sofifa_last_version['Team1'] == team) | 
                                    (sofifa_last_version['Team2'] == team)]
    
    #this_team = this_team[this_team['Team1_Position'] != 'RES']
    
    status, mask_players = get_optimized_team(this_team, n_gk_line=1, n_defense_line=4, n_attack_midfield_combined=6)
     
    optimize_teams.append([status, mask_players, team])


  players_vars = LpVariable.dicts(name="Players", indexs=players_ids, cat=LpBinary)


In [460]:
optimize_teams_sofifa = pd.DataFrame()

for i in range(0, len(optimize_teams)):
    

    by_team = sofifa_last_version[(sofifa_last_version['Team1'] == optimize_teams[i][2]) | 
                                   (sofifa_last_version['Team2'] == optimize_teams[i][2])].iloc[optimize_teams[i][1]][[
                                                                    'Sofifa_Id',
                                                                    'Team1', 
                                                                    'Team2',
                                                                    'Player_Name', 
                                                                    'All_Positions', 
                                                                    'Age', 
                                                                    'Overall_Rating', 
                                                                    'Potential', 
                                                                    'Value',
                                                                    'Team1_Position']]

    optimize_teams_sofifa = pd.concat([optimize_teams_sofifa, by_team], axis=0)

    print(optimize_teams[i][2])

Tottenham Hotspur
Aston Villa
Sheffield United
Chelsea
Nottingham Forest
Manchester City
Brentford
Newcastle United
Bournemouth
Liverpool
Arsenal
Brighton & Hove Albion
Wolverhampton Wanderers
West Ham United
Crystal Palace
Fulham
Burnley
Manchester United
Everton
Luton Town


In [461]:
for i in range(0, len(optimize_teams)):
    print(optimize_teams[i][2] + ':      ' + LpStatus[optimize_teams[i][0]], optimize_teams[i][1].sum())

Tottenham Hotspur:      Infeasible 11
Aston Villa:      Infeasible 11
Sheffield United:      Infeasible 11
Chelsea:      Infeasible 11
Nottingham Forest:      Infeasible 12
Manchester City:      Infeasible 11
Brentford:      Infeasible 11
Newcastle United:      Infeasible 11
Bournemouth:      Infeasible 11
Liverpool:      Infeasible 11
Arsenal:      Infeasible 11
Brighton & Hove Albion:      Infeasible 11
Wolverhampton Wanderers:      Infeasible 13
West Ham United:      Infeasible 11
Crystal Palace:      Infeasible 11
Fulham:      Infeasible 11
Burnley:      Infeasible 11
Manchester United:      Infeasible 11
Everton:      Infeasible 11
Luton Town:      Infeasible 11


In [462]:
#print(LpStatus[optimize_teams[2][0]], optimize_teams[2][1].sum())

In [463]:
optimize_teams_sofifa

Unnamed: 0,Sofifa_Id,Team1,Team2,Player_Name,All_Positions,Age,Overall_Rating,Potential,Value,Team1_Position
67966,264453,Tottenham Hotspur,,M. van de Ven,CB LB,22,78,85,26500000.0,LCB
68499,231943,Tottenham Hotspur,,Richarlison,ST LW RW,26,80,83,28000000.0,SUB
68522,247394,Tottenham Hotspur,Sweden,D. Kulusevski,RW,23,81,85,37000000.0,RW
68523,243576,Tottenham Hotspur,,Pedro Porro,RB RWB,23,81,84,32500000.0,RB
68524,226226,Tottenham Hotspur,Argentina,G. Lo Celso,CAM ST,27,81,82,29500000.0,SUB
...,...,...,...,...,...,...,...,...,...,...
66034,243608,Luton Town,,R. Giles,LWB LB,23,74,79,6000000.0,LWB
66035,241928,Luton Town,,A. Sambi Lokonga,CDM CM,23,74,80,6000000.0,SUB
66036,222994,Luton Town,,M. Nakamba,CDM CM,29,74,74,3600000.0,CDM
66037,211363,Wales,Luton Town,T. Lockyer,CB,28,74,74,3600000.0,SUB


Xuất kết quả ra CSV

In [465]:
optimize_teams_sofifa.to_csv('./output/optimize_teams_sofifa.csv', index=False)

In [None]:
#sofifa_last_version[sofifa_last_version['Team1'] == optimize_teams[2][2]][['Team1', 'Team2',
#                                                                            'Sofifa_Id', 
#                                                                            'Player_Name', 
#                                                                            'All_Positions', 
#                                                                            'Age', 
#                                                                            'Overall_Rating', 
#                                                                            'Potential', 
#                                                                            'Value',
#                                                                            'Team1_Position'
#                                                                            ]]

In [None]:
#sofifa_last_version[sofifa_last_version['Team1'] == optimize_teams[12][2]].iloc[optimize_teams[12][1]][['Team1', 
#                                                                            'Sofifa_Id', 
#                                                                            'Player_Name', 
#                                                                            'All_Positions', 
#                                                                            'Age', 
#                                                                            'Overall_Rating', 
#                                                                            'Potential', 
#                                                                            'Value',
#                                                                            'Team1_Position']]