<a href="https://colab.research.google.com/github/MKHO1RUL/Capstone-Project-Data-Classification-and-Summarization-Using-IBM-Granite/blob/main/Capstone_Project_Data_Classification_and_Summarization_Using_IBM_Granite.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Europe Top 5 League 2024/2025 Player Of The season

This project aims to identify the most deserving football player to be named the Player of the Season 2024/2025 using data analytics and the IBM Granite large language model (LLM). Rather than relying on subjective opinions, I use an objective, data-driven analysis based on player performance statistics from the 2024/2025 season.

In [421]:
!pip install langchain_community
!pip install replicate



In [422]:
import os
from google.colab import userdata
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

api_token = userdata.get("api_token")
os.environ["REPLICATE_API_TOKEN"] = api_token

In [423]:
from langchain_community.llms import Replicate

# Define parameters
parameters = {
  "top_k": 5,
  "top_p": 1.0,
  "max_tokens": 10000,
  "min_tokens": 0,
  "random_seed": None,
  "repetition_penalty": 1.0,
  "temperature": 0.7,
  "stopping_criteria": "length (256 tokens)",
  "stopping_sequence": None
}

# Define the model
llm = Replicate(
  model="ibm-granite/granite-3.3-8b-instruct",
  input=parameters
)



**Basic Player Information:**

Player– Player's name

Nation – Player's nationality

Pos – Position (FW, MF, DF, GK)

Squad – Club name

Comp – League

Age – Age of the player

Born – Year of birth

**Playing Time & Appearances**

MP – Matches played

Starts – Games started

Min – Minutes played

90s – Number of full 90-minute matches played

**Attacking Stats**

Gls – Goals scored

Ast – Assists provided

G+A – Goals + Assists

xG – Expected goals

xAG – Expected assists

npxG – Non-penalty expected goals

G-PK – Goals excluding penalties

**Defensive Stats**

Tkl – Total tackles

TklW – Tackles won

Blocks – Blocks made

Int – Interceptions

Tkl+Int – Combined tackles and interceptions

Clr – Clearances

Err – Errors leading to goals

**Passing & Creativity Stats**

PrgP – Progressive passes

PrgC – Progressive carries

KP – Key passes (passes leading to a shot)

Cmp%_stats_passing – Pass completion percentage

Ast_stats_passing – Assists

xA – Expected assists

PPA – Passes into the penalty area

**Goalkeeping Stats**

GA – Goals conceded

Saves – Saves made

Save% – Save percentage

CS – Clean sheets

CS% – Clean sheet percentage

PKA – Penalties faced

PKsv – Penalty saves

**Possession & Ball Control**

Touches – Total touches of the ball

Carries – Total ball carries

PrgR – Progressive runs (carries moving the ball forward significantly)

Mis – Miscontrols

Dis – Times dispossessed

**Miscellaneous Stats**

CrdY – Yellow cards

CrdR – Red cards

PKwon – Penalties won

PKcon – Penalties conceded

Recov – Ball recoveries

In [424]:
df = pd.read_csv("/content/players_data-2024_2025.csv")
df.head()

Unnamed: 0,Rk,Player,Nation,Pos,Squad,Comp,Age,Born,MP,Starts,Min,90s,Gls,Ast,G+A,G-PK,PK,PKatt,CrdY,CrdR,xG,npxG,xAG,npxG+xAG,PrgC,PrgP,PrgR,G+A-PK,xG+xAG,Rk_stats_shooting,Nation_stats_shooting,Pos_stats_shooting,Comp_stats_shooting,Age_stats_shooting,Born_stats_shooting,90s_stats_shooting,Gls_stats_shooting,Sh,SoT,SoT%,Sh/90,SoT/90,G/Sh,G/SoT,Dist,FK,PK_stats_shooting,PKatt_stats_shooting,xG_stats_shooting,npxG_stats_shooting,npxG/Sh,G-xG,np:G-xG,Rk_stats_passing,Nation_stats_passing,Pos_stats_passing,Comp_stats_passing,Age_stats_passing,Born_stats_passing,90s_stats_passing,Cmp,Att,Cmp%,TotDist,PrgDist,Ast_stats_passing,xAG_stats_passing,xA,A-xAG,KP,1/3,PPA,CrsPA,PrgP_stats_passing,Rk_stats_passing_types,Nation_stats_passing_types,Pos_stats_passing_types,Comp_stats_passing_types,Age_stats_passing_types,Born_stats_passing_types,90s_stats_passing_types,Att_stats_passing_types,Live,Dead,FK_stats_passing_types,TB,Sw,Crs,TI,CK,In,Out,Str,Cmp_stats_passing_types,Off,Blocks,Rk_stats_gca,Nation_stats_gca,Pos_stats_gca,Comp_stats_gca,Age_stats_gca,Born_stats_gca,90s_stats_gca,SCA,SCA90,PassLive,PassDead,TO,Sh_stats_gca,Fld,Def,GCA,GCA90,Rk_stats_defense,Nation_stats_defense,Pos_stats_defense,Comp_stats_defense,Age_stats_defense,Born_stats_defense,90s_stats_defense,Tkl,TklW,Def 3rd,Mid 3rd,Att 3rd,Att_stats_defense,Tkl%,Lost,Blocks_stats_defense,Sh_stats_defense,Pass,Int,Tkl+Int,Clr,Err,Rk_stats_possession,Nation_stats_possession,Pos_stats_possession,Comp_stats_possession,Age_stats_possession,Born_stats_possession,90s_stats_possession,Touches,Def Pen,Def 3rd_stats_possession,Mid 3rd_stats_possession,Att 3rd_stats_possession,Att Pen,Live_stats_possession,Att_stats_possession,Succ,Succ%,Tkld,Tkld%,Carries,TotDist_stats_possession,PrgDist_stats_possession,PrgC_stats_possession,1/3_stats_possession,CPA,Mis,Dis,Rec,PrgR_stats_possession,Rk_stats_playing_time,Nation_stats_playing_time,Pos_stats_playing_time,Comp_stats_playing_time,Age_stats_playing_time,Born_stats_playing_time,MP_stats_playing_time,Min_stats_playing_time,Mn/MP,Min%,90s_stats_playing_time,Starts_stats_playing_time,Mn/Start,Compl,Subs,Mn/Sub,unSub,PPM,onG,onGA,+/-,+/-90,On-Off,onxG,onxGA,xG+/-,xG+/-90,Rk_stats_misc,Nation_stats_misc,Pos_stats_misc,Comp_stats_misc,Age_stats_misc,Born_stats_misc,90s_stats_misc,CrdY_stats_misc,CrdR_stats_misc,2CrdY,Fls,Fld_stats_misc,Off_stats_misc,Crs_stats_misc,Int_stats_misc,TklW_stats_misc,PKwon,PKcon,OG,Recov,Won,Lost_stats_misc,Won%,Rk_stats_keeper,Nation_stats_keeper,Pos_stats_keeper,Comp_stats_keeper,Age_stats_keeper,Born_stats_keeper,MP_stats_keeper,Starts_stats_keeper,Min_stats_keeper,90s_stats_keeper,GA,GA90,SoTA,Saves,Save%,W,D,L,CS,CS%,PKatt_stats_keeper,PKA,PKsv,PKm,Rk_stats_keeper_adv,Nation_stats_keeper_adv,Pos_stats_keeper_adv,Comp_stats_keeper_adv,Age_stats_keeper_adv,Born_stats_keeper_adv,90s_stats_keeper_adv,GA_stats_keeper_adv,PKA_stats_keeper_adv,FK_stats_keeper_adv,CK_stats_keeper_adv,OG_stats_keeper_adv,PSxG,PSxG/SoT,PSxG+/-,/90,Cmp_stats_keeper_adv,Att_stats_keeper_adv,Cmp%_stats_keeper_adv,Att (GK),Thr,Launch%,AvgLen,Opp,Stp,Stp%,#OPA,#OPA/90,AvgDist
0,1,Max Aarons,eng ENG,DF,Bournemouth,eng Premier League,24.0,2000.0,3,1,86,1.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,1,8,3,0.0,0.0,1,eng ENG,DF,eng Premier League,24.0,2000.0,1.0,0,0,0,,0.0,0.0,,,,0,0,0,0.0,0.0,,0.0,0.0,1,eng ENG,DF,eng Premier League,24.0,2000.0,1.0,50,63,79.4,887,361,0,0.0,0.0,0.0,0,8,0,0,8,1,eng ENG,DF,eng Premier League,24.0,2000.0,1.0,63,51,12,2,0,0,2,10,0,0,0,0,50,0,1,1,eng ENG,DF,eng Premier League,24.0,2000.0,1.0,2,2.09,2,0,0,0,0,0,0,0.0,1,eng ENG,DF,eng Premier League,24.0,2000.0,1.0,2,2,1,1,0,1,100.0,0,3,1,2,1,3,0,0,1,eng ENG,DF,eng Premier League,24.0,2000.0,1.0,73,2,19,40,15,0,73,2,0,0.0,1,50.0,41,152,68,1,0,0,1,0,40,3,1,eng ENG,DF,eng Premier League,24.0,2000.0,3,86,29,2.5,1.0,1,61.0,0,2,13.0,11,0.67,2,0,2,2.09,1.82,2.3,0.3,2.0,2.12,1,eng ENG,DF,eng Premier League,24.0,2000.0,1.0,0,0,0,0,2,0,2,1,2,0,0,0,7,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
1,2,Max Aarons,eng ENG,"DF,MF",Valencia,es La Liga,24.0,2000.0,4,1,120,1.3,0,0,0,0,0,0,2,0,0.0,0.0,0.0,0.0,0,6,10,0.0,0.02,2,eng ENG,"DF,MF",es La Liga,24.0,2000.0,1.3,0,0,0,,0.0,0.0,,,,0,0,0,0.0,0.0,,0.0,0.0,2,eng ENG,"DF,MF",es La Liga,24.0,2000.0,1.3,47,66,71.2,705,190,0,0.0,0.0,0.0,1,2,0,0,6,2,eng ENG,"DF,MF",es La Liga,24.0,2000.0,1.3,66,54,12,1,0,0,5,11,0,0,0,0,47,0,1,2,eng ENG,"DF,MF",es La Liga,24.0,2000.0,1.3,1,0.75,1,0,0,0,0,0,0,0.0,2,eng ENG,"DF,MF",es La Liga,24.0,2000.0,1.3,4,4,2,2,0,5,80.0,1,1,0,1,0,4,3,0,2,eng ENG,"DF,MF",es La Liga,24.0,2000.0,1.3,85,1,21,28,36,0,85,4,1,25.0,2,50.0,46,215,103,0,3,0,3,1,49,10,2,eng ENG,"DF,MF",es La Liga,24.0,2000.0,4,120,30,3.5,1.3,1,73.0,0,3,16.0,14,0.75,1,3,-2,-1.5,-1.28,1.5,3.7,-2.3,-1.69,2,eng ENG,"DF,MF",es La Liga,24.0,2000.0,1.3,2,0,0,0,2,0,5,0,4,0,0,0,7,2,1,66.7,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
2,3,Rodrigo Abajas,es ESP,DF,Valencia,es La Liga,21.0,2003.0,1,1,65,0.7,0,0,0,0,0,0,1,0,0.1,0.1,0.0,0.1,3,2,3,0.0,0.1,3,es ESP,DF,es La Liga,21.0,2003.0,0.7,0,1,0,0.0,1.38,0.0,0.0,,24.5,0,0,0,0.1,0.1,0.07,-0.1,-0.1,3,es ESP,DF,es La Liga,21.0,2003.0,0.7,17,29,58.6,268,110,0,0.0,0.0,0.0,0,0,0,0,2,3,es ESP,DF,es La Liga,21.0,2003.0,0.7,29,21,8,0,0,0,1,8,0,0,0,0,17,0,2,3,es ESP,DF,es La Liga,21.0,2003.0,0.7,0,0.0,0,0,0,0,0,0,0,0.0,3,es ESP,DF,es La Liga,21.0,2003.0,0.7,3,2,2,1,0,3,100.0,0,1,0,1,1,4,0,0,3,es ESP,DF,es La Liga,21.0,2003.0,0.7,36,1,8,19,9,1,36,1,1,100.0,0,0.0,13,101,67,3,2,1,0,2,16,3,4,es ESP,DF,es La Liga,21.0,2003.0,1,65,65,1.9,0.7,1,65.0,0,0,,8,0.0,1,2,-1,-1.38,-1.14,1.4,0.7,0.7,0.93,3,es ESP,DF,es La Liga,21.0,2003.0,0.7,1,0,0,2,0,1,1,1,2,0,0,0,2,0,1,0.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
3,4,James Abankwah,ie IRL,"DF,MF",Udinese,it Serie A,20.0,2004.0,6,0,88,1.0,0,0,0,0,0,0,1,0,0.1,0.1,0.0,0.1,3,4,1,0.0,0.06,4,ie IRL,"DF,MF",it Serie A,20.0,2004.0,1.0,0,1,0,0.0,1.02,0.0,0.0,,15.0,0,0,0,0.1,0.1,0.06,-0.1,-0.1,4,ie IRL,"DF,MF",it Serie A,20.0,2004.0,1.0,36,46,78.3,614,206,0,0.0,0.0,0.0,0,2,0,0,4,4,ie IRL,"DF,MF",it Serie A,20.0,2004.0,1.0,46,45,1,1,0,0,0,0,0,0,0,0,36,0,0,4,ie IRL,"DF,MF",it Serie A,20.0,2004.0,1.0,1,1.02,1,0,0,0,0,0,0,0.0,4,ie IRL,"DF,MF",it Serie A,20.0,2004.0,1.0,4,2,4,0,0,3,66.7,1,2,1,1,1,5,3,0,4,ie IRL,"DF,MF",it Serie A,20.0,2004.0,1.0,65,8,37,22,7,2,65,0,0,,0,,29,219,165,3,1,1,1,3,34,1,5,ie IRL,"DF,MF",it Serie A,20.0,2004.0,6,88,15,2.6,1.0,0,,0,6,15.0,12,1.67,2,0,2,2.05,2.5,0.6,1.5,-0.9,-0.91,4,ie IRL,"DF,MF",it Serie A,20.0,2004.0,1.0,1,0,0,4,3,0,0,1,2,0,0,0,7,2,2,50.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
4,5,Keyliane Abdallah,fr FRA,FW,Marseille,fr Ligue 1,18.0,2006.0,1,0,3,0.0,0,0,0,0,0,0,0,0,0.0,0.0,0.0,0.0,1,0,0,0.0,0.0,5,fr FRA,FW,fr Ligue 1,18.0,2006.0,0.0,0,0,0,,0.0,0.0,,,,0,0,0,0.0,0.0,,0.0,0.0,5,fr FRA,FW,fr Ligue 1,18.0,2006.0,0.0,2,2,100.0,41,0,0,0.0,0.0,0.0,0,0,0,0,0,5,fr FRA,FW,fr Ligue 1,18.0,2006.0,0.0,2,2,0,0,0,0,0,0,0,0,0,0,2,0,0,5,fr FRA,FW,fr Ligue 1,18.0,2006.0,0.0,0,0.0,0,0,0,0,0,0,0,0.0,5,fr FRA,FW,fr Ligue 1,18.0,2006.0,0.0,1,1,1,0,0,1,100.0,0,0,0,0,0,1,0,0,5,fr FRA,FW,fr Ligue 1,18.0,2006.0,0.0,4,0,3,1,0,0,4,0,0,,0,,1,10,9,1,0,0,1,0,3,0,7,fr FRA,FW,fr Ligue 1,18.0,2006.0,1,3,3,0.1,0.0,0,,0,1,3.0,7,3.0,0,0,0,0.0,-0.79,0.0,0.0,0.0,-0.49,5,fr FRA,FW,fr Ligue 1,18.0,2006.0,0.0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


Categorizing players based on their positions

In [425]:
forwards = df[df['Pos'].astype(str).str.split(',').str[0] == 'FW'].copy()
midfielders = df[df['Pos'].astype(str).str.split(',').str[0] == 'MF'].copy()
defenders = df[df['Pos'].astype(str).str.split(',').str[0] == 'DF'].copy()
goalkeepers = df[df['Pos'].astype(str).str.split(',').str[0] == 'GK'].copy()

print("Forwards:")
print(forwards)
print("\nMidfielders:")
print(midfielders)
print("\nDefenders:")
print(defenders)
print("\nGoalkeepers:")
print(goalkeepers)

Forwards:
        Rk             Player   Nation    Pos           Squad  \
4        5  Keyliane Abdallah   fr FRA     FW       Marseille   
13      14     Matthis Abline   fr FRA     FW          Nantes   
17      18      Tammy Abraham  eng ENG     FW            Roma   
18      19      Tammy Abraham  eng ENG     FW           Milan   
22      23         Akor Adams   ng NGA     FW         Sevilla   
...    ...                ...      ...    ...             ...   
2837  2838      Edon Zhegrova   xk KVX  FW,MF           Lille   
2841  2842     Joshua Zirkzee   nl NED  FW,MF  Manchester Utd   
2842  2843    Budu Zivzivadze   ge GEO     FW      Heidenheim   
2852  2853        Milan Đurić   ba BIH     FW           Monza   
2853  2854        Milan Đurić   ba BIH     FW           Parma   

                    Comp   Age    Born  MP  Starts   Min   90s  Gls  Ast  G+A  \
4             fr Ligue 1  18.0  2006.0   1       0     3   0.0    0    0    0   
13            fr Ligue 1  21.0  2003.0  34     

Data normalization

In [426]:
from sklearn.preprocessing import MinMaxScaler

forward_cols = ['Gls', 'xG', 'Ast', 'xAG', 'Carries', 'PrgR', 'Dis', 'PKwon', 'CrdY', 'CrdR']
midfielder_cols = ['Ast', 'xAG', 'KP', 'PPA', 'PrgP', 'Carries', 'PrgR', 'TklW', 'Int', 'Recov', 'Mis', 'CrdY', 'CrdR']
defender_cols = ['Tkl', 'TklW', 'Blocks', 'Int', 'Clr', 'Err', 'PrgC', 'Recov', 'CrdY', 'CrdR']
goalkeeper_cols = ['GA', 'Saves', 'Save%', 'CS', 'CS%', 'Err', 'CrdY', 'CrdR']

scaler = MinMaxScaler()

forwards_normalized = forwards.copy()
forwards_normalized[forward_cols] = scaler.fit_transform(forwards_normalized[forward_cols])

midfielders_normalized = midfielders.copy()
midfielders_normalized[midfielder_cols] = scaler.fit_transform(midfielders_normalized[midfielder_cols])

defenders_normalized = defenders.copy()
defenders_normalized[defender_cols] = scaler.fit_transform(defenders_normalized[defender_cols])

goalkeepers_normalized = goalkeepers.copy()
goalkeepers_normalized[goalkeeper_cols] = scaler.fit_transform(goalkeepers_normalized[goalkeeper_cols])

print("Forwards Normalized:")
print(forwards_normalized.head())
print("\nMidfielders Normalized:")
print(midfielders_normalized.head())
print("\nDefenders Normalized:")
print(defenders_normalized.head())
print("\nGoalkeepers Normalized:")
print(goalkeepers_normalized.head())

Forwards Normalized:
    Rk             Player   Nation Pos      Squad        Comp   Age    Born  \
4    5  Keyliane Abdallah   fr FRA  FW  Marseille  fr Ligue 1  18.0  2006.0   
13  14     Matthis Abline   fr FRA  FW     Nantes  fr Ligue 1  21.0  2003.0   
17  18      Tammy Abraham  eng ENG  FW       Roma  it Serie A  26.0  1997.0   
18  19      Tammy Abraham  eng ENG  FW      Milan  it Serie A  26.0  1997.0   
22  23         Akor Adams   ng NGA  FW    Sevilla  es La Liga  24.0  2000.0   

    MP  Starts   Min   90s       Gls       Ast  G+A  G-PK  PK  PKatt  \
4    1       0     3   0.0  0.000000  0.000000    0     0   0      0   
13  34      33  2768  30.8  0.290323  0.111111   11     8   1      1   
17   1       0     1   0.0  0.000000  0.000000    0     0   0      0   
18  28      12  1183  13.1  0.096774  0.222222    7     2   1      2   
22   4       1   134   1.5  0.000000  0.000000    0     0   0      0   

        CrdY  CrdR        xG  npxG       xAG  npxG+xAG  PrgC  PrgP     

Top 50 forwards with the most G+A

In [427]:
top_forwards = forwards_normalized.sort_values(by='G+A', ascending=False).head(50)
print("\nTop 50 Forwards by Goals + Assists:")
print(top_forwards[['Player', 'Squad', 'Comp', 'Pos', 'Min', 'Gls', 'xG', 'Ast', 'xAG', 'G+A', 'Carries', 'PrgR', 'Dis', 'PKwon', 'CrdY', 'CrdR']])



Top 50 Forwards by Goals + Assists:
                    Player            Squad                Comp    Pos   Min  \
2304         Mohamed Salah        Liverpool  eng Premier League     FW  3371   
1317            Harry Kane    Bayern Munich       de Bundesliga     FW  2381   
1691         Kylian Mbappé      Real Madrid          es La Liga     FW  2907   
2201         Mateo Retegui         Atalanta          it Serie A     FW  2383   
1219        Alexander Isak    Newcastle Utd  eng Premier League     FW  2756   
1483    Robert Lewandowski        Barcelona          es La Liga     FW  2667   
1960         Michael Olise    Bayern Munich       de Bundesliga  FW,MF  2334   
2177              Raphinha        Barcelona          es La Liga  FW,MF  2839   
1693          Bryan Mbeumo        Brentford  eng Premier League     FW  3414   
697        Ousmane Dembélé        Paris S-G          fr Ligue 1     FW  1730   
1060       Mason Greenwood        Marseille          fr Ligue 1  FW,MF  2804   
405

Top 50 midfielders with the most G+A

In [428]:
top_midfielders = midfielders_normalized.sort_values(by='G+A', ascending=False).head(50)
print("\nTop 50 Midfielders by Goals + Assist:")
print(top_midfielders[['Player', 'Squad', 'Comp', 'Pos', 'Min', 'Ast', 'xAG', 'KP', 'PPA', 'PrgP', 'Carries', 'PrgR', 'TklW', 'Int', 'Recov', 'Mis', 'CrdY', 'CrdR']])


Top 50 Midfielders by Goals + Assist:
                     Player            Squad                Comp    Pos   Min  \
2033            Cole Palmer          Chelsea  eng Premier League  MF,FW  3191   
2779          Florian Wirtz       Leverkusen       de Bundesliga  MF,FW  2351   
2099          Gaëtan Perrin          Auxerre          fr Ligue 1  MF,FW  2691   
637           Matheus Cunha           Wolves  eng Premier League  MF,FW  2597   
1513        Ademola Lookman         Atalanta          it Serie A  MF,FW  2247   
1410        Andrej Kramarić       Hoffenheim       de Bundesliga  MF,FW  2767   
891         Bruno Fernandes   Manchester Utd  eng Premier League     MF  3018   
1366        Justin Kluivert      Bournemouth  eng Premier League     MF  2340   
1221                   Isco            Betis          es La Liga     MF  1547   
288         Jude Bellingham      Real Madrid          es La Liga     MF  2488   
2428            Xavi Simons       RB Leipzig       de Bundesliga     M

Top 50 defenders based on Clearances

In [429]:
top_defenders = defenders_normalized.sort_values(by='Clr', ascending=False).head(50)
print("\nTop 50 Defenders by Clearances:")
print(top_defenders[['Player', 'Squad', 'Comp', 'Pos', 'Min', 'Tkl', 'TklW', 'Blocks', 'Int', 'Tkl+Int', 'Clr', 'Err', 'PrgC', 'Recov', 'CrdY', 'CrdR']])


Top 50 Defenders by Clearances:
                    Player            Squad                Comp Pos   Min  \
1223        Ardian Ismajli           Empoli          it Serie A  DF  2508   
1828               Murillo  Nott'ham Forest  eng Premier League  DF  3188   
79           Omar Alderete           Getafe          es La Liga  DF  2971   
254   Federico Baschirotto            Lecce          it Serie A  DF  3420   
2679           Denis Vavro        Wolfsburg       de Bundesliga  DF  2461   
578         Nathan Collins        Brentford  eng Premier League  DF  3420   
1935           Dara O'Shea     Ipswich Town  eng Premier League  DF  3122   
1423       Marash Kumbulla         Espanyol          es La Liga  DF  2972   
2554         César Tárrega         Valencia          es La Liga  DF  3026   
2165        Antonio Raillo         Mallorca          es La Liga  DF  3207   
513                 Catena          Osasuna          es La Liga  DF  3074   
2553       James Tarkowski          Everton

Top 50 goalkeepers based on Goals conceded

In [430]:
min_played = goalkeepers_normalized[goalkeepers_normalized['Min'] >= 2000]
top_goalkeepers = min_played.sort_values(by='GA', ascending=True).head(50)
print("\nTop 50 Goalkeepers by Goals conceded (Min 2000 Minutes Played):")
print(top_goalkeepers[['Player', 'Squad', 'Comp', 'Pos', 'Min', 'GA', 'Saves', 'Save%', 'CS', 'CS%', 'Err', 'CrdY', 'CrdR']])


Top 50 Goalkeepers by Goals conceded (Min 2000 Minutes Played):
                      Player            Squad                Comp Pos   Min  \
765     Gianluigi Donnarumma        Paris S-G          fr Ligue 1  GK  2091   
1724              Alex Meret           Napoli          it Serie A  GK  3005   
810                  Ederson  Manchester City  eng Premier League  GK  2320   
623         Thibaut Courtois      Real Madrid          es La Liga  GK  2700   
87                   Alisson        Liverpool  eng Premier League  GK  2508   
1938               Jan Oblak  Atlético Madrid          es La Liga  GK  3240   
1461            Nicola Leali            Genoa          it Serie A  GK  2610   
704      Michele Di Gregorio         Juventus          it Serie A  GK  2970   
2470             Yann Sommer            Inter          it Serie A  GK  2970   
2447        Łukasz Skorupski          Bologna          it Serie A  GK  2364   
2331          Robert Sánchez          Chelsea  eng Premier League 

Scoring weight using IBM Granite models

In [431]:
stats_description = """
Basic Player Information:

Player– Player's name

Nation – Player's nationality

Pos – Position (FW, MF, DF, GK)

Squad – Club name

Comp – League

Age – Age of the player

Born – Year of birth

Playing Time & Appearances

MP – Matches played

Starts – Games started

Min – Minutes played

90s – Number of full 90-minute matches played

Attacking Stats

Gls – Goals scored

Ast – Assists provided

G+A – Goals + Assists

xG – Expected goals

xAG – Expected assists

npxG – Non-penalty expected goals

G-PK – Goals excluding penalties

Defensive Stats

Tkl – Total tackles

TklW – Tackles won

Blocks – Blocks made

Int – Interceptions

Tkl+Int – Combined tackles and interceptions

Clr – Clearances

Err – Errors leading to goals

Passing & Creativity Stats

PrgP – Progressive passes

PrgC – Progressive carries

KP – Key passes (passes leading to a shot)

Cmp%_stats_passing – Pass completion percentage

Ast_stats_passing – Assists

xA – Expected assists

PPA – Passes into the penalty area

Goalkeeping Stats

GA – Goals conceded

Saves – Saves made

Save% – Save percentage

CS – Clean sheets

CS% – Clean sheet percentage

PKA – Penalties faced

PKsv – Penalty saves

Possession & Ball Control

Touches – Total touches of the ball

Carries – Total ball carries

PrgR – Progressive runs (carries moving the ball forward significantly)

Mis – Miscontrols

Dis – Times dispossessed

Miscellaneous Stats

CrdY – Yellow cards

CrdR – Red cards

PKwon – Penalties won

PKcon – Penalties conceded

Recov – Ball recoveries
"""

forward_cols = ['Gls', 'xG', 'Ast', 'xAG', 'Carries', 'PrgR', 'Dis', 'PKwon', 'CrdY', 'CrdR']
midfielder_cols = ['Ast', 'xAG', 'KP', 'PPA', 'PrgP', 'Carries', 'PrgR', 'TklW', 'Int', 'Recov', 'Mis', 'CrdY', 'CrdR']
defender_cols = ['Tkl', 'TklW', 'Blocks', 'Int', 'Clr', 'Err', 'PrgC', 'Recov', 'CrdY', 'CrdR']
goalkeeper_cols = ['GA', 'Saves', 'Save%', 'CS', 'CS%', 'Err', 'CrdY', 'CrdR']

prompt_forward = f"""
{stats_description}

I need to assign a scoring weight to each attribute for football players based on their position.
For the forward position, the attributes are: {', '.join(forward_cols)}.
Please provide the scoring weights for each attribute, summing up to 100, considering their importance for a forward player.
The output should be in a Python dictionary format, like this: {{'Attribute1': weight1, 'Attribute2': weight2, ...}}.
"""

prompt_midfielder = f"""
{stats_description}

I need to assign a scoring weight to each attribute for football players based on their position.
For the midfielder position, the attributes are: {', '.join(midfielder_cols)}.
Please provide the scoring weights for each attribute, summing up to 100, considering their importance for a midfielder player.
The output should be in a Python dictionary format, like this: {{'Attribute1': weight1, 'Attribute2': weight2, ...}}.
"""

prompt_defender = f"""
{stats_description}

I need to assign a scoring weight to each attribute for football players based on their position.
For the defender position, the attributes are: {', '.join(defender_cols)}.
Please provide the scoring weights for each attribute, summing up to 100, considering their importance for a defender player.
The output should be in a Python dictionary format, like this: {{'Attribute1': weight1, 'Attribute2': weight2, ...}}.
"""

prompt_goalkeeper = f"""
{stats_description}

I need to assign a scoring weight to each attribute for football players based on their position.
For the goalkeeper position, the attributes are: {', '.join(goalkeeper_cols)}.
Please provide the scoring weights for each attribute, summing up to 100, considering their importance for a goalkeeper player.
The output should be in a Python dictionary format, like this: {{'Attribute1': weight1, 'Attribute2': weight2, ...}}.
"""


forward_weights = llm.invoke(prompt_forward)
midfielder_weights = llm.invoke(prompt_midfielder)
defender_weights = llm.invoke(prompt_defender)
goalkeeper_weights = llm.invoke(prompt_goalkeeper)

print("Forward Weights:")
print(forward_weights)
print("\nMidfielder Weights:")
print(midfielder_weights)
print("\nDefender Weights:")
print(defender_weights)
print("\nGoalkeeper Weights:")
print(goalkeeper_weights)

Forward Weights:
```json
{
    "Forward_Weights": {
        "Gls": 25,
        "xG": 15,
        "Ast": 15,
        "xAG": 10,
        "Carries": 10,
        "PrgR": 10,
        "Dis": 5,
        "PKwon": 5,
        "CrdY": 3,
        "CrdR": 2
    }
}
```

This weighting scheme assigns a higher percentage to direct goal involvement (Goals, Expected goals, Assists, Expected assists) and ball-carrying efficiency (Carries, Progressive runs). It also considers dispossession rate (Dis), penalty wins (PKwon), and card accumulation (CrdY, CrdR) as factors, though to a lesser extent. The weights sum to 100, reflecting the priorities for evaluating a forward player's performance.

Midfielder Weights:
Here's a suggested weight distribution for a midfielder position, keeping in mind that the importance of each attribute can vary based on specific playing styles and team strategies. This distribution aims to balance creative contributions, defensive responsibilities, and ball control for a midfie

In [450]:
def score_forward(row):
  score = 0
  # Goals and assists
  score += row['Gls'] * 3
  score += row['Ast'] * 2
  # Expected stats
  score += row['xG'] * 1.5
  score += row['xAG'] * 1
  # Ball carrying and progression
  score += row['Carries'] * 0.5
  score += row['PrgR'] * 0.5
  # Penalties
  score += row['PKwon'] * 1.5
  # Errors
  score -= row['Dis'] * 0.5
  # Discipline
  score -= row['CrdY'] * 0.5
  score -= row['CrdR'] * 1
  return score

def score_midfielder(row):
  score = 0
  # Creativity and passing
  score += row['Ast'] * 2
  score += row['xAG'] * 1
  score += row['KP'] * 1
  score += row['PrgP'] * 1
  score += row['PPA'] * 0.5
  # Defensive contribution
  score += row['TklW'] * 0.5
  score += row['Int'] * 0.5
  score += row['Recov'] * 0.3
  # Ball carrying
  score += row['PrgR'] * 0.6
  score += row['Carries'] * 0.6
  # Errors
  score -= row['Mis'] * 0.5
  # Discipline
  score -= row['CrdY'] * 0.5
  score -= row['CrdR'] * 1
  return score

def score_defender(row):
  score = 0
  # Defensive actions
  score += row['TklW'] * 1
  score += row['Int'] * 0.5
  score += row['Blocks'] * 1
  score += row['Clr'] * 1.5
  # Progressive actions
  score += row['PrgC'] * 0.2
  score += row['Recov'] * 1
  # Errors
  score -= row['Err'] * 0.8
  # Discipline
  score -= row['CrdY'] * 0.3
  score -= row['CrdR'] * 0.9
  return score

def score_goalkeeper(row):
  score = 0
  # Preventing goals
  score -= row['GA'] * 1.5
  score += row['Saves'] * 1
  score += row['Save%'] * 2
  score += row['CS'] * 1.5
  score += row['CS%'] * 1
  # Errors
  score -= row['Err'] * 0.5
  # Discipline
  score -= row['CrdY'] * 0.5
  score -= row['CrdR'] * 1
  return score

top_forwards['Score'] = top_forwards.apply(score_forward, axis=1)
top_midfielders['Score'] = top_midfielders.apply(score_midfielder, axis=1)
top_defenders['Score'] = top_defenders.apply(score_defender, axis=1)
top_goalkeepers['Score'] = top_goalkeepers.apply(score_goalkeeper, axis=1)

print("\nTop 50 Forwards:")
print(top_forwards.sort_values(by='Score', ascending=False)[['Player', 'Squad', 'Comp', 'Pos', 'Score']])
print("\nTop 50 Midfielders:")
print(top_midfielders.sort_values(by='Score', ascending=False)[['Player', 'Squad', 'Comp', 'Pos', 'Score']])
print("\nTop 50 Defenders:")
print(top_defenders.sort_values(by='Score', ascending=False)[['Player', 'Squad', 'Comp', 'Pos', 'Score']])
print("\nTop 50 Goalkeepers by Score:")
print(top_goalkeepers.sort_values(by='Score', ascending=False)[['Player', 'Squad', 'Comp', 'Pos', 'Score']])


Top 50 Forwards:
                    Player            Squad                Comp    Pos  \
2304         Mohamed Salah        Liverpool  eng Premier League     FW   
1691         Kylian Mbappé      Real Madrid          es La Liga     FW   
2177              Raphinha        Barcelona          es La Liga  FW,MF   
1317            Harry Kane    Bayern Munich       de Bundesliga     FW   
1960         Michael Olise    Bayern Munich       de Bundesliga  FW,MF   
2201         Mateo Retegui         Atalanta          it Serie A     FW   
405           Ante Budimir          Osasuna          es La Liga     FW   
2792          Lamine Yamal        Barcelona          es La Liga     FW   
697        Ousmane Dembélé        Paris S-G          fr Ligue 1     FW   
1060       Mason Greenwood        Marseille          fr Ligue 1  FW,MF   
1219        Alexander Isak    Newcastle Utd  eng Premier League     FW   
1483    Robert Lewandowski        Barcelona          es La Liga     FW   
1093       Serhou Gu

Top 25 Player Of The Season 2024/2025

In [451]:
top_25_forwards = top_forwards.sort_values(by='Score', ascending=False).head(25)
top_25_midfielders = top_midfielders.sort_values(by='Score', ascending=False).head(25)
top_25_defenders = top_defenders.sort_values(by='Score', ascending=False).head(25)
top_25_goalkeepers = top_goalkeepers.sort_values(by='Score', ascending=False).head(25)

top_players_combined = pd.concat([top_25_forwards, top_25_midfielders, top_25_defenders, top_25_goalkeepers])
top_players_overall = top_players_combined.sort_values(by='Score', ascending=False)

print("\nTop 25 Overall Players:")
print(top_players_overall[['Player', 'Squad', 'Comp', 'Pos', 'Score']].head(25))


Top 25 Overall Players:
                  Player           Squad                Comp    Pos     Score
2304       Mohamed Salah       Liverpool  eng Premier League     FW  7.689844
1691       Kylian Mbappé     Real Madrid          es La Liga     FW  5.927364
2177            Raphinha       Barcelona          es La Liga  FW,MF  5.577550
1317          Harry Kane   Bayern Munich       de Bundesliga     FW  5.506758
1960       Michael Olise   Bayern Munich       de Bundesliga  FW,MF  5.260140
2201       Mateo Retegui        Atalanta          it Serie A     FW  5.210430
891      Bruno Fernandes  Manchester Utd  eng Premier League     MF  5.057632
405         Ante Budimir         Osasuna          es La Liga     FW  4.973782
651     Mikkel Damsgaard       Brentford  eng Premier League  MF,FW  4.939526
2792        Lamine Yamal       Barcelona          es La Liga     FW  4.894730
2779       Florian Wirtz      Leverkusen       de Bundesliga  MF,FW  4.877791
200           Alex Baena      Villarrea

Summaries by IBM Granite models

In [454]:
prompt = f"""Based on the following DataFrame which contains a list of top football players from the 2024/2025 season ranked by a calculated score, analyze the data and provide insights into which players and positions are performing best. Consider the distribution of players across different positions and leagues in the top rankings.

DataFrame:
{top_players_overall[['Player', 'Squad', 'Comp', 'Pos', 'Score']].head(25).to_string()}

Analyze the data and answer the following:
1.  Who are the top 5 players based on the 'Score'?
2.  What are the primary positions represented in the top 10 players?
3.  Are there any notable leagues or clubs that have a strong presence in the top rankings?
4.  Based on this analysis, what general conclusions can be drawn about the performance distribution among positions and leagues in the 2024/2025 season?
5.  Who is Player Of The Season 2024/2025 based solely on this data?
"""

print(llm.invoke(prompt))

### Analysis of the Provided Football Player DataFrame

#### 1. Top 5 Players Based on 'Score'

To identify the top 5 players based on the 'Score' column, we'll sort the DataFrame in descending order by the 'Score' and select the first five entries:

```python
import pandas as pd

# Sample DataFrame creation based on provided data
data = {
    'Player': [
        "Mohamed Salah", "Kylian Mbappé", "Raphinha", "Harry Kane", "Michael Olise", 
        "Mateo Retegui", "Bruno Fernandes", "Ante Budimir", "Mikkel Damsgaard", 
        "Lamine Yamal", "Florian Wirtz", "Alex Baena", "Ousmane Dembélé", 
        "Mason Greenwood", "Alexander Isak", "Robert Lewandowski", "Cole Palmer", 
        "Serhou Guirassy", "Hugo Ekitike", "Bradley Barcola", "Gaëtan Perrin", 
        "Julian Brandt", "Bryan Mbeumo", "Martin Ødegaard", "Bruno Guimarães"
    ],
    'Squad': [
        "Liverpool", "Real Madrid", "Barcelona", "Bayern Munich", "Bayern Munich", 
        "Atalanta", "Manchester Utd", "Osasuna", "Bre

**Project Title**: Player of the Season 2024/2025  
**Author**: Muhammad Khoirul Irsyadul Ibad  
**Email**: irulkhoirul414@gmail.com  
**GitHub**: https://github.com/MKHO1RUL

Thank You