# ⚽ Football Player Performance Prediction

This notebook builds a simple machine learning model (linear regression) to predict a custom player rating based on performance data from top 5 European leagues.

## 📦 Import Libraries
We start by importing necessary Python libraries.

In [1]:
from pathlib import Path
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
import pandas as pd

## 📄 Load Dataset
Load the CSV file containing player data.

In [2]:
# Adjust path as necessary if running in another environment
players_data = Path().resolve().parent / "data" / "top5_leagues_players_2023.csv"
df = pd.read_csv(players_data)
df.head()

Unnamed: 0,name,league,club,age,position,goals,assists,shots,minutes,yellow_cards,red_cards
0,Erling Haaland,Premier League,Manchester City,22,ST,36,8,95,2800,3,0
1,Kylian Mbappé,Ligue 1,PSG,24,ST,29,7,88,2700,2,0
2,Harry Kane,Bundesliga,Bayern Munich,29,ST,30,5,85,2650,1,0
3,Robert Lewandowski,La Liga,Barcelona,34,ST,23,6,77,2500,4,0
4,Victor Osimhen,Serie A,Napoli,24,ST,26,4,80,2600,2,1


## 🧮 Create Custom Rating
We define a custom rating formula to evaluate players.

In [3]:
df["custom_rating"] = (
    df["goals"] * 4 +
    df["assists"] * 3 +
    df["shots"] * 0.5 -
    df["yellow_cards"] -
    df["red_cards"] * 3
)
df[["name", "custom_rating"]]

Unnamed: 0,name,custom_rating
0,Erling Haaland,212.5
1,Kylian Mbappé,179.0
2,Harry Kane,176.5
3,Robert Lewandowski,144.5
4,Victor Osimhen,151.0
5,Lionel Messi,141.0
6,Mohamed Salah,142.5
7,Karim Benzema,112.0
8,Lautaro Martínez,132.0
9,Marcus Rashford,122.0


## ✂️ Split Data
Split the data into training and test sets.

In [4]:
X = df[["goals", "assists", "minutes"]]
y = df["custom_rating"]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

## 🤖 Train Model
Train a linear regression model.

In [5]:
model = LinearRegression()
model.fit(X_train, y_train)

0,1,2
,fit_intercept,True
,copy_X,True
,tol,1e-06
,n_jobs,
,positive,False


## 📈 Evaluate Model
Evaluate the model using mean squared error and R² score.

In [6]:
y_pred = model.predict(X_test)

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse:.2f}")
print(f"R-squared Score: {r2:.2f}")

Mean Squared Error: 1.05
R-squared Score: 1.00


## 📊 Coefficients
Display the learned coefficients for each feature.

In [7]:
coeff_df = pd.DataFrame(model.coef_, X.columns, columns=["Coefficient"])
coeff_df

Unnamed: 0,Coefficient
goals,4.328797
assists,3.058371
minutes,0.022767
