# 📘 Description: Normalizing Football Player Data

This notebook performs **data normalization** on a dataset of football players’ pre_transfer_labeled_clean.csv.  
We use **StandardScaler** from `scikit-learn` to standardize numeric features.

Standardization is important because it brings all numerical features to the same scale, which helps many machine learning models perform better.  
With `StandardScaler`, each feature is transformed to have:
- **Mean = 0**
- **Standard Deviation = 1**

## 🔹 Step 1: Import Required Libraries  
We import the necessary libraries:  
- `pandas` for reading and processing the dataset  
- `StandardScaler` for normalizing numerical features  

In [None]:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from google.colab import files

## 🔹 Step 2: Load the Dataset  
The dataset is loaded from a CSV file. It contains various features about players before their transfers, along with metadata and a success label.

In [None]:
df = pd.read_csv(
    "https://raw.githubusercontent.com/MIT-Emerging-Talent/ET6-CDSP-group-23-repo/main/1_datasets/labeled/pre_transfer_labeled_clean.csv"
)

## 🔹 Step 3: Select Only Feature Columns  
We drop the columns that are not numerical features:
- Player identifiers: **‘Player Name’**, **‘Season_Year’**
- Categorical data: **‘Position’**
- Target label: **‘Successful’**

These are excluded from scaling.

In [None]:
# Drop non-numeric or target columns before scaling
df_features = df.drop(columns=["Player Name", "Season_Year", "Position", "Successful"])

## 🔹 Step 4: Handle Missing Values  
To ensure scaling works correctly, we replace any missing values in the feature columns with the **mean** of their respective columns.


In [None]:
# Handle missing values (if any)
df_features = df_features.fillna(df_features.mean())

## 🔹 Step 5: Normalize the Features  
We use `StandardScaler` to normalize the numerical features.  
This scales each feature to have a **mean of 0** and **standard deviation of 1**, ensuring all features contribute equally to model training.


In [None]:
# Normalize using StandardScaler
scaler = StandardScaler()
scaled_array = scaler.fit_transform(df_features)

## 🔹 Step 6: Combine with Original Columns  
After scaling, we convert the result back into a DataFrame and concatenate it with the original non-numeric and target columns:  
- `Player Name`  
- `Season_Year`  
- `Position`  
- `Successful`  

This gives us the final processed dataset, combining both metadata and normalized features.

In [None]:
# ✅ Create scaled DataFrame
df_scaled = pd.DataFrame(scaled_array, columns=df_features.columns)

# ✅ Concatenate with original ID/target columns
df_final = pd.concat(
    [
        df[["Player Name", "Season_Year", "Position", "Successful"]].reset_index(
            drop=True
        ),
        df_scaled.reset_index(drop=True),
    ],
    axis=1,
)

In [None]:
df_final

Unnamed: 0,Player Name,Season_Year,Position,Successful,Average Rating,Defending - Aerial duels won,Defending - Blocked,Defending - Dribbled past,Defending - Duels won,Defending - Fouls committed,...,Passing - Successful passes,Possession - Dispossessed,Possession - Fouls won,Possession - Penalties awarded,Possession - Successful dribbles,Possession - Touches,Possession - Touches in opposition box,ShootingRank - Goals,ShootingRank - Shots,ShootingRank - Shots on target
0,Aaron Lennon,2019.5,Attack,False,-1.848254,-1.039033,-0.241807,-0.131751,-0.393848,-0.016186,...,-0.582439,-0.209577,-0.187429,-0.575685,-0.334724,-0.456771,-0.205468,-0.270543,-0.308737,-0.174775
1,Aaron Mooy,2017.5,Midfield,True,0.119671,-0.696920,-0.067419,0.189182,-0.316741,-0.227556,...,0.296965,0.035792,-0.403985,-0.575685,-0.485835,0.111373,-0.263989,-0.155978,-0.151112,-0.178581
2,Adam Armstrong,2019.5,Attack,False,0.167669,-0.967009,0.387507,-0.472133,-0.579533,-0.207985,...,-0.615394,-0.062356,-0.261870,-0.575685,-0.177317,-0.462500,0.126781,0.273639,0.268669,0.331444
3,Adam Webster,2017.5,Defense,True,0.551655,2.610174,-0.203897,-0.481859,0.429147,-0.251041,...,0.054458,-0.356799,-0.173894,-0.575685,-0.511020,0.065881,-0.244167,-0.232355,-0.283848,-0.167162
4,Adama Traore,2016.5,Attack,True,1.223629,-0.486850,-0.143240,-0.297079,1.225391,-0.024014,...,-0.715693,0.436563,0.435170,0.185042,4.677122,-0.202858,-0.067660,-0.175072,-0.146134,-0.186193
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143,Tyler Adams,2020.5,Midfield,True,-0.552303,-0.450838,-0.370703,0.062754,-0.278974,-0.047500,...,0.686339,-0.253199,-0.153592,-0.575685,-0.397687,0.199325,-0.283810,-0.251449,-0.318692,-0.216642
144,Tyrell Malacia,2020.5,Defense,True,0.071673,-0.726930,-0.165986,-0.034498,-0.145218,-0.070986,...,0.358219,-0.018734,-0.143441,0.185042,0.049349,0.242795,-0.148834,-0.251449,-0.157748,-0.235673
145,Victor Kristiansen,2020.5,Defense,True,-1.272276,-0.750937,-0.165986,0.004403,-0.417452,-0.239299,...,-0.137185,-0.209577,-0.417520,-0.575685,-0.448058,-0.007411,-0.228121,-0.270543,-0.313714,-0.281347
146,Wilfried Gnonto,2020.5,Attack,True,-0.888291,-0.666910,0.076641,0.091930,0.347320,0.093413,...,-0.479633,-0.010555,0.844597,2.467222,0.502682,-0.215158,0.223057,0.073151,0.061267,0.042176


## 🔹 Step 7: Save and Download the Final Dataset  
We save the final DataFrame as a CSV file and download it to the local machine using Google Colab's `files.download()` function.


In [None]:
# ✅ Save normalized DataFrame to CSV
output_path = "/content/pre_transfer_labeled_scaled.csv"
df_final.to_csv(output_path, index=False)

# ✅ Download the file to your computer
files.download(output_path)

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>