<div style="text-align: center; background-color: #5A96E3; font-family: 'Trebuchet MS', Arial, sans-serif; color: white; padding: 20px; font-size: 40px; font-weight: bold; border-radius: 0 0 0 0; box-shadow: 0px 6px 8px rgba(0, 0, 0, 0.2);">
  Making Questions</div>

## Import libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import geopandas
from mpl_toolkits.axes_grid1 import make_axes_locatable

## Read cleaned data from file

In [2]:
cleaned_df = pd.read_csv('data_footballer_processed.csv')
cleaned_df.head(10)

Unnamed: 0,Name,Height,Weight,Preferred Foot,Birth Date,Age,Nation,Club,League,Preferred Positions,...,POT,Value,Wage,Ball Skills,Defence,Mental,Passing,Physical,Shooting,Goalkeeper
0,Erling Haaland,195,94,Left,2000-07-21,23,Norway,Manchester City,England Premier League (1),ST,...,94,157000000.0,340000.0,80.5,38.0,80.2,59.0,83.7,84.0,10.4
1,Kylian Mbappé,182,75,Right,1998-12-20,24,France,Paris Saint-Germain,France Ligue 1 (1),"ST, LW",...,94,153500000.0,225000.0,92.5,33.0,76.7,78.3,89.0,82.2,8.4
2,Kevin De Bruyne,181,75,Right,1991-06-28,32,Belgium,Manchester City,England Premier League (1),"CM, CAM",...,91,103000000.0,350000.0,89.0,61.5,84.0,94.3,75.7,83.1,11.2
3,Harry Kane,188,85,Right,1993-07-28,30,England,Bayern München,Germany 1. Bundesliga (1),ST,...,90,119500000.0,170000.0,84.5,42.0,81.3,85.0,75.9,86.5,10.8
4,Thibaut Courtois,199,96,Left,1992-05-11,31,Belgium,Real Madrid,Spain Primera Division (1),GK,...,90,63000000.0,250000.0,18.0,17.0,41.5,27.3,54.0,22.4,86.6
5,Robert Lewandowski,185,81,Right,1988-08-21,35,Poland,FC Barcelona,Spain Primera Division (1),ST,...,90,58000000.0,340000.0,88.0,30.5,81.0,76.7,81.1,87.8,10.2
6,Karim Benzema,185,81,Right,1987-12-19,35,France,Al Ittihad,Saudi Pro League (1),"CF, ST",...,90,51000000.0,95000.0,89.0,21.0,77.7,80.0,79.3,84.6,8.2
7,Lionel Messi,169,67,Left,1987-06-24,36,Argentina,Inter Miami,USA Major League Soccer (1),"CF, CAM",...,90,41000000.0,23000.0,94.5,29.5,75.2,88.0,79.4,83.6,10.8
8,Rúben Dias,187,82,Right,1997-05-14,26,Portugal,Manchester City,England Premier League (1),CB,...,90,97500000.0,250000.0,69.5,89.0,73.2,71.0,70.1,48.4,9.4
9,Vini Jr.,176,73,Right,2000-07-12,23,Brazil,Real Madrid,Spain Primera Division (1),LW,...,94,121500000.0,310000.0,91.0,21.5,69.5,78.3,84.3,71.4,7.2


In [4]:
cleaned_df.shape

(10020, 21)

In [5]:
cleaned_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10020 entries, 0 to 10019
Data columns (total 21 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Name                 10020 non-null  object 
 1   Height               10020 non-null  int64  
 2   Weight               10020 non-null  int64  
 3   Preferred Foot       10020 non-null  object 
 4   Birth Date           10020 non-null  object 
 5   Age                  10020 non-null  int64  
 6   Nation               10020 non-null  object 
 7   Club                 10020 non-null  object 
 8   League               10020 non-null  object 
 9   Preferred Positions  10020 non-null  object 
 10  OVR                  10020 non-null  int64  
 11  POT                  10020 non-null  int64  
 12  Value                10020 non-null  float64
 13  Wage                 10020 non-null  float64
 14  Ball Skills          10020 non-null  float64
 15  Defence              10020 non-null 

## Data Preprocessing

- One-hot encoding is a process used in data preprocessing to convert categorical variables into a binary matrix (0s and 1s). It is particularly useful when dealing with machine learning algorithms that require numerical input, as these algorithms often work with numerical data rather than categorical data.
- To handle easily in grouping football player's position, I create one-hot encoding for position column so save in one_hot_position_df.

In [7]:
position_df = cleaned_df['Preferred Positions']
one_hot_position_df = position_df.str.get_dummies(', ')
# Custom order from GK -> ST to observe
custom_order = ['GK', 'LWB', 'LB', 'CB', 'RB', 'RWB', 'CDM', 'CM', 'LM', 'RM', 'CAM', 'LW', 'RW', 'CF', 'ST']
one_hot_position_df = one_hot_position_df[custom_order]
one_hot_position_df

Unnamed: 0,GK,LWB,LB,CB,RB,RWB,CDM,CM,LM,RM,CAM,LW,RW,CF,ST
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1
2,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10015,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
10016,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0
10017,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
10018,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


## Question 1:

**Benefits of Seeking Answers**:

## Question 2:

**Benefits of Seeking Answers**:

## Question 3:

**Benefits of Seeking Answers**:

## Question 4:

**Benefits of Seeking Answers**:

## Question 5:

**Benefits of Seeking Answers**: