# FIFA 21 IRONHACK COMPETITION

# PART (I)

**Link to repo: https://github.com/ironhack-edu/data_project_FIFA_21**


You will use the fifa21_trainning.csv dataset provided to predict the position ('OVA') of each player. The competition will take place from friday morning to tuesday. 
<br><br>
Your model will be saved in a pickle file.
<br><br>
The ranking of the competitors will be calculated according to the highest Mean Average Error (MAE), rounded to 2 decimals.
<br><br>
Ties will be broken using, respectively: R2 Score (rounded to 2 decimals), Root Mean Squared Error (rounded to 2 decimals), time to run the code (using timeit)
<br>

## DELIVERABLES:

Your group should deliver a `group Jupyter notebook` with all the preprocessing functions alongside with the model.

Everything must be delivered until 12am on Wednesday. 
<br><br>
Be prepared to share your work on Tuesday morning, the best scores will have the opportunity to show their notebook and go through their pipeline (~10 min).
<br><br>

To deliver:
* A notebook with your work and model (group_number.ipynb);
* Pickle file with the model (group_number.pkl). 
<br><br>

The instructor will use your `group Jupyter notebook` to load a new dataset and use your functions and
your model to make a prediction in unseen data.


<br><br>

For this small project you are going to work in groups to put in practice some of the concepts of the previous week.

With your group mates, open the file in `file_for_project/fifa21_training.csv`. The objective is to create the best linear model to predict the column `OVA`.

You can find some documentation about the meaning of each column in the following links:

- [link - 0](https://sofifa.com/)
- [link - 1](https://gaming.stackexchange.com/questions/167318/what-do-fifa-14-position-acronyms-mean)
- [link - 2](https://www.fifauteam.com/fifa-ultimate-team-positions-and-tactics/)

### 1

Each member of the team should have his/her own _juypter_ notebook. In addition, each group should have a `group jupyter notebook`.

### 2

Decide which columns can be predictive and which ones can be directly dropped and take the needed actions.

### 3

Decide among the members of the group who is going to take care of inspecting the remaining columns
of the dataset. For example:
Member 1: cols 1 -> 5
Member 2: cols 6 -> 10
...
and so on

### 4

Each member must do:

- Explore their assigned columns and write python code to perform any cleanup operation that the assigned columns may need.
- Perform any scaling operation that the assigned column may need.

### 5

Put all the code of each member into the `group jupyter notebook`.

In [1]:
import pandas as pd
import numpy as np

In [2]:
data=pd.read_csv('./file_for_project/fifa21_training.csv')
for elem in data.columns:
    print(data[elem].isna().sum(),elem)

0 Unnamed: 0
0 ID
0 Name
0 Age
0 Nationality
21 Club
0 BP
343 Position
0 Team & Contract
0 Height
0 Weight
0 foot
0 Growth
44 Joined
12961 Loan Date End
0 Value
0 Wage
0 Release Clause
0 Contract
0 Attacking
0 Crossing
0 Finishing
0 Heading Accuracy
0 Short Passing
44 Volleys
0 Skill
0 Dribbling
44 Curve
0 FK Accuracy
0 Long Passing
0 Ball Control
0 Movement
0 Acceleration
0 Sprint Speed
44 Agility
0 Reactions
44 Balance
0 Power
0 Shot Power
44 Jumping
0 Stamina
0 Strength
0 Long Shots
0 Mentality
0 Aggression
7 Interceptions
7 Positioning
44 Vision
0 Penalties
329 Composure
0 Defending
0 Marking
0 Standing Tackle
44 Sliding Tackle
0 Goalkeeping
0 GK Diving
0 GK Handling
0 GK Kicking
0 GK Positioning
0 GK Reflexes
0 Total Stats
0 Base Stats
0 W/F
0 SM
67 A/W
67 D/W
0 IR
0 PAC
0 SHO
0 PAS
0 DRI
0 DEF
0 PHY
0 Hits
0 LS
0 ST
0 RS
0 LW
0 LF
0 CF
0 RF
0 RW
0 LAM
0 CAM
0 RAM
0 LM
0 LCM
0 CM
0 RCM
0 RM
0 LWB
0 LDM
0 CDM
0 RDM
0 RWB
0 LB
0 LCB
0 CB
0 RCB
0 RB
0 GK
0 OVA


In [3]:
cols = data.columns
cols

Index(['Unnamed: 0', 'ID', 'Name', 'Age', 'Nationality', 'Club', 'BP',
       'Position', 'Team & Contract', 'Height',
       ...
       'CDM', 'RDM', 'RWB', 'LB', 'LCB', 'CB', 'RCB', 'RB', 'GK', 'OVA'],
      dtype='object', length=102)

In [4]:
for elem in cols:
    print(elem)

Unnamed: 0
ID
Name
Age
Nationality
Club
BP
Position
Team & Contract
Height
Weight
foot
Growth
Joined
Loan Date End
Value
Wage
Release Clause
Contract
Attacking
Crossing
Finishing
Heading Accuracy
Short Passing
Volleys
Skill
Dribbling
Curve
FK Accuracy
Long Passing
Ball Control
Movement
Acceleration
Sprint Speed
Agility
Reactions
Balance
Power
Shot Power
Jumping
Stamina
Strength
Long Shots
Mentality
Aggression
Interceptions
Positioning
Vision
Penalties
Composure
Defending
Marking
Standing Tackle
Sliding Tackle
Goalkeeping
GK Diving
GK Handling
GK Kicking
GK Positioning
GK Reflexes
Total Stats
Base Stats
W/F
SM
A/W
D/W
IR
PAC
SHO
PAS
DRI
DEF
PHY
Hits
LS
ST
RS
LW
LF
CF
RF
RW
LAM
CAM
RAM
LM
LCM
CM
RCM
RM
LWB
LDM
CDM
RDM
RWB
LB
LCB
CB
RCB
RB
GK
OVA


We are going to drop some of the columns that we already know that are not going to give us useful info to predict the OVA of a player

In [5]:
data.drop(["Unnamed: 0", "ID","Joined", "Loan Date End", "Team & Contract", 'Nationality' ], axis = 1, inplace = True)

In [6]:
cols2 = data.columns
for elem in cols2:
    print(elem)

Name
Age
Club
BP
Position
Height
Weight
foot
Growth
Value
Wage
Release Clause
Contract
Attacking
Crossing
Finishing
Heading Accuracy
Short Passing
Volleys
Skill
Dribbling
Curve
FK Accuracy
Long Passing
Ball Control
Movement
Acceleration
Sprint Speed
Agility
Reactions
Balance
Power
Shot Power
Jumping
Stamina
Strength
Long Shots
Mentality
Aggression
Interceptions
Positioning
Vision
Penalties
Composure
Defending
Marking
Standing Tackle
Sliding Tackle
Goalkeeping
GK Diving
GK Handling
GK Kicking
GK Positioning
GK Reflexes
Total Stats
Base Stats
W/F
SM
A/W
D/W
IR
PAC
SHO
PAS
DRI
DEF
PHY
Hits
LS
ST
RS
LW
LF
CF
RF
RW
LAM
CAM
RAM
LM
LCM
CM
RCM
RM
LWB
LDM
CDM
RDM
RWB
LB
LCB
CB
RCB
RB
GK
OVA


In [7]:
data['Value']

0        €525K
1        €8.5M
2          €9M
3        €275K
4        €725K
         ...  
13695    €325K
13696    €190K
13697      €8M
13698    €140K
13699    €425K
Name: Value, Length: 13700, dtype: object

In [10]:
data['Position'] = data['Position'].fillna(0)

In [11]:
for elem in data.columns:
    print(data[elem].isna().sum(),elem)

0 Name
0 Age
21 Club
0 BP
0 Position
0 Height
0 Weight
0 foot
0 Growth
0 Value
0 Wage
0 Release Clause
0 Contract
0 Attacking
0 Crossing
0 Finishing
0 Heading Accuracy
0 Short Passing
44 Volleys
0 Skill
0 Dribbling
44 Curve
0 FK Accuracy
0 Long Passing
0 Ball Control
0 Movement
0 Acceleration
0 Sprint Speed
44 Agility
0 Reactions
44 Balance
0 Power
0 Shot Power
44 Jumping
0 Stamina
0 Strength
0 Long Shots
0 Mentality
0 Aggression
7 Interceptions
7 Positioning
44 Vision
0 Penalties
329 Composure
0 Defending
0 Marking
0 Standing Tackle
44 Sliding Tackle
0 Goalkeeping
0 GK Diving
0 GK Handling
0 GK Kicking
0 GK Positioning
0 GK Reflexes
0 Total Stats
0 Base Stats
0 W/F
0 SM
67 A/W
67 D/W
0 IR
0 PAC
0 SHO
0 PAS
0 DRI
0 DEF
0 PHY
0 Hits
0 LS
0 ST
0 RS
0 LW
0 LF
0 CF
0 RF
0 RW
0 LAM
0 CAM
0 RAM
0 LM
0 LCM
0 CM
0 RCM
0 RM
0 LWB
0 LDM
0 CDM
0 RDM
0 RWB
0 LB
0 LCB
0 CB
0 RCB
0 RB
0 GK
0 OVA
