# Analysis of FIFA2020 Players

## Content

<font color = 'blue'>
    
* [Introduction](#Introduction)
    * [Variable Description](#Variable_Description)
* [Objectives](#Objectives)
* [Data Preparation](#Data_Preparation)
* To be continued...

<div style="background: #0f214f;">
    <img src="https://piunikaweb.com/wp-content/uploads/2020/03/image_1585670159337.jpg" width="82%">
</div>

<a id="Introduction"></a>
# Introduction
FIFA 20 is a football simulation video game published by Electronic Arts as part of the FIFA series. Each player in FIFA has an overall rating as well as six scores for the key stats; Pace, Shooting, Passing, Dribbling, Defending, and Physical. These stats are combined with a player's international recognition to calculate the player's overall rating.

<div style="background: #0f214f;">
    <img src="https://www.footboom.net/img/upload/2/59edc-FIFA-18.jpeg" width="70%">
</div>

<a id="Variable_Description"></a>
## Variable Description
1. full_name - name and surname of the player
2. age - age of the player
3. country - country of the player
4. club - his football club where he plays
5. rating - player rating(0-99) in the game
6. value_eur - player value in euro
7. wage_eur - player salary in euro
8. preferred_foot - leading leg
9. reputation - player's recognition internationally, player popularity
10. position - player's position in the club

* int64(5): age, overall, value_eur, wage_eur,  international_reputation
* object(5): short_name, nationality, club, preferred_foot, team_position

<a id="Objectives"></a>
# Objectives
1. Analysis clubs and countries by players ratings, salaries and reputations
2. Analysis players for per country
3. Analysis clubs by player value
4. Analysis player ratings-to-age ratio and ratings-to-position
5. Right-handed vs left-handed, analysis by rating

<a id="Data_Preparation"></a>
#  1. Data Preparation

In [42]:
import pandas as pd
import requests
from bs4 import BeautifulSoup as BS

In [43]:
fifa2020_df = pd.read_csv('players_20.csv')

In [44]:
fifa2020_df = fifa2020_df[['short_name', 'age', 'nationality', 'club', 'overall', 'value_eur', 'wage_eur', 'preferred_foot', 'international_reputation', 'team_position']]

Renamed columns to make it easier to work:<br>
'short_name' - 'full_name'<br>
'nationality' - 'country'<br>
'overall' - 'rating'<br>
'international_reputation' - 'reputation'<br>
'team_position' - 'position'<br>

In [50]:
fifa2020_df.rename(columns={'short_name': 'full_name', 'nationality': 'country', 'overall': 'rating', 'international_reputation': 'reputation', 'team_position': 'position'}, inplace=True)

In [51]:
fifa2020_df.shape

(18038, 10)

The dataset has 18278 rows, and 10 columns

In [52]:
fifa2020_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 18038 entries, 0 to 18277
Data columns (total 10 columns):
 #   Column          Non-Null Count  Dtype 
---  ------          --------------  ----- 
 0   full_name       18038 non-null  object
 1   age             18038 non-null  int64 
 2   country         18038 non-null  object
 3   club            18038 non-null  object
 4   rating          18038 non-null  int64 
 5   value_eur       18038 non-null  int64 
 6   wage_eur        18038 non-null  int64 
 7   preferred_foot  18038 non-null  object
 8   reputation      18038 non-null  int64 
 9   position        18038 non-null  object
dtypes: int64(5), object(5)
memory usage: 1.5+ MB


In [53]:
fifa2020_df.isnull().sum()

full_name         0
age               0
country           0
club              0
rating            0
value_eur         0
wage_eur          0
preferred_foot    0
reputation        0
position          0
dtype: int64

In [54]:
fifa2020_df = fifa2020_df.dropna()
fifa2020_df

Unnamed: 0,full_name,age,country,club,rating,value_eur,wage_eur,preferred_foot,reputation,position
0,L. Messi,32,Argentina,FC Barcelona,94,95500000,565000,Left,5,RW
1,Cristiano Ronaldo,34,Portugal,Juventus,93,58500000,405000,Right,5,LW
2,Neymar Jr,27,Brazil,Paris Saint-Germain,92,105500000,290000,Right,5,CAM
3,J. Oblak,26,Slovenia,Atlético Madrid,91,77500000,125000,Right,3,GK
4,E. Hazard,28,Belgium,Real Madrid,91,90000000,470000,Right,4,LW
...,...,...,...,...,...,...,...,...,...,...
18273,Shao Shuai,22,China PR,Beijing Renhe FC,48,40000,2000,Right,1,RES
18274,Xiao Mingjie,22,China PR,Shanghai SIPG FC,48,40000,2000,Right,1,SUB
18275,Zhang Wei,19,China PR,Hebei China Fortune FC,48,40000,1000,Right,1,SUB
18276,Wang Haijian,18,China PR,Shanghai Greenland Shenhua FC,48,40000,1000,Right,1,SUB


In this case, I can remove rows with missing values

In [55]:
fifa2020_df.reset_index(drop=True, inplace=True)
fifa2020_df

Unnamed: 0,full_name,age,country,club,rating,value_eur,wage_eur,preferred_foot,reputation,position
0,L. Messi,32,Argentina,FC Barcelona,94,95500000,565000,Left,5,RW
1,Cristiano Ronaldo,34,Portugal,Juventus,93,58500000,405000,Right,5,LW
2,Neymar Jr,27,Brazil,Paris Saint-Germain,92,105500000,290000,Right,5,CAM
3,J. Oblak,26,Slovenia,Atlético Madrid,91,77500000,125000,Right,3,GK
4,E. Hazard,28,Belgium,Real Madrid,91,90000000,470000,Right,4,LW
...,...,...,...,...,...,...,...,...,...,...
18033,Shao Shuai,22,China PR,Beijing Renhe FC,48,40000,2000,Right,1,RES
18034,Xiao Mingjie,22,China PR,Shanghai SIPG FC,48,40000,2000,Right,1,SUB
18035,Zhang Wei,19,China PR,Hebei China Fortune FC,48,40000,1000,Right,1,SUB
18036,Wang Haijian,18,China PR,Shanghai Greenland Shenhua FC,48,40000,1000,Right,1,SUB
