# Introduction
The purpouse of this analysis is to see the efficiency of each player in NBA during the season 2021-2022 and try to create a dream team for next season.

## Categorical types
Attributes that are represented as object, for example: words...
* **Player** is the name of the player
* **Pos** is the position where the player plays
* **Tm** is the team where the player plays

## Numerical types
Attributes that are represented by numbers
* **Rk** is just the index to identify player
* **Age** is the age of the player
* **G** quantity of games 
* **GS** games started
* **MP** minutes played per game
* **FG** field goals per game
* **FGA** field goal attempts per game
* **FG%** field goal percentage
* **3P** 3-point field goals per game
* **3PA** 3-point field goal attempts per game
* **3P%** 3-point goal percentage
* **2P** 2-point field goals per game
* **2PA** 2-point field goal attempts per game
* **2P%** 2-point goal percentage
* **eFG** effective field goal percentage
* **FT** free throws per game
* **FTA** free throws attempts per game
* **FT%** free throw percentage
* **ORB** ofensive rebounds per game
* **DRB** defensive rebounds per game
* **TRB** total rebounds per game
* **AST** assists per game
* **STL** steals per game
* **TOV** turnovers per game
* **PF** personal fouls per game
* **PTS** points per game

# Importing libraries and read csv file

In [25]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')

In [26]:
players = pd.read_csv('Player_stats.csv', delimiter=';', encoding='latin-1', index_col=0)
players.head()

Unnamed: 0_level_0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
Rk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Precious Achiuwa,C,22,TOR,34,20,25.3,3.4,8.1,0.412,...,0.574,2.4,5.3,7.7,1.3,0.6,0.6,1.1,2.2,7.9
2,Steven Adams,C,28,MEM,44,43,25.7,2.6,4.9,0.521,...,0.566,4.3,4.9,9.3,3.0,1.0,0.6,1.6,1.7,6.7
3,Bam Adebayo,C,24,MIA,21,21,32.9,6.9,13.3,0.518,...,0.767,2.6,7.3,10.0,3.2,1.2,0.5,2.9,3.1,18.7
4,Santi Aldama,PF,21,MEM,25,0,10.5,1.4,3.8,0.372,...,0.579,0.9,1.6,2.5,0.5,0.1,0.2,0.3,0.9,3.4
5,LaMarcus Aldridge,C,36,BRK,32,10,23.1,5.9,10.3,0.574,...,0.862,1.5,4.1,5.7,0.9,0.4,1.1,0.8,1.7,13.8


In [27]:
players.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 590 entries, 1 to 590
Data columns (total 29 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Player  590 non-null    object 
 1   Pos     590 non-null    object 
 2   Age     590 non-null    int64  
 3   Tm      590 non-null    object 
 4   G       590 non-null    int64  
 5   GS      590 non-null    int64  
 6   MP      590 non-null    float64
 7   FG      590 non-null    float64
 8   FGA     590 non-null    float64
 9   FG%     590 non-null    float64
 10  3P      590 non-null    float64
 11  3PA     590 non-null    float64
 12  3P%     590 non-null    float64
 13  2P      590 non-null    float64
 14  2PA     590 non-null    float64
 15  2P%     590 non-null    float64
 16  eFG%    590 non-null    float64
 17  FT      590 non-null    float64
 18  FTA     590 non-null    float64
 19  FT%     590 non-null    float64
 20  ORB     590 non-null    float64
 21  DRB     590 non-null    float64
 22  TR

# Data Cleaning

In [28]:
players.columns

Index(['Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', '3P',
       '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB',
       'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS'],
      dtype='object')

In [29]:
players.isin(['?']).sum(axis=0)

Player    0
Pos       0
Age       0
Tm        0
G         0
GS        0
MP        0
FG        0
FGA       0
FG%       0
3P        0
3PA       0
3P%       0
2P        0
2PA       0
2P%       0
eFG%      0
FT        0
FTA       0
FT%       0
ORB       0
DRB       0
TRB       0
AST       0
STL       0
BLK       0
TOV       0
PF        0
PTS       0
dtype: int64

In [31]:
null_values = players.isnull().sum()
null_values[:]

Player    0
Pos       0
Age       0
Tm        0
G         0
GS        0
MP        0
FG        0
FGA       0
FG%       0
3P        0
3PA       0
3P%       0
2P        0
2PA       0
2P%       0
eFG%      0
FT        0
FTA       0
FT%       0
ORB       0
DRB       0
TRB       0
AST       0
STL       0
BLK       0
TOV       0
PF        0
PTS       0
dtype: int64

So there's no null values in our dataset.