# Introduction
The purpouse of this analysis is to see the efficiency of each player in NBA during the season 2021-2022 and try to create a dream team for next season.

## Categorical types
Attributes that are represented as object, for example: words...
* **Player** is the name of the player
* **Pos** is the position where the player plays
* **Tm** is the team where the player plays

## Numerical types
Attributes that are represented by numbers
* **Rk** is just the index to identify player
* **Age** is the age of the player
* **G** quantity of games 
* **GS** games started
* **MP** minutes played per game
* **FG** field goals per game
* **FGA** field goal attempts per game
* **FG%** field goal percentage
* **3P** 3-point field goals per game
* **3PA** 3-point field goal attempts per game
* **3P%** 3-point goal percentage
* **2P** 2-point field goals per game
* **2PA** 2-point field goal attempts per game
* **2P%** 2-point goal percentage
* **eFG** effective field goal percentage
* **FT** free throws per game
* **FTA** free throws attempts per game
* **FT%** free throw percentage
* **ORB** ofensive rebounds per game
* **DRB** defensive rebounds per game
* **TRB** total rebounds per game
* **AST** assists per game
* **STL** steals per game
* **TOV** turnovers per game
* **PF** personal fouls per game
* **PTS** points per game

# Importing libraries and read csv file

In [25]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
sns.set_style('whitegrid')

In [26]:
players = pd.read_csv('Player_stats.csv', delimiter=';', encoding='latin-1', index_col=0)
players.head()

Unnamed: 0_level_0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,FT%,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS
Rk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Precious Achiuwa,C,22,TOR,34,20,25.3,3.4,8.1,0.412,...,0.574,2.4,5.3,7.7,1.3,0.6,0.6,1.1,2.2,7.9
2,Steven Adams,C,28,MEM,44,43,25.7,2.6,4.9,0.521,...,0.566,4.3,4.9,9.3,3.0,1.0,0.6,1.6,1.7,6.7
3,Bam Adebayo,C,24,MIA,21,21,32.9,6.9,13.3,0.518,...,0.767,2.6,7.3,10.0,3.2,1.2,0.5,2.9,3.1,18.7
4,Santi Aldama,PF,21,MEM,25,0,10.5,1.4,3.8,0.372,...,0.579,0.9,1.6,2.5,0.5,0.1,0.2,0.3,0.9,3.4
5,LaMarcus Aldridge,C,36,BRK,32,10,23.1,5.9,10.3,0.574,...,0.862,1.5,4.1,5.7,0.9,0.4,1.1,0.8,1.7,13.8


In [27]:
players.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 590 entries, 1 to 590
Data columns (total 29 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   Player  590 non-null    object 
 1   Pos     590 non-null    object 
 2   Age     590 non-null    int64  
 3   Tm      590 non-null    object 
 4   G       590 non-null    int64  
 5   GS      590 non-null    int64  
 6   MP      590 non-null    float64
 7   FG      590 non-null    float64
 8   FGA     590 non-null    float64
 9   FG%     590 non-null    float64
 10  3P      590 non-null    float64
 11  3PA     590 non-null    float64
 12  3P%     590 non-null    float64
 13  2P      590 non-null    float64
 14  2PA     590 non-null    float64
 15  2P%     590 non-null    float64
 16  eFG%    590 non-null    float64
 17  FT      590 non-null    float64
 18  FTA     590 non-null    float64
 19  FT%     590 non-null    float64
 20  ORB     590 non-null    float64
 21  DRB     590 non-null    float64
 22  TR

# Data Cleaning

In [28]:
players.columns

Index(['Player', 'Pos', 'Age', 'Tm', 'G', 'GS', 'MP', 'FG', 'FGA', 'FG%', '3P',
       '3PA', '3P%', '2P', '2PA', '2P%', 'eFG%', 'FT', 'FTA', 'FT%', 'ORB',
       'DRB', 'TRB', 'AST', 'STL', 'BLK', 'TOV', 'PF', 'PTS'],
      dtype='object')

In [29]:
players.isin(['?']).sum(axis=0)

Player    0
Pos       0
Age       0
Tm        0
G         0
GS        0
MP        0
FG        0
FGA       0
FG%       0
3P        0
3PA       0
3P%       0
2P        0
2PA       0
2P%       0
eFG%      0
FT        0
FTA       0
FT%       0
ORB       0
DRB       0
TRB       0
AST       0
STL       0
BLK       0
TOV       0
PF        0
PTS       0
dtype: int64

In [None]:
null_values = players.isnull().sum()
null_values[:]

So there's no null values in our dataset.

# Inserting new column
Now, let's create new column called as "Efficiency" and we are going to perform some measurements for it.
The method of calculation for this columns is made by:

Efficiency = (PTS + AST + STL + REB + BLK - Missed FG - Missed FT - TOV)

In [33]:
Efficiency = players['PTS'] + players['TRB'] + players['AST'] + players['STL'] + players['BLK'] - (players['FGA'] - players['FG']) - (players['FTA'] - players['FT']) - players['TOV'] - players['PF']
players.insert(loc=29, column='EFF', value=Efficiency)
players.sample(10)

Unnamed: 0_level_0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,EFF
Rk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
453,Trevelin Queen,SG,24,HOU,8,0,8.6,1.6,3.6,0.448,...,0.6,1.1,1.8,0.4,0.6,0.0,0.9,0.8,4.6,3.7
532,Obi Toppin,PF,23,NYK,41,2,15.9,2.9,5.5,0.524,...,1.0,2.7,3.6,1.0,0.3,0.5,0.8,1.4,7.5,7.7
61,James Bouknight,SG,21,CHO,19,0,8.1,1.3,3.6,0.362,...,0.7,0.7,1.4,0.8,0.1,0.0,0.3,0.6,3.6,2.6
263,Josh Jackson,SF,24,DET,36,3,18.6,2.8,6.8,0.415,...,0.5,2.7,3.2,1.3,0.5,0.4,1.0,1.9,7.5,5.6
548,Moritz Wagner,C,24,ORL,37,0,12.0,2.5,5.1,0.484,...,0.5,1.9,2.4,0.9,0.2,0.3,0.7,1.9,7.3,5.6
51,Eric Bledsoe,PG,32,LAC,47,29,25.9,3.7,8.8,0.422,...,0.6,2.9,3.4,4.1,1.3,0.4,2.2,1.5,10.0,9.9
163,Dorian Finney-Smith,PF,28,DAL,45,45,32.1,3.8,8.6,0.446,...,1.5,3.3,4.8,1.9,1.2,0.5,1.0,2.3,10.3,10.2
328,Brook Lopez,C,33,MIL,1,1,28.0,3.0,9.0,0.333,...,2.0,3.0,5.0,0.0,1.0,3.0,0.0,3.0,8.0,8.0
136,Spencer Dinwiddie,PG,28,WAS,37,37,30.6,4.6,11.5,0.401,...,0.7,3.8,4.5,5.8,0.6,0.1,1.6,2.4,13.7,13.2
285,Derrick Jones Jr.,PF,24,CHI,31,8,17.2,2.3,3.9,0.582,...,1.2,2.2,3.4,0.5,0.5,0.7,0.5,2.3,6.3,6.8


# Grouping data into position categories (C, PG, SG, PF and SF)

* **C**: Center - plays close to the basket and scores most of their points off offensive rebounds or by "posting up" in the paint.
* **PG**: Point Guard - is in charge of running the offense, setting up plays, and controlling the tempo of the game.
* **SG**: Shooting Guard - best shooter on a team, an excellent free throw shooter and can drive to the basket in addition to taking long distance shots.
* **PF**: Power Forward - requires speed, athleticism, and a good mid-range jump shot. Prioritize rebounds and defense and have to be a good passer.
* **SF**: Small Forward - is usually the most well-round versatile player on a team. Must be an excellent ball-handler, three-point shooter, passer and have the strenght to drive to the basket and score from down low.

In [37]:
players_C = players.groupby('Pos').get_group('C')
players_C.head()

Unnamed: 0_level_0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,EFF
Rk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1,Precious Achiuwa,C,22,TOR,34,20,25.3,3.4,8.1,0.412,...,2.4,5.3,7.7,1.3,0.6,0.6,1.1,2.2,7.9,9.4
2,Steven Adams,C,28,MEM,44,43,25.7,2.6,4.9,0.521,...,4.3,4.9,9.3,3.0,1.0,0.6,1.6,1.7,6.7,13.8
3,Bam Adebayo,C,24,MIA,21,21,32.9,6.9,13.3,0.518,...,2.6,7.3,10.0,3.2,1.2,0.5,2.9,3.1,18.7,19.8
5,LaMarcus Aldridge,C,36,BRK,32,10,23.1,5.9,10.3,0.574,...,1.5,4.1,5.7,0.9,0.4,1.1,0.8,1.7,13.8,14.8
8,Jarrett Allen,C,23,CLE,39,39,32.7,6.7,9.6,0.695,...,3.3,7.6,10.9,1.8,0.8,1.4,1.9,1.8,16.3,23.3


In [38]:
players_PG = players.groupby('Pos').get_group('PG')
players_PG.head()

Unnamed: 0_level_0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,EFF
Rk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
9,Jose Alvarado,PG,23,NOP,18,0,9.2,1.1,3.1,0.364,...,0.3,0.7,1.0,1.6,1.0,0.1,0.2,1.0,3.1,3.3
15,Cole Anthony,PG,21,ORL,33,33,33.2,6.2,15.8,0.396,...,0.6,5.5,6.1,5.8,0.8,0.3,2.8,2.7,18.2,15.5
18,D.J. Augustin,PG,34,HOU,33,2,15.0,1.7,4.0,0.414,...,0.2,1.0,1.2,2.2,0.3,0.0,1.3,0.5,5.5,4.9
24,LaMelo Ball,PG,20,CHO,39,39,31.8,6.9,16.5,0.418,...,1.7,5.6,7.3,7.7,1.6,0.4,3.1,3.0,19.0,20.0
25,Lonzo Ball,PG,24,CHI,35,35,34.6,4.6,10.9,0.423,...,1.0,4.4,5.4,5.1,1.8,0.9,2.3,2.4,13.0,15.0


In [39]:
players_SG = players.groupby('Pos').get_group('SG')
players_SG.head()

Unnamed: 0_level_0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,EFF
Rk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6,Nickeil Alexander-Walker,SG,23,NOP,42,18,26.5,4.7,12.8,0.37,...,0.8,2.6,3.4,2.6,0.8,0.3,1.6,1.8,12.6,7.7
7,Grayson Allen,SG,26,MIL,41,40,28.1,4.1,9.6,0.423,...,0.5,2.9,3.4,1.3,0.8,0.4,0.6,1.5,11.8,9.9
20,Joel Ayayi,SG,21,WAS,7,0,2.9,0.1,0.9,0.167,...,0.1,0.3,0.4,0.6,0.0,0.0,0.0,0.0,0.3,0.5
33,Will Barton,SG,31,DEN,39,39,32.8,5.8,13.4,0.431,...,0.6,4.2,4.8,4.2,0.8,0.5,1.8,1.4,15.3,14.4
40,Bradley Beal,SG,28,WAS,37,37,36.0,8.9,19.7,0.455,...,1.0,3.7,4.7,6.5,0.9,0.4,3.4,2.4,23.7,18.8


In [40]:
players_PF = players.groupby('Pos').get_group('PF')
players_PF.head()

Unnamed: 0_level_0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,EFF
Rk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
4,Santi Aldama,PF,21,MEM,25,0,10.5,1.4,3.8,0.372,...,0.9,1.6,2.5,0.5,0.1,0.2,0.3,0.9,3.4,2.7
11,Kyle Anderson,PF,28,MEM,38,8,22.4,3.2,7.4,0.436,...,0.9,4.4,5.3,2.6,1.1,0.6,1.1,1.5,8.2,10.4
12,Giannis Antetokounmpo,PF,27,MIL,39,39,32.7,9.9,18.7,0.531,...,1.8,9.5,11.3,6.0,1.0,1.5,3.5,3.3,28.6,29.7
14,Carmelo Anthony,PF,37,LAL,43,3,26.9,4.6,10.6,0.434,...,0.9,3.3,4.2,1.0,0.7,0.8,0.8,2.3,13.4,10.6
23,Marvin Bagley III,PF,22,SAC,26,13,21.9,3.8,8.2,0.46,...,2.3,5.0,7.3,0.6,0.3,0.3,0.7,1.7,9.4,10.6


In [41]:
players_SF = players.groupby('Pos').get_group('SF')
players_SF.head()

Unnamed: 0_level_0,Player,Pos,Age,Tm,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,EFF
Rk,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
10,Justin Anderson,SF,28,IND,3,0,10.0,0.7,3.7,0.182,...,0.7,0.7,1.3,1.0,0.3,0.3,0.3,1.7,2.7,0.6
13,Thanasis Antetokounmpo,SF,29,MIL,28,5,11.9,1.4,2.9,0.463,...,1.1,1.7,2.8,0.7,0.5,0.3,0.6,1.9,3.3,3.2
16,OG Anunoby,SF,24,TOR,28,28,36.5,7.0,16.0,0.434,...,1.5,3.8,5.3,2.3,1.6,0.4,1.8,2.9,18.9,14.1
17,Trevor Ariza,SF,36,LAL,12,5,19.2,1.3,3.3,0.375,...,0.3,2.9,3.3,1.3,0.5,0.2,0.3,0.6,3.7,6.0
19,Deni Avdija,SF,21,WAS,46,6,22.5,2.6,6.0,0.437,...,0.7,4.0,4.7,1.6,0.8,0.7,0.9,2.2,7.2,8.1


#  Gráfico de Pos x EFF

In [43]:
plt.figure(figsize=(14, 6))
sns.swarmplot(x=['players_SF'], y=['EFF'], data=players)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

<Figure size 1008x432 with 0 Axes>