# Analiza košarkarskih podatkov

Za projektno nalogo bom analiziral podatke, ki sem jih pridobil iz spletne strani [Basketball Reference](https://www.basketball-reference.com/), natančneje podatke iz tabele, ki prikazuje statistiko vsakega igralca v sezoni 2023/24, tako redni del sezone kot končnica.

Z uporabo različnih metod bom raziskal, kako se nekatere kategorije košarkarskih podatkov med seboj povezujejo in odražajo v igri.

In [2]:
#naložimo potrebne pakete za obdelavo podatkov
import pandas as pd
import matplotlib.pyplot as plt

# uvozimo datoteko, kjer imamo podatke igralcev
redni_del = pd.read_csv("redni_del_sezone.csv")
koncnica = pd.read_csv("koncnica.csv")

## Redni del sezone

Za začetek si poglejmo redni del sezone. Naša tabela zaenkrat izgleda tako:

In [3]:
redni_del

Unnamed: 0,Player,Age,Team,Pos,G,GS,MP,FG,FGA,FG%,...,ORB,DRB,TRB,AST,STL,BLK,TOV,PF,PTS,Awards
0,Joel Embiid,29.0,PHI,C,39.0,39.0,33.6,11.5,21.8,0.529,...,2.4,8.6,11.0,5.6,1.2,1.7,3.8,2.9,34.7,AS
1,Luka Dončić,24.0,DAL,PG,70.0,70.0,37.5,11.5,23.6,0.487,...,0.8,8.4,9.2,9.8,1.4,0.5,4.0,2.1,33.9,"MVP-3,CPOY-6,AS,NBA1"
2,Giannis Antetokounmpo,29.0,MIL,PF,73.0,73.0,35.2,11.5,18.8,0.611,...,2.7,8.8,11.5,6.5,1.2,1.1,3.4,2.9,30.4,"MVP-4,DPOY-9,CPOY-12,AS,NBA1"
3,Shai Gilgeous-Alexander,25.0,OKC,PG,75.0,75.0,34.0,10.6,19.8,0.535,...,0.9,4.7,5.5,6.2,2.0,0.9,2.2,2.5,30.1,"MVP-2,DPOY-7,CPOY-3,AS,NBA1"
4,Jalen Brunson,27.0,NYK,PG,77.0,77.0,35.4,10.3,21.4,0.479,...,0.6,3.1,3.6,6.7,0.9,0.2,2.4,1.9,28.7,"MVP-5,CPOY-5,AS,NBA2"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
731,Ron Harper Jr.,23.0,TOR,PF,1.0,0.0,4.0,0.0,0.0,,...,0.0,0.0,0.0,1.0,0.0,0.0,0.0,2.0,0.0,
732,Justin Jackson,28.0,MIN,SF,2.0,0.0,0.5,0.0,0.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
733,Dmytro Skapintsev,25.0,NYK,C,2.0,0.0,1.0,0.0,0.5,0.000,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,
734,Javonte Smart,24.0,PHI,PG,1.0,0.0,1.0,0.0,0.0,,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,


Ker imamo veliko kategorij podatkov, jih bomo nekaj izbrisali, da bo tabela preglednejša. Izbrisali bomo stolpce, ki označujejo podatke za odigrane tekme v prvi postavi, zadeti meti za dve točki, število poskusov metov za dve točki, procentna uspešnost zadetih metov za dve točki, eFG%, število prostih metov, število izvedenih prostih metov, skoki v napadu, skoki v obrambi in nagrade. 
Opomba: večina podatkov je beležena kot povprečje na tekmo.

In [4]:
redni_del = redni_del.drop(columns=["GS", "2P", "2PA", "2P%", "eFG%", "FT", "FTA", "ORB", "DRB", "Awards"])
redni_del

Unnamed: 0,Player,Age,Team,Pos,G,MP,FG,FGA,FG%,3P,3PA,3P%,FT%,TRB,AST,STL,BLK,TOV,PF,PTS
0,Joel Embiid,29.0,PHI,C,39.0,33.6,11.5,21.8,0.529,1.4,3.6,0.388,0.883,11.0,5.6,1.2,1.7,3.8,2.9,34.7
1,Luka Dončić,24.0,DAL,PG,70.0,37.5,11.5,23.6,0.487,4.1,10.6,0.382,0.786,9.2,9.8,1.4,0.5,4.0,2.1,33.9
2,Giannis Antetokounmpo,29.0,MIL,PF,73.0,35.2,11.5,18.8,0.611,0.5,1.7,0.274,0.657,11.5,6.5,1.2,1.1,3.4,2.9,30.4
3,Shai Gilgeous-Alexander,25.0,OKC,PG,75.0,34.0,10.6,19.8,0.535,1.3,3.6,0.353,0.874,5.5,6.2,2.0,0.9,2.2,2.5,30.1
4,Jalen Brunson,27.0,NYK,PG,77.0,35.4,10.3,21.4,0.479,2.7,6.8,0.401,0.847,3.6,6.7,0.9,0.2,2.4,1.9,28.7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
731,Ron Harper Jr.,23.0,TOR,PF,1.0,4.0,0.0,0.0,,0.0,0.0,,,0.0,1.0,0.0,0.0,0.0,2.0,0.0
732,Justin Jackson,28.0,MIN,SF,2.0,0.5,0.0,0.0,,0.0,0.0,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0
733,Dmytro Skapintsev,25.0,NYK,C,2.0,1.0,0.0,0.5,0.000,0.0,0.0,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0
734,Javonte Smart,24.0,PHI,PG,1.0,1.0,0.0,0.0,,0.0,0.0,,,0.0,0.0,0.0,0.0,0.0,0.0,0.0


V tabeli imamo kar nekaj ničelnih podatkov ter s tem število igralcev, ki so v ligi zaigrali zelo malo. Zato bomo tabelo prilagodili in ovrgli nekaj igralcev.
 