# GOAT Index: Análisis Comparativo de Michael Jordan, Kobe Bryant y LeBron James

## Introducción

El debate sobre quién es el mejor jugador de la historia de la NBA (GOAT – Greatest Of All Time) es uno de los más recurrentes y polémicos en el deporte.

Este proyecto busca construir un **índice cuantitativo (GOAT Index)** que combine:
- rendimiento ofensivo
- impacto defensivo
- eficiencia
- éxito colectivo (títulos) 
para comparar a tres de los principales candidatos al GOAT:
**Michael Jordan, Kobe Bryant y LeBron James**.

El análisis se apoya en datos oficiales de la NBA y en datasets históricos, con el objetivo de ofrecer una comparación transparente, reproducible y defendible.


In [115]:
import pandas as pd
import numpy as np

from nba_api.stats.endpoints import playercareerstats


In [116]:
players = pd.read_csv("/Users/lautarocardinisilvestri/Downloads/Ironhack/Semana 4/Proyecto NBA/NBA_PLAYERS.csv")
teams = pd.read_csv("/Users/lautarocardinisilvestri/Downloads/Ironhack/Semana 4/Proyecto NBA/NBA_TEAMS.csv")
finals = pd.read_csv("/Users/lautarocardinisilvestri/Downloads/Ironhack/Semana 4/Proyecto NBA/NBA_Finals_and_MVP.csv")

In [117]:
GOATS_IDS = {
    "Michael Jordan": 893,
    "Kobe Bryant": 977,
    "LeBron James": 2544
}


In [118]:
def get_player_career(player_name, player_id):
    career = playercareerstats.PlayerCareerStats(player_id=player_id)
    df = career.get_data_frames()[0]
    df["PLAYER_NAME"] = player_name
    return df


In [119]:
dfs = []

for name, pid in GOATS_IDS.items():
    dfs.append(get_player_career(name, pid))

career_df = pd.concat(dfs, ignore_index=True)


ConnectionError: ('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))

In [None]:
career_df[["PLAYER_NAME", "SEASON_ID", "GP", "MIN", "PTS"]].head()


In [None]:
finals.columns


## Preparación de los datos

En esta sección se limpian y transforman las estadísticas originales
para obtener métricas comparables entre jugadores con diferentes
minutos y contextos de juego.

Se utilizan métricas normalizadas por minuto y por 36 minutos,
estándar común en análisis NBA.


In [None]:
cols_clean = [
    "PLAYER_NAME", "SEASON_ID", "GP", "MIN",
    "PTS", "REB", "AST", "STL", "BLK",
    "FG_PCT", "FG3_PCT", "FT_PCT"
]

GOAT_df = career_df[cols_clean].copy()
GOAT_df.head()


In [None]:
for stat in ["PTS", "REB", "AST", "STL", "BLK"]:
    GOAT_df[f"{stat}_PER_MIN"] = GOAT_df[stat] / GOAT_df["MIN"]

GOAT_df[
    ["PLAYER_NAME", "SEASON_ID", "MIN",
     "PTS_PER_MIN", "REB_PER_MIN", "AST_PER_MIN"]
].head()


In [None]:
for stat in ["PTS", "REB", "AST", "STL", "BLK"]:
    GOAT_df[f"{stat}_PER_36"] = GOAT_df[f"{stat}_PER_MIN"] * 36

GOAT_df[
    ["PLAYER_NAME", "SEASON_ID",
     "PTS_PER_36", "REB_PER_36", "AST_PER_36",
     "STL_PER_36", "BLK_PER_36"]
].head()


In [None]:
metrics_GOAT = GOAT_df[
    [
        "PLAYER_NAME", "SEASON_ID",
        "PTS_PER_36", "REB_PER_36", "AST_PER_36",
        "STL_PER_36", "BLK_PER_36",
        "FG_PCT", "FG3_PCT", "FT_PCT"
    ]
].copy()

metrics_GOAT.head()


In [None]:
player_teams = {
    "Michael Jordan": ["Chicago Bulls"],
    "Kobe Bryant": ["Los Angeles Lakers"],
    "LeBron James": ["Miami Heat", "Cleveland Cavaliers", "Los Angeles Lakers"]
}


In [None]:
championships_df = pd.DataFrame({
    "PLAYER_NAME": ["Michael Jordan", "Kobe Bryant", "LeBron James"],
    "CHAMPIONSHIPS": [6, 5, 4]
})


In [None]:
from sklearn.preprocessing import MinMaxScaler

scaler = MinMaxScaler()

goat_scaled = metrics_GOAT.copy()

cols_to_scale = [
    "PTS_PER_36",
    "REB_PER_36",
    "AST_PER_36",
    "STL_PER_36",
    "BLK_PER_36",
    "FG_PCT",
    "FG3_PCT",
    "FT_PCT"
]

goat_scaled[cols_to_scale] = scaler.fit_transform(
    goat_scaled[cols_to_scale]
)


In [None]:
weights = {
    "PTS_PER_36": 0.25,
    "REB_PER_36": 0.15,
    "AST_PER_36": 0.15,
    "STL_PER_36": 0.05,
    "BLK_PER_36": 0.05,
    "FG_PCT": 0.15,
    "FG3_PCT": 0.10,
    "FT_PCT": 0.10
}

goat_scaled["GOAT_INDEX"] = sum(
    goat_scaled[m] * w for m, w in weights.items()
)


In [None]:
goat_final = goat_scaled[["PLAYER_NAME", "GOAT_INDEX"]].copy()


In [None]:
championships_df["CHAMPIONSHIPS_NORM"] = (
    championships_df["CHAMPIONSHIPS"] /
    championships_df["CHAMPIONSHIPS"].max()
)


In [None]:
goat_final = goat_final.merge(
    championships_df[["PLAYER_NAME", "CHAMPIONSHIPS_NORM"]],
    on="PLAYER_NAME",
    how="left"
)


In [None]:
goat_final


In [None]:
goat_final["GOAT_INDEX_FINAL"] = (
    goat_final["GOAT_INDEX"] * 0.9 +
    goat_final["CHAMPIONSHIPS_NORM"] * 0.1
)


In [None]:
goat_final = goat_final.sort_values(
    "GOAT_INDEX_FINAL",
    ascending=False
).reset_index(drop=True)


In [None]:
goat_final[[
    "PLAYER_NAME",
    "GOAT_INDEX_FINAL",
    "GOAT_INDEX",
    "CHAMPIONSHIPS_NORM"
]]


In [None]:
ranking_final = goat_final[[
    "PLAYER_NAME",
    "GOAT_INDEX_FINAL",
    "GOAT_INDEX",
    "CHAMPIONSHIPS_NORM"
]].copy()

ranking_final


In [None]:
ranking_final = ranking_final.round(3)
ranking_final


In [None]:
goat_career_final = (
    goat_final
    .groupby("PLAYER_NAME", as_index=False)
    .mean(numeric_only=True)
)


In [None]:
goat_career_final = goat_career_final.sort_values(
    "GOAT_INDEX_FINAL",
    ascending=False
)


In [None]:
ranking_final = goat_career_final[[
    "PLAYER_NAME",
    "GOAT_INDEX_FINAL",
    "GOAT_INDEX",
    "CHAMPIONSHIPS_NORM"
]].round(3)

ranking_final


In [None]:
ranking_final = ranking_final.reset_index(drop=True)
ranking_final.index += 1
ranking_final.index.name = "RANK"
ranking_final

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(8, 4))

plt.barh(
    ranking["PLAYER_NAME"],
    ranking["GOAT_INDEX_FINAL"]
)

plt.xlabel("GOAT Index Final")
plt.title("GOAT Ranking – Índice Final")

# Para que el #1 quede arriba
plt.gca().invert_yaxis()

plt.tight_layout()
plt.show()


NameError: name 'ranking' is not defined

<Figure size 800x400 with 0 Axes>