# Compare my Stats

This project uses NBA data sourced from [nba_api](https://github.com/swar/nba_api). It showcases how to read in the data, comparing data, and plot visualizations.

We will be achieving 2 objectives in this notebook:

* finding player stats, finding Pascal Siakim's stats specifically
* comparing Pascal Siakim's stats

We will start by importing the data sourced from `nba_api`

In [None]:
!pip install nba_api

In [None]:
from nba_api.stats.endpoints import playercareerstats, leagueleaders
from nba_api.stats.static import players
import matplotlib.pyplot as plt
import plotly.express as px
import pandas as pd
print("Libraries imported")

## Obtaining Data

Using the `nba_api` information, let's begin by first obtaining information about the *players*.

In Python, a list is an ordered collection of items. You can think of it as a container that holds multiple values. The items in a list can be of any data type, such as numbers, strings, or even other lists. Lists are defined using square brackets **'[]'**, and the items inside the list are separated by commas.

On the other hand, a dictionary in Python is an unordered collection of key-value pairs. It allows you to store and retrieve values based on a specific key. Dictionaries are defined using curly braces {}, and each key-value pair is separated by a colon :. The keys in a dictionary must be unique.

In our particular list called `players_information`, we have entries stored as a *dictionary*. 

In [None]:
players_information = players.get_players()
players_information

Now, if you have a list where each entry is a dictionary, you can access information by combining the indexing and key access techniques. For example, to retrieve the `full_name` and `id` of the **first** entry in our list, we'd do the following:

In [None]:
first_player = players_information[0]
full_name = first_player['full_name']
id_player = first_player['id']
print(f'The full name of this player is: {full_name}')
print(f"The ID of this player is: {id_player}")

Using what was learned before, we can then obtain **Pascal Siakim's** ID by iterating over this entire list and checking if the name **Pascal Siakim** matches with any entry in the `first_name` key. If it finds a player with that name, it stores their ID and full name in variables *pascal_id* and *pascal_name*. Finally, it prints the full name and ID of the player with the name "Pascal".

In [None]:
# Iterate over each dictionary (player information) in the list
for player in players_information:
    # Check if the 'first_name' key in the current player's dictionary is "Pascal"
    if player['first_name'] == "Pascal":
        # If the condition is true, assign the 'id' and 'full_name' values to variables
        pascal_id, pascal_name = (player['id'], player['full_name'])
print(f"{pascal_name}'s id is {pascal_id}")

Now that we have Pascal Siakim's ID, let's find his historic stats in the NBA and compare them with the players in the league. 

## Comparing Stats

When obtaining information on player stats, we obtain the information in an **dataframe**.

In simple terms, a dataframe is a two-dimensional table-like structure that stores data in rows and columns. It is a way to organize and represent data in a structured format, similar to a spreadsheet or a database table. Think of a dataframe as a collection of related information arranged in a tabular form. Each row in the dataframe represents a record or observation, and each column represents a specific attribute or characteristic of the data.

In [None]:
pascal_career = playercareerstats.PlayerCareerStats(player_id=pascal_id)  # Create an instance of PlayerCareerStats for Pascal using his player ID
pascal_data = pascal_career.get_data_frames()[0]  # Retrieve the data frames (tables) for Pascal's career stats and select the first table
pascal_data  # Display the data frame containing Pascal's career stats

In the dataframe above, the stats we've obtained are totals, not averages. Later on in this notebook, we'll solve the issue of having the data in totals. 

In [None]:
# Pull data for the top 500 scorers by PTS column
season_leaders = leagueleaders.LeagueLeaders(
    per_mode48='PerGame',
    season_type_all_star='Regular Season',
    stat_category_abbreviation='PTS'
).get_data_frames()[0]

display(season_leaders)

In [None]:
season_leaders.columns

Based on the dataframe from above, each row represents a player's statistical performance. Here's a breakdown of the columns in the dataframe:

- `PLAYER_ID`: Unique identifier for each player.
- `RANK`: Ranking of the player based on a specific criterion.
- `PLAYER`: Player's name.
- `TEAM_ID`: Unique identifier for the team the player is associated with.
- `TEAM`: Team name.
- `GP`: Number of games played.
- `MIN`: Average minutes played per game.
- `FGM`: Field goals made per game.
- `FGA`: Field goals attempted per game.
- `FG_PCT`: Field goal percentage.
- `FG3M`: Three-point field goals made per game.
- `FG3A`: Three-point field goals attempted per game.
- `FG3_PCT`: Three-point field goal percentage.
- `FTM`: Free throws made per game.
- `FTA`: Free throws attempted per game.
- `FT_PCT`: Free throw percentage.
- `OREB`: Offensive rebounds per game.
- `DREB`: Defensive rebounds per game.
- `REB`: Total rebounds per game.
- `AST`: Assists per game.
- `STL`: Steals per game.
- `BLK`: Blocks per game.
- `TOV`: Turnovers per game.
- `PTS`: Points per game.
- `EFF`: Efficiency rating.

In [None]:
# Pascal's Overall Rank and PTS, REBS, etc..
season_leaders[season_leaders['PLAYER'] == "Pascal Siakam"]

Based on our understanding of the columns above, it appears Pascal Siakim is currently one of the best player's in the NBA! He ranks 19th out of 244 players, alongside averaging a staggering 24.2 points and 7.8 rebounds.

## Plotting Visualizations

Now that we have Pascal Siakim's stats, let's visualize them, alongside compare our own stats to Pascal Siakim.

To begin, let's first change Pascal Siakim's *total stats* into *averages*.

In [None]:
# Obtain Pascal Siakam's average stats for each season
# Define a list of stats to calculate the average for
things_to_add = ['REB', 'AST', 'STL', 'BLK', 'PTS']

# Iterate over each stat in the list
for thing in things_to_add:
    # Create a new column in the Pascal's data DataFrame
    # This new column represents the average of the current stat per game
    # The calculation is done by dividing the current stat by the 'GP' (games played) column
    # The result is rounded to one decimal place using the 'round' function
    pascal_data[thing+'/GP'] = round(pascal_data[thing] / pascal_data['GP'], 1)

Perfect! Now let's compare our own stats compared to Pascal Siakim's. 

We'll also be utilizing a **function** in our cell below. 

In programming, a function is a block of reusable code that performs a specific task. It is designed to take input (arguments) and produce output (return value) based on the given input. Functions help to organize code and make it more modular, as they can be called from different parts of a program whenever their functionality is needed.

In the provided code, the find_average function is defined. This function takes a single argument called stat, which is expected to be a list of numbers. The purpose of the find_average function is to calculate the average value of the numbers in the input list.

In [None]:
# Obtain average of each stat
def find_average(stat):
    return [round(sum(stat) / len(stat), 1)]

# Input personal stats
# Define personal statistics for points, rebounds, assists, blocks, and steals
my_points = [10, 8, 22, 15, 9]
my_rebounds = [2, 4, 3, 10, 8]
my_assists = [2, 3, 7, 5, 4, 9]
my_blocks = [0, 1, 1, 0, 2, 1]
my_steals = [0, 0, 1, 2, 1, 1]

# Calculate the average of each personal stat by calling the 'find_average' function
avg_points = find_average(my_points)
avg_rebounds = find_average(my_rebounds)
avg_assists = find_average(my_assists)
avg_steals = find_average(my_steals)
avg_blocks = find_average(my_blocks)

# Create a dictionary to store personal data
my_own_data = {
    "PLAYER_ID": 'Eric Lee',
    "SEASON_ID": "2022-23",
    "PTS/GP": avg_points,
    "REB/GP": avg_rebounds,
    "AST/GP": avg_assists,
    "STL/GP": avg_steals,
    "BLK/GP": avg_blocks
}

# Create a DataFrame using the 'my_own_data' dictionary
df = pd.DataFrame(data=my_own_data)

# Concatenate the 'pascal_data' DataFrame and the personal DataFrame ('df') to combine the information
total_info = pd.concat([pascal_data, df])


Using the information we've obtained above, we can now plot our dataframe and compare our stats versus Pascal Siakim.

In [None]:
# Create a scatter plot using the total_info dataframe to compare our stats to Pascal Siakim

pts_fig = px.scatter(total_info, y='PTS/GP', x='SEASON_ID', color='PLAYER_ID', size='PTS/GP', title='Comparing my Stats to Pascal Siakim', labels={'PTS/GP': 'Points per Game', 'SEASON_ID': 'Season'})

# Create a scatter plot for assists per game (AST/GP)
assist_fig = px.scatter(total_info, y='AST/GP', x='SEASON_ID', color='PLAYER_ID', size='AST/GP', title='Comparing my Stats to Pascal Siakim', labels={'AST/GP': 'Assists per Game', 'SEASON_ID': 'Season'})

# Create a scatter plot for rebounds per game (REB/GP)
rebounds_fig = px.scatter(total_info, y='REB/GP', x='SEASON_ID', color='PLAYER_ID', size='REB/GP', title='Comparing my Stats to Pascal Siakim', labels={'REB/GP': 'Rebounds per Game', 'SEASON_ID': 'Season'})

# Create a scatter plot for blocks per game (BLK/GP)
blocks_fig = px.scatter(total_info, y='BLK/GP', x='SEASON_ID', color='PLAYER_ID', size='BLK/GP', title='Comparing my Stats to Pascal Siakim', labels={'BLK/GP': 'Blocks per Game', 'SEASON_ID': 'Season'})

# Create a scatter plot for steals per game (STL/GP)
steals_fig = px.scatter(total_info, y='STL/GP', x='SEASON_ID', color='PLAYER_ID', size='STL/GP', title='Comparing my Stats to Pascal Siakim', labels={'STL/GP': 'Steals per Game', 'SEASON_ID': 'Season'})

# Display all the created scatter plots (pts_fig, assist_fig, rebounds_fig, blocks_fig, steals_fig)
display(pts_fig, assist_fig, rebounds_fig, blocks_fig, steals_fig)
