<a href="https://colab.research.google.com/github/bCBowers/sql_nba/blob/main/SQL_Workshop_Notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### SQL Workshop

***Environment and Data Setup***

Install needed libraries, ingest data, and setup SQL database

In [None]:
pip install nba_api

In [None]:
from nba_api.stats.endpoints import leagueleaders
import pandas as pd
from sqlalchemy import create_engine, text

# Pull data for the top 500 scorers
top_500_23_24 = leagueleaders.LeagueLeaders(
    season='2023-24',
    season_type_all_star='Regular Season',
    stat_category_abbreviation='PTS'
).get_data_frames()[0][:500]

engine = create_engine('sqlite://', echo=False)
conn = engine.connect()

top_500_23_24.to_sql(name='league_leaders_23_24', con=engine)

Can see the same data via both Pandas and SQL. Difference in visualization is due to this not being a native SQL environment.

In [None]:
top_500_23_24.head(10)

In [None]:
cursor = conn.execute(text("select * from league_leaders_23_24 limit 10"))
for row in cursor:
  print(row)

In [None]:
cursor = conn.execute(text("""
                            SELECT *
                            FROM league_leaders_23_24
                            limit 10
                            """))
for row in cursor:
  print(row)

In [None]:
cursor = conn.execute(text("""
                            SELECT player_id, rank, player, team, pts
                            FROM league_leaders_23_24
                            limit 10
                            """))
for row in cursor:
  print(row)

In [None]:
cursor = conn.execute(text("""
                          SELECT player_id, rank, player, team, pts
                          FROM league_leaders_23_24
                          WHERE pts <= 1000 and pts > 500
                            """))
for row in cursor:
  print(row)

In [None]:
cursor = conn.execute(text("""
                          SELECT team_id, team, sum(pts) Team_PTS,
                                count(player_id) player_count, avg(pts) avg_points_per_player
                          FROM league_leaders_23_24
                          GROUP BY team_id, team
                            """))
for row in cursor:
  print(row)

In [None]:
cursor = conn.execute(text("""
                          with team_max as (
                          SELECT team_id, team, max(pts) team_top_scorer,
                              round(avg(pts),2) team_avg_scorer
                          FROM league_leaders_23_24
                          GROUP BY team_id, team
                          --HAVING max(pts) > 1250
                          )

                          SELECT A.team_id, A.team, A.player_id, A.player, A.pts, A.rank,
                                B.team_top_scorer, B.team_avg_scorer
                          FROM league_leaders_23_24 as A
                          LEFT JOIN team_max as B on A.team_id = B.team_id
                          ORDER BY A.rank

                            """))
for row in cursor:
  print(row)

In [None]:
cursor = conn.execute(text("""
                        SELECT A.team_id, A.team, A.player_id, A.player, A.pts, A.rank,
                        (SELECT max(B.pts) FROM league_leaders_23_24 as B WHERE A.team_id = B.team_id)
                                    as team_top_scorer,
                        (SELECT round(avg(B.pts),2) FROM league_leaders_23_24 as B
                                    WHERE A.team_id = B.team_id) as team_avg_scorer
                      FROM league_leaders_23_24 as A
                      ORDER BY A.rank
                            """))
for row in cursor:
  print(row)

**Audience Questions**

1) Write a query that returns back the Player Name, Team Tricode, Rebounds_per_game for the 77th ranked player in Rebounds/Game .

2) Write a query that explicitly outputs the number of Players averageing 20 or more points per game. How many of those 20+ point scorers averaged double-doubles?

**Challenge Question**

Who's going to score the most points during the 2023-24, how many points will he score, and how many points will he average per game?

Notes:
- 2020-21 and 2021-22 seasons featured 72 games each. All other seasons features 82 games.
- Teams this season have played between 49 and 52 games.

Load in additional years worth of data to enrich our analysis.

In [None]:
top_500_22_23 = leagueleaders.LeagueLeaders(
    season='2022-23',
    season_type_all_star='Regular Season',
    stat_category_abbreviation='PTS'
).get_data_frames()[0][:500]

top_500_22_23.to_sql(name='league_leaders_22_23', con=engine)

top_500_21_22 = leagueleaders.LeagueLeaders(
    season='2021-22',
    season_type_all_star='Regular Season',
    stat_category_abbreviation='PTS'
).get_data_frames()[0][:500]

top_500_21_22.to_sql(name='league_leaders_21_22', con=engine)

top_500_20_21 = leagueleaders.LeagueLeaders(
    season='2020-21',
    season_type_all_star='Regular Season',
    stat_category_abbreviation='PTS'
).get_data_frames()[0][:500]

top_500_20_21.to_sql(name='league_leaders_20_21', con=engine)

top_500_19_20 = leagueleaders.LeagueLeaders(
    season='2019-20',
    season_type_all_star='Regular Season',
    stat_category_abbreviation='PTS'
).get_data_frames()[0][:500]

top_500_19_20.to_sql(name='league_leaders_19_20', con=engine)

Able to query both tables together using a UNION command.

In [None]:
cursor = conn.execute(text("""
  with combined as
    (select *, '2023-24' as season from league_leaders_23_24
      union
    select *, '2022-23' as season from league_leaders_22_23
    )
    select * from combined
    limit 10
"""))
for row in cursor:
  print(row)

In [None]:
# Code to start with
cursor = conn.execute(text(
    """with combined as
    (
      select *, '2023-24' as season, 51 as total_games from league_leaders_23_24
      union
      select *, '2022-23' as season, 82 as total_games from league_leaders_22_23
      union
      select *, '2021-22' as season, 82 as total_games from league_leaders_21_22
      union
      select *, '2020-21' as season, 72 as total_games from league_leaders_20_21
      union
      select *, '2019-20' as season, 72 as total_games from league_leaders_19_20
    ),

    -- calculate the how many games each team has played this season and join back into main CTE
    max_games as
    (
      select
        team,
        max(gp) as games_played
      from
        league_leaders_23_24
      group by team
    ),
    new_combined as
    (
      select
        combined.*,
        coalesce(max_games.games_played, combined.total_games) as total_gp
      from
        combined left join
        max_games on combined.team=max_games.team and season='2023-24'
    )
    select * from new_combined
    limit 10
    """
))
for row in cursor:
  print(row)

This last line is to close the connection with the SQL database we created. If we run it before finishing, we need to recreate everything over again.

In [None]:
conn.close()