The information needed to create an SCD from which we can find the first and last active dates, and group all of the clusters of years when players were active or inactive can be found in the players table or the player\_seasons table. Both of them have a min date of 1996 and a max of 2022.

In [40]:
SELECT COUNT(*)
FROM player_seasons ps
    FULL OUTER JOIN players p 
    ON ps.player_name = p.player_name AND ps.season = p.current_season 


count
37425


Since the other information isn't really relevant, for purposes of processing, going with the smaller table would be ideal; however, the players\_table seems to have 3 times the number of rows as the player\_seasons table. After looking into this, the players table current\_year doesn't represent whether the player was active. Once they are added to the table, it continues to list them, showing they are retained even after they have retired.   
  
Thus the player\_seasons table is better to use for this

In [175]:
SELECT player_name
    , season
FROM player_seasons
WHERE player_name = 'Michael Jordan'

player_name,season
Michael Jordan,1996
Michael Jordan,1997
Michael Jordan,2001
Michael Jordan,2002


In [176]:
SELECT player_name
    , current_season
FROM players
WHERE player_name = 'Michael Jordan'

player_name,current_season
Michael Jordan,1996
Michael Jordan,1997
Michael Jordan,1998
Michael Jordan,1999
Michael Jordan,2000
Michael Jordan,2001
Michael Jordan,2002
Michael Jordan,2003
Michael Jordan,2004
Michael Jordan,2005


In [14]:
-- CREATE THE player_state_tracking table

DROP TABLE IF EXISTS player_state_tracking;

CREATE TABLE player_state_tracking(
    player_name TEXT
    , first_active_season INT
    , last_active_season INT
    , seasonal_active_state TEXT
    , seasons_active INT[]
    , season INT
    , PRIMARY KEY (player_name, season)
)

In [1]:
WITH yesteryear AS (
    SELECT * 
    FROM player_state_tracking
    WHERE season = 1996
), 
thisyear AS (
    SELECT 
        player_name
        , season 
        , COUNT(1) 
    FROM player_seasons
    WHERE season = 1997
    GROUP BY player_name, season
)
SELECT 
    *
FROM thisyear t
    FULL OUTER JOIN yesteryear y 
    ON t.player_name = y.player_name
    AND t.season = y.season
LIMIT 5

-- We don't expect anything for yesteryear data since nothing has been added to the table yet

player_name,season,count,player_name.1,first_active_season,last_active_season,seasonal_active_state,seasons_active,season.1
,,,Aaron McKie,1996,1996,New,{1996},1996
,,,Aaron Williams,1996,1996,New,{1996},1996
,,,A.C. Green,1996,1996,New,{1996},1996
,,,Acie Earl,1996,1996,New,{1996},1996
,,,Adam Keefe,1996,1996,New,{1996},1996


Got exactly what we were expecting - the data from the thisyear table above and nulls for all yesteryear, which will have to be incrementally inserted into the table once we have the full query written.

In [41]:
INSERT INTO player_state_tracking
WITH yesteryear AS (
    SELECT * 
    FROM player_state_tracking
    WHERE season = 2021
), 
thisyear AS (
    SELECT 
        player_name
        , season
    FROM player_seasons
    WHERE season = 2022
    GROUP BY player_name, season
)
SELECT 
    COALESCE(t.player_name, y.player_name) AS player_name
    , COALESCE(y.first_active_season, t.season) AS first_active_season
    -- If the first_active_season is Null, it will create a season from the t.season
    , COALESCE(t.season, y.last_active_season) AS last_active_season
    -- If the season is null, then it will use the last active season to fill
    , CASE 
        WHEN y.player_name IS NULL THEN 'New'
        WHEN y.last_active_season = t.season - 1 THEN 'Continued Playing'
        WHEN y.last_active_season < t.season - 1 THEN 'Returned from Retirement'
        WHEN t.season IS NULL AND y.last_active_season = y.season THEN 'Retired'
        ELSE 'Stayed Retired'
    END AS seasonal_active_state
    , COALESCE(y.seasons_active, ARRAY[]::INT[])
        || CASE
                WHEN t.player_name IS NOT NULL THEN ARRAY[t.season] 
                ELSE ARRAY []::INT []
            END AS seasons_active
    , COALESCE(t.season, y.season + 1) AS season
FROM thisyear t
    FULL OUTER JOIN yesteryear y 
    ON t.player_name = y.player_name
;

SELECT * 
FROM player_state_tracking
WHERE player_name = 'Michael Jordan'

player_name,first_active_season,last_active_season,seasonal_active_state,seasons_active,season
Michael Jordan,1996,1996,New,{1996},1996
Michael Jordan,1996,1997,Continued Playing,"{1996,1997}",1997
Michael Jordan,1996,1997,Retired,"{1996,1997}",1998
Michael Jordan,1996,1997,Stale,"{1996,1997}",1999
Michael Jordan,1996,1997,Stale,"{1996,1997}",2000
Michael Jordan,1996,2001,Returned from Retirement,"{1996,1997,2001}",2001
Michael Jordan,1996,2002,Continued Playing,"{1996,1997,2001,2002}",2002
Michael Jordan,1996,2002,Retired,"{1996,1997,2001,2002}",2003
Michael Jordan,1996,2002,Stale,"{1996,1997,2001,2002}",2004
Michael Jordan,1996,2002,Stale,"{1996,1997,2001,2002}",2005


In [217]:
SELECT * 
FROM player_state_tracking
WHERE player_name = 'Michael Jordan'

player_name,first_active_season,last_active_season,seasonal_active_state,seasons_active,season
Michael Jordan,1996,1996,New,{1996},1996
Michael Jordan,1996,1997,Retained,"{1996,1997}",1997
Michael Jordan,1996,1997,Churned,"{1996,1997}",1998
Michael Jordan,1996,1997,Stale,"{1996,1997}",1999
Michael Jordan,1996,1997,Stale,"{1996,1997}",2000
Michael Jordan,1996,2001,Resurrected,"{1996,1997,2001}",2001
Michael Jordan,1996,2002,Retained,"{1996,1997,2001,2002}",2002
Michael Jordan,1996,2002,Churned,"{1996,1997,2001,2002}",2003
Michael Jordan,1996,2002,Stale,"{1996,1997,2001,2002}",2004
Michael Jordan,1996,2002,Stale,"{1996,1997,2001,2002}",2005


**Problem 2:** 

Write a query that uses GROUPING SETS to do efficient aggregations of game\_details data

1\. Aggregate this dataset along with following dimensions: 

\- player and team: who score the most points playing for one team?

\- player and season: who scored the most points in a single season?

\-team: which team has won the most games?

In [24]:
-- Determine which team has the most points across all seasons and per season
SELECT 
    dim_team_id
    , dim_season
    , SUM(M_pts)
FROM fct_game_details
WHERE dim_team_id = '1610612752'
GROUP BY GROUPING SETS(
    (dim_team_id)
    , (dim_team_id, dim_season)
    )


dim_team_id,dim_season,sum
1610612752,2022.0,4115
1610612752,2016.0,9209
1610612752,2019.0,7398
1610612752,2018.0,9117
1610612752,2017.0,9054
1610612752,2015.0,4831
1610612752,2020.0,8591
1610612752,2021.0,9196
1610612752,,61511


In [55]:
-- Find the player, team, and seasons to see who scored the most points in a season or for playing in one team
SELECT 
    dim_player_name
    , dim_team_id
    , dim_season
    , SUM(M_pts) AS total_points
FROM fct_game_details
WHERE M_pts IS NOT NULL AND dim_player_name = 'Stephen Curry'
GROUP BY GROUPING SETS(
    (dim_player_name, dim_team_id)
    , (dim_player_name, dim_season)
    )
ORDER BY dim_season, dim_team_id

dim_player_name,dim_team_id,dim_season,total_points
Stephen Curry,,2015.0,1911
Stephen Curry,,2016.0,2614
Stephen Curry,,2017.0,1811
Stephen Curry,,2018.0,2584
Stephen Curry,,2019.0,211
Stephen Curry,,2020.0,2159
Stephen Curry,,2021.0,2330
Stephen Curry,,2022.0,840
Stephen Curry,1610612744.0,,14460


In [68]:
-- Find the player who scored the most points for one team 
WITH aggregated_team_season_player AS (
SELECT 
    dim_player_name
    , dim_team_id
    , dim_season
    , SUM(M_pts) AS total_points
FROM fct_game_details
WHERE M_pts IS NOT NULL 
    -- AND dim_player_name = 'Stephen Curry'
GROUP BY GROUPING SETS(
    (dim_player_name, dim_team_id)
    , (dim_player_name, dim_season)
    )
)
SELECT 
    dim_player_name
    , dim_team_id
    , total_points
FROM aggregated_team_season_player
WHERE dim_season IS NULL
GROUP BY dim_player_name, dim_team_id, total_points
ORDER BY total_points DESC
LIMIT 1


dim_player_name,dim_team_id,total_points
Giannis Antetokounmpo,1610612749,15556


In [101]:
-- Find the player who scored the most points in one season
-- Find the player who scored the most points for one team 
WITH aggregated_team_season_player AS (
SELECT 
    dim_player_name
    , dim_team_id
    , dim_season
    , SUM(M_pts) AS total_points
FROM fct_game_details
WHERE M_pts IS NOT NULL 
    -- AND dim_player_name = 'LeBron James'
GROUP BY GROUPING SETS(
    (dim_player_name, dim_team_id)
    , (dim_player_name, dim_season)
    )
)
SELECT 
    dim_player_name
    , dim_season
    , total_points
FROM aggregated_team_season_player
WHERE dim_season IS NOT NULL
GROUP BY dim_player_name, dim_season, total_points
ORDER BY total_points DESC
-- LIMIT 1


dim_player_name,dim_season,total_points
LeBron James,2017,3016
LeBron James,2016,2585
LeBron James,2019,2369
LeBron James,2021,1751
LeBron James,2015,1728
LeBron James,2018,1560
LeBron James,2020,1319
LeBron James,2022,686


In [99]:
-- Find which team won the most games overall and then by season (IF season IS NOT NULL)
WITH wins AS (
    SELECT 
        gd.game_id
        , gd.team_id
        , g.season
        , g.home_team_wins 
        , g.home_team_id
        , g.visitor_team_id
        , gd.team_abbreviation
        , CASE
            WHEN team_id = home_team_id AND home_team_wins = 1 THEN 1
            WHEN team_id = home_team_id AND home_team_wins = 0 THEN 0
            WHEN team_id = visitor_team_id AND home_team_wins = 1 THEN 0
            ELSE 1
        END AS won_game
    FROM game_details gd 
        LEFT JOIN games g ON gd.game_id = g.game_id
    GROUP BY gd.game_id, gd.team_id, g.home_team_wins, g.home_team_id, g.visitor_team_id, gd.team_abbreviation, g.season
)
SELECT
    team_abbreviation
    , season
    , SUM(won_game) AS total_wins 
FROM wins
GROUP BY GROUPING SETS (
    (team_abbreviation)
    , (team_abbreviation, season)
)
ORDER BY total_wins DESC, season ASC

team_abbreviation,season,total_wins
GSW,,445
BOS,,416
TOR,,410
MIL,,399
UTA,,377
LAC,,366
MIA,,365
DEN,,363
PHI,,342
SAS,,339


Write a Windowed Query on game\_details that finds out

\- What is the most games a team won in a 90 game stretch?

\- How many games in a row did LeBron James score over 10 points a game?

In [117]:
-- This answers the most games won (per team) in a 90 game stretch
WITH wins AS (
    SELECT 
        gd.game_id
        , gd.team_id
        , g.season
        , g.home_team_wins 
        , g.home_team_id
        , g.visitor_team_id
        , gd.team_abbreviation
        , CASE
            WHEN team_id = home_team_id AND home_team_wins = 1 THEN 1
            WHEN team_id = home_team_id AND home_team_wins = 0 THEN 0
            WHEN team_id = visitor_team_id AND home_team_wins = 1 THEN 0
            ELSE 1
        END AS won_game
    FROM game_details gd 
        LEFT JOIN games g ON gd.game_id = g.game_id
    GROUP BY gd.game_id, gd.team_id, g.home_team_wins, g.home_team_id, g.visitor_team_id, gd.team_abbreviation, g.season
), windowed AS (
    SELECT 
        team_abbreviation
        , game_id
        , SUM(won_game) OVER (
                PARTITION BY team_abbreviation
                ORDER BY game_id
                ROWS BETWEEN 89 PRECEDING AND CURRENT ROW
            ) as wins_per_90_games         
    FROM wins
)
SELECT 
    team_abbreviation
    , MAX(wins_per_90_games)
FROM windowed
GROUP BY team_abbreviation
ORDER BY MAX(wins_per_90_games) DESC



team_abbreviation,max
GSW,78
MIL,73
SAS,72
PHX,72
HOU,70
TOR,69
UTA,67
LAL,66
BOS,66
CLE,65


In [135]:
-- This answers how many games in a row LeBron James scored more than 10 points per game
-- The first CTE will add 1 each time there is a game below 10, creating different streak groups by game
WITH streak_groups AS (
    SELECT 
        player_name
        , game_id
        , pts 
        , SUM(CASE WHEN pts > 10 THEN 0 ELSE 1 END) OVER (
            PARTITION BY player_name 
            ORDER BY game_id
        ) AS streak_group
    FROM game_details
    WHERE player_name = 'LeBron James' AND comment IS NULL
), streak_lengths AS (
    SELECT
        player_name
        , streak_group
        , COUNT(*) AS streak_length
    FROM streak_groups
    WHERE pts > 10
    GROUP BY player_name, streak_group
) 
SELECT 
    player_name
    , MAX(streak_length) AS longest_streak
FROM streak_lengths
GROUP BY player_name;

player_name,longest_streak
LeBron James,163
