# dbApps05b Task: GROUP BY Categorical Analysis

**Course:** Database Applications Development (145085)
**Institution:** Medina County Career Center
**Topic:** GROUP BY, Column Aliases (AS)

In this task, you will use SQL GROUP BY clauses to aggregate data across multiple categories. You'll practice using COUNT, AVG, and SUM functions, apply column aliases for readable output, and explore data grouped by one or more columns.

## Setup: Connect to NBA Database

Before starting the tasks, import necessary libraries and establish a connection to the nba_5seasons.db database.

In [None]:
# Import required libraries
import pandas as pd
import sqlite3

# Establish connection to the nba_5seasons.db database
dbConnection = sqlite3.connect('/sessions/sweet-lucid-archimedes/mnt/databaseApplicationsForGitHub/data/nba_5seasons.db')

# Verify connection is successful by listing available tables
tableQuery = "SELECT name FROM sqlite_master WHERE type='table';"
tables = pd.read_sql_query(tableQuery, dbConnection)
print("Available tables:")
print(tables)

# Display info about database structure
print("\n" + "="*50)
print("Database Schema Information")
print("="*50)

## Task 1: GROUP BY Single Column - Count Records by Season

**Objective:** Group the team_game_stats table by season and count how many game records exist for each season.

**Expected Output:** A table showing each season and the total number of games played in that season.

**Hint:** Use COUNT(*) to count rows and GROUP BY season. Consider using AS for column aliases.

In [None]:
# Your code here

## Task 2: GROUP BY with Multiple Aggregation Functions

**Objective:** For each season in team_game_stats, calculate the total points scored (SUM), average points per game (AVG), and number of games.

**Expected Output:** A table with columns: season, total_points, avg_points_per_game, game_count.

**Hint:** Use SUM(pts), AVG(pts), and COUNT(*) in a single query. Use AS to create clean column names.

In [None]:
# Your code here

## Task 3: GROUP BY Single Column - Analyze Player Statistics

**Objective:** Using the player_season_stats table, calculate the average points, rebounds, and assists for each season across all players.

**Expected Output:** A table showing each season with avg_points, avg_rebounds, and avg_assists.

**Hint:** Use AVG() on the pts, reb, and ast columns. Apply meaningful aliases.

In [None]:
# Your code here

## Task 4: GROUP BY Multiple Columns

**Objective:** Group team_game_stats by both team_id and season. Count the number of games each team played in each season.

**Expected Output:** A table with columns: team_id, season, games_played. Limit results to first 10 rows.

**Hint:** Use GROUP BY team_id, season and COUNT(*). Remember to alias the count column.

In [None]:
# Your code here

## Task 5: GROUP BY with ORDER BY

**Objective:** Group team_game_stats by team_id and calculate average points per game for each team across all seasons. Order the results by average points in descending order.

**Expected Output:** A table showing team_id and avg_points, ordered from highest to lowest average.

**Hint:** Use GROUP BY team_id, then ORDER BY the aggregated column in descending order (DESC).

In [None]:
# Your code here

## Task 6: GROUP BY on Teams Table - Categorical Analysis

**Objective:** Using the teams table, group by state and count how many teams are located in each state.

**Expected Output:** A table showing state and team_count, ordered by team_count in descending order.

**Hint:** Use GROUP BY state on the teams table. Use COUNT(*) to count teams per state.

In [None]:
# Your code here

## Task 7: Complex GROUP BY - Average Plus/Minus by Season

**Objective:** Group team_game_stats by season and calculate the average plus_minus value for each season.

**Expected Output:** A table with season and avg_plus_minus (formatted to 2 decimal places if possible).

**Hint:** Use AVG(plus_minus) and GROUP BY season. Consider ROUND() function for cleaner output.

In [None]:
# Your code here

## Task 8: GROUP BY with Multiple Aggregations - Comprehensive Team Stats

**Objective:** For the 2016-17 season only, group by team_id and calculate: total games played, total points, average points per game, and average rebounds per game.

**Expected Output:** A table with team_id, games_played, total_points, avg_pts, avg_rebounds. Limit to first 8 rows.

**Hint:** Use WHERE season = '2016-17' before the GROUP BY. Use multiple aggregation functions in one query.

In [None]:
# Your code here

## Task 9: GROUP BY with Filtering - Top Performers by Season

**Objective:** Group player_season_stats by season and find the average field goal percentage (fg_pct) for each season. Order by season in ascending order.

**Expected Output:** A table showing season and avg_fg_pct, where players with NULL fg_pct values are excluded.

**Hint:** Use WHERE fg_pct IS NOT NULL to exclude NULL values. Use ROUND() for clean decimals.

In [None]:
# Your code here

## Task 10: Advanced GROUP BY - Team Performance Summary

**Objective:** Group team_game_stats by team_id and calculate: number of wins, number of losses, total games, and win percentage. Order by win percentage in descending order.

**Expected Output:** A table with team_id, wins, losses, total_games, and win_pct (as a decimal or percentage). Limit to first 10 rows.

**Hint:** Count games where wl = 'W' for wins and wl = 'L' for losses. Use CASE WHEN or SUM with conditional logic. Calculate win_pct as wins / total_games.

In [None]:
# Your code here