# dbApps07c Task: Multi-Table JOINs & GROUP BY

## Learning Objectives
- Master 3+ table JOINs with complex relationships
- Combine JOINs with GROUP BY and aggregation functions
- Use HAVING clauses to filter aggregated results
- Analyze real-world movie and sports data

## Setup
Connect to imdb_class.db and nba_5seasons.db using pandas and sqlite3.

In [None]:
import pandas as pd
import sqlite3
import os

# Verify database files exist
dbPath = '/sessions/sweet-lucid-archimedes/mnt/databaseApplicationsForGitHub/dbApps07/'
imdbPath = os.path.join(dbPath, 'imdb_class.db')
nbaPath = os.path.join(dbPath, 'nba_5seasons.db')

print(f"IMDb database exists: {os.path.exists(imdbPath)}")
print(f"NBA database exists: {os.path.exists(nbaPath)}")

# Create connections
connImdb = sqlite3.connect(imdbPath)
connNba = sqlite3.connect(nbaPath)

---
## IMDb Tasks: Multi-Table JOINs

### Task 1: Three-Table JOIN - Find All Actors in Movies from 2020+
**Instructions:** Join title_basics → title_principals → name_basics to find all actors/actresses who appeared in movies released in 2020 or later. Show actor name, movie title, and release year. Limit to 20 results.

**Hint:** Use INNER JOINs. Filter on startYear >= 2020 and titleType = 'movie'.

In [None]:
# Task 1 Code Cell
# TODO: Write your SQL query here

### Task 2: Top 10 Directors by Movie Count
**Instructions:** Find the top 10 directors by number of movies they have directed. Use title_crew_person (WHERE role='director'), join with name_basics, and GROUP BY director name. Show director name and movie count.

**Hint:** GROUP BY + COUNT, ORDER BY count descending, WHERE role = 'director'

In [None]:
# Task 2 Code Cell
# TODO: Write your SQL query here

### Task 3: Top 10 Actors by Average Movie Rating
**Instructions:** Find the top 10 actors/actresses by average movie rating. Join title_principals → title_basics → title_ratings → name_basics. Show actor name and average rating. ONLY include actors with at least 4 movie credits (HAVING clause).

**Hint:** GROUP BY actor, HAVING COUNT(tconst) >= 4, calculate AVG(averageRating)

In [None]:
# Task 3 Code Cell
# TODO: Write your SQL query here

### Task 4: Most Prolific Actors (By Movie Credits)
**Instructions:** Find actors with the most movie credits. Join title_principals → title_basics → name_basics. Show actor name, movie count, and average rating of their films. Limit to top 15 results.

**Hint:** GROUP BY actor, COUNT movies, calculate AVG rating, ORDER BY count DESC

In [None]:
# Task 4 Code Cell
# TODO: Write your SQL query here

### Task 5: Average Rating by Genre
**Instructions:** Find the average rating for each genre. Join title_basics with title_ratings and GROUP BY genres. Show genre and average rating, ordered by average rating descending. Note: genres are comma-separated in the database.

**Hint:** GROUP BY genres, AVG(averageRating), ORDER BY avg DESC. Some rows may have multiple genres separated by commas.

In [None]:
# Task 5 Code Cell
# TODO: Write your SQL query here

---
## NBA Tasks: Multi-Table JOINs

### Task 6: Player Season Stats with Team Names
**Instructions:** Join players → player_season_stats → teams to show each player's name, team name, season, games played, and average points per game. Show top 20 results ordered by points descending.

**Hint:** 3-table join. Group not needed here—just show raw stats per season.

In [None]:
# Task 6 Code Cell
# TODO: Write your SQL query here using connNba

### Task 7: Team with Highest Average Points Per Game
**Instructions:** Find which team had the highest average points per game across all seasons. Join teams with team_game_stats, GROUP BY team, calculate AVG(pts).

**Hint:** GROUP BY team_id and full_name, AVG(pts), ORDER BY avg DESC, LIMIT 1

In [None]:
# Task 7 Code Cell
# TODO: Write your SQL query here using connNba

---
## Challenge Task

### Task 8: Directors with Average Rating > 7.0 and 5+ Movies (Complex)
**Instructions:** Find all directors whose movies have an average rating above 7.0 AND who have directed at least 5 movies. Show director name, movie count, and average rating. Order by average rating descending.

**Hint:** 3-table join (title_crew_person → title_basics → title_ratings → name_basics). WHERE role = 'director'. GROUP BY director. HAVING AVG(rating) > 7.0 AND COUNT(*) >= 5

In [None]:
# Task 8 Code Cell
# TODO: Write your SQL query here

---
## Summary
After completing these tasks, you should understand:
- How to join 3+ tables using INNER JOINs
- How to apply GROUP BY to aggregated data
- How to use HAVING to filter on aggregate functions
- How to combine complex conditions in real-world queries