# dbApps07 DIY Task: IMDb & NBA Query Challenge

## Independent Assessment (Sub-lesson 07d)

### Overview
This is an independent assessment combining all JOIN types with GROUP BY and HAVING clauses. You will work with two real databases: IMDb and NBA. This assessment tests your ability to write complex SQL queries with multiple table joins and aggregations.

### Grading Criteria
- Correct JOIN syntax (points for each valid join)
- Accurate use of GROUP BY and HAVING
- Meaningful results and proper filtering
- Code clarity and comments
- Completion of open-ended tasks demonstrating creativity

### Setup
Connect to both IMDb and NBA databases.

In [None]:
import pandas as pd
import sqlite3
import os

# Setup database connections
dbPath = '/sessions/sweet-lucid-archimedes/mnt/databaseApplicationsForGitHub/dbApps07/'
imdbPath = os.path.join(dbPath, 'imdb_class.db')
nbaPath = os.path.join(dbPath, 'nba_5seasons.db')

# Verify databases exist
print(f"IMDb database exists: {os.path.exists(imdbPath)}")
print(f"NBA database exists: {os.path.exists(nbaPath)}")

# Create connections
connImdb = sqlite3.connect(imdbPath)
connNba = sqlite3.connect(nbaPath)

print("\nConnections established successfully!")

---
## Part 1: IMDb Queries (5 Tasks)

### IMDb Task 1: High-Rated Movies with Significant Votes
**Requirements:**
- Find all movies with average rating > 8.5 AND more than 50,000 votes
- Join title_basics with title_ratings
- Show: movie title, release year, average rating, number of votes
- Order by rating descending
- Limit to 15 results

**Hint:** This is a 2-table join with a WHERE clause filtering on rating and numVotes.

In [None]:
# IMDb Task 1: High-rated movies with significant votes
# TODO: Write your SQL query here
# Remember to use pd.read_sql_query(query, connImdb)

### IMDb Task 2: Cast of a Specific Movie
**Requirements:**
- Choose a movie (you can pick 'The Shawshank Redemption', 'Inception', or any movie you know)
- Find all actors/actresses who appeared in that movie
- Join title_basics → title_principals → name_basics
- Show: actor name, character name, category (actor/actress/etc)
- Order by ordering (the order they appear in credits)
- Limit to 20 results

**Hint:** Use 3-table joins. Filter on primaryTitle = 'Your Movie Name' or use a WHERE clause with LIKE.

In [None]:
# IMDb Task 2: Cast of a specific movie
# TODO: Write your SQL query here
# Change the movie title to one you're interested in

### IMDb Task 3: Top 5 Most Common Genres by Title Count
**Requirements:**
- Count how many titles are in each genre
- Join title_basics with title_ratings (optional, but can add rating info)
- GROUP BY genres
- Show: genre, title count, average rating
- Order by count descending
- Limit to 5 results

**Hint:** GROUP BY genres column. Note that genres are comma-separated strings. Filter out null genres (\\N).

In [None]:
# IMDb Task 3: Top 5 most common genres
# TODO: Write your SQL query here

### IMDb Task 4: Directors Who Directed Movies in 3+ Different Decades (Complex)
**Requirements:**
- Find directors who have directed movies in at least 3 different decades (1980s, 1990s, 2000s, etc.)
- Join title_crew_person → title_basics → name_basics
- Calculate the decade from startYear (e.g., year 1994 = decade 1990)
- GROUP BY director
- Use HAVING to filter for directors with 3+ decades
- Show: director name, number of decades, year range
- Order by number of decades descending

**Hint:** Use CAST(startYear / 10 * 10 AS INTEGER) to extract decade. COUNT(DISTINCT decade) in HAVING clause.

In [None]:
# IMDb Task 4: Directors spanning 3+ decades (COMPLEX)
# TODO: Write your SQL query here
# This requires calculating decades and using a complex HAVING clause

### IMDb Task 5: Open-Ended Query (Your Choice)
**Requirements:**
- Write your own complex query combining JOINs, GROUP BY, and HAVING
- Must use at least 2 tables and at least one aggregation function
- Examples:
  - Actors who appeared with a specific actor in multiple movies
  - Movies with the highest average cast member age (if birth year data available)
  - Genres that improved rating over time (comparing decades)
  - Producers/writers with most highly-rated works
- Write a brief explanation of what your query does

**Hint:** Think about interesting patterns you want to explore in the IMDb data. Be creative!

In [None]:
# IMDb Task 5: Open-Ended Query (Your Creative Query)
# TODO: Write a creative query of your own
# Briefly describe what your query does:
# 
# [Your description here]

---
## Part 2: NBA Queries (3 Tasks)

### NBA Task 6: Player Season Stats with Team Names
**Requirements:**
- Show each player's name, team name, season, and average points per game
- Join players → player_season_stats → teams
- Calculate PPG (points / games played)
- Show: player name, team name, season, games played, total points, PPG
- Filter for players with at least 10 games played in a season
- Order by PPG descending
- Limit to 20 results

**Hint:** 3-table join. Calculate PPG as pts / gp (handle division carefully).

In [None]:
# NBA Task 6: Player season stats with team names
# TODO: Write your SQL query here using connNba

### NBA Task 7: Teams That Improved Their Scoring
**Requirements:**
- Find teams that improved their scoring from 2018-19 season to 2022-23 season
- Use team_game_stats and teams
- Calculate average points per game for each season (GROUP BY team_id and season)
- Compare 2018-19 avg PPG to 2022-23 avg PPG
- Show: team name, 2018-19 avg PPG, 2022-23 avg PPG, improvement (difference)
- Order by improvement descending

**Hint:** You may need to use a subquery or multiple CTEs to compare seasons. Or create separate aggregations for each season.

In [None]:
# NBA Task 7: Teams that improved scoring from 2018-19 to 2022-23
# TODO: Write your SQL query here using connNba

### NBA Task 8: Open-Ended Query (Your Choice)
**Requirements:**
- Write your own query using NBA data with at least 2 tables and aggregations
- Examples:
  - Which team has the highest combined assists and steals per game?
  - Player consistency: which players have low variance in points across seasons?
  - Teams with best field goal percentage over the 5-year period
  - Which season had the most competitive play (smallest difference between best and worst teams)?
- Write a brief explanation of what your query does

**Hint:** Explore team performance metrics, player statistics, or trends across the seasons.

In [None]:
# NBA Task 8: Open-Ended Query (Your Creative Query)
# TODO: Write a creative query of your own
# Briefly describe what your query does:
# 
# [Your description here]

---
## Assessment Summary

### Self-Evaluation Checklist
- [ ] All 8 tasks completed with valid SQL
- [ ] JOINs are correctly formatted (correct ON clauses)
- [ ] GROUP BY and HAVING clauses used appropriately
- [ ] Results make logical sense
- [ ] Code is commented and readable
- [ ] Both open-ended tasks (IMDb #5 and NBA #8) show creativity
- [ ] No SQL syntax errors

### Skills Demonstrated
After completing this assessment, you have demonstrated:
1. Multi-table JOIN operations (2-3+ tables)
2. GROUP BY with single and multiple fields
3. HAVING clauses with complex conditions
4. Aggregation functions (COUNT, AVG, SUM, etc.)
5. Filtering and sorting results
6. Working with multiple databases
7. Real-world data analysis and exploration
8. Problem-solving with SQL