# dbApps07: SQL JOINs
## Database Applications Development (145085)
## Medina County Career Center

This lesson covers the fundamental concepts and practical applications of SQL JOINs, including INNER JOINs, LEFT JOINs, and multi-table JOINs with aggregation functions.

## Setup & Imports

Connect to both the IMDb and NBA databases to explore JOIN operations across different data sources.

In [None]:
import pandas as pd
import sqlite3

# Connect to IMDb database
connImdb = sqlite3.connect('imdb_class.db')

# Connect to NBA database
connNba = sqlite3.connect('nba_5seasons.db')

print("Both database connections established successfully.")

---
## Sub-Lesson 07a — INNER JOIN

An **INNER JOIN** returns only the rows where there is a match in BOTH tables. This is the most commonly used type of join.

### Syntax:
```sql
SELECT columns
FROM table1
INNER JOIN table2
ON table1.key = table2.key
```

The **ON clause** specifies the condition that links rows from the two tables together.

### Example 1: IMDb — Basic INNER JOIN

Join `title_basics` with `title_ratings` to show movie titles along with their ratings.

**Explanation:**
- We use table aliases (`tb` for title_basics, `tr` for title_ratings) to make the query cleaner
- The ON clause matches rows where the primary keys (tconst) are equal
- Only titles that have ratings will be returned

In [None]:
# Query: Show movie titles with their average ratings
queryInnerJoin1 = """
SELECT 
    tb.primaryTitle,
    tb.startYear,
    tr.averageRating,
    tr.numVotes
FROM title_basics tb
INNER JOIN title_ratings tr ON tb.tconst = tr.tconst
WHERE tb.titleType = 'movie'
LIMIT 10
"""

resultInnerJoin1 = pd.read_sql_query(queryInnerJoin1, connImdb)
print("IMDb Movies with Ratings (INNER JOIN):")
print(resultInnerJoin1)

### Example 2: NBA — INNER JOIN with Teams and Games

Join `teams` with `team_game_stats` to show team names along with their game statistics.

In [None]:
# Query: Show team names with their game statistics
queryInnerJoin2 = """
SELECT 
    t.full_name,
    t.abbreviation,
    gs.season,
    gs.game_id,
    COUNT(*) as gameCount
FROM teams t
INNER JOIN team_game_stats gs ON t.team_id = gs.team_id
WHERE gs.season = 2023
GROUP BY t.full_name, gs.game_id
LIMIT 5
"""

resultInnerJoin2 = pd.read_sql_query(queryInnerJoin2, connNba)
print("NBA Teams with Game Statistics (INNER JOIN):")
print(resultInnerJoin2)

### Try This 7a.1

Write an INNER JOIN query that shows:
- The movie title (from title_basics)
- The rating (from title_ratings)
- Filter for movies released between 2015 and 2020
- Order by rating descending
- Show top 5 results

In [None]:
# Write your query here
queryTryThis7a1 = """

"""

# resultTryThis7a1 = pd.read_sql_query(queryTryThis7a1, connImdb)
# print(resultTryThis7a1)

### Try This 7a.2

Write an INNER JOIN query for the NBA database that shows:
- Player names (from players table)
- Points per game (from player_season_stats)
- Season 2023
- Players who scored more than 10 points per game
- Order by points descending

In [None]:
# Write your query here
queryTryThis7a2 = """

"""

# resultTryThis7a2 = pd.read_sql_query(queryTryThis7a2, connNba)
# print(resultTryThis7a2)

---
## Sub-Lesson 07b — LEFT JOIN

A **LEFT JOIN** returns all rows from the left table, even if there is no match in the right table. Unmatched rows will have NULL values for columns from the right table.

### Syntax:
```sql
SELECT columns
FROM table1
LEFT JOIN table2
ON table1.key = table2.key
```

**Key Difference from INNER JOIN:**
- INNER JOIN: Only matching rows
- LEFT JOIN: All rows from left table + matching rows from right table

### Example 3: Comparing INNER vs LEFT JOIN

First, let's see how many records we get with an INNER JOIN vs a LEFT JOIN using the NBA players table.

In [None]:
# INNER JOIN: Only players who have season stats
queryInnerJoinNBA = """
SELECT 
    COUNT(DISTINCT p.player_id) as playerCount
FROM players p
INNER JOIN player_season_stats pss ON p.player_id = pss.player_id
WHERE pss.season = 2023
"""

innerCount = pd.read_sql_query(queryInnerJoinNBA, connNba)
print("Players with stats (INNER JOIN):")
print(innerCount)

In [None]:
# LEFT JOIN: All players, with NULLs if they don't have stats
queryLeftJoinNBA = """
SELECT 
    COUNT(DISTINCT p.player_id) as playerCount
FROM players p
LEFT JOIN player_season_stats pss ON p.player_id = pss.player_id
WHERE pss.season = 2023 OR pss.season IS NULL
"""

leftCount = pd.read_sql_query(queryLeftJoinNBA, connNba)
print("Total players (LEFT JOIN):")
print(leftCount)

### Example 4: LEFT JOIN with NULLs

Show all players and their season statistics. Notice how some players will have NULL values if they don't have stats for that season.

In [None]:
# Query: All players with their stats (or NULL if no stats)
queryLeftJoin = """
SELECT 
    p.full_name,
    pss.season,
    pss.gp as gamesPlayed,
    pss.pts as pointsPerGame,
    pss.reb as reboundsPerGame,
    pss.ast as assistsPerGame
FROM players p
LEFT JOIN player_season_stats pss ON p.player_id = pss.player_id
WHERE pss.season = 2023
LIMIT 15
"""

resultLeftJoin = pd.read_sql_query(queryLeftJoin, connNba)
print("NBA Players with Season Stats (LEFT JOIN):")
print(resultLeftJoin)

### Try This 7b.1

Write a LEFT JOIN query for the IMDb database that shows:
- All TV shows from title_basics
- Their ratings from title_ratings (if available)
- Show records where ratings are NULL (unrated shows)
- Limit to 10 results

In [None]:
# Write your query here
queryTryThis7b1 = """

"""

# resultTryThis7b1 = pd.read_sql_query(queryTryThis7b1, connImdb)
# print(resultTryThis7b1)

### Try This 7b.2

Write a LEFT JOIN query that shows all teams and their total game count for 2023.
- Include teams that may not have games in the dataset
- Use GROUP BY and COUNT to aggregate

In [None]:
# Write your query here
queryTryThis7b2 = """

"""

# resultTryThis7b2 = pd.read_sql_query(queryTryThis7b2, connNba)
# print(resultTryThis7b2)

---
## Sub-Lesson 07c — Multi-Table JOINs & GROUP BY

You can join more than two tables together to answer complex questions. Combine JOINs with GROUP BY, WHERE, and HAVING clauses for powerful data analysis.

### Syntax:
```sql
SELECT columns, aggregate_function(column)
FROM table1
JOIN table2 ON table1.key = table2.key
JOIN table3 ON table2.key = table3.key
WHERE condition
GROUP BY column
HAVING aggregate_condition
ORDER BY column
```

### Example 5: 3-Table JOIN — Find Actor Names for Movies

Connect three tables: title_basics → title_principals → name_basics to find which actors appeared in which movies.

In [None]:
# Query: Join 3 tables to find actors in movies
queryMultiJoin1 = """
SELECT 
    tb.primaryTitle as movieTitle,
    tb.startYear,
    nb.primaryName as actorName,
    tp.category as role
FROM title_basics tb
INNER JOIN title_principals tp ON tb.tconst = tp.tconst
INNER JOIN name_basics nb ON tp.nconst = nb.nconst
WHERE tb.titleType = 'movie' 
  AND tp.category IN ('actor', 'actress')
  AND tb.startYear = 2020
LIMIT 10
"""

resultMultiJoin1 = pd.read_sql_query(queryMultiJoin1, connImdb)
print("Actors in 2020 Movies (3-Table JOIN):")
print(resultMultiJoin1)

### Example 6: Multi-Table JOIN with GROUP BY

Count how many movies each actor has appeared in.

In [None]:
# Query: Count movies per actor
queryMultiJoin2 = """
SELECT 
    nb.primaryName as actorName,
    COUNT(DISTINCT tb.tconst) as movieCount
FROM name_basics nb
INNER JOIN title_principals tp ON nb.nconst = tp.nconst
INNER JOIN title_basics tb ON tp.tconst = tb.tconst
WHERE tp.category IN ('actor', 'actress')
  AND tb.titleType = 'movie'
GROUP BY nb.nconst, nb.primaryName
ORDER BY movieCount DESC
LIMIT 10
"""

resultMultiJoin2 = pd.read_sql_query(queryMultiJoin2, connImdb)
print("Top Actors by Movie Count:")
print(resultMultiJoin2)

### Example 7: Complex Query — JOINs + WHERE + GROUP BY + HAVING + ORDER BY

Find the average movie rating per actor for actors who have appeared in at least 5 movies.

In [None]:
# Query: Average rating per actor (with filters)
queryMultiJoin3 = """
SELECT 
    nb.primaryName as actorName,
    COUNT(DISTINCT tb.tconst) as movieCount,
    ROUND(AVG(tr.averageRating), 2) as avgMovieRating
FROM name_basics nb
INNER JOIN title_principals tp ON nb.nconst = tp.nconst
INNER JOIN title_basics tb ON tp.tconst = tb.tconst
INNER JOIN title_ratings tr ON tb.tconst = tr.tconst
WHERE tp.category IN ('actor', 'actress')
  AND tb.titleType = 'movie'
GROUP BY nb.nconst, nb.primaryName
HAVING COUNT(DISTINCT tb.tconst) >= 5
ORDER BY avgMovieRating DESC
LIMIT 10
"""

resultMultiJoin3 = pd.read_sql_query(queryMultiJoin3, connImdb)
print("Top Actors by Average Movie Rating (Min 5 Movies):")
print(resultMultiJoin3)

### Example 8: NBA Multi-Table Analysis

Find the total points scored per team across all seasons.

In [None]:
# Query: Total points per team
queryNBAMultiJoin = """
SELECT 
    t.full_name as teamName,
    pss.season,
    COUNT(pss.player_id) as playerCount,
    ROUND(AVG(pss.pts), 2) as avgPointsPerGame
FROM teams t
INNER JOIN player_season_stats pss ON t.team_id = pss.team_id
WHERE pss.season = 2023
GROUP BY t.team_id, t.full_name, pss.season
ORDER BY avgPointsPerGame DESC
LIMIT 10
"""

resultNBAMultiJoin = pd.read_sql_query(queryNBAMultiJoin, connNba)
print("NBA Teams - Average Points Per Game (2023):")
print(resultNBAMultiJoin)

### Try This 7c.1

Write a multi-table JOIN query that shows:
- Movie title
- Number of actors/actresses in the movie
- Average rating of the movie
- Filter for movies with at least 20 actors
- Order by movie rating descending
- Limit to 10 results

In [None]:
# Write your query here
queryTryThis7c1 = """

"""

# resultTryThis7c1 = pd.read_sql_query(queryTryThis7c1, connImdb)
# print(resultTryThis7c1)

### Try This 7c.2

Write a multi-table JOIN query for the NBA that shows:
- Player name
- Team name
- Average points, rebounds, and assists for 2023
- Players who averaged more than 20 points per game
- Order by points descending

In [None]:
# Write your query here
queryTryThis7c2 = """

"""

# resultTryThis7c2 = pd.read_sql_query(queryTryThis7c2, connNba)
# print(resultTryThis7c2)

### Try This 7c.3

Write a complex query that demonstrates:
- At least 2 JOINs (you choose the tables)
- WHERE clause with multiple conditions
- GROUP BY aggregation
- HAVING clause to filter grouped results
- ORDER BY

You can use either database. Include a comment explaining what your query does.

In [None]:
# Write your query here
# Description: 
queryTryThis7c3 = """

"""

# resultTryThis7c3 = pd.read_sql_query(queryTryThis7c3, connImdb)
# print(resultTryThis7c3)

---
## Summary

You've now learned three types of SQL JOINs:

1. **INNER JOIN**: Returns only matching rows from both tables
2. **LEFT JOIN**: Returns all rows from the left table, plus matching rows from the right table (with NULLs for non-matches)
3. **Multi-Table JOINs**: Connect 3+ tables together for complex analysis

When combined with GROUP BY, WHERE, HAVING, and ORDER BY, JOINs allow you to answer sophisticated business questions and extract meaningful insights from relational databases.