# dbApps07b Task: LEFT JOIN Exercises

## Objective
Master LEFT JOIN syntax and understand how it differs from INNER JOIN.
Learn to identify and handle NULL values when rows don't match.

## Key Topics
- LEFT JOIN vs INNER JOIN
- NULL values in unmatched rows
- Using COALESCE to handle NULLs
- WHERE ... IS NULL to find orphan records

## Databases
- **imdb_class.db**: Title and person data from IMDb
- **nba_5seasons.db**: NBA team and player statistics (5 seasons)


In [None]:
# Import necessary libraries
import pandas as pd
import sqlite3

# Connect to both databases
connImdb = sqlite3.connect('/sessions/sweet-lucid-archimedes/mnt/databaseApplicationsForGitHub/dbApps07/imdb_class.db')
connNba = sqlite3.connect('/sessions/sweet-lucid-archimedes/mnt/databaseApplicationsForGitHub/dbApps07/nba_5seasons.db')

print('Connected to imdb_class.db and nba_5seasons.db')

---

## Task 1: LEFT JOIN - Check for Titles Without Ratings
**Instruction:**
Perform a LEFT JOIN from `title_basics` to `title_ratings` on `tconst`.
Check if any titles have NULL values in the `averageRating` column.

Display: `primaryTitle`, `averageRating`, `numVotes`.
Show rows where `averageRating IS NULL` to identify unrated titles.

**Question:** How many titles in the database don't have ratings?


In [None]:
# Your code here


---

## Task 2: Compare Row Counts - INNER JOIN vs LEFT JOIN
**Instruction:**
Run two queries on the same tables (title_basics and title_ratings):

**Query A:** INNER JOIN - Count rows
**Query B:** LEFT JOIN - Count rows

Compare the results. Why are the counts different?


In [None]:
# Your code here


---

## Task 3: NBA - Find Players with NO Season Statistics
**Instruction:**
LEFT JOIN `players` with `player_season_stats` on `player_id`.
Find all players where `season IS NULL` (players with no recorded statistics).

Display: `full_name`, `season`.

**Question:** How many players in the database have no season statistics recorded?


In [None]:
# Your code here


---

## Task 4: Use COALESCE to Replace NULL Values
**Instruction:**
Perform the same LEFT JOIN from Task 1 (title_basics and title_ratings).
Use COALESCE to replace NULL `averageRating` values with `0`.
Use COALESCE to replace NULL `numVotes` values with `0`.

Display: `primaryTitle`, `coalesced_rating`, `coalesced_votes`.
Limit to first 20 rows.

**Note:** COALESCE(col1, col2, ...) returns the first non-NULL value.


In [None]:
# Your code here


---

## Task 5: LEFT JOIN - Find Titles Without Crew Records
**Instruction:**
LEFT JOIN `title_basics` with `title_crew_person` on `tconst`.
Find all titles where `nconst IS NULL` (titles with no crew information).

Display: `primaryTitle`, `titleType`, `startYear`, `nconst`.

**Question:** How many titles have no associated crew records?


In [None]:
# Your code here


---

## Task 6: NBA - Teams and Win Counts (Including Teams with 0 Wins in a Season)
**Instruction:**
LEFT JOIN `teams` with `team_game_stats`.
For a specific season (e.g., 2017), count wins per team using `COUNT(CASE WHEN wl = 'W' THEN 1 END) as wins`.
Group by team name.

Include all teams, even if they had 0 wins in that season (which shouldn't happen in real data, but demonstrates LEFT JOIN).

Display: `full_name`, `wins`, `total_games` (COUNT of all game records).


In [None]:
# Your code here


---

## Task 7: Explanation - When to Use INNER JOIN vs LEFT JOIN
**Instruction:**
Write a markdown explanation (in the cell below) answering:
1. When should you use INNER JOIN?
2. When should you use LEFT JOIN?
3. What are the pros and cons of each?
4. Can you think of a real-world scenario for each?


Your explanation here...


---

## Task 8: Challenge - Find Orphan Records Using LEFT JOIN and IS NULL
**Instruction:**
Write a query that uses LEFT JOIN and WHERE ... IS NULL to find "orphan" records.

Choose one of these scenarios:
- **Option A:** Find titles in `title_basics` that have no corresponding row in `title_principals` (titles with no cast/crew).
- **Option B:** Find players in `players` that have no corresponding row in `player_season_stats` (retired players with no stats in this dataset).
- **Option C:** Find teams in `teams` that have no games in `team_game_stats` (new teams, or data quality issue).

Display relevant columns and count the results.


In [None]:
# Your code here


---

## Summary
By completing these tasks, you've learned:
- LEFT JOIN preserves all rows from the left table
- NULL values appear in unmatched rows
- WHERE ... IS NULL finds orphan/missing records
- COALESCE handles NULL values gracefully
- INNER JOIN vs LEFT JOIN have different use cases
