# SQL Joins

---

## ✨ Joining Tables

Today, we will review basic SQL joins.

▶️ First, run the code cell below to import modules used for **🧭 Check Your Work** sections and the autograder.

In [None]:
import unittest
import base64
tc = unittest.TestCase()

---

### 🎯 Pre-exercise: Import Packages

#### 👇 Tasks

- ✔️ Import the following Python packages.
    1. `pandas`: Use alias `pd`.
    2. `numpy`: Use alias `np`.
    3. `sqlite3`: No alias

In [2]:
### BEGIN SOLUTION
import pandas as pd
import numpy as np
import sqlite3
### END SOLUTION

#### 🧭 Check your work

In [3]:
import sys
tc.assertTrue('pd' in globals(), 'Check whether you have correctly imported Pandas with an alias.')
tc.assertTrue('np' in globals(), 'Check whether you have correctly imported NumPy with an alias.')
tc.assertTrue('sqlite3' in globals(), 'Check whether you have correctly imported the sqlite3 package.')

---
### 📌 Read Sqlite Database File

▶️ Run the code below to select the first 5 rows from the `students`, `courses`, and `enrollments` tables.

In [4]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('course-enrollments-sample.db')

print('students table')
display(pd.read_sql_query('SELECT * FROM students LIMIT 5;', con=conn))

print('========================')

print('courses table')
display(pd.read_sql_query('SELECT * FROM courses LIMIT 5;', con=conn))

print('========================')

print('enrollments table')
display(pd.read_sql_query('SELECT * FROM enrollments LIMIT 5;', con=conn))

print('========================')

conn.close()

students table


DatabaseError: Execution failed on sql 'SELECT * FROM students LIMIT 5;': no such table: students

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn_checker = sqlite3.connect('course-enrollments-sample.db')
tables_to_check = ['students', 'courses', 'enrollments']

# Check if table exists
user_tables = list(pd.read_sql_query('SELECT * FROM sqlite_master WHERE type="table";', con=conn_checker)['tbl_name'])

for table_to_check in tables_to_check:
    tc.assertTrue(table_to_check in user_tables, f'{table_to_check} does not exist in your NWT.db file!')

conn_checker.close()

---

### 🎯 Exercise 1: Join `students` into `enrollments` table

#### 👇 Tasks

- ✔️ Write a query that joins the `students` table into `enrollments`.
- ✔️ Select all columns.
- ✔️ Store your query to a new variable named `query_joined1`.

In [None]:
### BEGIN SOLUTION
query_joined1 = '''
SELECT *
FROM enrollments
LEFT JOIN students
ON enrollments.student_id == students.student_id;
'''
### END SOLUTION

conn = sqlite3.connect('course-enrollments-sample.db')
df_result = pd.read_sql_query(query_joined1, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('course-enrollments-sample.db')
df_check = pd.read_sql_query(query_joined1, con=conn)
tc.assertEqual(df_result.shape, (202, 7), 'Incorrect number of rows and/or columns')
conn.close()

---

### 🎯 Exercise 2: Join all three tables

#### 👇 Tasks

- ✔️ Write a query that joins all three tables.
- ✔️ Select all columns.
- ✔️ Store your query to a new variable named `query_joined2`.

In [None]:
### BEGIN SOLUTION
query_joined2 = '''
SELECT *
FROM enrollments
LEFT JOIN students
ON enrollments.student_id == students.student_id
LEFT JOIN courses
ON enrollments.course_id == courses.course_id;
'''
### END SOLUTION

conn = sqlite3.connect('course-enrollments-sample.db')
df_result = pd.read_sql_query(query_joined2, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('course-enrollments-sample.db')
df_check = pd.read_sql_query(query_joined2, con=conn)
tc.assertEqual(df_result.shape[0], 202, 'Incorrect number of rows')
conn.close()

---

### 🎯 Exercise 3: Total number of credit hours by class standing

#### 👇 Tasks

- ✔️ Write a query that finds the total number of credit hours by class status (freshman, sophomore, junior, senior).
- ✔️ Store your query to a new variable named `query_joined3`.

#### 🔑 Expected Output

|    | class     |   SUM(credit_hours) |
|---:|:----------|--------------------:|
|  0 | Freshman  |                 198 |
|  1 | Junior    |                 137 |
|  2 | Senior    |                 128 |
|  3 | Sophomore |                 199 |

In [None]:
### BEGIN SOLUTION
query_joined3 = '''
SELECT class, SUM(credit_hours)
FROM enrollments
LEFT JOIN students
ON enrollments.student_id == students.student_id
LEFT JOIN courses
ON enrollments.course_id == courses.course_id
GROUP BY class;
'''
### END SOLUTION

conn = sqlite3.connect('course-enrollments-sample.db')
df_result = pd.read_sql_query(query_joined3, con=conn)
display(df_result)
conn.close()

#### 🧭 Check your work

In [None]:
# DO NOT CHANGE THE CODE IN THIS CELL
conn = sqlite3.connect('course-enrollments-sample.db')
df_check = pd.read_sql_query(query_joined3, con=conn)

df_check.columns = ['A', 'B']
df_correct = pd.DataFrame({'A': ['Freshman', 'Junior', 'Senior', 'Sophomore'],
 'B': [198, 137, 128, 199]})

pd.testing.assert_frame_equal(df_check, df_correct)

conn.close()