# Has Many Through

### Introduction

In previous lessons, we saw how to work with a `has_many`, `has_one` relationship pattern, like an artist that has created many albums.  In this lesson, we'll see how to structure database for a many to many relationship.

### Our Setting 

Imagine we are creating a database for a hospital.  We want to keep track of the patients that each doctor has.  Notice that here, we do not really have a `has_many` `has_one` relationship.

This is because:
* A patient `has_many` doctors **and**
* A doctor `has_many` patients.

Let's try to structure this type of relationship in excel.  To start with here are our tables of doctors and patients.

* Doctors

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/doctors-table.png?raw=1" width="50%">

* Patients

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/patients-table.png?raw=1" width="50%">

Now let's think about how we can connect the two.

If we place the foreign key of `doctor_id` on the patients, then we are saying that a patient can only have one doctor.  And every time a patient gets another doctor, we would have to add another `doctor_id` column.

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/wrong-join-doctors.png?raw=1" width="50%">

But if we place the foreign key of `patient_id` on the doctors table, then we are saying that a doctor can only have one patient.  And every time a doctor gets another patient, we would have to add another `patient_id` column.

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/wrong-doctors.png?raw=1" width="50%">

Either way, we are running into an issue.  The problem is that:
* We do not know how many patients a doctor will have, and
* We do not know how doctors a patient will have

### The Solution

We can solve this by creating a join table.  In our join table each row will represent a doctor patient relationship.

* `doctor_patients`

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/doctors-patients-join.png?raw=1" width="60%">

Let's take a moment to understand the new table above.

Above the first row represents the relationship between the doctor with `id = 2` and the patient where `id = 3`.  And then if we want to see the other patient that the doctor has, we can see in the fourth row that the doctor with `id = 2` also has a patient of `4`. 

### Loading Our Data

In [1]:
import pandas as pd
import sqlite3
conn = sqlite3.connect('./hospitals.db')

In [2]:
import pandas as pd
doctors_patients_url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-sql-curriculum/master/2-sql-relations/4-has-many-through-reading/doctors_patients.csv"
doctors_url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-sql-curriculum/master/2-sql-relations/4-has-many-through-reading/doctors.csv"
patients_url = "https://raw.githubusercontent.com/data-eng-10-21/mod-1-sql-curriculum/master/2-sql-relations/4-has-many-through-reading/patients.csv"
doctors_df = pd.read_csv(doctors_url)
patients_df = pd.read_csv(patients_url)
doctors_patients_df = pd.read_csv(doctors_patients_url)

In [3]:
doctors_df.to_sql('doctors', conn, index = False)
patients_df.to_sql('patients', conn, index = False)
doctors_patients_df.to_sql('doctors_patients', conn, index = False)

In [4]:
cursor = conn.cursor()
cursor.execute('SELECT name from sqlite_master where type= "table"')
cursor.fetchall()

[('doctors',), ('patients',), ('doctors_patients',)]

In [6]:
doctors_df = pd.read_sql('select * from doctors;', conn)
patients_df = pd.read_sql('select * from patients;', conn)
doctors_patients_df = pd.read_sql('select * from doctors_patients;', conn)

In [7]:
doctors_df

Unnamed: 0,id,first_name,last_name,position
0,1,Gregory,House,General Practitioner
1,2,Lisa,Cuddy,Chief Doctor
2,3,James,Wilson,Cancer Specialist
3,4,Robert,Chase,Resident
4,5,Eric,Foreman,Practicing Doctor


In [8]:
patients_df

Unnamed: 0,id,first_name,last_name,birthday
0,1,Jerry,Seinfeld,1962-3-3
1,2,Elaine,Benis,1966-4-5
2,3,Cosmo,Kramer,1960-5-10
3,4,Costanza,George,1962-6-10


### Querying the Data

Now that we have loaded our data into our database, let's begin to ask questions of it.

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/doctors-patients-join.png?raw=1" width="30%"> <img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/doctors-table.png?raw=1" width="30%"> <img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/patients-table.png?raw=1" width="30%">

Let's start with working with the `doctors_patients` table and then we can go from there.

* Find the all of the patients who have the doctor with `id = 2`.

In [9]:
cursor.execute('select * from doctors_patients WHERE doctor_id = 2')
cursor.fetchall()

[(1, 2, 3), (4, 2, 4)]

We can see that our first and fourth rows are returned, and that the patients of the doctor with id 2 is the patient with `id = 3`, and the patient with `id = 4`.

The next thing to do from here, is get the names of those patients who have had the doctor with id = 2.

In [10]:
statement = '''SELECT patients.* FROM doctors_patients 
JOIN patients ON doctors_patients.patient_id = patients.id
WHERE doctor_id = 2'''

cursor.execute(statement)
cursor.fetchall()

[(3, 'Cosmo', 'Kramer', '1960-5-10'), (4, 'Costanza', 'George', '1962-6-10')]

So here we can see that the patients that had `doctor_id = 2` are Kramer and George.

Let's take a moment to better this statement.

We went from this:

```sql
SELECT * FROM doctors_patients WHERE doctor_id = 2
```

To this: 

```sql 
SELECT patients.* FROM doctors_patients 
JOIN patients ON doctors_patients.patient_id = patients.id
WHERE doctor_id = 2
```

In the latter statement, to get to the patients names, we had to join the patients table.  And to link the `doctors_patients` table to the `patients` table, we joined on `doctors_patients.patient_id = patients.id`.

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/joined-doctors-patients.png?raw=1" width="60%">

Then we found those rows where the `doctor_id` was 2.

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/filtered-join.png?raw=1" width="70%">

And from there we only said to only select the patient columns.

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/patient-cols.png?raw=1" width="50%">

### Queries with three tables

Let's take a look at the data from our three tables again.

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/doctors-patients-join.png?raw=1" width="30%"> <img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/doctors-table.png?raw=1" width="30%"> <img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/patients-table.png?raw=1" width="30%">

Let's say that now we want to find the patients of Lisa Cuddy.  To do so we will need to have a query that involves all three queries.

Our previous statement was pretty close.

```sql 
SELECT patients.* FROM doctors_patients 
JOIN patients ON doctors_patients.patient_id = patients.id
WHERE doctor_id = 2
```

The change we would like to make is to replace `WHERE doctor_id = 2` with `WHERE doctors.last_name = Cuddy`.

And to do that, we need to load up the doctors table, and join the rows to our `doctors_patients` table.  Here's how we can do that:

```sql 
SELECT patients.* FROM doctors_patients 
JOIN patients ON doctors_patients.patient_id = patients.id
JOIN doctors ON doctors_patients.doctor_id = doctors.id
WHERE doctors.last_name = 'Cuddy'
```

In [11]:
select = """SELECT patients.* FROM doctors_patients 
JOIN patients ON doctors_patients.patient_id = patients.id
JOIN doctors ON doctors_patients.doctor_id = doctors.id
WHERE doctors.last_name = 'Cuddy'"""

In [12]:
cursor.execute(select)
cursor.fetchall()

[(3, 'Cosmo', 'Kramer', '1960-5-10'), (4, 'Costanza', 'George', '1962-6-10')]

It worked!  We now got the same result, but we did it by using a doctor's name, instead of using a doctor's id.  Finally, we could query using both the first and last name.

In [13]:
updated_select = """SELECT patients.* FROM doctors_patients 
JOIN patients ON doctors_patients.patient_id = patients.id
JOIN doctors ON doctors_patients.doctor_id = doctors.id
WHERE doctors.first_name = 'Lisa' AND doctors.last_name = 'Cuddy'"""

cursor.execute(updated_select)
cursor.fetchall()

[(3, 'Cosmo', 'Kramer', '1960-5-10'), (4, 'Costanza', 'George', '1962-6-10')]

### Your Turn

Now it's your turn.  Once again, here are our tables.

<img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/doctors-patients-join.png?raw=1" width="30%"> <img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/doctors-table.png?raw=1" width="30%"> <img src="https://github.com/data-eng-10-21/mod-1-sql-curriculum/blob/master/2-sql-relations/4-has-many-through-reading/patients-table.png?raw=1" width="30%">

Now write a sql query that finds all of the doctors that Cosmo Kramer has been served by.

In [41]:
select_doctors = """
                SELECT doctors.* FROM doctors_patients
                JOIN doctors ON doctors.id = doctors_patients.doctor_id
                JOIN patients ON patients.id = doctors_patients.patient_id
                WHERE patients.first_name = 'Cosmo' AND patients.last_name = 'Kramer'
"""

In [42]:
cursor.execute(select_doctors)
cursor.fetchall()

# [(2, 'Lisa', 'Cuddy', 'Chief Doctor'), (4, 'Robert', 'Chase', 'Resident')]

[(2, 'Lisa', 'Cuddy', 'Chief Doctor'), (4, 'Robert', 'Chase', 'Resident')]

### Answers

In [None]:
select_doctors = """SELECT doctors.* FROM doctors_patients 
JOIN patients ON doctors_patients.patient_id = patients.id
JOIN doctors ON doctors_patients.doctor_id = doctors.id
WHERE patients.first_name = 'Cosmo' AND patients.last_name = 'Kramer'"""
