# Reviewing SQL Joins - Inner and Outer Joins

When performing joins, we have multiple types, an inner join, an outer join, and a full outer join.  In this lesson, we'll review the different kinds of joins and learn use cases for them.

### Loading our data

We can begin by loading our data.

In [None]:
import sqlite3

conn = sqlite3.connect('crm.db')

def build_dataframe(data):
    columns = data['headers']
    values = data['values']
    df = pd.DataFrame(values)
    df.columns = columns
    return df

In [None]:
import pandas as pd
data = {"headers": ["personId", "firstName", "lastName"], "values": [[1, "Allen", "Wang"], [2, "Bob", "Alice"]]}

In [None]:
df = build_dataframe(data)

In [None]:
df.to_sql('persons', conn, if_exists = 'replace', index = False)

In [None]:
pd.read_sql('select * from persons', conn)

Unnamed: 0,personId,firstName,lastName
0,1,Allen,Wang
1,2,Bob,Alice


In [None]:
data = {"headers": ["addressId", "personId", "city", "state"], "values": [[1, 2, "New York City", "New York"], [2, 3, "Leetcode", "California"]]}

In [None]:
df = build_dataframe(data)

In [None]:
df.to_sql('addresses', conn, if_exists = 'replace', index = False)

### Viewing our Data

Now let's take a look at our data.

In [None]:
pd.read_sql('select * from persons', conn)

Unnamed: 0,personId,firstName,lastName
0,1,Allen,Wang
1,2,Bob,Alice


In [None]:
pd.read_sql('select * from addresses', conn)

Unnamed: 0,addressId,personId,city,state
0,1,2,New York City,New York
1,2,3,Leetcode,California


So we can see that the column joining our two tables is the personId column.

### Some queries

Now write a query that will only display rows where both the person *and* the related address exists.  

In [None]:
pd.read_sql('''
select persons.personId,  persons.firstName,  persons.lastName, addressId,addresses.personId, city, state from persons
join addresses on addresses.personId = persons.personId

''', conn)


# personId	firstName	lastName	addressId	personId	city	state
# 0	2	Bob	Alice	1	2	New York City	New York

Unnamed: 0,personId,firstName,lastName,addressId,personId.1,city,state
0,2,Bob,Alice,1,2,New York City,New York


Next write a query that only returns firstName, lastName, city and state.  If the address information is not available for that person, return null for city and state.

In [None]:
pd.read_sql('''
select firstName, lastName, city, state from persons
left join addresses on addresses.personId = persons.personId
''', conn)

# 	firstName	lastName	city	state
# 0	Allen	Wang	None	None
# 1	Bob	Alice	New York City	New York

Unnamed: 0,firstName,lastName,city,state
0,Allen,Wang,,
1,Bob,Alice,New York City,New York


Now there are sometimes that we would like to count up the number of values that are present, or that are not.  

> **Do not** use a case when to perform this.

In [None]:
query = '''
select firstName, lastName, city, state, count(city) as num_of_city, count(state) as num_of_state from persons
left join addresses on addresses.personId = persons.personId

'''

pd.read_sql(query, conn)

# 	firstName	lastName	city	state	num_of_city	num_of_state
# 0	Allen	Wang	None	None	1	1

Unnamed: 0,firstName,lastName,city,state,num_of_city,num_of_state
0,Allen,Wang,,,1,1


And now, do use a case when statement to perform this.

In [None]:
query = '''
select firstName, lastName, city, state, sum(case when state is Null  then 1 else 0 end) as num_of_city, sum(case when city is Null  then 1 else 0 end) as num_of_state from persons
join addresses on addresses.personId = persons.personId
'''

pd.read_sql(query, conn)

# 	firstName	lastName	city	state	num_of_city	num_of_state
# 0	Allen	Wang	None	None	1	1

Unnamed: 0,firstName,lastName,city,state,num_of_city,num_of_state
0,Bob,Alice,New York City,New York,1,1


### Summary

In this lesson, we reviewed the difference between inner joins and left outer joins.  With an inner join, a record is only returned if the primary key and foreign keys are the same.  

With a left outer join, the records on the left table are *always* returned.  And when there is no matching id on the right table, null values are returned.  

In [None]:
pd.read_sql('''select firstName, lastName, city, state from persons left join addresses
            on persons.personId = addresses.personId''', conn)

Unnamed: 0,firstName,lastName,city,state
0,Allen,Wang,,
1,Bob,Alice,New York City,New York


Finally, we saw how we can count the number of present values with a simple `count` statement.  

In [None]:
query = '''select firstName, lastName, city, state, count(city) num_of_city, count(state) num_of_state
from persons left join addresses
on persons.personId = addresses.personId'''

pd.read_sql(query, conn)

Unnamed: 0,firstName,lastName,city,state,num_of_city,num_of_state
0,Allen,Wang,,,1,1


And we saw how we can also perform the same calculation by using a case when statement.

In [None]:
query = '''select firstName, lastName, city, state, 
sum(case when state is null then 1 else 0 end) as num_of_state,
sum(case when state is null then 1 else 0 end) as num_of_city
from persons left join addresses
on persons.personId = addresses.personId'''

pd.read_sql(query, conn)


Unnamed: 0,firstName,lastName,city,state,num_of_state,num_of_city
0,Allen,Wang,,,1,1
