### SQL JOIN
The SQL `JOIN` clasue is the main way that you will write queries that combine data from multiple tables.
Continuing with the example above, let's say we want to handle payroll for all regular employees. To do that, we need to know their name, pay, and manager name.

If we just SELECT * from the employees table, it will look like this:

In [11]:
import sqlite3
import pandas as pd

conn = sqlite3.connect('data/payroll.db')
#Works as well
# cur = conn.cursor()
# cur.execute(''' 
# SELECT * from employees
# ''').fetchall()
q= '''
SELECT * from employees
'''
pd.read_sql(q,conn)

Unnamed: 0,id,name,pay,manager_id
0,1,Bob,3000.0,1
1,2,Karen,4000.0,1
2,3,Patrick,4000.0,2


Then we could manually query for each manager id:

In [14]:
q= '''
SELECT name from managers
WHERE id=1
'''
pd.read_sql(q,conn)

Unnamed: 0,name
0,Steve


In [15]:
q= '''
SELECT name from managers
WHERE id=2
'''
pd.read_sql(q,conn)

Unnamed: 0,name
0,Spongebob


That works, but it's annoying. Again, you can imagine that not scaling well to hundreds or thousands of employees.

With a SQL join, we can do it all at once:

In [16]:
q= '''
SELECT * from employees
JOIN managers
ON employees.manager_id = managers.id
'''
pd.read_sql(q,conn)

Unnamed: 0,id,name,pay,manager_id,id.1,name.1,pay.1
0,1,Bob,3000.0,1,1,Steve,7000.0
1,2,Karen,4000.0,1,1,Steve,7000.0
2,3,Patrick,4000.0,2,2,Spongebob,10000.0


Great, all of the information in one table!

Well, that has everything we want, plus some extra information. It's confusing that we have name and pay in there twice. Since we are trying to manage regular employee payroll, we probably only want the pay for those employees, and we should figure out a way to distinguish between the employee's name and the manager's name.

Most of the time when you have a JOIN, you want to specify which columns you actually want, instead of SELECT *. Something like this, using aliases to make everything really clear:

In [25]:
q= '''
SELECT 
employees.name AS employee_name,
employees.pay AS employee_pay,
managers.name AS managers_name
FROM employees 
JOIN managers
ON employees.manager_id = managers.id
'''
pd.read_sql(q,conn)

Unnamed: 0,employee_name,employee_pay,managers_name
0,Bob,3000.0,Steve
1,Karen,4000.0,Steve
2,Patrick,4000.0,Spongebob


Perfect! Now we have a nice, maintainable system, and we are able to pull exactly the data needed for this task.
### SQL Subqueries
Another more-advanced technique we will introduce in this section is a SQL subquery. The above query, rewritten to use a subquery instead of JOIN, would be:

In [38]:
q = '''
SELECT 
name AS employee_name,
pay AS employee_pay,
    (
        SELECT name
        FROM managers
        WHERE managers.id = employees.manager_id
    ) AS Manager_name
FROM employees
'''
pd.read_sql(q,conn)

Unnamed: 0,employee_name,employee_pay,Manager_name
0,Bob,3000.0,Steve
1,Karen,4000.0,Steve
2,Patrick,4000.0,Spongebob
