# joins and relationships

## 1. introduction

* joins allows to leverage relationships between entities while doing queries.
* not all joins are supported by every technology (we will see workarounds).

## 2. sample database (same will be used for lab!)

In [43]:
# let's load jupyter sql extension

%load_ext sql
%config SqlMagic.autocommit = False

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [44]:
# load database

%sql sqlite:///data/publications.db

'Connected: @data/publications.db'

getting tables in publications database:

In [46]:
%%sql tables <<

SELECT 
    name
FROM 
    sqlite_master 
WHERE 
    type ='table' AND 
    name NOT LIKE 'sqlite_%';

 * sqlite:///data/publications.db
Done.
Returning data to local variable tables


In [47]:
tables.DataFrame()

Unnamed: 0,name
0,authors
1,discounts
2,employee
3,jobs
4,pub_info
5,publishers
6,roysched
7,sales
8,stores
9,titleauthor


## 3. types of joins

### inner join

In [21]:
%%sql inner <<

SELECT 
pub_name,
count(titles.title_id) as titles_published
FROM publishers
JOIN titles
ON publishers.pub_id = titles.pub_id
GROUP BY pub_name
ORDER BY titles_published DESC;

 * sqlite:///data/publications.db
Done.
Returning data to local variable inner


In [32]:
inner.DataFrame()

Unnamed: 0,pub_name,titles_published
0,Binnet & Hardley,7
1,Algodata Infosystems,6
2,New Moon Books,5


### left join (all elements in the first table of the join will prevail)

In [48]:
%%sql left <<

SELECT 
pub_name,
count(titles.title_id) as titles_published
FROM publishers
LEFT JOIN titles
ON publishers.pub_id = titles.pub_id
GROUP BY pub_name
ORDER BY titles_published DESC;

 * sqlite:///data/publications.db
Done.
Returning data to local variable left


In [49]:
left.DataFrame()

Unnamed: 0,pub_name,titles_published
0,Binnet & Hardley,7
1,Algodata Infosystems,6
2,New Moon Books,5
3,Scootney Books,0
4,Ramona Publishers,0
5,Lucerne Publishing,0
6,GGG&G,0
7,Five Lakes Publishing,0


### right join (not supported by sqlite, just change order of tables)

In [27]:
%%sql right <<

select 
titles.title, 
titles.type, 
titles.price, 
sum(sales.qty) as units_sold
from titles
left join sales
on titles.title_id = sales.title_id
group by titles.title, titles.type, titles.price;

 * sqlite:///data/publications.db
Done.
Returning data to local variable right


In [30]:
right.DataFrame()

Unnamed: 0,title,type,price,units_sold
0,But Is It User Friendly?,popular_comp,22.95,30.0
1,Computer Phobic AND Non-Phobic Individuals: Be...,psychology,21.59,20.0
2,Cooking with Computers: Surreptitious Balance ...,business,11.95,25.0
3,Emotional Security: A New Algorithm,psychology,7.99,25.0
4,Fifty Years in Buckingham Palace Kitchens,trad_cook,11.95,20.0
5,Is Anger the Enemy?,psychology,10.95,108.0
6,Life Without Fear,psychology,7.0,25.0
7,Net Etiquette,popular_comp,,
8,"Onions, Leeks, and Garlic: Cooking Secrets of ...",trad_cook,20.95,40.0
9,Prolonged Data Deprivation: Four Case Studies,psychology,19.99,15.0


### outer join (not supported by sqlite and mysql either)

In [34]:
%%sql outer <<

SELECT 
employee.fname, 
employee.hire_date, 
jobs.job_desc, 
jobs.job_id
FROM jobs
LEFT JOIN employee
on jobs.job_id = employee.job_id
UNION
SELECT 
employee.fname, 
employee.hire_date, 
jobs.job_desc, 
jobs.job_id
FROM employee
LEFT JOIN jobs
on jobs.job_id = employee.job_id;

 * sqlite:///data/publications.db
Done.
Returning data to local variable outer


In [36]:
outer.DataFrame()

Unnamed: 0,fname,hire_date,job_desc,job_id
0,,,New Hire - Job not specified,1
1,Anabela,1993-01-27 00:00:00,Public Relations Manager,8
2,Ann,1991-07-16 00:00:00,Business Operations Manager,3
3,Annette,1990-02-21 00:00:00,Managing Editor,6
4,Aria,1991-10-26 00:00:00,Productions Manager,10
5,Carine,1992-07-07 00:00:00,Sales Representative,13
6,Carlos,1989-04-21 00:00:00,Publisher,5
7,Daniel,1990-01-01 00:00:00,Operations Manager,11
8,Diego,1991-12-16 00:00:00,Managing Editor,6
9,Elizabeth,1990-07-24 00:00:00,Designer,14


## 4. combined queries (with)

In [39]:
%%sql with_example <<

WITH 
employees_custom AS 
(
SELECT *
FROM employee
),
jobs_custom AS
(
SELECT *
FROM jobs
JOIN employees_custom
ON employees_custom.job_id = jobs.job_id
)
SELECT * from jobs_custom;

 * sqlite:///data/publications.db
Done.
Returning data to local variable with_example


In [40]:
with_example.DataFrame()

Unnamed: 0,job_id,job_desc,min_lvl,max_lvl,emp_id,fname,minit,lname,job_id:1,job_lvl,pub_id,hire_date
0,10,Productions Manager,75,165,A-C71970F,Aria,,Cruz,10,87,1389,1991-10-26 00:00:00
1,6,Managing Editor,140,225,A-R89858F,Annette,,Roulet,6,152,9999,1990-02-21 00:00:00
2,3,Business Operations Manager,175,225,AMD15433F,Ann,M,Devon,3,200,9952,1991-07-16 00:00:00
3,8,Public Relations Manager,100,175,ARD36773F,Anabela,R,Domingues,8,100,877,1993-01-27 00:00:00
4,5,Publisher,150,250,CFH28514M,Carlos,F,Hernadez,5,211,9999,1989-04-21 00:00:00
5,13,Sales Representative,25,100,CGS88322F,Carine,G,Schmitt,13,64,1389,1992-07-07 00:00:00
6,11,Operations Manager,75,150,DBT39435M,Daniel,B,Tonini,11,75,877,1990-01-01 00:00:00
7,6,Managing Editor,140,225,DWR65030M,Diego,W,Roel,6,192,1389,1991-12-16 00:00:00
8,14,Designer,25,100,ENL44273F,Elizabeth,N,Lincoln,14,35,877,1990-07-24 00:00:00
9,4,Chief Financial Officier,175,250,F-C16315M,Francisco,,Chang,4,227,9952,1990-11-03 00:00:00


## 5. lab time! let's start together

In [41]:
%%sql challenge_1 <<

select authors.au_id, 
authors.au_lname, 
authors.au_fname, 
titles.title
from titles
JOIN titleauthor
on titles.title_id = titleauthor.title_id
JOIN authors
on authors.au_id = titleauthor.au_id;

 * sqlite:///data/publications.db
Done.
Returning data to local variable challenge_1


In [42]:
challenge_1.DataFrame()

Unnamed: 0,au_id,au_lname,au_fname,title
0,172-32-1176,White,Johnson,Prolonged Data Deprivation: Four Case Studies
1,213-46-8915,Green,Marjorie,The Busy Executive's Database Guide
2,213-46-8915,Green,Marjorie,You Can Combat Computer Stress!
3,238-95-7766,Carson,Cheryl,But Is It User Friendly?
4,267-41-2394,O'Leary,Michael,Cooking with Computers: Surreptitious Balance ...
5,267-41-2394,O'Leary,Michael,"Sushi, Anyone?"
6,274-80-9391,Straight,Dean,Straight Talk About Computers
7,409-56-7008,Bennet,Abraham,The Busy Executive's Database Guide
8,427-17-2319,Dull,Ann,Secrets of Silicon Valley
9,472-27-2349,Gringlesby,Burt,"Sushi, Anyone?"
