# Set Operations

Remember your high school algebra?  Venn Diagrams?  Well we can put some of that to good use to answer some interesting queries

## Union Compatible

All set operations must be done on relations (tables) that are union compatible.  This means that the tables must have
* The same number of columns
* The same column names
* The data in the columns must be the same type.

![](union.png)

## Union

The first set operation is the Union.  This one is easy, it is simple the joining of two union compatible relations.  It is useful when you want to include two groups of things in the same result, but it would be very hard or impossible to accomplish with a single query.  For example:

**We want a list of the employees that have had three or more titles OR that have had more than 10 raises.**

Each of these questions is easy to answer by itself, but probably impossible to do with a single query by joining.


**Part 1 -- get the emp_nos of the employes with 3 or more titles**


In [12]:
%%sql

select emp_no, count(*)
from titles
group by emp_no
having count(*) >= 3
limit 10

10 rows affected.


emp_no,count
10009,3
10066,3
10258,3
10451,3
10571,3
10612,3
10628,3
10634,3
11003,3
11027,3


**Part 2 -- get the emp_nos of the employees with more than 10 raises**


In [13]:
%%sql

select emp_no, count(*)
from salaries
group by emp_no
having count(*) > 10
limit 10

10 rows affected.


emp_no,count
10001,17
10004,16
10005,13
10006,12
10007,14
10009,18
10013,17
10018,16
10021,15
10025,11


Notice how similar the two queries are.  Now we can answer the question by using union to glue the two results together.

In [14]:
%%sql

select emp_no, count(*)
from salaries
group by emp_no
having count(*) > 10
    UNION
select emp_no, count(*)
from titles
group by emp_no
having count(*) >= 3
limit 10

10 rows affected.


emp_no,count
10001,17
10004,16
10005,13
10006,12
10007,14
10009,3
10009,18
10013,17
10018,16
10021,15


Finally we can take the union query and put it into the from clause of an outer query so that we can show the first and last names.

In [None]:
select first_name, last_name
from
    (select emp_no, count(*)
    from salaries
    group by emp_no
    having count(*) > 10
    union
    select emp_no, count(*)
    from titles
    group by emp_no
    having count(*) >= 3) as T natural join employees
order by last_name, first_name
limit 10

## Intersection

Intersection, which is the same as natural join, is also useful in the above situtaion.  Suppose we wanted to find the employes that have had 3 or more titles AND more than 10 raises.  Here we would use set intersection.  But it is important that we only look at the employee numbers, not the employee numbers and the count as in order for something to be in the intersection ALL of the values in the row in the first relation must match ALL of the values in the row in the second relation.

![](intersect.png)


In [18]:
%%sql
select emp_no
from salaries
group by emp_no
having count(*) > 5
    intersect
select emp_no
from titles
group by emp_no
having count(*) >= 3
limit 10


10 rows affected.


emp_no
10009
10066
10258
10451
10571
10612
10628
10634
11003
11027


For comparison, lets do this as a natural join too.

In [20]:
%%sql

select emp_no
from
    (select emp_no
    from salaries
    group by emp_no
    having count(*) > 5) as A
    natural join
    (select emp_no
    from titles
    group by emp_no
    having count(*) >= 3) as B
limit 10

10 rows affected.


emp_no
10009
10066
10258
10451
10571
10612
10628
10634
11003
11027


In [19]:
%%sql

select first_name, last_name
from
    (select emp_no
    from salaries
    group by emp_no
    having count(*) > 10
    intersect
    select emp_no
    from titles
    group by emp_no
    having count(*) >= 3) as T natural join employees
order by last_name, first_name
limit 10


10 rows affected.


first_name,last_name
Sigeru,Aamodt
Raimond,Acton
Aram,Adachi
Sungwon,Adachi
Jessie,Akaboshi
Moon,Akaboshi
Tetsurou,Akazan
Fumitaka,Akiyama
Troy,Akiyama
Lenore,Alameldin


## Set Difference

* "but not"
* except

In my personal opinion, the set difference is the "funnest" relation operator.  It lest you ask lots of interesting questions.  

We want all of the things in one table minus the things that are in the other table.

![](except.png)

In [7]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [8]:
%sql postgresql://millbr02:@localhost/employees

'Connected: millbr02@employees'

**Find all of the employees who have not worked for 'Research'**


Lets restate that as find all of the employees minus the employees that have worked for 'Research'

**Part 1 -- Find all employees**

In [21]:
%%sql

select emp_no from employees 
limit 10

10 rows affected.


emp_no
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010


**Part 2 -- Find the employees that have worked for research**

In [22]:
%%sql

select emp_no from departments natural join dept_emp where dept_name = 'Research'
limit 10

10 rows affected.


emp_no
10007
10015
10019
10040
10046
10052
10064
10070
10082
10094


Now we can use the except keyword to do "all employees" minus "employees who worked in Research"

In [24]:
%%sql

(select emp_no from employees)
 except 
(select emp_no from departments natural join dept_emp where dept_name = 'Research') 

limit 10

10 rows affected.


emp_no
10001
10002
10003
10004
10005
10006
10008
10009
10010
10011


In [25]:
%%sql

select first_name, last_name
from 
    ((select emp_no from employees)
     except 
    (select emp_no from departments natural join dept_emp where dept_name = 'Research') ) as T
    natural join employees
order by last_name, first_name
limit 10;



10 rows affected.


first_name,last_name
Abdelkader,Aamodt
Adhemar,Aamodt
Aemilian,Aamodt
Alagu,Aamodt
Aleksander,Aamodt
Alexius,Aamodt
Alois,Aamodt
Aluzio,Aamodt
Anestis,Aamodt
Anoosh,Aamodt
