In [18]:
import warnings
warnings.filterwarnings('ignore')

In [19]:
%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [20]:
%sql postgresql://millbr02:@localhost/employees

'Connected: millbr02@employees'

# Nested Queries

It is perfectly legal to embed one select statement inside another.  In fact a select statement can be used anywhere a table with a name can be used.   Also, queries that return just a single value can be used in a comparison expression.

Selects can be embeded in:

* The from clause   --- Must have an alias
* The where clause
* A having clause



First lets look at a couple of dumb examples.  There is no good reason to use nested queries for this as we can do it easier with a simple join.

In [22]:
%%sql

select emp_no
from (select emp_no, first_name, last_name from dept_emp natural join employees) as foo
limit 10;

10 rows affected.


emp_no
10001
10002
10003
10004
10005
10006
10007
10008
10009
10010


In [12]:
%%sql

select first_name, last_name 
from employees 
where emp_no in (select emp_no from salaries where salary < 39000)
order by last_name, first_name


21 rows affected.


first_name,last_name
Denny,Assaf
Olivera,Baek
Vishu,Biran
Younwoo,Champarnaud
Bernd,Copas
Shahaf,England
Zhiguo,Kobuchi
Chuanyi,Kuhnemann
Mechthild,Langford
Pascal,Lueh


Of course this is a bad way to do it...

In [14]:
%%sql
select first_name, last_name
from employees natural join (select emp_no from salaries where salary < 39000 ) as foo
order by 

22 rows affected.


first_name,last_name
Monique,Reinhard
Yurij,Narwekar
Jacqueline,Syang
Fumiya,Unno
Fumiya,Unno
Vishu,Biran
Bernd,Copas
Mechthild,Langford
Yagil,Perri
Shahaf,England


In [13]:
%%sql

select first_name, last_name
from employees natural join salaries 
where salary < 39000
order by last_name, first_name

22 rows affected.


first_name,last_name
Denny,Assaf
Olivera,Baek
Vishu,Biran
Younwoo,Champarnaud
Bernd,Copas
Shahaf,England
Zhiguo,Kobuchi
Chuanyi,Kuhnemann
Mechthild,Langford
Pascal,Lueh


## Revisiting the relational algebra extend

* Add a column that corresponds to decade
* Add a constant column

In [27]:
%%sql

select date_part('year', hire_date)::integer, hire_date, birth_date from employees limit 10;

10 rows affected.


date_part,hire_date,birth_date
1986,1986-06-26,1953-09-02
1985,1985-11-21,1964-06-02
1986,1986-08-28,1959-12-03
1986,1986-12-01,1954-05-01
1989,1989-09-12,1955-01-21
1989,1989-06-02,1953-04-20
1989,1989-02-10,1957-05-23
1994,1994-09-15,1958-02-19
1985,1985-02-18,1952-04-19
1989,1989-08-24,1963-06-01


In [41]:
%%sql

select date_part('year', hire_date)::integer/10*10 decade, hire_date, birth_date from employees limit 10;

10 rows affected.


decade,hire_date,birth_date
1980,1986-06-26,1953-09-02
1980,1985-11-21,1964-06-02
1980,1986-08-28,1959-12-03
1980,1986-12-01,1954-05-01
1980,1989-09-12,1955-01-21
1980,1989-06-02,1953-04-20
1980,1989-02-10,1957-05-23
1990,1994-09-15,1958-02-19
1980,1985-02-18,1952-04-19
1980,1989-08-24,1963-06-01


In [30]:
%sql select 'hello' greeting, birth_date, hire_date from employees limit 10;

10 rows affected.


greeting,birth_date,hire_date
hello,1953-09-02,1986-06-26
hello,1964-06-02,1985-11-21
hello,1959-12-03,1986-08-28
hello,1954-05-01,1986-12-01
hello,1955-01-21,1989-09-12
hello,1953-04-20,1989-06-02
hello,1957-05-23,1989-02-10
hello,1958-02-19,1994-09-15
hello,1952-04-19,1985-02-18
hello,1963-06-01,1989-08-24


* How old were you when you were hired?
* What is the average age at the time of hire?

In [40]:
%%sql

select (hire_date - birth_date)/365 from employees limit 10;

10 rows affected.


?column?
32
21
26
32
34
36
31
36
32
26


In [37]:
%%sql

select avg(hire_age) from (select (hire_date - birth_date)/365 hire_age from employees limit 10) T

1 rows affected.


avg
30.6


### Who has held the most titles?

In [24]:
%%sql

select emp_no, count(*)
from titles
group by emp_no
order by count(*) desc
limit 10;

(psycopg2.ProgrammingError) aggregate function calls cannot be nested
LINE 1: select emp_no, max(count(*))
                           ^
 [SQL: 'select emp_no, max(count(*))\nfrom titles\ngroup by emp_no\norder by count(*) desc\nlimit 10;']


OK, this gives us the emp_no's of the employees with the most.  How can we use this to find the names?   A nested query!

In [19]:
%%sql
select first_name, last_name, hire_date
from employees natural join (select emp_no, count(*)
    from titles
    group by emp_no
    order by count(*) desc
    limit 10) as foo

10 rows affected.


first_name,last_name,hire_date
Fun,Varman,1985-02-24
Nahid,Chepyzhov,1990-11-11
Basil,Ishibashi,1985-05-17
Bodh,Ranta,1988-03-12
Gil,Peroz,1991-09-02
Krisda,Krogh,1985-09-11
Kwee,Schusler,1986-02-26
Sumant,Peac,1985-02-18
Remzi,Cappello,1986-05-07
Mariangiola,Gulla,1987-05-24


We can even include columns from the nested query in our final result as follows:


In [20]:
%%sql

select first_name, last_name, hire_date, tcount
from employees natural join (select emp_no, count(*) tcount
    from titles
    group by emp_no
    order by count(*) desc
    limit 10) as foo
order by hire_date;




10 rows affected.


first_name,last_name,hire_date,tcount
Sumant,Peac,1985-02-18,3
Fun,Varman,1985-02-24,3
Basil,Ishibashi,1985-05-17,3
Krisda,Krogh,1985-09-11,3
Kwee,Schusler,1986-02-26,3
Remzi,Cappello,1986-05-07,3
Mariangiola,Gulla,1987-05-24,3
Bodh,Ranta,1988-03-12,3
Nahid,Chepyzhov,1990-11-11,3
Gil,Peroz,1991-09-02,3


Find all of the emplyees having max count.

In [27]:
%%sql

select first_name, last_name, hire_date, tcount
from employees natural join (select emp_no, count(*) tcount
    from titles
    group by emp_no
    having count(*) >= (select max(count) from (select emp_no, count(*) from titles group by emp_no) as bar)
    order by count(*) desc
    ) as foo
order by hire_date;




3014 rows affected.


first_name,last_name,hire_date,tcount
Yakkov,Impagliazzo,1985-02-02,3
Taiji,Kemmerer,1985-02-02,3
Ayonca,Bruckman,1985-02-03,3
Shih,Auria,1985-02-04,3
Martine,Plotkin,1985-02-04,3
Panayotis,Thisen,1985-02-05,3
Shahab,Rotem,1985-02-05,3
Fumitaka,Akiyama,1985-02-05,3
Hilary,Snyers,1985-02-05,3
Tadahiro,Beeson,1985-02-06,3


### Find all employees who have salaries above average.

In [17]:
%%sql

select first_name, last_name, salary 
from employees natural join salaries
where now() < to_date and salary > (select avg(salary) from salaries where now() < to_date )
limit 10;

10 rows affected.


first_name,last_name,salary
Kaijung,Yeung,80115
Brewster,Birdsall,105918
Conrado,Riexinger,79049
Magy,Ligten,89759
Garnet,Highland,116711
Mayuri,Strooper,82313
Boguslaw,Cardazo,75904
Huican,Muhling,77293
Lillian,Godskesen,76663
Kenroku,Tetzlaff,74151


## Super Bonus Query

### Find all employees who have salaries above average for their department -- Not for the faint of heart!!

First lets figure out the query to calculate the average salary for a dept_no


In [34]:
%%sql

select dept_no, avg(salary)
from dept_emp natural join employees join salaries on employees.emp_no = salaries.emp_no
group by dept_no

9 rows affected.


dept_no,avg
d006,57251.27191341599
d007,80667.6057553377
d003,55574.879369695525
d001,71913.20000419153
d004,59605.482461651445
d002,70489.36489699609
d008,59665.1817012686
d005,59478.90116243182
d009,58770.36647976248


In [16]:
%%sql

select dept_name, first_name, last_name, salary
from (departments natural join dept_emp natural join employees join 
    (select emp_no, salary from salaries where now() < to_date) as bar 
        on employees.emp_no = bar.emp_no) as foo
where  salary > (select avg from 
                   (select dept_no, avg(salary)
                    from dept_emp natural join employees join salaries on employees.emp_no = salaries.emp_no
                    where foo.dept_no = dept_no and now() < salaries.to_date
                    group by dept_no) x )
limit 10

10 rows affected.


dept_name,first_name,last_name,salary
Development,Georgi,Facello,88958
Production,Chirstian,Koblick,74057
Human Resources,Kyoichi,Maliniak,94692
Research,Tzvetan,Zielinski,88070
Quality Management,Sumant,Peac,94409
Production,Duangkaew,Piveteau,80324
Quality Management,Duangkaew,Piveteau,80324
Human Resources,Eberhardt,Terkki,68901
Marketing,Cristinel,Bouloucos,99651
Production,Kazuhide,Peha,84672


Notice the foo.dept_no = dept_no in the nested query in the where clause.  This is called a nested correlated query.  Tricky, and they are usually pretty cpu intensive as the inner query must be run for each row of the outer.