# Nested Queries, Type I Subquery

Nested queries are **subqueries** that exist within a larger (aka _outer_) query.

**Conceptual Type I / II Subquery**
![Subquery](../images/subquery-syntax.gif)


# Type I - Uncorrelated Subquery
A uncorrelated subquery is a type of subquery where inner query doesn’t depend upon the outer query for its execution. It can complete its execution as a standalone query. Let us explain uncorrelated subqueries with the help of an example.

Suppose, you have database “dsa_ro” which has a single tables we are concerned with: cities.  Now, suppose we want to report the City and Country with the lowest and highest population.

The subquery used in this case will be uncorrelated subquery since the inner query will retrieve the populations of the lowest and highest populated cities; the result of this inner query will be directly fed into the outer query which retrieves City and Country for those populations within the city table. 

The inner query which retrieves the population of the cities can executed as standalone query as well.

Let us see this in action! 

# Use-Case

Imagine you are asked to report the City and Country from the `cities` table with the lowest and highest population. 

How would you do this?  
We could first find the MIN() and MAX() of the populations, then secondly construct a query to use those values to select cities.

In [1]:
%load_ext sql
%sql postgres://dsa_ro_user:readonly@pgsql.dsa.lan/dsa_ro

'Connected: dsa_ro_user@dsa_ro'

In [2]:
%sql SELECT * FROM cities LIMIT 5;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
5 rows affected.


city,country,population
Shanghai,China,22315500
Karachi,Pakistan,13052000
Mumbai,India,12691800
Beijing,China,11716600
Istanbul,Turkey,11174300


In [3]:
%sql SELECT MIN(population) FROM cities;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
1 rows affected.


min
1001600


In [4]:
%sql SELECT MAX(population) FROM cities;

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
1 rows affected.


max
22315500


We should find the following values:
 * Minimum is 1001600
 * Maximum is 22315500

**NOTE the `%%sql` to use a multi-line statement**

In [5]:
%%sql 
SELECT city, country, population 
FROM cities
WHERE population in (1001600,22315500)
ORDER BY population

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
2 rows affected.


city,country,population
Odessa,Ukraine,1001600
Shanghai,China,22315500


Notice that to get our answer, we constructed a set of values, `(1001600, 22315500)`, and tested each row to have the population value be one of those two values.

This query could also have been written as 

```SQL
SELECT city, country, population 
FROM cities
WHERE population = 1001600
  OR  population = 22315500
ORDER BY population
```

The nested query allows us to use a query within the parentheses to generate a list.

In [6]:
%%sql 
SELECT city, country, population 
FROM cities
WHERE population = (SELECT MIN(population) FROM cities)
  OR  population = (SELECT MAX(population) FROM cities)
ORDER BY population

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
2 rows affected.


city,country,population
Odessa,Ukraine,1001600
Shanghai,China,22315500


 --   Alternatively  --

In [7]:
%%sql 
SELECT city, country, population 
FROM cities
WHERE population in ( 
    (SELECT MIN(population) FROM cities), (SELECT MAX(population) FROM cities) 
    )
ORDER BY population

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
2 rows affected.


city,country,population
Odessa,Ukraine,1001600
Shanghai,China,22315500


## Type I Subqueries

When the subqueries can be computed **one time**, then the result reused for each row of the _outer_ query, we have a Type I (one).
In contrast, some nested queries must be run for each row of the outer query.

Looking at the plan the database develops for the query, we see two `InitPlan` queries.

The queries are _uncorrelated_ to the output query rows.

In [8]:
%%sql 
EXPLAIN
SELECT city, country, population 
FROM cities
WHERE population in ( 
    (SELECT MIN(population) FROM cities), (SELECT MAX(population) FROM cities) 
    )
ORDER BY population

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
10 rows affected.


QUERY PLAN
Sort (cost=22.23..22.23 rows=2 width=20)
Sort Key: cities.population
InitPlan 1 (returns $0)
-> Aggregate (cost=7.40..7.41 rows=1 width=4)
-> Seq Scan on cities cities_1 (cost=0.00..6.52 rows=352 width=4)
InitPlan 2 (returns $1)
-> Aggregate (cost=7.40..7.41 rows=1 width=4)
-> Seq Scan on cities cities_2 (cost=0.00..6.52 rows=352 width=4)
-> Seq Scan on cities (cost=0.00..7.40 rows=2 width=20)
"Filter: (population = ANY (ARRAY[$0, $1]))"


You can see that the `InitPlan`s each store their values into a variable, `$0` and `$1`, respectively.  

These values are then used in the sequential table scan and the test of `population IN ($0,$1)`, written in the plan as 
```
Filter: (population = ANY (ARRAY[0,1]))
```

**NOTE**: In depth discussion of plans is in the next module.

**Now run the SQL command!**

In [9]:
%%sql 
SELECT city, country, population 
FROM cities
WHERE population in ( 
    (SELECT MIN(population) FROM cities), (SELECT MAX(population) FROM cities) 
    )
ORDER BY population

 * postgres://dsa_ro_user:***@pgsql.dsa.lan/dsa_ro
2 rows affected.


city,country,population
Odessa,Ukraine,1001600
Shanghai,China,22315500


# Save your Notebook, then `File > Close and Halt`