### SQL Subqueries and Implementations
### Objectives
- Use SQL subqueries to nest queries
- Identify common SQL dialects and tools
- Query data from web databases

In [1]:
import pandas as pd
import sqlite3

### SQL Subqueries
Like you might nest one function within another in Python, you can nest queries in SQL. We can use a subquery within another query to succinctly implement queries that have multiple query steps.

In [3]:
conn = sqlite3.connect('data/flights.db')

### Subqueries in FROM
You can use a subquery in the FROM clause - this is useful, for example, if you want to apply multiple aggregation functions.

Let say we want to get the average of the number of routes departing from all airports.First we'd need to get the total number of routes departing from all airports, then take the average.

In [17]:
pd.read_sql('''
    SELECT 
    source AS depart_airport,
    COUNT() as number_of_departures
FROM routes
GROUP BY source
''', conn)

Unnamed: 0,depart_airport,number_of_departures
0,AAE,9
1,AAL,20
2,AAN,2
3,AAQ,3
4,AAR,8
...,...,...
3404,ZUH,60
3405,ZUM,2
3406,ZVK,3
3407,ZYI,15


We can use this query as a subquery, and take the average of the new number_of_departures column.

In [18]:
q = ''' 
SELECT AVG(number_of_departures) 
FROM
 (SELECT 
    source AS depart_airport,
    COUNT() as number_of_departures
FROM routes
GROUP BY source)

'''
pd.read_sql(q,conn)

Unnamed: 0,AVG(number_of_departures)
0,19.848343


#### Subqueries acting as Tables


### Subqueries in Where
You can use a subquery in the WHERE clause - this is useful, for example, if you want to filter a query based on results from another query.

Let's say that we want to get a table with all of the departures and destinations for the flight routes, but I only want to include flights departing from the five countries with the most airports.


In [32]:
q = ''' 
SELECT country,count() AS number_of_airports_in_country
FROM airports 
GROUP BY
     COUNTRY
ORDER BY number_of_airports_in_country  DESC
LIMIT 5


'''
pd.read_sql(q,conn)

Unnamed: 0,country,number_of_airports_in_country
0,United States,1697
1,Canada,435
2,Germany,321
3,Australia,263
4,Russia,249


In [39]:
q = ''' 
SELECT source,dest,ap.country
FROM routes rt
JOIN 
     airports ap
ON rt.source_id = ap.id
WHERE country in (
SELECT country 
FROM airports 
GROUP BY
     COUNTRY
ORDER BY count()  DESC
LIMIT 5
)

'''
pd.read_sql(q,conn)

Unnamed: 0,source,dest,country
0,YAM,YQT,Canada
1,YAM,YSB,Canada
2,YAM,YTZ,Canada
3,YAM,YYZ,Canada
4,YAY,YBX,Canada
...,...,...,...
20330,ULK,YKS,Russia
20331,ULK,YKS,Russia
20332,BQB,ALH,Australia
20333,BQB,PER,Australia


### Level Up: Common Table Expressions
Common Table Expressions (CTEs) are a more readable way to implement subqueries, using WITH and AS.

In [42]:
q = '''
WITH top_5_countries AS(
SELECT country 
FROM airports 
GROUP BY
     COUNTRY
ORDER BY count()  DESC
LIMIT 5
)
SELECT source,dest,ap.country
FROM routes rt
JOIN 
     airports ap
ON rt.source_id = ap.id
WHERE country in top_5_countries
'''
pd.read_sql(q,conn)

Unnamed: 0,source,dest,country
0,YAM,YQT,Canada
1,YAM,YSB,Canada
2,YAM,YTZ,Canada
3,YAM,YYZ,Canada
4,YAY,YBX,Canada
...,...,...,...
20330,ULK,YKS,Russia
20331,ULK,YKS,Russia
20332,BQB,ALH,Australia
20333,BQB,PER,Australia


### Exercise
Create a table listing all airlines that serve the three airports with the most outbound routes.

In [63]:
q='''

WITH top_3_airports AS 
    (SELECT airports.id 
     FROM airports 
     LEFT JOIN routes
        ON routes.source_id = airports.id
     GROUP BY airports.id
     ORDER BY COUNT() DESC
     LIMIT 3) 

SELECT DISTINCT
    rt.airline
FROM
    routes AS rt
LEFT JOIN airports AS ap
    ON rt.source_id = ap.id
WHERE rt.source_id IN top_3_airports

'''
pd.read_sql(q,conn)

Unnamed: 0,airline
0,3E
1,3M
2,3U
3,5J
4,8L
...,...
97,WN
98,WS
99,Y4
100,Y7


### SQL Versions
The is no one version of SQL - there are many versions out there! What you're learning about SQL with SQLite will apply to all of them. Just keep in mind when you apply for jobs that you may see any of these listed in any given job posting, and they are all just different versions of what you know.

### SQL Dialects
As with dialects of spoken languages, SQL dialects have many commonalities but some differences in syntax and functionality. Here are a few of the major players:

### SQLite (we've already seen this!)
PostgreSQL (free and open-source!)
Oracle SQL
MySQL (half open-souce, half Oracle)
Microsoft SQL Server
Transact-SQL (extends MS SQL)
SQLite Pros & Cons
We use SQLite in this course, but it has some limitations.

### Pros
Easy to set up
Easy to share database files
Uses little memory
### Cons
Limited functionality for managing users and access permissions
Not "thread safe": two edits at the same time can mess up your data

### Exercise 1: Create a table showing the number of listings in each neighborhood

![](images/NoofNeighbourhoods.PNG)

### Exercise 2: Create a table showing the 20 listings with the most reviews
![](images/Mostreviews.PNG)

### Exercise 3: Create a table showing all of the reviews for listings that are "Bed & Breakfast" property types.
![](images/allreviewsFor_bedBreakfast.PNG)