<img src = "https://images2.imgbox.com/60/09/VFwl5LOq_o.jpg" width="400">

# 3. Correlated Queries, Nested Queries, and Common Table Expressions
---

In this chapter, you will learn how to use nested and correlated subqueries to extract more complex data from a relational database. You will also learn about common table expressions and how to best construct queries using multiple common table expressions.

In [None]:
# %pip install ipython-sql

In [1]:
%load_ext sql

In [2]:
%sql sqlite:///data/soccer.db

'Connected: @data/soccer.db'

## Basic Correlated Subqueries
---

Correlated subqueries are subqueries that reference one or more columns in the main query. Correlated subqueries depend on information in the main query to run, and thus, cannot be executed on their own. Correlated subqueries are evaluated in SQL once per row of data retrieved -- a process that takes a lot more computing power and time than a simple subquery.

**Simple Subquery**
- Can be run *independently* from the main query
- Evaluated once in the whole query

**Correlated Subquery**
- *Dependent* on the main query to execute
- Evaluated in loops ( Significantly slows down query runtime )


In this exercise, you will practice using correlated subqueries to examine matches with scores that are extreme outliers for each country -- above 3 times the average score!

### Instructions

Select the `country_id`, `date`, `home_goal`, and `away_goal` columns in the main query.

Complete the `AVG` value in the subquery.

Complete the subquery column references, so that `country_id` is matched in the main and subquery.

In [3]:
%%sql

SELECT main.country_id,
       main.date,
       main.home_goal,
       main.away_goal
FROM   match AS main
WHERE  ( home_goal + away_goal ) > 
            (SELECT AVG(( sub.home_goal + sub.away_goal ) * 3)            
            FROM   match AS sub
            WHERE  main.country_id = sub.country_id)

 * sqlite:///data/soccer.db
Done.


country_id,date,home_goal,away_goal
1729,2011-08-28 00:00:00,8,2
1729,2012-12-29 00:00:00,7,3
1729,2013-05-19 00:00:00,5,5
1729,2013-12-14 00:00:00,6,3
1729,2014-03-22 00:00:00,3,6
1729,2014-08-30 00:00:00,3,6
4769,2011-10-15 00:00:00,5,3
4769,2011-12-21 00:00:00,4,4
4769,2012-02-12 00:00:00,4,5
4769,2012-02-25 00:00:00,4,4


## Correlated subquery with multiple conditions
---

Correlated subqueries are useful for matching data across multiple columns. In the previous exercise, you generated a list of matches with extremely high scores for each country. In this exercise, you're going to add an additional column for matching to answer the question -- what was the highest scoring match for each country, in each season?

**Note: this query may take a while to load.**

### Instructions

Select the `country_id`, `date`, `home_goal`, and `away_goal` columns in the main query.

Complete the subquery: Select the matches with the highest number of total goals.

Match the subquery to the main query using `country_id` and `season`.

Fill in the correct logical operator so that total goals equals the max goals recorded in the subquery.

In [5]:
%%sql

SELECT main.country_id,
       main.date,
       main.home_goal,
       main.away_goal
FROM   match AS main
WHERE  ( home_goal + away_goal ) = 
            (SELECT MAX(sub.home_goal + sub.away_goal)
             FROM   match AS sub
             WHERE  main.country_id = sub.country_id
                    AND main.season = sub.season) 

 * sqlite:///data/soccer.db
Done.


country_id,date,home_goal,away_goal
1,2012-11-17 00:00:00,2,6
1,2012-12-09 00:00:00,1,7
1,2013-01-19 00:00:00,2,6
1,2012-08-19 00:00:00,2,6
1,2014-04-19 00:00:00,2,4
1,2014-04-26 00:00:00,4,2
1,2015-01-17 00:00:00,1,7
1,2014-09-13 00:00:00,3,5
1729,2011-08-28 00:00:00,8,2
1729,2012-12-29 00:00:00,7,3


## Nested simple subqueries
---

Nested subqueries can be either simple or correlated.

Just like an unnested subquery, a nested subquery's components can be executed independently of the outer query, while a correlated subquery requires both the outer and inner subquery to run and produce results.

In this exercise, you will practice creating a nested subquery to examine the highest total number of goals in each season, overall, and during July across all seasons.

### Instructions

Complete the main query to select the season and the max total goals in a match for each season. Name this `max_goals`.

Complete the first simple subquery to select the max total goals in a match across all seasons. Name this `overall_max_goals`.

Complete the nested subquery to select the maximum total goals in a match played in July across all seasons.

Select the maximum total goals in the outer subquery. Name this entire subquery `july_max_goals`.

`SELECT  season,
         MAX( home_goal + away_goal )                     AS max_goals,
        (SELECT MAX( home_goal + away_goal ) FROM match)  AS overall_max_goals,
        (SELECT MAX( home_goal + away_goal )
         FROM   match
         WHERE  id IN 
                (SELECT id FROM   match
                 WHERE  Extract(month FROM date) = 07))   AS july_max_goals
 FROM   match
 GROUP  BY season`

## Nest a subquery in FROM
---

What's the average number of matches per season where a team scored 5 or more goals? How does this differ by country?

Let's use a nested, correlated subquery to perform this operation. In the real world, you will probably find that nesting multiple subqueries is a task you don't have to perform often. In some cases, however, you may find yourself struggling to properly group by the column you want, or to calculate information requiring multiple mathematical transformations (i.e., an `AVG` of a `COUNT`).

Nesting subqueries and performing your transformations one step at a time, adding it to a subquery, and then performing the next set of transformations is often the easiest way to yield accurate information about your data. Let's get to it!

### Instructions

Generate a list of matches where **at least one team scored 5 or more goals.**

In [15]:
%%sql

SELECT country_id,
       season,
       id
FROM   match
WHERE  home_goal >= 5
        OR away_goal >= 5 
LIMIT 20

 * sqlite:///data/soccer.db
Done.


country_id,season,id
1,2012/2013,999
1,2012/2013,1029
1,2012/2013,1045
1,2012/2013,1046
1,2012/2013,1059
1,2012/2013,1077
1,2012/2013,1080
1,2012/2013,1083
1,2012/2013,1120
1,2012/2013,1188


Turn the query from the previous step into a *subquery* in the `FROM` statement.

`COUNT` the match `id`s generated in the previous step, and group the query by `country_id` and `season`.

In [16]:
%%sql

SELECT country_id,
       season,
       COUNT(id) AS matches
FROM   (SELECT country_id,
               season,
               id
        FROM   match
        WHERE  home_goal >= 5
                OR away_goal >= 5) AS subquery
GROUP  BY country_id,
          season

 * sqlite:///data/soccer.db
Done.


country_id,season,matches
1,2012/2013,12
1,2014/2015,11
1729,2011/2012,20
1729,2012/2013,15
1729,2013/2014,14
1729,2014/2015,11
4769,2011/2012,6
4769,2012/2013,5
4769,2013/2014,8
4769,2014/2015,13


Finally, declare the same query from step 2 as a subquery in `FROM` with the alias `outer_s`.

Left join it to the `country` table using the outer query's `country_id` column.

Calculate an `AVG` of high scoring `matches` per country in the main query.

In [17]:
%%sql

SELECT c.NAME  AS country,
       AVG(id) AS avg_seasonal_high_scores
FROM   country AS c
       LEFT JOIN (SELECT country_id,
                         season,
                         COUNT(id) AS matches
                  FROM   (SELECT country_id,
                                 season,
                                 id
                          FROM   match
                          WHERE  home_goal >= 5
                                  OR away_goal >= 5) AS inner_s
                  GROUP  BY country_id,
                            season) AS outer_s
              ON c.id = outer_s.country_id
GROUP  BY country 

 * sqlite:///data/soccer.db
Done.


country,avg_seasonal_high_scores
Belgium,1.0
England,1729.0
France,4769.0
Germany,7809.0
Italy,10257.0
Netherlands,13274.0
Poland,15722.0
Portugal,17642.0
Scotland,19694.0
Spain,21518.0


## Clean up with CTEs
---

In chapter 2, you generated a list of countries and the number of matches in each country with more than 10 total goals. The query in that exercise utilized a subquery in the `FROM` statement in order to filter the matches before counting them in the main query. Below is the query you created:

`SELECT c.name        AS country,
        COUNT(sub.id) AS matches
 FROM   country AS c
        INNER JOIN (SELECT country_id,
                           id
                    FROM   match
                    WHERE  ( home_goal + away_goal ) >= 10) AS sub
                ON c.id = sub.country_id
 GROUP  BY country `

You can list one (or more) subqueries as **common table expressions** (CTEs) by *declaring* them ahead of your main query, which is an excellent tool for organizing information and placing it in a logical order.

Why use CTEs?
- Executed once ( CTE is then stored in memory, Improves query performance)
- Improving organization of queries
- Referencing other CTEs
- Referencing itself (`SELF JOIN`)


In this exercise, let's rewrite a similar query using a CTE.


### Instructions

Complete the syntax to declare your CTE.

Select the `country_id` and match `id` from the `match table` in your CTE.

Left join the CTE to the league table using `country_id`.

In [18]:
%%sql

WITH match_list
     AS (SELECT country_id,
                id
         FROM   match
         WHERE  ( home_goal + away_goal ) >= 10)
SELECT l.NAME               AS league,
       COUNT(match_list.id) AS matches
FROM   league AS l
       LEFT JOIN match_list
              ON l.id = match_list.country_id
GROUP  BY l.NAME 

 * sqlite:///data/soccer.db
Done.


league,matches
Belgium Jupiler League,0
England Premier League,6
France Ligue 1,0
Germany 1. Bundesliga,2
Italy Serie A,0
Netherlands Eredivisie,2
Poland Ekstraklasa,0
Portugal Liga ZON Sagres,0
Scotland Premier League,0
Spain LIGA BBVA,8


## Organizing with CTEs
---

Previously, you modified a query based on a statement you completed in chapter 2 using common table expressions.

This time, let's expand on the exercise by looking at details about matches with very high scores using CTEs. Just like a subquery in `FROM`, you can join tables *inside* a CTE.

### Instructions

Declare your CTE, where you create a list of all matches with the league name.

Select the league, date, home, and away goals from the CTE.

Filter the main query for matches with 10 or more goals.

In [19]:
%%sql

WITH match_list AS (
    SELECT l.name AS league, 
           m.date, 
           m.home_goal, 
           m.away_goal,
          (m.home_goal + m.away_goal) AS total_goals
    FROM match AS m
    LEFT JOIN league as l ON m.country_id = l.id)
SELECT league, date, home_goal, away_goal
FROM match_list
WHERE total_goals >= 10

 * sqlite:///data/soccer.db
Done.


league,date,home_goal,away_goal
England Premier League,2011-08-28 00:00:00,8,2
England Premier League,2011-08-28 00:00:00,8,2
England Premier League,2012-12-29 00:00:00,7,3
England Premier League,2012-12-29 00:00:00,7,3
England Premier League,2013-05-19 00:00:00,5,5
England Premier League,2013-05-19 00:00:00,5,5
Germany 1. Bundesliga,2013-03-30 00:00:00,9,2
Germany 1. Bundesliga,2013-03-30 00:00:00,9,2
Netherlands Eredivisie,2011-11-06 00:00:00,6,4
Netherlands Eredivisie,2011-11-06 00:00:00,6,4


## CTEs with nested subqueries
---

If you find yourself listing multiple subqueries in the `FROM` clause with nested statement, your query will likely become long, complex, and difficult to read.

Since many queries are written with the intention of being saved and re-run in the future, proper organization is key to a seamless workflow. Arranging subqueries as CTEs will save you time, space, and confusion in the long run!

### Instructions

Declare a CTE that calculates the total goals from matches in August of the 2013/2014 season.

Left join the CTE onto the league table using `country_id` from the `match_list` CTE.

Filter the list on the inner subquery to only select matches in August of the 2013/2014 season.

`WITH match_list AS (
     SELECT country_id,
          ( home_goal + away_goal ) AS goals
     FROM   match
     WHERE  id IN (SELECT id
                   FROM   match
                   WHERE  season = '2013/2014'
                   AND EXTRACT(month FROM DATE) = 8))
 SELECT l.name,
        AVG(match_list.goals)
 FROM   league AS l
        left join match_list
               ON l.id = match_list.country_id
 GROUP  BY l.name` 

## Deciding on techniques to use
---

**Joins**
- Combine 2+ tables
- Simple operations/aggregations

**Correlated Subqueries**
- Match subqueries & tables
- Avoid limits of joins
- Hig processing time

**Multiple/Nested Subqueries**
- Multi-step transformations
- Improve accuracy and reproducibility

**Common Table Expressions**
- Organize subqueries sequentially
- Can reference other CTEs

### So which do I use?
- Depends on your database/question
- The technique that best allows you to: 
    - Use and reuse your queries
    - Generate clear and accurate results

## Get team names with a subquery
---

Let's solve a problem we've encountered a few times in this course so far *-- How do you get both the home and away team names into one final query result?*

Out of the 4 techniques we just discussed, this can be performed using subqueries, correlated subqueries, and CTEs. Let's practice creating similar result sets using each of these 3 methods over the next 3 exercises, starting with subqueries in `FROM`.

### Instructions

Create a query that left joins `team` to `match` in order to get the identity of the home team. This becomes the subquery in the next step.

In [23]:
%%sql

SELECT m.id,
       t.team_long_name AS hometeam
FROM   match AS m
       LEFT JOIN team AS t
              ON m.hometeam_id = team_api_id
LIMIT 20

 * sqlite:///data/soccer.db
Done.


id,hometeam
997,KV Kortrijk
998,Beerschot AC
999,RAEC Mons
1000,Standard de Liège
1001,KV Mechelen
1002,Club Brugge KV
1003,KAA Gent
1004,KRC Genk
1005,Club Brugge KV
1006,Standard de Liège


Add a second subquery to the `FROM` statement to get the away team name, changing only the `hometeam_id`. Left join both subqueries to the `match` table on the `id` column.

In [25]:
%%sql

SELECT m.date,
       hometeam,
       awayteam,
       m.home_goal,
       m.away_goal
FROM   match AS m
       LEFT JOIN (SELECT match.id,
                         team.team_long_name AS hometeam
                  FROM   match
                         LEFT JOIN team
                                ON match.hometeam_id = team.team_api_id) AS home
              ON home.id = m.id
       LEFT JOIN (SELECT match.id,
                         team.team_long_name AS awayteam
                  FROM   match
                         LEFT JOIN team
                                ON match.awayteam_id = team.team_api_id) AS away
              ON away.id = m.id 
LIMIT 20

 * sqlite:///data/soccer.db
Done.


date,hometeam,awayteam,home_goal,away_goal
2012-07-28 00:00:00,KV Kortrijk,RSC Anderlecht,1,1
2012-07-28 00:00:00,Beerschot AC,Sporting Lokeren,2,4
2012-07-28 00:00:00,RAEC Mons,Oud-Heverlee Leuven,5,2
2012-07-29 00:00:00,Standard de Liège,SV Zulte-Waregem,0,1
2012-07-28 00:00:00,KV Mechelen,Sporting Charleroi,4,2
2012-07-28 00:00:00,Club Brugge KV,Waasland-Beveren,3,1
2012-07-29 00:00:00,KAA Gent,Lierse SK,2,0
2012-07-29 00:00:00,KRC Genk,KSV Cercle Brugge,3,3
2012-10-07 00:00:00,Club Brugge KV,KRC Genk,1,1
2012-10-07 00:00:00,Standard de Liège,RSC Anderlecht,2,1


## Get team names with correlated subqueries
---
Let's solve the same problem using correlated subqueries *-- How do you get both the home and away team names into one final query result?*

This can easily be performed using correlated subqueries. But how might that impact the performance of your query? Complete the following steps and let's find out!

**Please note that your query will run more slowly than the previous exercise!**


### Instructions

Using a correlated subquery in the `SELECT` statement, match the `team_api_id` column from `team` to the `hometeam_id` from `match`.

In [27]:
%%sql

SELECT m.date,
       (SELECT team_long_name
        FROM   team AS t
        WHERE  t.team_api_id = m.hometeam_id) AS hometeam
FROM   match AS m 

LIMIT 10

 * sqlite:///data/soccer.db
Done.


date,hometeam
2012-07-28 00:00:00,KV Kortrijk
2012-07-28 00:00:00,Beerschot AC
2012-07-28 00:00:00,RAEC Mons
2012-07-29 00:00:00,Standard de Liège
2012-07-28 00:00:00,KV Mechelen
2012-07-28 00:00:00,Club Brugge KV
2012-07-29 00:00:00,KAA Gent
2012-07-29 00:00:00,KRC Genk
2012-10-07 00:00:00,Club Brugge KV
2012-10-07 00:00:00,Standard de Liège


Create a second correlated subquery in `SELECT`, yielding the away team's name.

Select the home and away goal columns from `match` in the main query.

In [28]:
%%sql

SELECT m.date,
       (SELECT team_long_name
        FROM   team AS t
        WHERE  t.team_api_id = m.hometeam_id) AS hometeam,
       (SELECT team_long_name
        FROM   team AS t
        WHERE  t.team_api_id = m.awayteam_id) AS awayteam,
       home_goal,
       away_goal
FROM   match AS m 

LIMIT 15

 * sqlite:///data/soccer.db
Done.


date,hometeam,awayteam,home_goal,away_goal
2012-07-28 00:00:00,KV Kortrijk,RSC Anderlecht,1,1
2012-07-28 00:00:00,Beerschot AC,Sporting Lokeren,2,4
2012-07-28 00:00:00,RAEC Mons,Oud-Heverlee Leuven,5,2
2012-07-29 00:00:00,Standard de Liège,SV Zulte-Waregem,0,1
2012-07-28 00:00:00,KV Mechelen,Sporting Charleroi,4,2
2012-07-28 00:00:00,Club Brugge KV,Waasland-Beveren,3,1
2012-07-29 00:00:00,KAA Gent,Lierse SK,2,0
2012-07-29 00:00:00,KRC Genk,KSV Cercle Brugge,3,3
2012-10-07 00:00:00,Club Brugge KV,KRC Genk,1,1
2012-10-07 00:00:00,Standard de Liège,RSC Anderlecht,2,1


## Get team names with CTEs
---

You've now explored two methods for answering the question, *How do you get both the home and away team names into one final query result?*

Let's explore the final method - common table expressions. Common table expressions are similar to the subquery method for generating results, mainly differing in syntax and the order in which information is processed.

### Instructions

Select `id` from `match` and `team_long_name` from `team`. 

Join these two tables together on `hometeam_id` in `match` and `team_api_id` in `team`.

In [29]:
%%sql

SELECT m.id,
       t.team_long_name AS hometeam
FROM   match AS m
       LEFT JOIN team AS t
              ON t.team_api_id = m.hometeam_id
LIMIT 10

 * sqlite:///data/soccer.db
Done.


id,hometeam
997,KV Kortrijk
998,Beerschot AC
999,RAEC Mons
1000,Standard de Liège
1001,KV Mechelen
1002,Club Brugge KV
1003,KAA Gent
1004,KRC Genk
1005,Club Brugge KV
1006,Standard de Liège


Declare the query from the previous step as a common table expression. `SELECT` everything from the CTE into the main query. **Your results will not change at this step!**

In [30]:
%%sql

WITH home AS (
     SELECT m.id,
            t.team_long_name AS hometeam
     FROM   match AS m
     LEFT JOIN team AS t
            ON m.hometeam_id = t.team_api_id)
SELECT *
FROM   home 
LIMIT 20

 * sqlite:///data/soccer.db
Done.


id,hometeam
997,KV Kortrijk
998,Beerschot AC
999,RAEC Mons
1000,Standard de Liège
1001,KV Mechelen
1002,Club Brugge KV
1003,KAA Gent
1004,KRC Genk
1005,Club Brugge KV
1006,Standard de Liège


Let's declare the second CTE, `away`. Join it to the first CTE on the id column.

The `date`, `home_goal`, and `away_goal` columns have been added to the CTEs. `SELECT` them into the main query.

In [32]:
%%sql

WITH home AS (
    SELECT m.id,
           m.date,
           t.team_long_name AS hometeam,
           m.home_goal
    FROM   match AS m
    LEFT JOIN team AS t
           ON m.hometeam_id = t.team_api_id),
away AS (
    SELECT m.id,
           m.date,
           t.team_long_name AS awayteam,
           m.away_goal
    FROM   match AS m
    LEFT JOIN team AS t
           ON m.awayteam_id = t.team_api_id)
SELECT home.date,
       home.hometeam,
       away.awayteam,
       home.home_goal,
       away.away_goal
FROM   home
       INNER JOIN away
               ON away.id = home.id 
        
LIMIT 20

 * sqlite:///data/soccer.db
Done.


date,hometeam,awayteam,home_goal,away_goal
2012-07-28 00:00:00,KV Kortrijk,RSC Anderlecht,1,1
2012-07-28 00:00:00,Beerschot AC,Sporting Lokeren,2,4
2012-07-28 00:00:00,RAEC Mons,Oud-Heverlee Leuven,5,2
2012-07-29 00:00:00,Standard de Liège,SV Zulte-Waregem,0,1
2012-07-28 00:00:00,KV Mechelen,Sporting Charleroi,4,2
2012-07-28 00:00:00,Club Brugge KV,Waasland-Beveren,3,1
2012-07-29 00:00:00,KAA Gent,Lierse SK,2,0
2012-07-29 00:00:00,KRC Genk,KSV Cercle Brugge,3,3
2012-10-07 00:00:00,Club Brugge KV,KRC Genk,1,1
2012-10-07 00:00:00,Standard de Liège,RSC Anderlecht,2,1
