<img src = "https://images2.imgbox.com/60/09/VFwl5LOq_o.jpg" width="400">

# 2. Short and Simple Subqueries
---

In this chapter, you will learn about subqueries in the SELECT, FROM, and WHERE clauses. You will gain an understanding of when subqueries are necessary to construct your dataset and where to best include them in your queries.

## Filtering using scalar subqueries
---

Subqueries are incredibly powerful for performing complex filters and transformations. You can filter data based on single, scalar values using a subquery in ways you cannot by using `WHERE` statements or joins. Subqueries can also be used for more advanced manipulation of your data set. You will likely encounter subqueries in any real-world setting that uses relational databases.

In this exercise, you will generate a list of matches where the total goals scored (for both teams in total) is more than 3 times the average for games in the `matches_2013_2014` table, which includes all games played in the 2013/2014 season.

In [1]:
# %pip install ipython-sql

In [2]:
%load_ext sql

In [3]:
%sql sqlite:///data/soccer.db

'Connected: @data/soccer.db'

### Instructions

Calculate triple the average home + away goals scored across all matches. This will become your subquery in the next step. Note that this column does not have an alias, so it will be called `?column?` in your results.

In [4]:
%%sql 

SELECT 3 * AVG( home_goal + away_goal )
FROM   matches_2013_2014 

 * sqlite:///data/soccer.db
Done.


3 * AVG( home_goal + away_goal )
8.300461741424801


Select the date, home goals, and away goals in the main query.

Filter the main query for matches where the total goals scored exceed the value in the subquery.

In [5]:
%%sql 

SELECT date,
       home_goal,
       away_goal
FROM   matches_2013_2014
WHERE  ( home_goal + away_goal ) > (SELECT 3 * AVG( home_goal + away_goal )
                                    FROM   matches_2013_2014) 

 * sqlite:///data/soccer.db
Done.


date,home_goal,away_goal
2013-12-14 00:00:00,6,3
2014-03-22 00:00:00,3,6
2013-10-30 00:00:00,7,3
2013-12-14 00:00:00,6,3
2014-03-22 00:00:00,3,6
2013-10-30 00:00:00,7,3


## Filtering using a subquery with a list
---

Your goal in this exercise is to generate a list of teams that never played a game in their home city. Using a subquery, you will generate a list of unique `hometeam_ID` values from the unfiltered `match` table to exclude in the `team` table's `team_api_ID` column.

In addition to filtering using a single-value (scalar) subquery, you can create a list of values in a subquery to filter data based on a complex set of conditions. This type of subquery generates a one column reference list for the main query. As long as the values in your list match a column in your main query's table, you don't need to use a join -- even if the list is from a separate table.

### Instructions

Create a subquery in the `WHERE` clause that retrieves all unique `hometeam_ID` values from the `match` table.

Select the `team_long_name` and `team_short_name` from the `team` table. Exclude all values from the subquery in the main query.

In [6]:
%%sql 

SELECT team_long_name,
       team_short_name
FROM   team
WHERE  team_api_id NOT IN ( SELECT DISTINCT hometeam_id
                            FROM   match ) 

 * sqlite:///data/soccer.db
Done.


team_long_name,team_short_name
FCV Dender EH,DEN
KSV Roeselare,ROS
Tubize,TUB
Royal Excel Mouscron,MOU
Sint-Truidense VV,STT
KAS Eupen,EUP
Middlesbrough,MID
Portsmouth,POR
Birmingham City,BIR
Blackpool,BLA


## Filtering with more complex subquery conditions
---

Create a subquery in `WHERE` clause that retrieves all `hometeam_ID` values from `match` with a `home_goal` score greater than or equal to 8.

Select the `team_long_name` and `team_short_name` from the `team` table. Include all values from the subquery in the main query.

In [7]:
%%sql 

SELECT team_long_name,
       team_short_name
FROM   team
WHERE  team_api_id IN (SELECT hometeam_id
                       FROM   match
                       WHERE  home_goal >= 8) 

 * sqlite:///data/soccer.db
Done.


team_long_name,team_short_name
Manchester United,MUN
Chelsea,CHE
Southampton,SOU
FC Bayern Munich,BMU
Real Madrid CF,REA
FC Barcelona,BAR


## Joining Subqueries in FROM
---

The `match` table in the European Soccer Database does not contain country or team names. You can get this information by joining it to the `country` table, and use this to aggregate information, such as the number of matches played in each country.

If you're interested in filtering data from one of these tables, you can also create a subquery from one of the tables, and then join it to an existing table in the database. A subquery in `FROM` is an effective way of answering detailed questions that requires filtering or transforming data before including it in your final results.

Your goal in this exercise is to generate a subquery using the `match` table, and then join that subquery to the `country` table to calculate information about matches with 10 or more goals in total!

### Instructions

Create the subquery to be used in the next step, which selects the country ID and match ID (`id`) from the `match` table.

Filter the query for matches with greater than or equal to 10 goals.

In [8]:
%%sql 

SELECT country_id,
       id
FROM   match
WHERE  ( home_goal + away_goal ) >= 10 

 * sqlite:///data/soccer.db
Done.


country_id,id
1729,3093
1729,3369
1729,3566
7809,9211
13274,14224
21518,23444
21518,24016
21518,24114
21518,24123


Construct a subquery that selects only matches with 10 or more total goals.

Inner join the subquery onto `country` in the main query.

Select `name` from `country` and count the `id` column from `match`.

In [9]:
%%sql 

SELECT c.name        AS country_name,
       COUNT(sub.id) AS matches
FROM   country AS c
       INNER JOIN (SELECT country_id,
                          id
                   FROM   match
                   WHERE  ( home_goal + away_goal ) >= 10) AS sub
               ON c.id = sub.country_id
GROUP  BY country_name 

 * sqlite:///data/soccer.db
Done.


country_name,matches
England,3
Germany,1
Netherlands,1
Spain,4


## Building on Subqueries in FROM
---

In the previous exercise, you found that England, Netherlands, Germany and Spain were the only countries that had matches in the database where 10 or more goals were scored overall. Let's find out some more details about those matches -- when they were played, during which seasons, and how many of the goals were home versus away goals.

You'll notice that in this exercise, the table alias is excluded for every column selected in the main query. This is because the *main query* is extracting data from the *subquery*, which is treated as a *single table*.

### Instructions

Complete the subquery inside the `FROM` clause. Select the country name from the country table, along with the date, the home goal, the away goal, and the total goals columns from the match table.

Create a column in the subquery that adds home and away goals, called `total_goals`. This will be used to filter the main query.
Select the country, date, home goals, and away goals in the main query.

Filter the main query for games with 10 or more total goals.

In [10]:
%%sql 

SELECT country,
       date,
       home_goal,
       away_goal
FROM   (SELECT c.name                        AS country,
               m.date,
               m.home_goal,
               m.away_goal,
               ( m.home_goal + m.away_goal ) AS total_goals
        FROM   match AS m
               LEFT JOIN country AS c
                      ON m.country_id = c.id) AS subquery
WHERE  total_goals >= 10 

 * sqlite:///data/soccer.db
Done.


country,date,home_goal,away_goal
England,2011-08-28 00:00:00,8,2
England,2012-12-29 00:00:00,7,3
England,2013-05-19 00:00:00,5,5
Germany,2013-03-30 00:00:00,9,2
Netherlands,2011-11-06 00:00:00,6,4
Spain,2013-10-30 00:00:00,7,3
Spain,2015-04-05 00:00:00,9,1
Spain,2015-05-23 00:00:00,7,3
Spain,2014-09-20 00:00:00,2,8


## Add a subquery to the SELECT clause
---

Subqueries in `SELECT` statements generate a single value that allow you to pass an aggregate value down a data frame. This is useful for performing calculations on data within your database.

In the following exercise, you will construct a query that calculates the average number of goals per match in each country's league.

### Instructions

In the subquery, select the average total goals by adding `home_goal` and `away_goal`.

Filter the results so that only the average of goals in the 2013/2014 season is calculated.

In the main query, select the average total goals by adding `home_goal` and `away_goal`. This calculates the average goals for each league.

Filter the results in the main query the same way you filtered the subquery. Group the query by the league name.

In [11]:
%%sql 

SELECT l.name                                   AS league,
       ROUND(AVG(m.home_goal + m.away_goal), 2) AS avg_goals,
       (SELECT ROUND(AVG(home_goal + away_goal), 2)
        FROM   match
        WHERE  season = '2013/2014')            AS overall_avg
FROM   league AS l
       LEFT JOIN match AS m
              ON l.country_id = m.country_id
WHERE  season = '2013/2014'
GROUP  BY league 

 * sqlite:///data/soccer.db
Done.


league,avg_goals,overall_avg
Belgium Jupiler League,2.5,2.77
England Premier League,2.77,2.77
France Ligue 1,2.46,2.77
Germany 1. Bundesliga,3.16,2.77
Italy Serie A,2.72,2.77
Netherlands Eredivisie,3.2,2.77
Poland Ekstraklasa,2.64,2.77
Portugal Liga ZON Sagres,2.37,2.77
Scotland Premier League,2.75,2.77
Spain LIGA BBVA,2.75,2.77


## Subqueries in Select for Calculations
---

Subqueries in `SELECT` are a useful way to create calculated columns in a query. A subquery in `SELECT` can be treated as a single numeric value to use in your calculations. When writing queries in `SELECT`, it's important to remember that filtering the main query does not filter the subquery -- and vice versa.

In the previous exercise, you created a column to compare each league's average total goals to the overall average goals in the 2013/2014 season. In this exercise, you will add a column that directly compares these values by subtracting the overall average from the subquery.

### Instructions

Select the average goals scored in a match for each league in the main query.

Select the average goals scored in a match overall for the 2013/2014 season in the subquery.

Subtract the subquery from the average number of goals calculated for each league.

Filter the main query so that only games from the 2013/2014 season are included.

In [12]:
%%sql 

SELECT l.name AS league,
       ROUND(AVG(m.home_goal + m.away_goal), 2) AS avg_goals,
       ROUND(AVG(m.home_goal + m.away_goal) - (SELECT AVG(home_goal + away_goal)
                                               FROM   match
                                               WHERE  season = '2013/2014'), 2)
       AS diff
FROM   league AS l
       LEFT JOIN match AS m
              ON l.country_id = m.country_id
WHERE  season = '2013/2014'
GROUP  BY l.name 

 * sqlite:///data/soccer.db
Done.


league,avg_goals,diff
Belgium Jupiler League,2.5,-0.27
England Premier League,2.77,0.0
France Ligue 1,2.46,-0.31
Germany 1. Bundesliga,3.16,0.39
Italy Serie A,2.72,-0.04
Netherlands Eredivisie,3.2,0.43
Poland Ekstraklasa,2.64,-0.13
Portugal Liga ZON Sagres,2.37,-0.4
Scotland Premier League,2.75,-0.02
Spain LIGA BBVA,2.75,-0.02


## ALL the subqueries EVERYWHERE
---

In soccer leagues, games are played at different stages. Winning teams progress from one stage to the next, until they reach the final stage. In each stage, the stakes become higher than the previous one. The `match` table includes data about the different stages that each match took place in.

In this lesson, you will build a final query across 3 exercises that will contain three subqueries -- one in the `SELECT` clause, one in the `FROM` clause, and one in the `WHERE` clause. In the final exercise, your query will extract data examining the average goals scored in each stage of a match. Does the average number of goals scored change as the stakes get higher from one stage to the next?

### Instructions

Extract the average number of home and away team goals in two `SELECT` subqueries.

Calculate the average home and away goals for the specific stage in the main query.

Filter both subqueries and the main query so that only data from the 2012/2013 season is included.

Group the query by the `m.stage` column.

In [13]:
%%sql 

SELECT m.stage,
       ROUND(AVG(m.home_goal + m.away_goal), 2) AS avg_goals,
       ROUND((SELECT AVG(home_goal + away_goal)
              FROM   match
              WHERE  season = '2012/2013'), 2)  AS overall
FROM   match AS m
WHERE  season = '2012/2013'
GROUP  BY m.stage 

 * sqlite:///data/soccer.db
Done.


stage,avg_goals,overall
1,2.68,2.77
2,2.65,2.77
3,2.83,2.77
4,2.8,2.77
5,2.61,2.77
6,2.78,2.77
7,2.69,2.77
8,3.09,2.77
9,2.7,2.77
10,2.96,2.77


## Add a subquery in FROM
---

In the previous exercise, you created a data set listing the average home and away goals in each match stage of the 2012/2013 match season.

In this next step, you will turn the main query into a *subquery* to extract a list of stages where the average home goals in a stage is higher than the *overall* average for home goals in a match.

### Intructions

Calculate the average home goals and average away goals from the match table for each stage in the `FROM` clause subquery.

Add a subquery to the `WHERE` clause that calculates the overall average home goals.

Filter the main query for stages where the average home goals is higher than the overall average.

Select the `stage` and `avg_goals` columns from the `s` subquery into the main query.

In [14]:
%%sql 

SELECT s.stage,
       ROUND(s.avg_goals, 2) AS avg_goals
FROM   (SELECT stage, AVG(home_goal + away_goal) AS avg_goals
        FROM   match
        WHERE  season = '2012/2013'
        GROUP  BY stage) AS s
WHERE  s.avg_goals > (SELECT AVG(home_goal + away_goal)
                      FROM   match
                      WHERE  season = '2012/2013') 

 * sqlite:///data/soccer.db
Done.


stage,avg_goals
3,2.83
4,2.8
6,2.78
8,3.09
10,2.96
11,2.92
12,3.23
17,2.85
20,2.96
21,2.9


## Add a subquery in SELECT
---

In the previous exercise, you added a subquery to the `FROM` statement and selected the stages where the number of average goals in a stage exceeded the overall average number of goals in the 2012/2013 match season. In this final step, you will add a subquery in `SELECT` to compare the average number of goals scored in each stage to the total.

### Instructions

Create a subquery in `SELECT` that yields the average goals scored in the 2012/2013 season. Name the new column `overall_avg`.

Create a subquery in `FROM` that calculates the average goals scored in each stage during the 2012/2013 season.

Filter the main query for stages where the average goals exceeds the overall average in 2012/2013.

In [15]:
%%sql 

SELECT s.stage,
       ROUND(s.avg_goals, 2)         AS avg_goal,
       (SELECT AVG(home_goal + away_goal)
        FROM   match
        WHERE  season = '2012/2013') AS overall_avg
FROM   (SELECT stage,
               AVG(home_goal + away_goal) AS avg_goals
        FROM   match
        WHERE  season = '2012/2013'
        GROUP  BY stage) AS s
WHERE  s.avg_goals > (SELECT AVG(home_goal + away_goal)
                      FROM   match 
                      WHERE  season = '2012/2013') 

 * sqlite:///data/soccer.db
Done.


stage,avg_goal,overall_avg
3,2.83,2.772699386503068
4,2.8,2.772699386503068
6,2.78,2.772699386503068
8,3.09,2.772699386503068
10,2.96,2.772699386503068
11,2.92,2.772699386503068
12,3.23,2.772699386503068
17,2.85,2.772699386503068
20,2.96,2.772699386503068
21,2.9,2.772699386503068
