<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# Subquery in FROM 
© ExploreAI Academy

> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Description

In this lesson we will use a subquery like a temporary table, meaning we will do a calculation, and then reference the result of that calculation in another query. 


## Learning objectives

In this train we will learn:
- How to create a query using the FROM clause.

## Connecting to our MySQL database

Using our `Access_to_Basic_Services` table in our `united_nations` database we created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.

Note that before, for this tutorial, we will already have worked through the lessons where the `united_nations.Basic_Services` table, `united_nations.Economic_Indicators` table, and the `united_nations.Geographic_Locations` table have been created.


In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [2]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:password@localhost:3306/united_nations



To make a query, we add the `%%sql` command to the start of a cell, create one open line then the query like below, and run the cell.

In [3]:
%%sql

SELECT 
    *
FROM
    Access_to_Basic_Services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


## Exercise


In this exercise. We will use a subquery to answer the following question: What is the average GDP and population for countries with unemployment rates above 5%?


### Creating the subquery
Construct the subquery that displays the average GDP and average population of countries for 2020. The query should have the following columns: `Country_name`, `Avg_GDP`, `Avg_Population`.

In [None]:
%%sql
SELECT 
    country_name,
    AVG(Est_gdp_in_billions),
    AVG(Est_population_in_millions)
FROM
    access_to_basic_services
GROUP BY 1;

### Creating the main query
Construct a query that filters out countries with unemployment rates above 5%. The query should include the following columns: `Country_name`, `Est_gdp_in_billions`,  `Est_population_in_millions`.


In [None]:
%%sql
SELECT 
    country_name,
    Est_gdp_in_billions,
    Est_population_in_millions
FROM
    access_to_basic_services
WHERE
    Pct_unemployment > 5
        AND Time_period = 2020;

### Combining the queries

Using the query and the subquery, determine the average GDP and population for countries with unemployment rates above 5%. Group the results by `Country_name`.


In [6]:
%%sql
SELECT 
    country_name,
    AVG(Est_gdp_in_billions),
    AVG(Est_population_in_millions)
FROM
    (SELECT 
    country_name,
    Est_gdp_in_billions,
    Est_population_in_millions
FROM
    access_to_basic_services
WHERE
    Pct_unemployment > 5
        AND Time_period = 2020) as FilteredCountries
GROUP BY 1;

 * mysql+pymysql://root:***@localhost:3306/united_nations
37 rows affected.


country_name,AVG(Est_gdp_in_billions),AVG(Est_population_in_millions)
Uzbekistan,59.89,34.23205
Afghanistan,20.14,38.97223
Bhutan,2.33,0.772506
India,2667.69,1396.387127
Sri Lanka,85.35,21.919
China,14687.67,1411.1
Mongolia,13.31,3.294335
Brunei Darussalam,12.01,0.441725
Canada,1645.42,38.037204
Dominican Republic,78.84,10.999664


## Solutions

### Creating the subquery

In [None]:
%%sql

SELECT 
    Country_name,
    Est_gdp_in_billions,
    Est_population_in_millions
FROM
    Economic_Indicators
WHERE
    Pct_unemployment > 5
    AND Time_period = 2020;

This table simply filters the `Economic_Indicators` table to only display countries with unemployment > 5. This is the input table the main query will use.

### Creating the main query

In [None]:
%%sql

SELECT 
    Country_name,
    AVG(Est_gdp_in_billions) AS Avg_GDP,
    AVG(Est_population_in_millions) AS Avg_Population
FROM
    Economic_Indicators
GROUP BY
    Country_name;


Note that this query calculates the averages for **all** years, and for **all** countries.

### Combining the queries

In [7]:
%%sql

SELECT 
    Country_name,
    AVG(Est_gdp_in_billions) AS Avg_GDP,
    AVG(Est_population_in_millions) AS Avg_Population
FROM
    (SELECT 
        Country_name,
        Est_gdp_in_billions,
        Est_population_in_millions
    FROM
        Economic_Indicators
    WHERE
        Pct_unemployment > 5
        AND Time_period = 2020) AS FilteredCountries
GROUP BY
    Country_name;


 * mysql+pymysql://root:***@localhost:3306/united_nations
37 rows affected.


Country_name,Avg_GDP,Avg_Population
Afghanistan,20.14,38.97223
Argentina,385.54,45.376763
Armenia,12.64,2.805608
Australia,1326.9,25.655289
Azerbaijan,42.69,10.093121
Bhutan,2.33,0.772506
Botswana,14.93,2.546402
Brazil,1448.56,213.196304
Brunei Darussalam,12.01,0.441725
Canada,1645.42,38.037204


Using the subquery filters the data first into a derived table, which the main query uses to aggregate with. 

## Summary

 


In this lesson, we used a subquery to determine the average GDP and populations of countries where the unemployment rate is above 5%. We made use of the `FROM` clause. In essence, the subquery in the `FROM` clause can be thought of as creating an intermediate or derived table for the main query to operate upon, even if this table exists only for the duration of the query execution. 

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>