<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img
 src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/alx-courses/aice/assets/Content_page_banner_blue_dots.png"
 alt="ALX Content Header"
 class="full-width-image"
/>
</div>

# Subquery in FROM 

> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Description

In this lesson we will use a subquery like a temporary table, meaning we will do a calculation, and then reference the result of that calculation in another query. 


## Learning objectives

In this train we will learn:
- How to create a query using the FROM clause.

## Connecting to our MySQL database

Using our `Access_to_Basic_Services` table in our `united_nations` database we created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.

Note that before, for this tutorial, we will already have worked through the lessons where the `united_nations.Basic_Services` table, `united_nations.Economic_Indicators` table, and the `united_nations.Geographic_Locations` table have been created.


In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [2]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:WangilaP%400911@localhost:3306/united_nations


'Connected: root@united_nations'


To make a query, we add the `%%sql` command to the start of a cell, create one open line then the query like below, and run the cell.

In [3]:
%%sql

SELECT 
    *
FROM
    Access_to_Basic_Services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


In [10]:
%%sql
SELECT
    *
FROM
    Economic_indicators;

 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


Country_name,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment
Afghanistan,2015,20.0,34.0,
Afghanistan,2016,18.0,35.0,
Afghanistan,2017,19.0,36.0,11.0
Afghanistan,2018,18.0,37.0,
Afghanistan,2019,19.0,38.0,
Afghanistan,2020,20.0,39.0,12.0
Algeria,2015,166.0,40.0,11.0
Algeria,2016,160.0,40.0,10.0
Algeria,2017,170.0,41.0,12.0
Algeria,2018,175.0,42.0,


## Exercise


In this exercise. We will use a subquery to answer the following question: What is the average GDP and population for countries with unemployment rates above 5%?


### Creating the subquery
Construct the subquery that displays the average GDP and average population of countries for 2020. The query should have the following columns: `Country_name`, `Avg_GDP`, `Avg_Population`.

In [8]:
%%sql
SELECT
    Country_name,
    AVG(Est_gdp_in_billions) AS Avg_GDP,
    AVG(Est_population_in_millions) AS Avg_Population
FROM
    Economic_indicators
WHERE
    Time_period = 2020
GROUP BY
    Country_name;

 * mysql+pymysql://root:***@localhost:3306/united_nations
165 rows affected.


Country_name,Avg_GDP,Avg_Population
Afghanistan,20.0,39.0
Algeria,145.0,43.0
American Samoa,1.0,0.0
Angola,54.0,33.0
Argentina,386.0,45.0
Armenia,13.0,3.0
Australia,1327.0,26.0
Azerbaijan,43.0,10.0
Bahrain,35.0,1.0
Bangladesh,374.0,167.0


### Creating the main query
Construct a query that filters out countries with unemployment rates above 5%. The query should include the following columns: `Country_name`, `Est_gdp_in_billions`,  `Est_population_in_millions`.


In [11]:
%%sql
SELECT
    Country_name,
    Est_gdp_in_billions,
    Est_population_in_millions
FROM
    Economic_indicators
WHERE
    Pct_unemployment > 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
207 rows affected.


Country_name,Est_gdp_in_billions,Est_population_in_millions
Afghanistan,19,36
Afghanistan,20,39
Algeria,166,40
Algeria,160,40
Algeria,170,41
Angola,69,32
Argentina,644,44
Argentina,525,44
Argentina,448,45
Argentina,386,45


### Combining the queries

Using the query and the subquery, determine the average GDP and population for countries with unemployment rates above 5%. Group the results by `Country_name`.


In [13]:
%%sql
SELECT
    sub.Country_name,
    AVG(sub.Est_gdp_in_billions) AS Avg_GDP,
    AVG(sub.Est_population_in_millions) AS Avg_Population
FROM (
    SELECT
        Country_name,
        Est_gdp_in_billions,
        Est_population_in_millions
    FROM
        Economic_indicators
    WHERE
        Time_period = 2020
        AND Pct_unemployment > 5
) AS sub
GROUP BY
    sub.Country_name;


 * mysql+pymysql://root:***@localhost:3306/united_nations
33 rows affected.


Country_name,Avg_GDP,Avg_Population
Afghanistan,20.0,39.0
Argentina,386.0,45.0
Armenia,13.0,3.0
Australia,1327.0,26.0
Azerbaijan,43.0,10.0
Botswana,15.0,3.0
Brazil,1449.0,213.0
Brunei Darussalam,12.0,0.0
Canada,1645.0,38.0
Chile,253.0,19.0


## Solutions

### Creating the subquery

In [None]:
%%sql

SELECT 
    Country_name,
    Est_gdp_in_billions,
    Est_population_in_millions
FROM
    Economic_Indicators
WHERE
    Pct_unemployment > 5
    AND Time_period = 2020;

This table simply filters the `Economic_Indicators` table to only display countries with unemployment > 5. This is the input table the main query will use.

### Creating the main query

In [None]:
%%sql

SELECT 
    Country_name,
    AVG(Est_gdp_in_billions) AS Avg_GDP,
    AVG(Est_population_in_millions) AS Avg_Population
FROM
    Economic_Indicators
GROUP BY
    Country_name;


Note that this query calculates the averages for **all** years, and for **all** countries.

### Combining the queries

In [None]:
%%sql

SELECT 
    Country_name,
    AVG(Est_gdp_in_billions) AS Avg_GDP,
    AVG(Est_population_in_millions) AS Avg_Population
FROM
    (SELECT 
        Country_name,
        Est_gdp_in_billions,
        Est_population_in_millions
    FROM
        Economic_Indicators
    WHERE
        Pct_unemployment > 5
        AND Time_period = 2020) AS FilteredCountries
GROUP BY
    Country_name;


Using the subquery filters the data first into a derived table, which the main query uses to aggregate with. 

## Summary

 


In this lesson, we used a subquery to determine the average GDP and populations of countries where the unemployment rate is above 5%. We made use of the `FROM` clause. In essence, the subquery in the `FROM` clause can be thought of as creating an intermediate or derived table for the main query to operate upon, even if this table exists only for the duration of the query execution. 

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/refs/heads/master/ALX_banners/ALX_Navy.png"  style="width:100px"  ;/>
</div>