<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# Subquery in  WHERE 
© ExploreAI Academy

In this notebook, we delve deeper into subqueries, specifically subqueries within a WHERE clause. One of the powerful aspects of subqueries is their ability to compare individual rows with aggregated data. This concept will be explored here.

> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Learning objectives

In this train, we will learn:
- How to use the WHERE clause in subqueries.
- How to join tables.

## Connecting to our MySQL database

Using our `Access_to_Basic_Services` table in our `united_nations` database we created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.

Note that before this tutorial, we will already have worked through the lessons where the `united_nations.Basic_Services` table, `united_nations.Economic_Indicators` table, and `united_nations.Geographic_Locations` table were created.


In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [4]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:Omar2003negm*@localhost:3306/united_nations


'Connected: root@united_nations'


To make a query, we add the `%%sql` command to the start of a cell, create one open line then the query like below, and run the cell.

In [5]:
%%sql

SELECT 
    *
FROM
    Access_to_Basic_Services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


## Exercise



We want to answer the following question:

For the year 2020, which countries have a GDP above the global average, but still have less than 90% of their population with access to managed drinking water services?

This question will shed light on nations that, despite having a robust economy, may still be facing challenges in providing basic amenities like water.


### Task 1

Start by constructing a query that displays the average GDP of a country during the year 2020.

In [12]:
%%sql
# Add your code here
SELECT
    AVG(Est_gdp_in_billions)
FROM
    Economic_Indicators
WHERE
    Time_period = 2020

 * mysql+pymysql://root:***@localhost:3306/united_nations
1 rows affected.


AVG(Est_gdp_in_billions)
301.176825


### Task 2

In order to answer our question, we need to pull data from both the Economic_Indicators and the Basic_Services tables; therefore, we need to join them together. Using `Country_name` and  `Time_period`, join the Basic_Services table to the Economic_Indicators table.

In [11]:
%%sql
# Add your code here
SELECT
    *
FROM
    Basic_services B
INNER JOIN
    Economic_Indicators E
ON
    B.Country_Name = E.Country_Name
    AND
    B.Time_period = E.Time_period

 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Country_name_1,Time_period_1,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment
Afghanistan,2015,67.0,45.67,Afghanistan,2015,20.0,33.753499,
Afghanistan,2016,69.67,47.0,Afghanistan,2016,18.02,34.636207,
Afghanistan,2017,72.33,49.33,Afghanistan,2017,18.9,35.643418,11.18
Afghanistan,2018,75.33,50.67,Afghanistan,2018,18.42,36.686784,
Afghanistan,2019,78.0,52.33,Afghanistan,2019,18.9,37.769499,
Afghanistan,2020,80.33,54.0,Afghanistan,2020,20.14,38.97223,11.71
Algeria,2015,92.0,85.0,Algeria,2015,165.98,39.543154,11.21
Algeria,2016,93.0,85.33,Algeria,2016,160.03,40.339329,10.2
Algeria,2017,93.0,84.67,Algeria,2017,170.1,41.136546,12.0
Algeria,2018,93.0,84.67,Algeria,2018,174.91,41.927007,


### Task 3

Using the query created in Task 2, filter the results to display records where:

1. The year = 2020.
2. The GDP is above the global average. 
3. Less than 90% of the country's population have access to managed drinking water services. 

Hint: Keep in mind that we determined the GDP above the global average in Task 1.

In [13]:
%%sql
# Add your code here
SELECT
    *
FROM
    Basic_services B
INNER JOIN
    Economic_Indicators E
ON
    B.Country_Name = E.Country_Name
    AND
    B.Time_period = E.Time_period
WHERE
    E.Time_period = 2020
    AND
    B.Pct_managed_drinking_water_services < 90
    AND
    E.Est_gdp_in_billions >
    (
        SELECT
            AVG(Est_gdp_in_billions)
        FROM
            Economic_Indicators
        WHERE
            Time_period = 2020
    );

 * mysql+pymysql://root:***@localhost:3306/united_nations
1 rows affected.


Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Country_name_1,Time_period_1,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment
Nigeria,2020,77.33,42.67,Nigeria,2020,432.2,208.327405,


## Solutions

### Task 1

In [15]:
%%sql

SELECT 
    AVG(Est_gdp_in_billions)
FROM 
    Economic_Indicators 
WHERE 
    Time_period = 2020;


 * mysql+pymysql://root:***@localhost:3306/united_nations
1 rows affected.


AVG(Est_gdp_in_billions)
301.176825


### Task 2

In [16]:
%%sql

SELECT 
    econ.Country_name,
    econ.Time_period,
    econ.Est_gdp_in_billions,
    service.Pct_managed_drinking_water_services
FROM 
    Economic_Indicators AS econ
INNER JOIN 
    Basic_Services AS service
ON 
    econ.Country_name = service.Country_name
    AND econ.Time_period = service.Time_period
LIMIT 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Country_name,Time_period,Est_gdp_in_billions,Pct_managed_drinking_water_services
Afghanistan,2015,20.0,67.0
Afghanistan,2016,18.02,69.67
Afghanistan,2017,18.9,72.33
Afghanistan,2018,18.42,75.33
Afghanistan,2019,18.9,78.0
Afghanistan,2020,20.14,80.33
Algeria,2015,165.98,92.0
Algeria,2016,160.03,93.0
Algeria,2017,170.1,93.0
Algeria,2018,174.91,93.0


### Task 3

In [14]:
%%sql

SELECT 
    econ.Country_name,
    econ.Time_period,
    econ.Est_gdp_in_billions,
    service.Pct_managed_drinking_water_services
FROM 
    Economic_Indicators AS econ
INNER JOIN 
    Basic_Services AS service
ON 
    econ.Country_name = service.Country_name
    AND econ.Time_period = service.Time_period
WHERE
    econ.time_period = 2020
    AND service.Pct_managed_drinking_water_services < 90
    AND econ.Est_gdp_in_billions > (SELECT 
                                        AVG(Est_gdp_in_billions)
                                    FROM 
                                        Economic_Indicators 
                                    WHERE 
                                        Time_period = 2020);


 * mysql+pymysql://root:***@localhost:3306/united_nations
1 rows affected.


Country_name,Time_period,Est_gdp_in_billions,Pct_managed_drinking_water_services
Nigeria,2020,432.2,77.33


## Summary

 


This layered query first calculates the average GDP and then uses that value to filter out countries, along with all of the other criteria. 
Nigeria is the only country that satisfies this criteria. While its GDP might be above average, the low access to water is linked to policies that are not aligned with SDG 6, bad governance, infrastructure, quality, and supply, according to the World Bank.
Finally, note that these queries quickly become complex when we use JOINs, subqueries, filters, and calculations together. The code is hard to read now and will be even harder to read for someone else who might work on this later. So we need to be mindful of this as we create these complex queries. 

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>