<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# SELECT and SELECT WHERE
© ExploreAI Academy

In this walk-through we demonstrate how to get data out of a table using the SELECT statement.
We also show how to filter data using the where clause.



> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Learning objectives

In this train, we will learn:
- How to use SELECT and SELECT DISTINCT to select columns.
- How to use WHERE to filter data based on a condition.
- Save results sets as new tables.

## Connecting to our MySQL database

Using our `Access_to_Basic_Services` table in our `united_nations` database we created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.


In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

# %load_ext sql
%load_ext sql

In [3]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://cybergod:example-password@localhost:3306/united_nations


To make a query, we add the `%%sql` command to the start of a cell, create one open line, then the query like below, and run the cell.

In [4]:
%%sql

SELECT 
    *
FROM
    Access_to_basic_services
LIMIT 5;

 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98,17.542806,184.39,2699700,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98,17.794055,137.28,2699700,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98,18.037776,166.81,2699700,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98,18.276452,179.34,2699700,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98,18.513673,181.67,2699700,4.8


## Exercise


Suppose we want to find out which country had the lowest percentage of people with access to managed drinking water services in 2020.

### 1. Exploring the database

Use the `SELECT` statement to display all the columns from the `Access_to_Basic_Services` table. This will help us get a feel for the data we're working with. 

In [10]:
%%sql
# Add your code here
# USE united_nations;
SELECT 
* 
FROM 
Access_to_basic_services
LIMIT 100; 

 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
100 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98,18.513673,181.67,2699700.0,4.8
Central and Southern Asia,Central Asia,Kazakhstan,2020,95.0,98,18.755666,171.08,2699700.0,4.89
Central and Southern Asia,Central Asia,Kyrgyzstan,2015,89.67,97,,,,
Central and Southern Asia,Central Asia,Kyrgyzstan,2016,90.33,97,,,,
Central and Southern Asia,Central Asia,Kyrgyzstan,2017,91.0,97,,,,
Central and Southern Asia,Central Asia,Kyrgyzstan,2018,91.33,97,,,,


The previous query may return a large number of rows, which could slow down our system. Modify the query to limit the number of rows returned to 10.

In [11]:
%%sql
SELECT 
* 
FROM 
Access_to_basic_services
LIMIT 10; 

 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
10 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98,18.513673,181.67,2699700.0,4.8
Central and Southern Asia,Central Asia,Kazakhstan,2020,95.0,98,18.755666,171.08,2699700.0,4.89
Central and Southern Asia,Central Asia,Kyrgyzstan,2015,89.67,97,,,,
Central and Southern Asia,Central Asia,Kyrgyzstan,2016,90.33,97,,,,
Central and Southern Asia,Central Asia,Kyrgyzstan,2017,91.0,97,,,,
Central and Southern Asia,Central Asia,Kyrgyzstan,2018,91.33,97,,,,


### 2. Unique country names
Extract a list of unique country names in the database.

In [13]:
%%sql
SELECT DISTINCT
    Country_name
FROM 
Access_to_basic_services


 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
182 rows affected.


Country_name
Kazakhstan
Kyrgyzstan
Tajikistan
Turkmenistan
Uzbekistan
Afghanistan
Bangladesh
Bhutan
India
Iran (Islamic Republic of)


Create a new table called `Country_list` and save the unique country names into this table.

In [20]:
%%sql
CREATE TABLE Country_list(Country VARCHAR(25));

INSERT INTO Country_list(Country)
SELECT DISTINCT 
    Country_name
FROM 
    Access_to_basic_services;

 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
(pymysql.err.OperationalError) (1050, "Table 'Country_list' already exists")
[SQL: CREATE TABLE Country_list(Country VARCHAR(25));]
(Background on this error at: https://sqlalche.me/e/20/e3q8)


In [21]:
%%sql
SELECT 
* 
FROM 
Country_list;


 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
182 rows affected.


Country_name
Kazakhstan
Kyrgyzstan
Tajikistan
Turkmenistan
Uzbekistan
Afghanistan
Bangladesh
Bhutan
India
Iran (Islamic Republic of)


### 3. Selecting specific fields

Select the `country_name`, `time_period`, and `pct_managed_drinking_water_services` fields from the `Access_to_Basic_Services` table.

In [29]:
%%sql
# # use united_nations;
SELECT
    Country_name, 
    Time_period,
    Pct_managed_drinking_water_services
FROM 
    Access_to_basic_services
LIMIT 20;


 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
20 rows affected.


Country_name,Time_period,Pct_managed_drinking_water_services
Kazakhstan,2015,94.67
Kazakhstan,2016,94.67
Kazakhstan,2017,95.0
Kazakhstan,2018,95.0
Kazakhstan,2019,95.0
Kazakhstan,2020,95.0
Kyrgyzstan,2015,89.67
Kyrgyzstan,2016,90.33
Kyrgyzstan,2017,91.0
Kyrgyzstan,2018,91.33


Rename the `pct_managed_drinking_water_services` field to `pct_access_to_water` in your query results.

In [31]:
%%sql
# Add your code here
SELECT
    Country_name, 
    Time_period,
    Pct_managed_drinking_water_services as pct_access_to_water
FROM 
    Access_to_basic_services
LIMIT 20;

 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
20 rows affected.


Country_name,Time_period,pct_access_to_water
Kazakhstan,2015,94.67
Kazakhstan,2016,94.67
Kazakhstan,2017,95.0
Kazakhstan,2018,95.0
Kazakhstan,2019,95.0
Kazakhstan,2020,95.0
Kyrgyzstan,2015,89.67
Kyrgyzstan,2016,90.33
Kyrgyzstan,2017,91.0
Kyrgyzstan,2018,91.33


### 4. Filtering and sorting data

Modify your query to only display data for the year `2020`.

In [33]:
%%sql
# Add your code here
SELECT
    Country_name, 
    Time_period,
    Pct_managed_drinking_water_services
FROM 
    Access_to_basic_services
WHERE Time_period=2020
LIMIT 20;

 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
20 rows affected.


Country_name,Time_period,Pct_managed_drinking_water_services
Kazakhstan,2020,95.0
Kyrgyzstan,2020,92.67
Tajikistan,2020,85.0
Turkmenistan,2020,100.0
Uzbekistan,2020,98.0
Afghanistan,2020,80.33
Bangladesh,2020,97.67
Bhutan,2020,97.33
India,2020,91.0
Iran (Islamic Republic of),2020,96.67


The previous query may return a large number of rows, which could slow down our system. Modify the query to limit the number of rows returned to 10.

In [37]:
%%sql
# Add your code here
SELECT
    Country_name, 
    Time_period,
    Pct_managed_drinking_water_services as pct_access_to_water
FROM 
    Access_to_basic_services
WHERE Time_period=2020
ORDER BY pct_access_to_water ASC
#Since we cannot sort like we did in the MySQL GUI, we have to use SQL. Just add your code before this line, and this line will order your results. 
LIMIT 10;

 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
10 rows affected.


Country_name,Time_period,pct_access_to_water
Central African Republic,2020,38.33
Democratic Republic of the Congo,2020,47.67
South Sudan,2020,48.33
Angola,2020,52.33
Chad,2020,52.67
Burkina Faso,2020,53.33
Madagascar,2020,56.33
Papua New Guinea,2020,56.67
Niger,2020,57.33
Somalia,2020,57.33


In [41]:
%%sql
SHOW COLUMNS
FROM 
    Access_to_basic_services;

 * mysql+pymysql://cybergod:***@localhost:3306/united_nations
10 rows affected.


Field,Type,Null,Key,Default,Extra
Region,text,YES,,,
Sub_region,text,YES,,,
Country_name,text,YES,,,
Time_period,int,YES,,,
Pct_managed_drinking_water_services,double,YES,,,
Pct_managed_sanitation_services,int,YES,,,
Est_population_in_millions,double,YES,,,
Est_gdp_in_billions,double,YES,,,
Land_area,int,YES,,,
Pct_unemployment,double,YES,,,


And there is the answer at the top: 

In [None]:
#Answer:38.33

## Solutions

### 1. Exploring the database

Use the `SELECT` statement to display all the columns from the `Access_to_Basic_Services` table. This will help us get a feel for the data we're working with.

In [None]:
%%sql

SELECT 
    * 
FROM 
    united_nations.Access_to_Basic_Services
LIMIT 30;

The previous query may return a large number of rows, which could slow down our system. Modify the query to limit the number of rows returned to 10.

In [None]:
%%sql

SELECT 
    * 
FROM 
    united_nations.Access_to_Basic_Services
LIMIT 10;

### 2. Unique country names
Extract a list of unique country names in the database.

In [None]:
%%sql

SELECT DISTINCT 
    Country_name 
FROM 
    united_nations.Access_to_Basic_Services
    LIMIT 20;

Create a new table called `Country_list` and save the unique country names into this table.

In [None]:
%%sql

CREATE TABLE Country_list(Country VARCHAR(255));
INSERT INTO Country_list(Country)
SELECT DISTINCT 
    Country_name 
FROM 
    united_nations.Access_to_Basic_Services;

### 3. Selecting specific fields

Select the `country_name`, `time_period`, and `pct_managed_drinking_water_services` fields from the `Access_to_Basic_Services` table.

In [None]:
%%sql

SELECT 
    country_name, 
    time_period, 
    pct_managed_drinking_water_services 
FROM 
    united_nations.Access_to_Basic_Services
LIMIT 20;


Rename the `pct_managed_drinking_water_services` field to `pct_access_to_water` in your query results.

In [None]:
%%sql

SELECT 
    country_name, 
    time_period, 
    pct_managed_drinking_water_services AS pct_access_to_water
FROM 
    united_nations.Access_to_Basic_Services
LIMIT 20

### 4. Filtering and sorting data

Modify your query to only display data for the year 2020.

In [None]:
%%sql

SELECT 
    country_name, 
    time_period, 
    pct_managed_drinking_water_services AS pct_access_to_water
FROM 
    united_nations.Access_to_Basic_Services 
WHERE 
    Time_period = 2020
LIMIT 100;

The previous query may return a large number of rows, which could slow down our system. Modify the query to limit the number of rows returned to 10.

In [None]:
%%sql

SELECT 
    country_name, 
    time_period, 
    pct_managed_drinking_water_services AS pct_access_to_water
FROM 
    united_nations.Access_to_Basic_Services 
WHERE 
    Time_period = 2020
ORDER BY pct_access_to_water
LIMIT 10;

## Summary
Congratulations! You have used SQL commands to filter and sort data to answer a specific question. Please review your results and think about what other questions could be answered with this data.

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>