<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# Logical and comparison operators ii
© ExploreAI Academy

> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Learning objectives

In this train we will learn:
- How to use the IS NULL and IS NOT NULL statements
- How to use the IS IN and IS NOT IN statements
- Understand if there is any correlation between the GDP and availibility of drinking water and sanitation services in Sub-Saharan Africa

## Connecting to our MySQL database

Using our `Access_to_Basic_Services` table in our `united_nations` database we created in MySQL Workbench, we want to answer some questions about our dataset. We can apply the same queries we used in MySQL Workbench in this notebook if we connect to our MySQL server by running the cells below.


In [21]:
# Load and activate the SQL extension to allows us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql is installed correctly. 

%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [22]:
# Establish a connection to the local database using the '%sql' magic command,
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:password@localhost:3306/united_nations


To make a query, we add the `%%sql` command to the start of a cell, create one open line and then the query like below and run the cell.

In [23]:
%%sql

SELECT 
    *
FROM
    Access_to_Basic_Services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


## Exercise


We will be working with the `united_nations.Access_to_Basic_Services` table, which contains information about different countries, their access to basic services, and their estimated GDP.

In this exercise We would like to determine if the GDP of a country, specifically in Sub-Saharan Africa has any correlation to its access to basic services


### Task 1 
Select data from the Sub-Saharan African region during the year 2020.

In [32]:
%%sql
SELECT *
FROM united_nations.Access_to_Basic_Services
WHERE time_period = 2020 and region = 'Sub-Saharan Africa'
limit 10;

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Sub-Saharan Africa,Eastern Africa,Burundi,2020,70.33,44.33,12.220227,2.65,25680.0,1.03
Sub-Saharan Africa,Eastern Africa,Djibouti,2020,69.0,56.0,1.090156,3.18,23180.0,
Sub-Saharan Africa,Eastern Africa,Ethiopia,2020,58.0,11.67,117.190911,107.66,1128571.26,
Sub-Saharan Africa,Eastern Africa,Kenya,2020,67.0,33.67,51.98578,100.67,569140.0,
Sub-Saharan Africa,Eastern Africa,Madagascar,2020,56.33,13.0,28.225177,13.05,581800.0,
Sub-Saharan Africa,Eastern Africa,Malawi,2020,74.33,28.67,19.377061,12.18,94280.0,0.91
Sub-Saharan Africa,Eastern Africa,Mauritius,2020,100.0,96.0,1.26574,11.4,2030.0,8.63
Sub-Saharan Africa,Eastern Africa,Mayotte,2020,96.0,100.0,,,,
Sub-Saharan Africa,Eastern Africa,Mozambique,2020,66.67,40.33,31.178239,14.03,786380.0,
Sub-Saharan Africa,Eastern Africa,Rwanda,2020,66.33,64.0,13.146362,10.18,24670.0,11.83


### Task 2

Sometimes there are null values in our entries. Any country having Null values for their GDP should not be included in our query as they will not help us determine if there is any correlation between GDP and access to basic services. For this task determine if there are any NULL values in the GDP column

In [41]:
%%sql
# Add your code here
SELECT *
FROM united_nations.Access_to_Basic_Services
WHERE
	region = 'Sub-Saharan Africa'
AND 
	Time_period = 2020
AND  
    Est_gdp_in_billions is null


 * mysql+pymysql://root:***@localhost:3306/united_nations
9 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Sub-Saharan Africa,Eastern Africa,Mayotte,2020,96.0,100.0,,,,
Sub-Saharan Africa,Eastern Africa,Réunion,2020,100.0,100.0,,,,
Sub-Saharan Africa,Eastern Africa,South Sudan,2020,48.33,22.33,10.606227,,631930.0,
Sub-Saharan Africa,Eastern Africa,United Republic of Tanzania,2020,65.0,34.0,,,,
Sub-Saharan Africa,Middle Africa,Congo,2020,69.0,17.67,,,,
Sub-Saharan Africa,Middle Africa,Democratic Republic of the Congo,2020,47.67,15.33,,,,
Sub-Saharan Africa,Western Africa,Côte d'Ivoire,2020,70.67,34.67,,,,
Sub-Saharan Africa,Western Africa,Gambia,2020,79.33,44.33,,,,
Sub-Saharan Africa,Western Africa,Saint Helena,2020,99.0,100.0,,,,


### Task 3

If there are any Null values exclude them from your query.

In [40]:
%%sql
# Add your code here
SELECT *
FROM united_nations.Access_to_Basic_Services
WHERE
	region = 'Sub-Saharan Africa'
AND 
	Time_period = 2020
AND  
    Est_gdp_in_billions is not null


 * mysql+pymysql://root:***@localhost:3306/united_nations
38 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Sub-Saharan Africa,Eastern Africa,Burundi,2020,70.33,44.33,12.220227,2.65,25680.0,1.03
Sub-Saharan Africa,Eastern Africa,Djibouti,2020,69.0,56.0,1.090156,3.18,23180.0,
Sub-Saharan Africa,Eastern Africa,Ethiopia,2020,58.0,11.67,117.190911,107.66,1128571.26,
Sub-Saharan Africa,Eastern Africa,Kenya,2020,67.0,33.67,51.98578,100.67,569140.0,
Sub-Saharan Africa,Eastern Africa,Madagascar,2020,56.33,13.0,28.225177,13.05,581800.0,
Sub-Saharan Africa,Eastern Africa,Malawi,2020,74.33,28.67,19.377061,12.18,94280.0,0.91
Sub-Saharan Africa,Eastern Africa,Mauritius,2020,100.0,96.0,1.26574,11.4,2030.0,8.63
Sub-Saharan Africa,Eastern Africa,Mozambique,2020,66.67,40.33,31.178239,14.03,786380.0,
Sub-Saharan Africa,Eastern Africa,Rwanda,2020,66.33,64.0,13.146362,10.18,24670.0,11.83
Sub-Saharan Africa,Eastern Africa,Somalia,2020,57.33,40.0,16.537016,6.88,627340.0,


### Task 4

Lets get an idea if there's any correlation between the GDP and access to basic services for the top 5 economies in Sub-Saharan Africr. The top 5 GDP's are : ('Nigeria','South Africa','Ethiopia','Kenya','Ghana'). Make sure your query only includes these countries.

In [43]:
%%sql
# Add your code here
SELECT *
FROM united_nations.Access_to_Basic_Services
WHERE
	region = 'Sub-Saharan Africa'
AND 
	Time_period = 2020
AND  
    Est_gdp_in_billions is not null
AND 
    country_name in ('Nigeria','South Africa','Ethiopia','Kenya','Ghana')

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Sub-Saharan Africa,Eastern Africa,Ethiopia,2020,58.0,11.67,117.190911,107.66,1128571.26,
Sub-Saharan Africa,Eastern Africa,Kenya,2020,67.0,33.67,51.98578,100.67,569140.0,
Sub-Saharan Africa,Southern Africa,South Africa,2020,92.0,78.67,58.801927,337.62,1213090.0,24.34
Sub-Saharan Africa,Western Africa,Ghana,2020,84.67,23.0,32.180401,70.04,227533.0,
Sub-Saharan Africa,Western Africa,Nigeria,2020,77.33,42.67,208.327405,432.2,910770.0,


Based on your results. Is there any correlation between the GDP and access to basic services? Does a higher GDP translate to better services?

### Task 5

We only looked at 5 countries in the previous query. Lets have a look at the rest of Sub-Saharan Africa .
Exclude the countries mentioned in the previous task

In [47]:
%%sql
# Add your code here

SELECT *
FROM united_nations.Access_to_Basic_Services
WHERE
	region = 'Sub-Saharan Africa'
AND 
	Time_period = 2020
AND  
    Est_gdp_in_billions is not null
AND 
    country_name not in ('Nigeria','South Africa','Ethiopia','Kenya','Ghana')

 * mysql+pymysql://root:***@localhost:3306/united_nations
33 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Sub-Saharan Africa,Eastern Africa,Burundi,2020,70.33,44.33,12.220227,2.65,25680.0,1.03
Sub-Saharan Africa,Eastern Africa,Djibouti,2020,69.0,56.0,1.090156,3.18,23180.0,
Sub-Saharan Africa,Eastern Africa,Madagascar,2020,56.33,13.0,28.225177,13.05,581800.0,
Sub-Saharan Africa,Eastern Africa,Malawi,2020,74.33,28.67,19.377061,12.18,94280.0,0.91
Sub-Saharan Africa,Eastern Africa,Mauritius,2020,100.0,96.0,1.26574,11.4,2030.0,8.63
Sub-Saharan Africa,Eastern Africa,Mozambique,2020,66.67,40.33,31.178239,14.03,786380.0,
Sub-Saharan Africa,Eastern Africa,Rwanda,2020,66.33,64.0,13.146362,10.18,24670.0,11.83
Sub-Saharan Africa,Eastern Africa,Somalia,2020,57.33,40.0,16.537016,6.88,627340.0,
Sub-Saharan Africa,Eastern Africa,Uganda,2020,61.0,21.67,44.404611,37.6,200520.0,
Sub-Saharan Africa,Eastern Africa,Zambia,2020,66.67,32.67,18.927715,18.11,743390.0,6.03


Again. Do you see any correlation? Arrange the `Pct_managed_drinking_water_services` in descending order. Do the countries with the highest percentage to `Pct_managed_drinking_water_services` have a higher GDP then the countries with a lower percentage of `Pct_managed_drinking_water_services` ?
 Now arrange the `Pct_managed_sanitation_services` in descending order. Do countries with a higher `Pct_managed_sanitation_services` also have a higher GDP?

## Solutions

### Task 1

In [19]:
%%sql

SELECT 
	Country_name,
	Time_period,
	Pct_managed_drinking_water_services,
	Pct_managed_sanitation_services,
	Est_population_in_millions,
	Est_gdp_in_billions
FROM 
	united_nations.Access_to_Basic_Services
WHERE
	region = 'Sub-Saharan Africa'
AND 
    Time_period = 2020;

 * mysql+pymysql://root:***@localhost:3306/united_nations
47 rows affected.


Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions
Burundi,2020,70.33,44.33,12.220227,2.65
Djibouti,2020,69.0,56.0,1.090156,3.18
Ethiopia,2020,58.0,11.67,117.190911,107.66
Kenya,2020,67.0,33.67,51.98578,100.67
Madagascar,2020,56.33,13.0,28.225177,13.05
Malawi,2020,74.33,28.67,19.377061,12.18
Mauritius,2020,100.0,96.0,1.26574,11.4
Mayotte,2020,96.0,100.0,,
Mozambique,2020,66.67,40.33,31.178239,14.03
Rwanda,2020,66.33,64.0,13.146362,10.18


### Task 2

In [48]:
%%sql

SELECT 
	Country_name,
	Time_period,
	Pct_managed_drinking_water_services,
	Pct_managed_sanitation_services,
	Est_gdp_in_billions,
    region
FROM 
	united_nations.Access_to_Basic_Services
WHERE
	region = 'Sub-Saharan Africa'
AND 
	Time_period = 2020
AND 
	Est_gdp_in_billions IS NULL;

# Use LIMIT if you think the results set will be large



 * mysql+pymysql://root:***@localhost:3306/united_nations
9 rows affected.
0 rows affected.


[]

### Task 3

In [49]:
%%sql

SELECT 
	Country_name,
	Time_period,
	Pct_managed_drinking_water_services,
	Pct_managed_sanitation_services,
	Est_gdp_in_billions,
    	region
FROM 
	united_nations.Access_to_Basic_Services
WHERE
	region = 'Sub-Saharan Africa'
AND 
	Time_period = 2020
AND 
	Est_gdp_in_billions IS NOT NULL;

# Use LIMIT if you think the results set will be large
# Use ORDER BY Est_gdp_in_billions to order your results

 * mysql+pymysql://root:***@localhost:3306/united_nations
38 rows affected.
0 rows affected.


[]

### Task 4

Arrange the data in descending order. Comparing Nigeria to South Africa.  We see that Nigeria has a lower percentage in the availability of both water and sanitation services. 


In [55]:
%%sql

SELECT 
	Country_name,
	Time_period,
	Pct_managed_drinking_water_services,
	Pct_managed_sanitation_services,
	Est_population_in_millions,
	Est_gdp_in_billions
FROM 
	united_nations.Access_to_Basic_Services

WHERE
	Region = 'Sub-Saharan Africa'
AND 
Time_period = 2020

AND 
	Est_gdp_in_billions IS NOT NULL

AND 
	Country_name  NOT IN ('Nigeria','South Africa','Ethiopia','Kenya','Ghana');


# Use LIMIT if you think the results set will be large
# Use ORDER BY Est_gdp_in_billions to order your results


 * mysql+pymysql://root:***@localhost:3306/united_nations
33 rows affected.
0 rows affected.


[]

### Task 5

Looking at the first entry without ordering any data it is interesting to note that Burundi, with a GDP of only 2.65 Billion has a similar percentage of managed services compared to Nigeria seen in our previous  query(77% and 42%). Nigeria had a percentage of 77 for managed water and 42 for managed sanitation. Arranging the percentage managed drinking water services in descending order.Mauritius with a GDP lower than that of Botswana has better drinking water and sanitation services. 

In [54]:
%%sql

SELECT 
	Country_name,
	Time_period,
	Pct_managed_drinking_water_services,
	Pct_managed_sanitation_services,
	Est_population_in_millions,
	Est_gdp_in_billions
FROM 
	united_nations.Access_to_Basic_Services

WHERE
	Region = 'Sub-Saharan Africa'
AND 
Time_period = 2020

AND 
	Est_gdp_in_billions IS NOT NULL

AND 
	Country_name in ('Nigeria','South Africa','Ethiopia','Kenya','Ghana');

# Use LIMIT if you think the results set will be large
# Use ORDER BY Est_gdp_in_billions to order your results by GDP
# Use ORDER BY Pct_managed_drinking_water_services to order percentage managed drinking water services
# Use ORDER BY Pct_managed_sanitation_services to order your results by percentage managed sanitation services


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.
0 rows affected.


[]

## Summary

In this exercise we used the IS NULL statement to determine if there were any null values in the GDP column. 
We then used the IS NOT NULL statement to exclude those nulls.
We looked at the top 5 GDP's in Sub-Saharan Africa by using the IS IN statement. 
We had a look at the rest of the Sub-Saharan African countries by excluding the top 5 GDP's using the IS NOT IN statement
Based on the quick perusal of the data in Sub-Saharan africa, we can conclude that there isn’t
any noticeable correlation between GDP and the availability of drinking water and sanitation services. 


#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>