## **Setting Up**

In [1]:
%%capture
!pip install ipython-sql 
!pip install pymysql 
!pip install ipython-sql==0.4.1 
!pip install prettytable==0.7.2 
!pip install SQLAlchemy==1.4.49 
!pip install cryptography 
!pip install PyMySQL[rsa]

In [2]:
#prepare SQL environment
%load_ext sql

In [3]:
import urllib.parse

USERNAME = "root"
PASSWORD = "password-y/@N"
HOST = "localhost"
PORT = 3306
DATABASE = "united_nations"

connection_string = f"mysql+pymysql://{USERNAME}:{urllib.parse.quote_plus(PASSWORD)}@{HOST}:{PORT}/{DATABASE}"
%sql $connection_string

In [4]:
%%sql

SELECT * 
FROM 
    united_nations.access_to_basic_services 
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


In [5]:
%%sql

SHOW TABLES;

 * mysql+pymysql://root:***@localhost:3306/united_nations
2 rows affected.


Tables_in_united_nations
access_to_basic_services
country_list


In [6]:
%%sql
/*SHOW COLUMNS FROM access_to_basic_services;*/
DESCRIBE access_to_basic_services; 

 * mysql+pymysql://root:***@localhost:3306/united_nations
10 rows affected.


Field,Type,Null,Key,Default,Extra
Region,varchar(32),YES,,,
Sub_region,varchar(25),YES,,,
Country_name,varchar(37),NO,,,
Time_period,int,NO,,,
Pct_managed_drinking_water_services,"decimal(5,2)",YES,,,
Pct_managed_sanitation_services,"decimal(5,2)",YES,,,
Est_population_in_millions,"decimal(11,6)",YES,,,
Est_gdp_in_billions,"decimal(8,2)",YES,,,
Land_area,"decimal(10,2)",YES,,,
Pct_unemployment,"decimal(5,2)",YES,,,


## **8.5 Initial data analysis with numeric functions** 

### **1. What is the total number of entries in the dataset?**
Count the number of entries in the dataset using the COUNT function. Return the result with an alias.

In [7]:
%%sql

SELECT 
    COUNT(*) AS Number_of_observations
FROM 
    united_nations.access_to_basic_services;

 * mysql+pymysql://root:***@localhost:3306/united_nations
1 rows affected.


Number_of_observations
1048


### **2. What are the earliest and latest years for which we have data?**
Determine the earliest and latest years by calculating the minimum and maximum of the Time_period column using the MIN and MAX functions respectively. Use aliases to name the results.

In [8]:
%%sql

SELECT 
    MIN(Time_period) AS Min_time_period,
    MAX(Time_period) AS Max_time_period
FROM 
	access_to_basic_services

 * mysql+pymysql://root:***@localhost:3306/united_nations
1 rows affected.


Min_time_period,Max_time_period
2015,2020


### **3. How many countries are included in this dataset?**
Count the number of countries in the Country_name column. Note, if we only use the COUNT function without an additional keyword, SQL will return the total number of entries in the column, including duplicates. Use the DISTINCT keyword to only return unique country names. Return the result with an alias.

In [9]:
%%sql

SELECT 
    COUNT(DISTINCT Country_name) AS Number_of_countries
FROM 
	access_to_basic_services

 * mysql+pymysql://root:***@localhost:3306/united_nations
1 rows affected.


Number_of_countries
182


### **4. What is the average percentage of people who have access to managed drinking water services across all years and all countries included in our dataset?**
Use the AVG function to calculate the average of the Pct_managed_drinking_water_services column. Use an alias.

In [10]:
%%sql

SELECT 
    AVG(Pct_managed_drinking_water_services) AS Avg_managed_drinking_water_services
FROM 
	access_to_basic_services

 * mysql+pymysql://root:***@localhost:3306/united_nations
1 rows affected.


Avg_managed_drinking_water_services
87.189103


## **8.7 Transform Columns**

### **1. What is the GDP per year for each country?**
Calculate the GDP per year for each country using the Country_name, Time_period, and Est_gdp_in_billions columns.

In [11]:
%%sql

SELECT 
    Country_name,
    Time_period,
    Est_gdp_in_billions
FROM 
	access_to_basic_services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Time_period,Est_gdp_in_billions
Kazakhstan,2015,184.39
Kazakhstan,2016,137.28
Kazakhstan,2017,166.81
Kazakhstan,2018,179.34
Kazakhstan,2019,181.67


### **2. What are the rounded-off values of the Est_gdp_in_billions column?**
When looking at many billion-dollar figures, the decimal places can be a little distracting. We can round off the numbers in the Est_gdp_in_billions column to make them more manageable.

Using the same query executed above, round off the values in the Est_gdp_in_billions column using the ROUND function.

In [12]:
%%sql

SELECT 
    Country_name,
    Time_period,
    ROUND(Est_gdp_in_billions) AS Rounded_est_gdp_in_billions
FROM 
	access_to_basic_services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Time_period,Rounded_est_gdp_in_billions
Kazakhstan,2015,184
Kazakhstan,2016,137
Kazakhstan,2017,167
Kazakhstan,2018,179
Kazakhstan,2019,182


### **3. What is the logarithm of GDP for each country over the time period?**
In order to calculate the growth rate of GDP over the time period, we can use the logarithm of GDP. This is because using the logarithm allows for easier comparison and analysis of growth rates with more digestible representations of larger numbers. Logarithmic transformations capture proportional changes rather than absolute changes, which is often more meaningful when analysing economic growth rates.

Calculate the logarithm of the Est_gdp_in_billions column using the LOG function.

In [13]:
%%sql

SELECT 
    Country_name,
    Time_period,
    LOG(Est_gdp_in_billions) AS Log_est_gdp_in_billions
FROM 
	access_to_basic_services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Time_period,Log_est_gdp_in_billions
Kazakhstan,2015,5.217053079717073
Kazakhstan,2016,4.922022635739652
Kazakhstan,2017,5.116855440165964
Kazakhstan,2018,5.189283445523902
Kazakhstan,2019,5.202191854450653


### **4. What is the square root of GDP for each country over the time period?**
Alternatively, to get a similar effect, we could use the SQRT function to calculate the square roots of the values in the Est_gdp_in_billions column in order to get a smaller representation of these values.

Calculate the square root of the Est_gdp_in_billions column using the SQRT function.

In [14]:
%%sql

SELECT 
    Country_name,
    Time_period,
    SQRT(Est_gdp_in_billions) AS Sqrt_est_gdp_in_billions
FROM 
	access_to_basic_services
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Time_period,Sqrt_est_gdp_in_billions
Kazakhstan,2015,13.579027947537334
Kazakhstan,2016,11.716654812701448
Kazakhstan,2017,12.915494570476191
Kazakhstan,2018,13.391788528796294
Kazakhstan,2019,13.478501400378308


### **5. Combine all into one query**

In [15]:
%%sql

SELECT 
    Country_name,
    Time_period,
    Est_gdp_in_billions,
    ROUND(Est_gdp_in_billions) AS Rounded_est_gdp_in_billions,
    LOG(Est_gdp_in_billions) AS Log_est_gdp_in_billions,
    SQRT(Est_gdp_in_billions) AS Sqrt_est_gdp_in_billions
FROM 
	access_to_basic_services
LIMIT 5;


 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Time_period,Est_gdp_in_billions,Rounded_est_gdp_in_billions,Log_est_gdp_in_billions,Sqrt_est_gdp_in_billions
Kazakhstan,2015,184.39,184,5.217053079717073,13.579027947537334
Kazakhstan,2016,137.28,137,4.922022635739652,11.716654812701448
Kazakhstan,2017,166.81,167,5.116855440165964,12.915494570476191
Kazakhstan,2018,179.34,179,5.189283445523902,13.391788528796294
Kazakhstan,2019,181.67,182,5.202191854450653,13.478501400378308


## **8.10 Create a summary statistic report in SQL**

### **1. What is the minimum, maximum, and average percentage of people that have access to managed drinking water services per region and sub_region?**

In [18]:
%%sql

SELECT 
    MIN(Pct_managed_drinking_water_services) AS Min_Pct_managed_drinking_water_services,
    MAX(Pct_managed_drinking_water_services) AS Max_Pct_managed_drinking_water_services,
    AVG(Pct_managed_drinking_water_services) AS Avg_Pct_managed_drinking_water_services
FROM 
	access_to_basic_services
GROUP BY Region, Sub_region;

 * mysql+pymysql://root:***@localhost:3306/united_nations
18 rows affected.


Min_Pct_managed_drinking_water_services,Max_Pct_managed_drinking_water_services,Avg_Pct_managed_drinking_water_services
80.33,100.0,93.144667
67.0,99.67,91.894074
75.67,100.0,92.699667
73.33,100.0,90.626061
91.0,100.0,97.911333
64.0,100.0,96.005
79.0,100.0,93.798125
86.0,100.0,94.880952
61.33,100.0,88.906111
59.0,100.0,95.031204


### **2. What is the total number of countries within each region and sub_region?**

In [22]:
%%sql

SELECT 
    Region,
    Sub_region,
    COUNT(DISTINCT(Country_name)) AS Number_of_countries
FROM 
	access_to_basic_services
GROUP BY Region, Sub_region;

 * mysql+pymysql://root:***@localhost:3306/united_nations
18 rows affected.


Region,Sub_region,Number_of_countries
Central and Southern Asia,Central Asia,5
Central and Southern Asia,Southern Asia,9
Eastern and South-Eastern Asia,Eastern Asia,5
Eastern and South-Eastern Asia,South-Eastern Asia,11
Europe and Northern America,Northern America,5
Latin America and the Caribbean,Caribbean,27
Latin America and the Caribbean,Central America,8
Latin America and the Caribbean,South America,14
Northern Africa and Western Asia,Northern Africa,6
Northern Africa and Western Asia,Western Asia,18


### **3. What is the total GDP for each region and sub_region?**

In [23]:
%%sql

SELECT 
    Region,
    Sub_region,
    SUM(Est_gdp_in_billions) AS Est_total_gdp_in_billions
FROM 
	access_to_basic_services
GROUP BY Region, Sub_region;

 * mysql+pymysql://root:***@localhost:3306/united_nations
18 rows affected.


Region,Sub_region,Est_total_gdp_in_billions
Central and Southern Asia,Central Asia,1670.32
Central and Southern Asia,Southern Asia,19824.66
Eastern and South-Eastern Asia,Eastern Asia,107123.37
Eastern and South-Eastern Asia,South-Eastern Asia,15563.18
Europe and Northern America,Northern America,9905.96
Latin America and the Caribbean,Caribbean,2070.17
Latin America and the Caribbean,Central America,8524.66
Latin America and the Caribbean,South America,19959.58
Northern Africa and Western Asia,Northern Africa,2736.8
Northern Africa and Western Asia,Western Asia,13605.83


In [24]:
%%sql

SELECT Region,
    Sub_region,
    MIN(Pct_managed_drinking_water_services) AS min_Pct_managed_drinking_water_services,
    MAX(Pct_managed_drinking_water_services) AS max_Pct_managed_drinking_water_services,
    AVG(Pct_managed_drinking_water_services) AS avg_Pct_managed_drinking_water_services,
    COUNT(DISTINCT(Country_name)) AS Number_of_countries,
    SUM(EST_gdp_in_billions) AS EST_total_gdp_in_billions
FROM united_nations.Access_to_Basic_Services
GROUP BY Region, Sub_region;

 * mysql+pymysql://root:***@localhost:3306/united_nations
18 rows affected.


Region,Sub_region,min_Pct_managed_drinking_water_services,max_Pct_managed_drinking_water_services,avg_Pct_managed_drinking_water_services,Number_of_countries,EST_total_gdp_in_billions
Central and Southern Asia,Central Asia,80.33,100.0,93.144667,5,1670.32
Central and Southern Asia,Southern Asia,67.0,99.67,91.894074,9,19824.66
Eastern and South-Eastern Asia,Eastern Asia,75.67,100.0,92.699667,5,107123.37
Eastern and South-Eastern Asia,South-Eastern Asia,73.33,100.0,90.626061,11,15563.18
Europe and Northern America,Northern America,91.0,100.0,97.911333,5,9905.96
Latin America and the Caribbean,Caribbean,64.0,100.0,96.005,27,2070.17
Latin America and the Caribbean,Central America,79.0,100.0,93.798125,8,8524.66
Latin America and the Caribbean,South America,86.0,100.0,94.880952,14,19959.58
Northern Africa and Western Asia,Northern Africa,61.33,100.0,88.906111,6,2736.8
Northern Africa and Western Asia,Western Asia,59.0,100.0,95.031204,18,13605.83


## **8.12 Filtering and analysing a summary statistic report**

### **1. Filter for the year 2020.**

In [26]:
%%sql

SELECT Region,
    Sub_region,
    MIN(Pct_managed_drinking_water_services) AS Min_Pct_managed_drinking_water_services,
    MAX(Pct_managed_drinking_water_services) AS Max_Pct_managed_drinking_water_services,
    AVG(Pct_managed_drinking_water_services) AS Avg_Pct_managed_drinking_water_services,
    COUNT(DISTINCT(Country_name)) AS Number_of_countries,
    SUM(EST_gdp_in_billions) AS EST_total_gdp_in_billions
FROM Access_to_Basic_Services
WHERE Time_period = 2020
GROUP BY Region, Sub_region
ORDER BY EST_total_gdp_in_billions ASC;

 * mysql+pymysql://root:***@localhost:3306/united_nations
18 rows affected.


Region,Sub_region,Min_Pct_managed_drinking_water_services,Max_Pct_managed_drinking_water_services,Avg_Pct_managed_drinking_water_services,Number_of_countries,EST_total_gdp_in_billions
Oceania,Micronesia,77.0,100.0,94.5,6,6.67
Oceania,Polynesia,92.0,100.0,98.555556,9,7.84
Oceania,Melanesia,56.67,99.0,82.934,5,40.21
Sub-Saharan Africa,Middle Africa,38.33,77.33,59.3325,8,123.22
Central and Southern Asia,Central Asia,85.0,100.0,94.134,5,239.1
Latin America and the Caribbean,Caribbean,65.0,100.0,95.910667,15,343.26
Sub-Saharan Africa,Eastern Africa,48.33,100.0,70.018824,17,359.1
Sub-Saharan Africa,Southern Africa,76.33,92.0,83.668,5,369.34
Northern Africa and Western Asia,Northern Africa,62.33,100.0,90.053333,6,386.29
Sub-Saharan Africa,Western Africa,53.33,99.0,73.607059,17,631.91


### **2. Focus on countries where the percentage of managed drinking water services is below 60%**

In [27]:
%%sql

SELECT Region,
    Sub_region,
    MIN(Pct_managed_drinking_water_services) AS Min_Pct_managed_drinking_water_services,
    MAX(Pct_managed_drinking_water_services) AS Max_Pct_managed_drinking_water_services,
    AVG(Pct_managed_drinking_water_services) AS Avg_Pct_managed_drinking_water_services,
    COUNT(DISTINCT(Country_name)) AS Number_of_countries,
    SUM(EST_gdp_in_billions) AS EST_total_gdp_in_billions
FROM Access_to_Basic_Services
WHERE Time_period = 2020
AND Pct_managed_drinking_water_services < 60
GROUP BY Region, Sub_region
ORDER BY EST_total_gdp_in_billions ASC;

 * mysql+pymysql://root:***@localhost:3306/united_nations
4 rows affected.


Region,Sub_region,Min_Pct_managed_drinking_water_services,Max_Pct_managed_drinking_water_services,Avg_Pct_managed_drinking_water_services,Number_of_countries,EST_total_gdp_in_billions
Oceania,Melanesia,56.67,56.67,56.67,1,23.85
Sub-Saharan Africa,Western Africa,53.33,57.33,55.33,2,31.67
Sub-Saharan Africa,Middle Africa,38.33,52.67,47.75,4,66.67
Sub-Saharan Africa,Eastern Africa,48.33,58.0,54.9975,4,127.59


### **3. Filter for the regions and sub-regions that have fewer than four countries.**

In [29]:
%%sql

SELECT Region,
    Sub_region,
    MIN(Pct_managed_drinking_water_services) AS Min_Pct_managed_drinking_water_services,
    MAX(Pct_managed_drinking_water_services) AS Max_Pct_managed_drinking_water_services,
    AVG(Pct_managed_drinking_water_services) AS Avg_Pct_managed_drinking_water_services,
    COUNT(DISTINCT(Country_name)) AS Number_of_countries,
    SUM(EST_gdp_in_billions) AS EST_total_gdp_in_billions
FROM Access_to_Basic_Services
WHERE Time_period = 2020
AND Pct_managed_drinking_water_services < 60
GROUP BY Region, Sub_region
HAVING Number_of_countries < 4
ORDER BY EST_total_gdp_in_billions ASC;

 * mysql+pymysql://root:***@localhost:3306/united_nations
2 rows affected.


Region,Sub_region,Min_Pct_managed_drinking_water_services,Max_Pct_managed_drinking_water_services,Avg_Pct_managed_drinking_water_services,Number_of_countries,EST_total_gdp_in_billions
Oceania,Melanesia,56.67,56.67,56.67,1,23.85
Sub-Saharan Africa,Western Africa,53.33,57.33,55.33,2,31.67
