# Primary keys: Create Geographic_Location table
In this notebook, we demonstrate how to divide a larger dataset into smaller tables and link them using relationships.

In [1]:
!pip install pymysql
!pip install jupysql 
!pip install sqlalchemy 



# Connecting to our MySQL database
Using our Access_to_Basic_Services table created in MySQL Workbench, we are interested in creating a table that contains only the geographic location data for each country. We can apply the same queries in MySQL Workbench and in this notebook if we connect to our MySQL server. Since we have a MySQL database, we can connect to it using mysql and pymysql.



In [2]:
%load_ext sql 

In [5]:

# Load JupySQL extension
%load_ext sql

# Create SQLAlchemy engine
from sqlalchemy import create_engine

engine = create_engine("mysql+pymysql://root:Olisebinum@localhost:3306/united_nations")

# Connect JupySQL to the engine
%sql engine

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [6]:
%%sql

SELECT
    *
FROM
    Access_to_Basic_Services
LIMIT 5;

Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


## Exercise
We want to do the following:

Create a table named Geographic_Location.
Extract the relevant columns from the Access_to_Basic_Services table into the Geographic_Location table.

## 1. create a table named Geographic_Location.
Create a table named Geographic_Location with the columns Country_name, Sub_region, Region, and Land_area, with Country_name being the primary key.

In [None]:
%%sql

CREATE TABLE united_nations.Geographic_Location( 
    Country_name VARCHAR(37) PRIMARY KEY NOT NULL 
    Sub_region VARCHAR(25)
    Region VARCHAR(32)
    Land_area DECIMAL (10,2)
    ) ;

    

In [10]:
%%sql 

SHOW TABLES ; 

Tables_in_united_nations
Access_to_Basic_Services
Geographic_Location


## Exercise
We want to do the following:

Create a table that contains data about access to basic services for each country and year.
Create a table that contains the economic indicators for each country and each year.

In [11]:
%%sql

CREATE TABLE united_nations.Basic_Services (
  Country_name VARCHAR(37),
  Time_period INTEGER,
  Pct_managed_drinking_water_services NUMERIC(5,2),
  Pct_managed_sanitation_services NUMERIC(5,2),
  PRIMARY KEY (Country_name, Time_period),
  FOREIGN KEY (Country_name) REFERENCES Geographic_Location (Country_name)
);

In [13]:
%%sql 

INSERT INTO united_nations.Basic_services(Country_name,
    Time_period, Pct_managed_drinking_water_services,
    Pct_managed_sanitation_services)
SELECT 
    Country_name,
    Time_period,
    Pct_managed_drinking_water_services,
    Pct_managed_sanitation_services

FROM 
    united_nations.Access_to_Basic_Services; 

### Checking our Newly Created Table 


In [14]:
%%sql 

SELECT 
    *
FROM
    united_nations.Basic_Services;

Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services
Afghanistan,2015,67.0,45.67
Afghanistan,2016,69.67,47.0
Afghanistan,2017,72.33,49.33
Afghanistan,2018,75.33,50.67
Afghanistan,2019,78.0,52.33
Afghanistan,2020,80.33,54.0
Algeria,2015,92.0,85.0
Algeria,2016,93.0,85.33
Algeria,2017,93.0,84.67
Algeria,2018,93.0,84.67


## 2. Create a table that contains the economic indicators.
Create a table named Economic_indicators with the columns Country_name, Time_period, Est_gdp_in_billions, Est_population_in_millions, and Pct_unemployment, with Country_name and Time_period being the composite keys and Country_name being the foreign key.

In [21]:
%%sql 


CREATE TABLE united_nations.Economic_Indicators (
  Country_name VARCHAR(37),
  Time_period INTEGER,
  Est_gdp_in_billions NUMERIC(8,2),
  Est_population_in_millions NUMERIC(11,6),
  Pct_unemployment NUMERIC(5,2),
  PRIMARY KEY (Country_name, Time_period),
  FOREIGN KEY (Country_name) REFERENCES Geographic_Location (Country_name)
);



## 2. Create a table that contains the economic indicators.
Create a table named Economic_indicators with the columns Country_name, Time_period, Est_gdp_in_billions, Est_population_in_millions, and Pct_unemployment, with Country_name and Time_period being the composite keys and Country_name being the foreign ke

In [22]:
%%sql

INSERT INTO united_nations.Economic_Indicators(
    Country_name,  
    Time_period,
    Est_gdp_in_billions,
    Est_population_in_millions,
    Pct_unemployment
    )
SELECT 
    Country_name,  
    Time_period,
    Est_gdp_in_billions,
    Est_population_in_millions,
    Pct_unemployment
FROM 
    united_nations.Access_to_Basic_Services; 

## Checking our Newly Created Table

In [23]:
%%sql

SELECT 
    * 
FROM
    Economic_Indicators; 

Country_name,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment
Afghanistan,2015,20.0,33.753499,
Afghanistan,2016,18.02,34.636207,
Afghanistan,2017,18.9,35.643418,11.18
Afghanistan,2018,18.42,36.686784,
Afghanistan,2019,18.9,37.769499,
Afghanistan,2020,20.14,38.97223,11.71
Algeria,2015,165.98,39.543154,11.21
Algeria,2016,160.03,40.339329,10.2
Algeria,2017,170.1,41.136546,12.0
Algeria,2018,174.91,41.927007,


## Exercise
Suppose we no longer have use for our newly created tables (Basic_Services, Geographic_Location, and Economic_Indicators) and want to drop them from our database.

#  1. Drop Geographic location, Basic_Services, and Economic_Indicators tables.
In the following three cells, write the queries that will drop all of these tables from our dataset in the correct order.

###  Since the Geographic_Location table's primary key is also a foreign key in the other two tables, the foreign key restriction prevents us from deleting it. 
This makes sense since, if the table is deleted, the Economic_Indicators and Basic_Services tables will no longer have a reference from which to obtain the country names for their respective tables. Furthermore, a foreign key cannot exist if it doesn't also exist as a primary key in another table.

Therefore, we would need to drop the tables that reference the Geographic_Location table first or remove the foreign key constraints in order to delete it.

Since we are deleting all of the tables anyway, we drop the tables that reference the Geographic_Location table instead.

We start by dropping the Economic_Indicators table, then drop the Basic_Services table which can be dropped in any order, and finally, we can drop the Geographic_Location table.

The modifications we performed above permanently removed the Economic_Indicators, Basic_Services, and Geographic_Location tables from our database. We would need to re-add those tables to the united_nations database because we will require them in future lessons. Running the code in the cell below will re-add the tables for us.



## Overview
Entity-Relationship diagrams play a valuable role in determining the table relationships and join strategies within a database. They provide the means to make informed decisions about which tables to join and the appropriate method for doing so.

Let’s recall our united_nations ERD which has three entities: Geographic_Location, Basic_Services, and Economic_Indicators

In [25]:
%%sql

SELECT * 
FROM
    united_nations.Geographic_Location;

Country_name,Sub_region,Region,Land_area
Afghanistan,Southern Asia,Central and Southern Asia,652230.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0
American Samoa,Polynesia,Oceania,200.0
Angola,Middle Africa,Sub-Saharan Africa,1246700.0
Anguilla,Caribbean,Latin America and the Caribbean,
Antigua and Barbuda,Caribbean,Latin America and the Caribbean,440.0
Argentina,South America,Latin America and the Caribbean,2736690.0
Armenia,Western Asia,Northern Africa and Western Asia,28470.0
Aruba,Caribbean,Latin America and the Caribbean,180.0
Australia,Australia and New Zealand,Oceania,7690400.0


## 1. First LEFT JOIN
Combine the Geographic_Location table with the Economic_Indicators table based on the Country_name column.

In [26]:
%%sql

SELECT * 
FROM
    united_nations.Geographic_Location as geo
LEFT JOIN 
    united_nations.Economic_Indicators as econ
    ON geo.Country_name = econ.Country_name; 


Country_name,Sub_region,Region,Land_area,Country_name_1,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,33.753499,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.02,34.636207,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2017,18.9,35.643418,11.18
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2018,18.42,36.686784,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2019,18.9,37.769499,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2020,20.14,38.97223,11.71
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2015,165.98,39.543154,11.21
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2016,160.03,40.339329,10.2
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2017,170.1,41.136546,12.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2018,174.91,41.927007,


## 2. Second LEFT JOIN
Combine the previously joined tables with the Basic_Services table, again based on the Country_name column.



In [27]:
%%sql

SELECT * 
FROM
    united_nations.Geographic_Location as geo
LEFT JOIN 
    united_nations.Economic_Indicators as econ
    ON geo.Country_name = econ.Country_name
LEFT JOIN
     united_nations.Basic_Services as esv
    ON geo.Country_name = esv.Country_name ; 

Country_name,Sub_region,Region,Land_area,Country_name_1,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment,Country_name_2,Time_period_1,Pct_managed_drinking_water_services,Pct_managed_sanitation_services
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,33.753499,,Afghanistan,2015,67.0,45.67
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,33.753499,,Afghanistan,2016,69.67,47.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,33.753499,,Afghanistan,2017,72.33,49.33
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,33.753499,,Afghanistan,2018,75.33,50.67
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,33.753499,,Afghanistan,2019,78.0,52.33
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,33.753499,,Afghanistan,2020,80.33,54.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.02,34.636207,,Afghanistan,2015,67.0,45.67
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.02,34.636207,,Afghanistan,2016,69.67,47.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.02,34.636207,,Afghanistan,2017,72.33,49.33
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.02,34.636207,,Afghanistan,2018,75.33,50.67


## 3. Refine the second LEFT JOIN
At first glance, the results of the above query might seem fine, but take a closer look at the Time_periods. We can see that they don't align as we would expect.

Refine the second LEFT JOIN query by adding an additional condition based on the Time_period column.

In [28]:
%%sql

SELECT * 
FROM
    united_nations.Geographic_Location as geo
LEFT JOIN 
    united_nations.Economic_Indicators as econ
    ON geo.Country_name = econ.Country_name
LEFT JOIN
     united_nations.Basic_Services as esv
    ON geo.Country_name = esv.Country_name 
    AND econ.Time_period = esv.Time_period; 

Country_name,Sub_region,Region,Land_area,Country_name_1,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment,Country_name_2,Time_period_1,Pct_managed_drinking_water_services,Pct_managed_sanitation_services
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,33.753499,,Afghanistan,2015,67.0,45.67
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.02,34.636207,,Afghanistan,2016,69.67,47.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2017,18.9,35.643418,11.18,Afghanistan,2017,72.33,49.33
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2018,18.42,36.686784,,Afghanistan,2018,75.33,50.67
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2019,18.9,37.769499,,Afghanistan,2019,78.0,52.33
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2020,20.14,38.97223,11.71,Afghanistan,2020,80.33,54.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2015,165.98,39.543154,11.21,Algeria,2015,92.0,85.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2016,160.03,40.339329,10.2,Algeria,2016,93.0,85.33
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2017,170.1,41.136546,12.0,Algeria,2017,93.0,84.67
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2018,174.91,41.927007,,Algeria,2018,93.0,84.67


## Union
In this notebook, we will learn how to merge and consolidate data from multiple tables efficiently by combining query results and creating comprehensive summaries using the UNION operator.

In [30]:
%%sql

SELECT
	loc.Country_name
FROM
	united_nations.Geographic_Location AS loc
WHERE REGION LIKE '%Central and Southern Asia%';

Country_name
Afghanistan
Bangladesh
Bhutan
India
Iran (Islamic Republic of)
Kazakhstan
Kyrgyzstan
Maldives
Nepal
Pakistan


# Exercise


## 1. Join tables
Obtain Time_period and Pct_unemployment columns for each country by combining the Geographic_Location and Economic_Indicators tables based on the Country_name.



In [37]:
%%sql

SELECT 
    loc.Country_name,
    eco.Time_period,
    eco.Pct_unemployment 
FROM 
    united_nations.Geographic_Location as loc
LEFT JOIN
    united_nations.Economic_Indicators as eco
ON 
    loc.Country_name = eco.Country_name
WHERE  
    REGION LIKE "%Central and Southern Asia%" ; 
    

Country_name,Time_period,Pct_unemployment
Afghanistan,2015,
Afghanistan,2016,
Afghanistan,2017,11.18
Afghanistan,2018,
Afghanistan,2019,
Afghanistan,2020,11.71
Bangladesh,2015,
Bangladesh,2016,4.35
Bangladesh,2017,4.37
Bangladesh,2018,


## 2. Impute NULL values
To enhance the completeness of our summary, we'll address missing unemployment rate values. If any unemployment rate is absent, we'll replace it with the regional data, which is 19.59 in this case. We will then save this column as Pct_unemployment_imputed.

In [39]:
%%sql

SELECT 
    loc.Country_name,
    eco.Time_period,
    IFNULL(eco.Pct_unemployment, 19.59) AS Pct_unemployment_inputed
FROM 
    united_nations.Geographic_Location as loc
LEFT JOIN
    united_nations.Economic_Indicators as eco
ON 
    loc.Country_name = eco.Country_name
WHERE  
    REGION LIKE "%Central and Southern Asia%" ; 

Country_name,Time_period,Pct_unemployment_inputed
Afghanistan,2015,19.59
Afghanistan,2016,19.59
Afghanistan,2017,11.18
Afghanistan,2018,19.59
Afghanistan,2019,19.59
Afghanistan,2020,11.71
Bangladesh,2015,19.59
Bangladesh,2016,4.35
Bangladesh,2017,4.37
Bangladesh,2018,19.59


## 3. Repeat for other regions
We can now repeat this process for various regions. We will use the UNION operator to combine the SELECT statements.

In [40]:
%%sql

SELECT
	loc.Country_name,
	eco.Time_period,
	IFNULL(eco.Pct_unemployment, 19.59) as Pct_unemployment_imputed
FROM
	united_nations.Geographic_Location as loc
LEFT JOIN
	united_nations.Economic_Indicators as eco
	ON eco.Country_name = loc.Country_name
WHERE REGION LIKE '%Central and Southern Asia%'

UNION

SELECT
	loc.Country_name,
	eco.Time_period,
	IFNULL(eco.Pct_unemployment, 22.64) as Pct_unemployment_imputed
FROM
	united_nations.Geographic_Location as loc
LEFT JOIN
	united_nations.Economic_Indicators as eco
	ON eco.Country_name = loc.Country_name
WHERE REGION LIKE '%Eastern and South-Eastern Asia%'

UNION

SELECT
	loc.Country_name,
	eco.Time_period,
	IFNULL(eco.Pct_unemployment, 24.43) as Pct_unemployment_imputed
FROM united_nations.Geographic_Location as loc
LEFT JOIN
	united_nations.Economic_Indicators as eco
	ON eco.Country_name = loc.Country_name
WHERE REGION LIKE '%Europe and Northern America%'

UNION

SELECT
	loc.Country_name,
	eco.Time_period,
	IFNULL(eco.Pct_unemployment, 24.23) as Pct_unemployment_imputed
FROM united_nations.Geographic_Location as loc
LEFT JOIN
	united_nations.Economic_Indicators as eco
	ON eco.Country_name = loc.Country_name
WHERE REGION LIKE '%Latin America and the Caribbean%'

UNION

SELECT
	loc.Country_name,
	eco.Time_period,
	IFNULL(eco.Pct_unemployment, 17.84) as Pct_unemployment_imputed
FROM
	united_nations.Geographic_Location as loc
LEFT JOIN
	united_nations.Economic_Indicators as eco
	ON eco.Country_name = loc.Country_name
WHERE REGION LIKE '%Northern Africa and Western Asia%'

UNION

SELECT
	loc.Country_name,
	eco.Time_period,
	IFNULL(eco.Pct_unemployment, 4.98) as Pct_unemployment_imputed
FROM
	united_nations.Geographic_Location as loc
LEFT JOIN
	united_nations.Economic_Indicators as eco
	ON eco.Country_name = loc.Country_name
WHERE REGION LIKE '%Oceania%'

UNION

SELECT
	loc.Country_name,
	eco.Time_period,
	IFNULL(eco.Pct_unemployment, 33.65) as Pct_unemployment_imputed
FROM
	united_nations.Geographic_Location as loc
LEFT JOIN united_nations.Economic_Indicators as eco
	ON eco.Country_name = loc.Country_name
WHERE REGION LIKE '%Sub-Saharan Africa%';

Country_name,Time_period,Pct_unemployment_imputed
Afghanistan,2015,19.59
Afghanistan,2016,19.59
Afghanistan,2017,11.18
Afghanistan,2018,19.59
Afghanistan,2019,19.59
Afghanistan,2020,11.71
Bangladesh,2015,19.59
Bangladesh,2016,4.35
Bangladesh,2017,4.37
Bangladesh,2018,19.59
