<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/Logo blue_dark.png"  style="width:25px" align="right";/>
</div>

# Union
© ExploreAI Academy

In this notebook, we will learn how to merge and consolidate data from multiple tables efficiently by combining query results and creating comprehensive summaries using the UNION operator.



> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Learning objectives

By the end of this train, you will:
- Understand the concept of UNION and its significance in combining multiple query results. 
- Know how to use the UNION operator to merge data from different tables with similar structures. 
- Comprehend the process of creating a summary by merging results from multiple SELECT statements.


## Overview

Suppose we want a summary of estimated unemployment rates per country for each given time period, and we expect that certain data entries of the unemployment rates might be missing.

To address this issue, we intend to replace the absent values by using the regional estimated unemployment rates available in the table below, in order to enhance the completeness of our summary.

|Region|Pct_regional_unemployment|
|---|---|
|Central and Southern Asia|     19.59    |
|Eastern and South-Eastern Asia|  22.64  |
|Europe and Northern America|     24.43   |
|Latin America and the Caribbean|  24.23 |
|Northern Africa and Western Asia| 17.84 |
|Oceania|                          4.98 |
|Sub-Saharan Africa|              33.65  |

*Table 1: Regional estimated unemployment rates.*


To put it differently, according to the flow chart's instructions, for every country belonging to the Central and Southern Asia region, if there is no recorded percentage for the unemployment rate, we should substitute it with the regional data, which is `19.59` in this specific instance.

The same procedure applies to other regions as well, where each missing unemployment rate should be replaced with its respective regional data.

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Union_flow_chart.png"  style="background-color:white;";/>
<br>
<br>
    <em>Figure 1: Union flow chart</em>
</div>


## Connecting to the MySQL database

We'll start by connecting to the `united_nations` database. To connect to the MySQL server, run the cells below.


In [None]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [None]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:password@localhost:3306/united_nations


We'll be working with two tables: `Geographic_Location`, which contains information about countries and their regions, and `Economic_Indicators`, which holds data on unemployment rates for each country over time.


Let’s fetch a list of countries belonging to the **Central and Southern Asia region**. 

**NOTE:** Each region will eventually have its own `SELECT` statement and this region is simply the starting point to illustrate the process.


In [None]:
%%sql

SELECT 
	loc.Country_name 
FROM 
	united_nations.Geographic_Location AS loc 
WHERE REGION LIKE '%Central and Southern Asia%';

## Exercise


### 1. Join tables

Obtain `Time_period` and `Pct_unemployment` columns for each country by combining the `Geographic_Location` and `Economic_Indicators` tables based on the `Country_name`.

In [None]:
%%sql
# Add your code here

### 2. Impute `NULL` values

To enhance the completeness of our summary, we'll address missing unemployment rate values. If any unemployment rate is absent, we'll replace it with the regional data, which is 19.59 in this case. We will then save this column as `Pct_unemployment_imputed`.

In [None]:
%%sql
# Add your code here

### 3. Repeat for other regions

We can now repeat this process for various regions. We will use the `UNION` operator to combine the `SELECT` statements.

In [None]:
%%sql
# Add your code here

## Solutions

### 1. Join tables

In [None]:
%%sql

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	eco.Pct_unemployment 
FROM 
	united_nations.Geographic_Location AS loc 
LEFT JOIN 
	united_nations.Economic_Indicators AS eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Central and Southern Asia%';

### 2. Impute `NULL` values

In [None]:
%%sql

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	IFNULL(eco.Pct_unemployment, 19.59) AS Pct_unemployment_imputed 
FROM 
	united_nations.Geographic_Location AS loc 
LEFT JOIN 
	united_nations.Economic_Indicators AS eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Central and Southern Asia%';

### 3. Repeat for other regions

In [None]:
%%sql

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	IFNULL(eco.Pct_unemployment, 19.59) as Pct_unemployment_imputed 
FROM 
	united_nations.Geographic_Location as loc 
LEFT JOIN 
	united_nations.Economic_Indicators as eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Central and Southern Asia%' 

UNION 

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	IFNULL(eco.Pct_unemployment, 22.64) as Pct_unemployment_imputed 
FROM 
	united_nations.Geographic_Location as loc 
LEFT JOIN 
	united_nations.Economic_Indicators as eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Eastern and South-Eastern Asia%' 

UNION 

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	IFNULL(eco.Pct_unemployment, 24.43) as Pct_unemployment_imputed 
FROM united_nations.Geographic_Location as loc 
LEFT JOIN 
	united_nations.Economic_Indicators as eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Europe and Northern America%' 

UNION 

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	IFNULL(eco.Pct_unemployment, 24.23) as Pct_unemployment_imputed 
FROM united_nations.Geographic_Location as loc 
LEFT JOIN 
	united_nations.Economic_Indicators as eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Latin America and the Caribbean%' 

UNION 

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	IFNULL(eco.Pct_unemployment, 17.84) as Pct_unemployment_imputed 
FROM 
	united_nations.Geographic_Location as loc 
LEFT JOIN 
	united_nations.Economic_Indicators as eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Northern Africa and Western Asia%' 

UNION 

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	IFNULL(eco.Pct_unemployment, 4.98) as Pct_unemployment_imputed 
FROM 
	united_nations.Geographic_Location as loc 
LEFT JOIN 
	united_nations.Economic_Indicators as eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Oceania%' 

UNION 

SELECT 
	loc.Country_name, 
	eco.Time_period, 
	IFNULL(eco.Pct_unemployment, 33.65) as Pct_unemployment_imputed 
FROM 
	united_nations.Geographic_Location as loc 
LEFT JOIN united_nations.Economic_Indicators as eco 
	ON eco.Country_name = loc.Country_name 
WHERE REGION LIKE '%Sub-Saharan Africa%';

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/ExploreAI_logos/EAI_Blue_Dark.png"  style="width:200px";/>
</div>