<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img
 src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/alx-courses/aice/assets/Content_page_banner_blue_dots.png"
 alt="ALX Content Header"
 class="full-width-image"
/>
</div>

# Recreating the Access_to_Basic_Services dataset 

In this notebook, we cover how ERDs help us understand database joins better. We also focus on the `LEFT JOIN` technique and highlight the importance of picking the right joining strategy, as incorrect joins can lead to inaccurate results.



> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Learning objectives

By the end of this train, you will:
- Understand how Entity-Relationship Diagrams can help us understand database joins better. 
- Understand the `LEFT JOIN` technique and how it is used to combine tables.
- Know the importance of picking the right joining strategy and how incorrect joins can lead to inaccurate results.


## Overview

Entity-Relationship diagrams play a valuable role in determining the table relationships and join strategies within a database. They provide the means to make informed decisions about which tables to join and the appropriate method for doing so. 


Let’s recall our united_nations ERD which has three entities: Geographic_Location, Basic_Services, and Economic_Indicators. 

<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/Northwind_ERD.png" alt= "united_nations ERD" width="60%" height="60%">

One common joining technique involves selecting a central table that serves as the core of all relationships in the database and employing a `LEFT JOIN`. 
In our case, the `Geographic_Location` table would be the central table.  

With a `LEFT JOIN`, all records from the left table are returned, along with the corresponding matching records from the right table. In cases where there is no match, the result will include NULL values on the right side.

## Connecting to our MySQL database

We'll start by connecting to the `united_nations` database. To connect to the MySQL server, run the cells below.

In [4]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

The sql extension is already loaded. To reload it, use:
  %reload_ext sql


In [5]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:WangilaP%400911@localhost:3306/united_nations

'Connected: root@united_nations'

We'll then use a simple `SELECT` query to fetch all records from the `Geographic_Location` table.

In [6]:
%%sql
SELECT 
	* 
FROM 
	united_nations.Geographic_Location as geo
LIMIT 5;

 * mysql+pymysql://root:***@localhost:3306/united_nations
5 rows affected.


Country_name,Sub_region,Region,Land_area
Afghanistan,Southern Asia,Central and Southern Asia,652230.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0
American Samoa,Polynesia,Oceania,200.0
Angola,Middle Africa,Sub-Saharan Africa,1246700.0
Anguilla,Caribbean,Latin America and the Caribbean,


## Exercise


### 1. First `LEFT JOIN`

Combine the `Geographic_Location` table with the `Economic_Indicators` table based on the `Country_name` column. 

In [7]:
%%sql
SELECT
    *
FROM Geographic_Location
LEFT JOIN Economic_Indicators
ON Geographic_Location.Country_name = Economic_Indicators.Country_name;

 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


Country_name,Sub_region,Region,Land_area,Country_name_1,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.0,35.0,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2017,19.0,36.0,11.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2018,18.0,37.0,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2019,19.0,38.0,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2020,20.0,39.0,12.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2015,166.0,40.0,11.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2016,160.0,40.0,10.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2017,170.0,41.0,12.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2018,175.0,42.0,


### 2. Second `LEFT JOIN`

Combine the previously joined tables with the `Basic_Services` table, again based on the `Country_name` column.

In [9]:
%%sql
SELECT
    *
FROM 
    Geographic_Location
LEFT JOIN 
    Economic_Indicators
ON 
    Geographic_Location.Country_name = Economic_Indicators.Country_name
LEFT JOIN
    Basic_Services
ON
    Geographic_Location.Country_name = Basic_services.Country_name; 

 * mysql+pymysql://root:***@localhost:3306/united_nations
6156 rows affected.


Country_name,Sub_region,Region,Land_area,Country_name_1,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment,Country_name_2,Time_period_1,Pct_managed_drinking_water_services,Pct_managed_sanitation_services
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,,Afghanistan,2015,67,46.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,,Afghanistan,2016,70,47.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,,Afghanistan,2017,72,49.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,,Afghanistan,2018,75,51.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,,Afghanistan,2019,78,52.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,,Afghanistan,2020,80,54.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.0,35.0,,Afghanistan,2015,67,46.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.0,35.0,,Afghanistan,2016,70,47.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.0,35.0,,Afghanistan,2017,72,49.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.0,35.0,,Afghanistan,2018,75,51.0


### 3. Refine the second `LEFT JOIN`

At first glance, the results of the above query might seem fine, but take a closer look at the `Time_periods`. We can see that they don't align as we would expect.

Refine the second `LEFT JOIN` query by adding an additional condition based on the `Time_period` column.

In [11]:
%%sql
SELECT
    *
FROM 
    Geographic_Location
LEFT JOIN 
    Economic_Indicators
ON 
    Geographic_Location.Country_name = Economic_Indicators.Country_name
LEFT JOIN
    Basic_Services
ON
    Geographic_Location.Country_name = Basic_services.Country_name
    AND
        Economic_indicators.Time_period = Basic_services.Time_period; 

 * mysql+pymysql://root:***@localhost:3306/united_nations
1048 rows affected.


Country_name,Sub_region,Region,Land_area,Country_name_1,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment,Country_name_2,Time_period_1,Pct_managed_drinking_water_services,Pct_managed_sanitation_services
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,,Afghanistan,2015,67,46.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.0,35.0,,Afghanistan,2016,70,47.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2017,19.0,36.0,11.0,Afghanistan,2017,72,49.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2018,18.0,37.0,,Afghanistan,2018,75,51.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2019,19.0,38.0,,Afghanistan,2019,78,52.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2020,20.0,39.0,12.0,Afghanistan,2020,80,54.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2015,166.0,40.0,11.0,Algeria,2015,92,85.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2016,160.0,40.0,10.0,Algeria,2016,93,85.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2017,170.0,41.0,12.0,Algeria,2017,93,85.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2018,175.0,42.0,,Algeria,2018,93,85.0


## Solutions

### 1. First `LEFT JOIN`

In [8]:
%%sql

SELECT 
	* 
FROM 
	united_nations.Geographic_Location as geo 
LEFT JOIN 
	united_nations.Economic_Indicators as econ 	
	ON geo.Country_name = econ.Country_name
LIMIT 50;

 * mysql+pymysql://root:***@localhost:3306/united_nations
50 rows affected.


Country_name,Sub_region,Region,Land_area,Country_name_1,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2015,20.0,34.0,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2016,18.0,35.0,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2017,19.0,36.0,11.0
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2018,18.0,37.0,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2019,19.0,38.0,
Afghanistan,Southern Asia,Central and Southern Asia,652230.0,Afghanistan,2020,20.0,39.0,12.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2015,166.0,40.0,11.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2016,160.0,40.0,10.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2017,170.0,41.0,12.0
Algeria,Northern Africa,Northern Africa and Western Asia,2381741.0,Algeria,2018,175.0,42.0,


With this LEFT JOIN, we will get all the records from the `Geographic_Location` table and only the matching records from the `Economic_Indicators` table. If there is no match, we will still get the data from the `Geographic_Location` table, and the columns from the `Economic_Indicators` table will be `NULL`.


### 2. Second `LEFT JOIN`

In [None]:
%%sql

SELECT 
	* 
FROM 
	united_nations.Geographic_Location as geo 
LEFT JOIN 
	united_nations.Economic_Indicators as econ 	
	ON geo.Country_name = econ.Country_name 
LEFT JOIN 
	united_nations.Basic_Services as svc 	
	ON geo.Country_name = svc.Country_name
LIMIT 20;

### 3. Refine second `LEFT JOIN`

In [None]:
%%sql

SELECT 
	* 
FROM 
	united_nations.Geographic_Location as geo 
LEFT JOIN 
	united_nations.Economic_Indicators as econ 	
	ON geo.Country_name = econ.Country_name 
LEFT JOIN 
	united_nations.Basic_Services as svc 	
	ON geo.Country_name = svc.Country_name
	AND econ.Time_period = svc.Time_period
LIMIT 20;

With the additional condition, we ensure that the `Time_periods` align correctly and we get the desired output.



## Summary

This notebook shows how Entity-Relationship Diagrams can help us understand database joins better. We specifically focused on the `LEFT JOIN` technique, which is widely used to combine tables. Additionally, we noticed the importance of picking the right joining strategy, as incorrect joins can lead to inaccurate results.

#  

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/refs/heads/master/ALX_banners/ALX_Navy.png"  style="width:100px"  ;/>
</div>