<div align="right" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img
 src="https://raw.githubusercontent.com/Explore-AI/Pictures/master/alx-courses/aice/assets/Content_page_banner_blue_dots.png"
 alt="ALX Content Header"
 class="full-width-image"
/>
</div>

# Composite and foreign keys:  Basic_services and Economic_indicators tables

In this notebook, we demonstrate how to divide a larger dataset into smaller tables and link them using relationships. 

> ⚠️ This notebook will not run on Google Colab because it cannot connect to a local database. Please make sure that this notebook is running on the same local machine as your MySQL Workbench installation and MySQL `united_nations` database.

## Learning objectives

By the end of this train, you should:
- Understand how to create composite keys that make up the primary key.
- Understand how to create a foreign key that references a primary key in another table.

## Connecting to our MySQL database

Using our `Access_to_Basic_Services` table created in MySQL Workbench, we are interested in creating tables that contain  the basic services and economic indicators data for each country and each year. We can apply the same queries in MySQL Workbench and in this notebook if we connect to our MySQL server. Since we have a MySQL database, we can connect to it using mysql and pymysql.

In [1]:
# Load and activate the SQL extension to allow us to execute SQL in a Jupyter notebook. 
# If you get an error here, make sure that mysql and pymysql are installed correctly. 

%load_ext sql

In [2]:
# Establish a connection to the local database using the '%sql' magic command.
# Replace 'password' with our connection password and `db_name` with our database name. 
# If you get an error here, please make sure the database name or password is correct.

%sql mysql+pymysql://root:12122001UPi@localhost:3306/united_nations

To make a query, we add the `%%sql` command to the start of a cell, create one open line, and then the query like below, and run the cell.

In [3]:
%%sql

SELECT 
    *
FROM
    Access_to_Basic_Services
LIMIT 5;

Region,Sub_region,Country_name,Time_period,Pct_managed_drinking_water_services,Pct_managed_sanitation_services,Est_population_in_millions,Est_gdp_in_billions,Land_area,Pct_unemployment
Central and Southern Asia,Central Asia,Kazakhstan,2015,94.67,98.0,17.542806,184.39,2699700.0,4.93
Central and Southern Asia,Central Asia,Kazakhstan,2016,94.67,98.0,17.794055,137.28,2699700.0,4.96
Central and Southern Asia,Central Asia,Kazakhstan,2017,95.0,98.0,18.037776,166.81,2699700.0,4.9
Central and Southern Asia,Central Asia,Kazakhstan,2018,95.0,98.0,18.276452,179.34,2699700.0,4.85
Central and Southern Asia,Central Asia,Kazakhstan,2019,95.0,98.0,18.513673,181.67,2699700.0,4.8


## Exercise

We want to do the following:
1. Create a table that contains data about access to basic services for each country and year.
2. Create a table that contains the economic indicators for each country and each year.

### 1. Create a table that contains data about access to basic services for each country and year only.

Create a table named `Basic_services` with the columns `Country_name`, `Time_period`, `Pct_managed_drinking_water_services`, and `Pct_managed_sanitation_services`, with `Country_name` and `Time_period` being the composite keys and `Country_name` being the foreign key.

In [6]:
%%sql
CREATE TABLE Basic_services
    (Country_name VARCHAR(45), 
    Time_period  DATETIME, 
    Pct_managed_drinking_water_services VARCHAR(45),
    Pct_managed_sanitation_services VARCHAR(45),
    PRIMARY KEY(Country_name, Time_period), 
    FOREIGN KEY(Country_name) REFERENCES Geographic_location(Country_name));

In [8]:
%%sql
ALTER TABLE Basic_services
MODIFY COLUMN Time_period INTEGER,
MODIFY COLUMN Pct_managed_drinking_water_services NUMERIC(5, 2),
MODIFY COLUMN Pct_managed_sanitation_services NUMERIC(5, 2);

Extract the columns `Country_name`, `Time_period`, `Pct_managed_drinking_water_services`, and `Pct_managed_sanitation_services` from the `Access_to_Basic_Services` table into the newly created `Basic_services` table.

In [9]:
%%sql
INSERT INTO Basic_services 
    (SELECT Country_name, 
        Time_period, 
        Pct_managed_drinking_water_services, 
        Pct_managed_sanitation_services 
    FROM Access_to_Basic_Services); 

### 2. Create a table that contains the economic indicators.

Create a table named `Economic_indicators` with the columns `Country_name`, `Time_period`, `Est_gdp_in_billions`, `Est_population_in_millions`, and `Pct_unemployment`, with `Country_name` and `Time_period` being the composite keys and `Country_name` being the foreign key.

In [14]:
%%sql
CREATE TABLE Economic_indicators
    (Country_name VARCHAR(45), 
    Time_period INTEGER,
    Est_gdp_in_billions NUMERIC(10, 2), 
    Est_population_in_millions NUMERIC(10, 2), 
    Pct_unemployment NUMERIC(5, 2), 
    PRIMARY KEY (Country_name, Time_period), 
    FOREIGN KEY (Country_name) REFERENCES Geographic_location(Country_name));

Extract the columns `Country_name`, `Time_period`, `Est_gdp_in_billions`, `Est_population_in_millions`, and `Pct_unemployment` from the `Access_to_Basic_Services` table into the newly created `Economic_indicators` table.

In [16]:
%%sql
INSERT INTO Economic_indicators(
    SELECT 
        Country_name, 
        Time_period, 
        Est_gdp_in_billions, 
        Est_population_in_millions, 
        Pct_unemployment 
    FROM Access_to_Basic_Services); 

RuntimeError: (pymysql.err.IntegrityError) (1062, "Duplicate entry 'Kazakhstan-2015' for key 'economic_indicators.PRIMARY'")
[SQL: INSERT INTO Economic_indicators(
    SELECT
        Country_name,
        Time_period,
        Est_gdp_in_billions,
        Est_population_in_millions,
        Pct_unemployment
    FROM Access_to_Basic_Services);]
(Background on this error at: https://sqlalche.me/e/20/gkpj)


In [18]:
%%sql
SELECT * FROM Economic_indicators;

Country_name,Time_period,Est_gdp_in_billions,Est_population_in_millions,Pct_unemployment
Afghanistan,2015,20.0,33.75,
Afghanistan,2016,18.02,34.64,
Afghanistan,2017,18.9,35.64,11.18
Afghanistan,2018,18.42,36.69,
Afghanistan,2019,18.9,37.77,
Afghanistan,2020,20.14,38.97,11.71
Algeria,2015,165.98,39.54,11.21
Algeria,2016,160.03,40.34,10.2
Algeria,2017,170.1,41.14,12.0
Algeria,2018,174.91,41.93,


## Solutions

### 1. Create a table that contains data about access to basic services for each country and year only.

In [None]:
%%sql

CREATE TABLE united_nations.Basic_Services (
  Country_name VARCHAR(37),
  Time_period INTEGER,
  Pct_managed_drinking_water_services NUMERIC(5,2),
  Pct_managed_sanitation_services NUMERIC(5,2),
  PRIMARY KEY (Country_name, Time_period),
  FOREIGN KEY (Country_name) REFERENCES Geographic_Location (Country_name)
);

Since our main Access to Basic Services table contains measurements of basic services for each country and year, it implies that every unique country and year will have unique measurements. In other words, both `Country_name` and `Time_period` identify each unique entry. Therefore, `Country_name` and `Time_period` are the composite keys that make up the primary key.

Furthermore, since we have already created a `Geographic_Location` table that contains the geographic location data for each country, this table will include a foreign key that links the `Country_name` column to the `Country_name` primary key in the `Geographic_Location` table.

In [None]:
%%sql

INSERT INTO Basic_Services (Country_name, Time_period, Pct_managed_drinking_water_services, Pct_managed_sanitation_services)
SELECT Country_name,
    Time_period,
    Pct_managed_drinking_water_services,
    Pct_managed_sanitation_services
FROM united_nations.Access_to_Basic_Services;

### 2. Create a table that contains the economic indicators.

In [None]:
%%sql

CREATE TABLE united_nations.Economic_Indicators (
  Country_name VARCHAR(37),
  Time_period INTEGER,
  Est_gdp_in_billions NUMERIC(8,2),
  Est_population_in_millions NUMERIC(11,6),
  Pct_unemployment NUMERIC(5,2),
  PRIMARY KEY (Country_name, Time_period),
  FOREIGN KEY (Country_name) REFERENCES Geographic_Location (Country_name)
);

Again, since we have these metrics per country and per year, `Country_name` and `Time_period` are the composite keys that make up the primary key, and we will have a foreign key that links the `Country_name` column to the `Country_name` primary key in the `Geographic_Location` table.

In [None]:
INSERT INTO Economic_Indicators (Country_name, Time_period, Est_gdp_in_billions, Est_population_in_millions, Pct_unemployment)
SELECT Country_name,
    Time_period,
    Est_gdp_in_billions,
    Est_population_in_millions,
    Pct_unemployment    
FROM united_nations.Access_to_Basic_Services;

#

<div align="center" style=" font-size: 80%; text-align: center; margin: 0 auto">
<img src="https://raw.githubusercontent.com/Explore-AI/Pictures/refs/heads/master/ALX_banners/ALX_Navy.png"  style="width:100px"  ;/>
</div>