In [1]:
import pandas as pd
import sqlite3 as sql
from pandasql import sqldf

In [2]:
cities = pd.read_csv('cities.csv')

In [3]:
countries = pd.read_csv('countries.csv')

In [4]:
economies = pd.read_csv('economies.csv')

In [5]:
languages = pd.read_csv('languages.csv')

In [6]:
currencies = pd.read_csv('currencies.csv')

In [7]:
populations = pd.read_csv('populations.csv')

In [17]:
economies2015 = pd.read_csv('economies2015.csv')

In [18]:
economies2019 = pd.read_csv('economies2019.csv')

# Set Theory for SQL Joins

- UNION vs. UNION ALL\
Nice work learning all about UNION and UNION ALL!\
Two tables, languages and currencies, are provided. Run the queries provided in the console and select the correct answer for the multiple-choice questions in this exercise.

q = '''SELECT * \
FROM languages\
UNION\
SELECT * \
FROM currencies;'''\
sqldf(q,env=None) SELECTs to the left and right of UNION do not have the same number of result columns

In [9]:
c = '''SELECT code FROM
languages
UNION ALL
SELECT code FROM 
currencies;'''

In [10]:
sqldf(c,env=None)

Unnamed: 0,code
0,AFG
1,AFG
2,AFG
3,AFG
4,ALB
...,...
1174,ZWE
1175,ZWE
1176,ZWE
1177,ZWE


- An unordered list of each country code in languages and currencies, including duplicates

In [12]:
x = '''SELECT code 
FROM languages
UNION
SELECT curr_id 
FROM currencies;'''
sqldf(x,env=None)

Unnamed: 0,code
0,1
1,2
2,3
3,4
4,5
...,...
431,WSM
432,YEM
433,ZAF
434,ZMB


- Correct! Both queries on the left and right of the set operation must have the same data types. The names of the fields do not need to be the same, as the result will always contain field names from the left query.

# Comparing global economies
Are you ready to perform your first set operation?\
In this exercise, you have two tables, economies2015 and economies2019, available to you under the tabs in the console. You'll perform a set operation to stack all records in these two tables on top of each other, excluding duplicates.\
When drafting queries containing set operations, it is often helpful to write the queries on either side of the operation first, and then call the set operator. The instructions are ordered accordingly.

In [20]:
# Select all fields from economies2015
u = '''SELECT *
FROM economies2015   
-- Set operation
UNION
-- Select all fields from economies2019
SELECT *
FROM economies2019
ORDER BY code, year;'''
sqldf(u,env=None)

Unnamed: 0,code,year,income_group,gross_savings
0,ABW,2015,High income,14.867852
1,AGO,2015,Lower middle income,25.021327
2,AGO,2019,Lower middle income,25.524848
3,ALB,2015,Upper middle income,16.863981
4,ALB,2019,Upper middle income,14.499826
...,...,...,...,...
312,ZAF,2019,Upper middle income,13.465737
313,ZMB,2015,Lower middle income,33.700215
314,ZMB,2019,Lower middle income,39.714393
315,ZWE,2015,Lower middle income,-0.107826


- Your first UNION! UNION can be helpful for consolidating data from multiple tables into one result, which as you have seen, can then be ordered in meaningful ways.

# Comparing two set operations
You learned in the video exercise that UNION ALL returns duplicates, whereas UNION does not. In this exercise, you will dive deeper into this, looking at cases for when UNION is appropriate compared to UNION ALL.\
You will be looking at combinations of country code and year from the economies and populations tables.

In [21]:
# Query that determines all pairs of code and year from economies and populations, without duplicates
un = '''SELECT code, year
FROM economies
UNION
SELECT country_code, year
FROM populations;'''
sqldf(un,env=None)

Unnamed: 0,code,year
0,ABW,2010
1,ABW,2015
2,AFG,2010
3,AFG,2015
4,AGO,2010
...,...,...
429,ZAF,2015
430,ZMB,2010
431,ZMB,2015
432,ZWE,2010


In [22]:
un_a = '''SELECT code, year
FROM economies
-- Set theory clause
UNION ALL
SELECT country_code, year
FROM populations
ORDER BY code, year;'''
sqldf(un_a,env=None)

Unnamed: 0,code,year
0,ABW,2010
1,ABW,2015
2,AFG,2010
3,AFG,2010
4,AFG,2015
...,...,...
809,ZMB,2015
810,ZWE,2010
811,ZWE,2010
812,ZWE,2015


- Nicely done! UNION returned 434 records, whereas UNION ALL returned 814. Are you able to spot the duplicates in the UNION ALL?

# At the INTERSECT

Well done getting through the material on INTERSECT!\
Let's say you are interested in those countries that share names with cities. Use this task as an opportunity to show off your knowledge of set theory in SQL!



In [25]:
# Return all cities with the same name as a country
i = '''SELECT name
FROM cities
INTERSECT 
SELECT country_name
FROM countries;'''
sqldf(i,env=None)

Unnamed: 0,name
0,Singapore


- Nice one! It looks as though Singapore is the only country in our database that has a city with the same name!

Which of the following definitions of set operations is correct?\
Correct! INTERSECT is a robust set operation for finding the set of identical records between two sets of records.✅

# EXCEPT

- You've got it, EXCEPT...
Just as you were able to leverage INTERSECT to find the names of cities with the same names as countries, you can also do the reverse, using EXCEPT.\
In this exercise, you will find the names of cities that do not have the same names as their countries.

In [29]:
# Return all cities that do not have the same name as a country
e = '''SELECT name
FROM cities
EXCEPT  
SELECT country_name
FROM countries
ORDER BY name;'''
sqldf(e,env=None)

Unnamed: 0,name
0,Abidjan
1,Abu Dhabi
2,Abuja
3,Accra
4,Addis Ababa
...,...
230,Yerevan
231,Yokohama
232,Zhengzhou
233,Zhongshan


- EXCEPTional! Note that if countries had been on the left and cities on the right, you would have returned the opposite: all countries that do not have the same name as a city.