# Join Examples

* Using Python and MySQL to go through https://blog.codinghorror.com/a-visual-explanation-of-sql-joins/. Jeff Atwood, the Database designer of Stackoverflow, explains joins. 

In [6]:
from sqlalchemy import create_engine
import pandas as pd
from warnings import filterwarnings
import pymysql
filterwarnings('ignore', category=pymysql.Warning)
import os

In [7]:
engine = create_engine('mysql+pymysql://root:AQib.21Talib@localhost')  # connect to server
engine.execute("create database if not exists join_sample") #create db

<sqlalchemy.engine.result.ResultProxy at 0x1b6181e4f60>

In [8]:
engine = create_engine('mysql+pymysql://root:AQib.21Talib@localhost/join_sample') 

In [9]:
def RunSQL(sql_command):
    connection = pymysql.connect(host='localhost',
                             user='root',
                             password='AQib.21Talib',
                             db='join_sample',
                             charset='utf8mb4',
                             cursorclass=pymysql.cursors.DictCursor)
    try:
        with connection.cursor() as cursor:
            commands = sql_command.split(';')
            for command in commands:
                if command == '\n': continue
                cursor.execute(command + ';')
                connection.commit()
    except Exception as e: 
        print(e)
    finally:
        connection.close()

In [10]:
sql_query = """
drop table if exists TableA;
drop table if exists TableB;
create table TableA(
 id integer,
 name varchar(1000)
);
create table TableB(
 id integer,
 name varchar(1000)
);

insert into TableA(id, name) values 
(1, 'Pirate'),
(2, 'Monkey'),
(3, 'Ninja'),
(4, 'Spaghetti');
insert into TableB(id, name) values 
(1, 'Rutabaga'),
(2, 'Pirate'),
(3, 'Darth Vader'),
(4, 'Ninja');
"""
RunSQL(sql_query)

In [6]:
table_a = pd.read_sql_query('select * from TableA', engine)
table_a.head()

Unnamed: 0,id,name
0,1,Pirate
1,2,Monkey
2,3,Ninja
3,4,Spaghetti


In [7]:
table_b = pd.read_sql_query('select * from TableB', engine)
table_b.head()

Unnamed: 0,id,name
0,1,Rutabaga
1,2,Pirate
2,3,Darth Vader
3,4,Ninja


## Inner Join

**Inner join** produces only the set of records that match in both Table A and Table B.

In [8]:
sql_query = """
SELECT * FROM TableA
INNER JOIN TableB
ON TableA.name = TableB.name
"""
inner_join = pd.read_sql_query(sql_query, engine)
inner_join.head()

Unnamed: 0,id,name,id.1,name.1
0,1,Pirate,2,Pirate
1,3,Ninja,4,Ninja


## Full Outer Join

**Full outer join** produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.

**MySQL Doesn't have FULL OUTER JOINS** 

* Faking them in MySQL gets tricky. Beyond the scope of this class. See https://stackoverflow.com/questions/4796872/how-to-do-a-full-outer-join-in-mysql for more details

* But we have Pandas... We can just do a full outer join there!

In [9]:
merged = pd.merge(table_a, table_b, how='outer', on='name')
merged

Unnamed: 0,id_x,name,id_y
0,1.0,Pirate,2.0
1,2.0,Monkey,
2,3.0,Ninja,4.0
3,4.0,Spaghetti,
4,,Rutabaga,1.0
5,,Darth Vader,3.0


## Left Outer Join 

**Left Outer Join** produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.

In [10]:
sql_query = """
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
"""
left_outer = pd.read_sql_query(sql_query, engine)
left_outer.head()



Unnamed: 0,id,name,id.1,name.1
0,1,Pirate,2.0,Pirate
1,3,Ninja,4.0,Ninja
2,2,Monkey,,
3,4,Spaghetti,,


* To produce the set of records only in Table A, but not in Table B, we perform the same left outer join, then **exclude the records we don't want from the right side via a where clause**.

In [11]:
sql_query = """
SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableB.id IS null
"""
left_outer = pd.read_sql_query(sql_query, engine)
left_outer.head()

Unnamed: 0,id,name,id.1,name.1
0,2,Monkey,,
1,4,Spaghetti,,


In [12]:
sql_query = """
SELECT * FROM TableA
RIGHT OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableA.id IS null
"""
left_outer = pd.read_sql_query(sql_query, engine)
left_outer.head()

Unnamed: 0,id,name,id.1,name.1
0,,,1,Rutabaga
1,,,3,Darth Vader
