# Concatenating Tables with Set-Like Operations in `sqlalchemy`

Finally, we look at combining tables with `union`, `intersect`, and `except` in `sqlalchemy`.

## Example - Auto Sales in SQL

In [41]:
import pandas as pd
from dfply import *

from sqlalchemy.ext.automap import automap_base
from sqlalchemy import select as selectq
from sqlalchemy import create_engine, func
from more_sqlalchemy import pprint

In [23]:
from sqlalchemy import union, union_all, intersect, intersect_all, except_, except_all

In [5]:
sales_eng = create_engine("sqlite:///databases/sales_2_8.db") 
Base = automap_base()
Base.prepare(sales_eng, reflect=True)
SalesApr = Base.classes.sales_apr
salesAprTbl = SalesApr.__table__
SalesMay = Base.classes.sales_may
salesMayTbl = SalesMay.__table__

In [8]:
pd.read_sql_query(selectq([SalesApr]), con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,20,14,6,24,1
2,Yolanda,19,10,28,17,2
3,Xerxes,11,27,17,9,3


In [9]:
pd.read_sql_query(selectq([SalesMay]), con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,19,12,17,20,1
2,Yolanda,19,8,32,15,2
3,Xerxes,12,23,18,9,3


## Notes on set concatenation in `sqlalchemy`

* Available functions: `union, union_all, intersect, intersect_all, except_, except_all`
* Used to combine full select statements
    * Example: `(SELECT * FROM T1) UNION (SELECT * FROM T2)`
    
**Consequence:** You need to
    1. Make two/more select statements
    2. *Then* combine with `union` etc.

## Performing a `union`

In [24]:
sales_union = union(selectq([salesAprTbl]), selectq([salesMayTbl]))
print(sales_union)


SELECT sales_apr."Salesperson", sales_apr."Compact", sales_apr."Sedan", sales_apr."SUV", sales_apr."Truck", sales_apr.id 
FROM sales_apr UNION SELECT sales_may."Salesperson", sales_may."Compact", sales_may."Sedan", sales_may."SUV", sales_may."Truck", sales_may.id 
FROM sales_may


In [25]:
pd.read_sql_query(sales_union, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,19,12,17,20,1
2,Bob,20,14,6,24,1
3,Xerxes,11,27,17,9,3
4,Xerxes,12,23,18,9,3
5,Yolanda,19,8,32,15,2
6,Yolanda,19,10,28,17,2


## Performing a `union_all`

In [26]:
sales_union_all = union_all(selectq([salesAprTbl]), selectq([salesMayTbl]))
pd.read_sql_query(sales_union_all, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,20,14,6,24,1
2,Yolanda,19,10,28,17,2
3,Xerxes,11,27,17,9,3
4,Ann,22,18,15,12,0
5,Bob,19,12,17,20,1
6,Yolanda,19,8,32,15,2
7,Xerxes,12,23,18,9,3


##  `union_all` and friends take any number of tables

In [27]:
sales_union_all3 = union_all(selectq([salesAprTbl]), 
                             selectq([salesAprTbl]), 
                             selectq([salesMayTbl]))
pd.read_sql_query(sales_union_all3, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0
1,Bob,20,14,6,24,1
2,Yolanda,19,10,28,17,2
3,Xerxes,11,27,17,9,3
4,Ann,22,18,15,12,0
5,Bob,20,14,6,24,1
6,Yolanda,19,10,28,17,2
7,Xerxes,11,27,17,9,3
8,Ann,22,18,15,12,0
9,Bob,19,12,17,20,1


## Performing a `intersect`

Note that `intersect` and `intersect_all` are synonymous.

In [28]:
sales_inter = intersect(selectq([salesAprTbl]), selectq([salesMayTbl]))
pd.read_sql_query(sales_inter, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Ann,22,18,15,12,0


## Performing a `except_`

Note that the `_` is needed as `except` is a protected Python name.

In [29]:
sales_except = except_(selectq([salesAprTbl]), selectq([salesMayTbl]))
pd.read_sql_query(sales_except, con=sales_eng)

Unnamed: 0,Salesperson,Compact,Sedan,SUV,Truck,id
0,Bob,20,14,6,24,1
1,Xerxes,11,27,17,9,3
2,Yolanda,19,10,28,17,2


## <font color="red"> Exercise 3 </font>

In the database folder, you will find a database titled `uber_samples_2_8.db` that contains the sample tables from the last 2 examples.  The tables are named  `sales_jun`, `sales_apr`, `sales_may`, `sales_sep`, `sales_aug`, and `sales_jul`. 

1. Use `union_all` to create a `stmt` that combines these files into one table.
2. Use `pandas` and `limit(5)` to get the first 5 rows of the table.
3. Use `selectq([func.count('*')]).select_from(stmt)` to count the total number of rows in the new table.

In [31]:
engine = create_engine("sqlite:///databases/uber_samples_2_8.db")

Base = automap_base()
Base.prepare(engine, reflect=True)

SalesApr = Base.classes.sales_apr
salesAprTbl = SalesApr.__table__
SalesMay = Base.classes.sales_may
salesMayTbl = SalesMay.__table__
SalesJun = Base.classes.sales_jun
salesJunTbl = SalesJun.__table__
SalesJul = Base.classes.sales_jul
salesJulTbl = SalesJul.__table__
SalesAug = Base.classes.sales_aug
salesAugTbl = SalesAug.__table__
SalesSep = Base.classes.sales_sep
salesSepTbl = SalesSep.__table__

In [32]:
sales_union_all = union_all(selectq([salesAprTbl]), 
                             selectq([salesMayTbl]), 
                             selectq([salesJunTbl]),
                             selectq([salesJulTbl]),
                             selectq([salesAugTbl]),
                             selectq([salesSepTbl]))

In [50]:
pd.read_sql_query(sales_union_all.limit(5), con=engine)

Unnamed: 0,date,Lat,Lon,Base,month,id
0,4/18/2014 21:38:00,40.7359,-73.9852,B02682,apr,0
1,4/23/2014 15:19:00,40.7642,-73.9543,B02598,apr,1
2,4/10/2014 7:15:00,40.7138,-74.0103,B02598,apr,2
3,4/11/2014 15:23:00,40.7847,-73.9698,B02682,apr,3
4,4/7/2014 17:26:00,40.646,-73.7767,B02598,apr,4


In [49]:
count = (selectq([func.count('*')]).select_from(sales_union_all))
pd.read_sql_query(count, con=engine)

Unnamed: 0,count_1
0,600000


## Up Next

Stuff