### Note: You may need to install `markdown` for full display functionality:

    pip install markdown
    
_If you can't load this, you will miss out on the rendered format of the relational algebra expressions, but otherwise everything should work_

In [1]:
%load_ext sql
%sql sqlite://

%load_ext autoreload
%autoreload 2

# To help render markdown
from IPython.core.display import display, HTML
from markdown import markdown
def render_markdown_raw(m): return display(HTML(markdown(m))) # must be last element of cell.
def render_markdown(m): return render_markdown_raw(m.toMD())
def cost_markdown(q): 
    q.reset_count()
    get_result(q) # run the counters
    return display(HTML(markdown("Total Reads: {0}\n\n".format(q.total_count()) + q.toCount(0))))

# import the relational algbera operators
from relation_algebra import Select, Project, Union, Difference, NJoin, CrossProduct, BaseRelation
from relation_algebra import get_result, compare_results

import random

  warn("IPython.utils.traitlets has moved to a top-level traitlets package.")


Relational Algebra: Practice Notebook
====================

Since we didn't get to cover relational algebra (RA) on a problem set, we're providing this notebook so you can get some practice in before the final exam.  Solutions will be posted in a separate notebook- try doing these on your own first, then take a look at the solutions to check your understanding!

In particular, you should understand:
* How to go from SQL query -> RA expression
* How to go from RA expression -> SQL query
* How to optimize an RA expression by commuting operators

**_Note that some of the problems here will be slightly more involved than what would be on the exam!_**

### Generating test instances

Below, we create four test relations and populate them with values- you can and should play around with different test instances!

In [2]:
%%sql
drop table if exists R; create table R(A int, B int);
drop table if exists S; create table S(B int, C int);
drop table if exists T; create table T(C int, D int);
drop table if exists U; create table U(D int, E int);

Done.
Done.
Done.
Done.
Done.
Done.
Done.
Done.


[]

In [3]:
for x in range(0,10,2):
    for y in range(0,10,3):
        %sql INSERT INTO R VALUES (:x, :y);
for x in range(0,20,4):
    for y in range(0,10,2):
        %sql INSERT INTO S VALUES (:x, :y);
for x in range(0,5,1):
    for y in range(0,10,2):
        %sql INSERT INTO T VALUES (:x, :y);
for x in range(0,10,2):
    for y in range(0,5,1):
        %sql INSERT INTO U VALUES (:x, :y);

1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affected.
1 rows affecte

## 1. Tutorial: Relational Algebra Python Toolkit

We'll use a python toolkit we made to play around with RA.  We'll get started with a quick tutorial, but the syntax should also be pretty intuitive (feel free to look at the source code too!)

**NOTE: Knowing how to use this toolkit is not necessary for the final- just for using this notebook!**

#### BaseRelation class

Recall that in our RA operations we'll deal with sets; to get started, we need to take SQL output and turn it into a `BaseRelation` object, which we can optionally name:

In [4]:
r = %sql SELECT * FROM R;
R = BaseRelation(r, name="R")

s = %sql SELECT * FROM S;
S = BaseRelation(s, name="S")

t = %sql SELECT * FROM T;
T = BaseRelation(t, name="T")

u = %sql SELECT * FROM U;
U = BaseRelation(u, name="U")

Done.
Done.
Done.
Done.


For **all operators in our toolkit**, we can use `get_result` to see the set we have:

In [5]:
print get_result(R)

[(0, 0), (0, 3), (0, 6), (0, 9), (2, 0), (2, 3), (2, 6), (2, 9), (4, 0), (4, 3), (4, 6), (4, 9), (6, 0), (6, 3), (6, 6), (6, 9), (8, 0), (8, 3), (8, 6), (8, 9)]


And (again **for all operators in our toolkit**) we can use `render_markdown(R)`

**_NOTE: This function requires that you have installed the `markdown` python library.  It's just for this function / pretty printing, so if you weren't able to install this library, don't worry!_**

In [6]:
render_markdown(R)

### Basic Operators

#### Selection

_Note that in the current version of our RA toolkit, only equality selection is supported_

In [7]:
x = Select("A", 2, R)
render_markdown(x)
print get_result(x)

[(2, 0), (2, 3), (2, 6), (2, 9)]


#### Projection

In [8]:
x = Project(["A"], R)
render_markdown(x)
print get_result(x)

[(2,), (8,), (0,), (6,), (4,)]


#### Cross-Product

_Note that the schemas of the two input expressions must be **distinct**_

In [9]:
x = CrossProduct(R,T)
render_markdown(x)
# Warning- generates a lot of output!
# print get_result(x)

#### Union

_Note that the schemas of the two input expressions must be **the same**_

In [10]:
x = Union(Select("A",0,R), Select("A",4,R))
render_markdown(x)
print get_result(x)

[(0, 0), (4, 9), (4, 6), (0, 6), (4, 3), (0, 9), (0, 3), (4, 0)]


#### Difference

_Note that the schemas of the two input expressions must be **the same**_

In [11]:
x = Difference(R, Select("A", 0, R))
render_markdown(x)
print get_result(x)

[(6, 3), (6, 9), (2, 6), (4, 9), (4, 6), (2, 9), (6, 6), (8, 0), (6, 0), (8, 9), (2, 0), (2, 3), (8, 3), (4, 3), (8, 6), (4, 0)]


#### Natural Join

In [12]:
x = NJoin(R, S)
render_markdown(x)
print get_result(x)

[(0, 0, 0), (0, 0, 2), (0, 0, 4), (0, 0, 6), (0, 0, 8), (2, 0, 0), (2, 0, 2), (2, 0, 4), (2, 0, 6), (2, 0, 8), (4, 0, 0), (4, 0, 2), (4, 0, 4), (4, 0, 6), (4, 0, 8), (6, 0, 0), (6, 0, 2), (6, 0, 4), (6, 0, 6), (6, 0, 8), (8, 0, 0), (8, 0, 2), (8, 0, 4), (8, 0, 6), (8, 0, 8)]


### Compositionality

Most importantly, these operators are all compositional, so you can pass them in as inputs to each other (as we already did with passing `BaseRelation` into the operators above)!

In [13]:
x = Project(["A"], Select("A", 0, R))
render_markdown(x)
print get_result(x)

[(0,)]


#### Checking equivalence

You can use the `compare_results` function to check whether two different RA expressions produce the same result set.  Note that this means you can also compare an RA expression with a SQL query (just put the SQL query result into a `BaseRelation` class)

In [14]:
x = Project(["A"], Select("A", 0, R))
render_markdown(x)

y = Select("A", 0, Project(["A"], R))
render_markdown(y)

compare_results(x,y)

True

## 2. SQL -> RA

Let's go through some examples where we'll translate SQL to Relational Algebra- note you can use the tools to debug / test your answers

**For each of the below queries, translate them from SQL into RA, then check for equivalence using the `compare_results` function:**

### Exercise 2(a)

In [None]:
%%sql
SELECT DISTINCT *
FROM R
WHERE R.A = 2;

In [None]:
X = %sql SELECT DISTINCT * FROM R WHERE R.A = 2;
x = BaseRelation(X)

# Your RA expression here
y = 
render_markdown(y)

# Compare results here
compare_results(x,y)

### Exercise 2(b)

In [None]:
%%sql
SELECT DISTINCT S.B
FROM S
WHERE S.C = 4;

In [None]:
X = %sql SELECT DISTINCT S.B FROM S WHERE S.C = 4;
x = BaseRelation(X)

# Your RA expression here
y = 
render_markdown(y)

# Compare results here
compare_results(x,y)

### Exercise 2(c)

In [None]:
%%sql
SELECT DISTINCT R.A, S.C
FROM R, S
WHERE R.B = S.B;

In [None]:
X = %sql SELECT DISTINCT R.A, S.C FROM R, S WHERE R.B = S.B;
x = BaseRelation(X)

# Your RA expression here
y = 
render_markdown(y)

# Compare results here
compare_results(x,y)

### Exercise 2(d)

In [None]:
%%sql
SELECT DISTINCT R.A, T.D
FROM R, S, T
WHERE R.B = S.B AND S.C = T.C AND R.A = 2 AND S.B = 0;

In [None]:
X = %sql SELECT DISTINCT R.A, T.D FROM R, S, T WHERE R.B = S.B AND S.C = T.C AND R.A = 2 AND S.B = 0;
x = BaseRelation(X)

# Your RA expression here
y = 
render_markdown(y)

# Compare results here
compare_results(x,y)

### Exercise 2(e)

In [None]:
%%sql
SELECT DISTINCT R.A
FROM R
WHERE R.B = 0 OR R.B = 2;

In [None]:
X = %sql SELECT DISTINCT R.A FROM R WHERE R.B = 0 OR R.B = 2;
x = BaseRelation(X)

# Your RA expression here
y = 
render_markdown(y)

# Compare results here
compare_results(x,y)

### Exercise 2(f)

In [None]:
%%sql
SELECT DISTINCT R.A
FROM R
WHERE R.B <> 2;

In [None]:
X = %sql SELECT DISTINCT R.A FROM R WHERE R.B <> 2;
x = BaseRelation(X)

# Your RA expression here
y = 
render_markdown(y)

# Compare results here
compare_results(x,y)

### Exercise 2(g)

In [None]:
%%sql
SELECT DISTINCT R.B, U.E
FROM R, S, T, U
WHERE R.B = S.B AND S.C = T.C AND T.D = U.D
  AND (S.C = 2 OR T.D = 4) AND U.D <> 2;

In [None]:
X = %sql SELECT DISTINCT R.B, U.E FROM R, S, T, U WHERE R.B = S.B AND S.C = T.C AND T.D = U.D AND (S.C = 2 OR T.D = 4) AND U.D <> 2;
x = BaseRelation(X)

# Your RA expression here
y = 
render_markdown(y)
print get_result(y)

# Compare results here
compare_results(x,y)

## 3: RA -> SQL

Now we'll go through some examples where we'll translate Relational Algebra to SQL- note you can use the tools to debug / test your answers!



### Exercise 3(a)

In [15]:
x = Select("B", 0, Project(["B"], BaseRelation(s, name="S")))
render_markdown(x)
print get_result(x)

[(0,)]


In [None]:
%%sql
--YOUR QUERY HERE!

### Exercise 3(b)

In [16]:
x = Project(["A","E"], Select("A", 2, Select("C", 0, NJoin(R, NJoin(S, NJoin(T,U))))))
render_markdown(x)
print get_result(x)

[(2, 0), (2, 3), (2, 4), (2, 1), (2, 2)]


In [None]:
%%sql
--YOUR QUERY HERE!

### Exercise 3(c)

In [17]:
x = Project(["A","C"],
        NJoin(
            NJoin(Select("B", 0, BaseRelation(r, name="R")), BaseRelation(s, name="S")),
            Select("C", 0, BaseRelation(t, name="T"))))
render_markdown(x)
print get_result(x)

[(8, 0), (2, 0), (0, 0), (6, 0), (4, 0)]


In [None]:
%%sql
--YOUR QUERY HERE

### Exercise 3(d)

In [18]:
x = NJoin(Union(Select("A",2,R), Select("A",4,R)), Difference(Select("C",2,S), Select("B",1,S)))
render_markdown(x)
print get_result(x)

[(2, 0, 2), (4, 0, 2)]


In [None]:
%%sql
--YOUR QUERY HERE

## 4. Optimization of RA Expressions

In this section, we'll optimize RA expressions, i.e. reduce the total IO cost of executing them.

**In these exercises, re-write the RA expressions and then test their costs using the `cost_markdown` function**

_Note that the execution will usually be done in the most naive way possible (for example, the tuple nested loop join is used for `NJoin`); however this doesn't affect our optimization decisions_

### Exercise 4(a)

In [19]:
x = Project(["D"], NJoin(T,U))
render_markdown(x)
cost_markdown(x)

In [None]:
# Your more IO-efficient RA expression here:
y = 

# Print & compare
render_markdown(y)
print compare_results(x,y)
cost_markdown(y)

### Exercise 4(b)

In [20]:
x = Select("A", 2, Project(["A","C"], NJoin(R,S)))
render_markdown(x)
cost_markdown(x)

In [None]:
# Your more IO-efficient RA expression here:
y = 

# Print & compare
render_markdown(y)
print compare_results(x,y)
cost_markdown(y)

### Exercise 4(c)

In [21]:
x = Select("C", 0, Project(["A","C"], Select("B", 0, NJoin(NJoin(R, S), T))))
render_markdown(x)
cost_markdown(x)

In [None]:
# Your more IO-efficient RA expression here:
y = 

# Print & compare
render_markdown(y)
print compare_results(x,y)
cost_markdown(y)

### Exercise 4(d)

In [22]:
x = Select("C", 0, Project(["C"], Select("D", 2, Select("A",3,NJoin(R,NJoin(S,T))))))
render_markdown(x)
cost_markdown(x)

In [None]:
# Your more IO-efficient RA expression here:
y = 

# Print & compare
render_markdown(y)
print compare_results(x,y)
cost_markdown(y)