Activity 7-2
------------

First we'll load some data, some tools from last time, and some utilities for nicer displaying (of tables side-by-side):

In [1]:
%load_ext sql
%sql sqlite://

  warn("IPython.utils.traitlets has moved to a top-level traitlets package.")


'Connected: None@None'

In [14]:
%%sql
DROP TABLE IF EXISTS Player;
CREATE TABLE Player(uniform_number INT, team TEXT, position TEXT, first_name TEXT, last_name TEXT);
INSERT INTO Player VALUES (1, 'Stanford', 'WR', 'Bob', 'Jones');
INSERT INTO Player VALUES (2, 'Stanford', 'RB', 'Joe', 'Bobson');
INSERT INTO Player VALUES (1, 'UCLA', 'WR', 'Bob', 'Roberts');

Done.
Done.
1 rows affected.
1 rows affected.
1 rows affected.


[]

In [20]:
from IPython.core.display import display_html, HTML
def to_html_table(res, style=None):
    html = '<table' + (' style="' + style + '"' if style else '') + '><tr><th>'
    html += '</th><th>'.join(res.keys) + '</th></tr><tr><td>'
    html += '</td></tr><tr><td>'.join(['</td><td>'.join([str(cell) for cell in row]) for row in list(res)])
    return html + '</tr></table>'
def display_side_by_side(l, r):
    s = "display: inline-block;"
    html = to_html_table(l, style=s) + ' ' + to_html_table(r, style=s)
    display_html(HTML(data=html))

In [16]:
def to_set(x):
    if type(x) == set:
        return x
    elif type(x) in [list, set]:
        return set(x)
    elif type(x) in [str, int]:
        return set([x])
    else:
        raise Exception("Unrecognized type.")
def fd_to_str((lhs,rhs)): return ",".join(to_set(lhs)) + " -> " + ",".join(to_set(rhs))
def fds_to_str(fds): return "\n\t".join(map(fd_to_str, fds))
def set_to_str(x): return "{" + ",".join(x) + "}"
def fd_applies_to(fd, x): 
    lhs, rhs = map(to_set, fd)
    return lhs.issubset(x)
def compute_closure(x, fds, verbose=False):
    bChanged = True        # We will repeat until there are no changes.
    x_ret    = to_set(x).copy()    # Make a copy of the input to hold x^{+}
    while bChanged:
        bChanged = False   # Must change on each iteration
        for fd in fds:     # loop through all the FDs.
            (lhs, rhs) = map(to_set, fd) # recall: lhs -> rhs
            if fd_applies_to(fd, x_ret) and not rhs.issubset(x_ret):
                x_ret = x_ret.union(rhs)
                if verbose:
                    print("Using FD " + fd_to_str(fd))
                    print("\t Updated x to " + set_to_str(x_ret))
                bChanged = True
    return x_ret
def is_superkey_for(A, X, fds, verbose=False): 
    return X.issubset(compute_closure(A, fds, verbose=verbose))
import itertools
def is_key_for(A, X, fds, verbose=False):
    subsets = set(itertools.combinations(A, len(A)-1))
    return is_superkey_for(A, X, fds) and \
        all([not is_superkey_for(set(SA), X, fds) for SA in subsets])

Suppose we have a schema of football players:

In [17]:
%sql SELECT * FROM Player;

Done.


uniform_number,team,position,first_name,last_name
1,Stanford,WR,Bob,Jones
2,Stanford,RB,Joe,Bobson
1,UCLA,WR,Bob,Roberts


Where the following FDs hold:

In [18]:
F = [
    ('uniform_number','position'),
    (set(['position','last_name']),'uniform_number')
]

In other words, in this league

1. The numbers on a player's uniform are always associated with specific positions
2. _Across all teams_, a player's postion plus their last name should uniquely determine their uniform number.

### Exercise 1

Determine and carry out a _lossy_ BCNF decomposition, i.e. one that will result in a 'lost' FD.  Show that the FD is lost by inserting some tuples into the decomposed tables, **respecting the remaining local FDs**, then joining the decomposed tables back together:

In [26]:
%sql DROP TABLE IF EXISTS A;
%sql CREATE TABLE A AS SELECT DISTINCT * FROM (SELECT uniform_number, position FROM Player);
%sql DROP TABLE IF EXISTS B;
%sql CREATE TABLE B AS SELECT DISTINCT * FROM (SELECT uniform_number, team, first_name, last_name FROM Player);
l = %sql SELECT * FROM A;
r = %sql SELECT * FROM B;
display_side_by_side(l,r)

Done.
Done.
Done.
Done.
Done.
Done.


uniform_number,position
1,WR
2,RB

uniform_number,team,first_name,last_name
1,Stanford,Bob,Jones
2,Stanford,Joe,Bobson
1,UCLA,Bob,Roberts


In [27]:
%%sql
INSERT INTO A VALUES (3, 'WR');
INSERT INTO B VALUES (3, 'UCLA', 'John', 'Jones');

1 rows affected.
1 rows affected.


[]

In [28]:
%%sql
SELECT a.uniform_number, b.team, a.position, b.first_name, b.last_name
FROM A AS a, B AS b
WHERE a.uniform_number = b.uniform_number;

Done.


uniform_number,team,position,first_name,last_name
1,Stanford,WR,Bob,Jones
1,UCLA,WR,Bob,Roberts
2,Stanford,RB,Joe,Bobson
3,UCLA,WR,John,Jones


We see that the tuples we inserted didn't break any _local_ FDs (that were preserved by the BCNF decomposition), however now the FD $\{\text{last_name},\text{postion}\}\rightarrow\text{uniform_number}$ is violated!

### Exercise 2

Discuss how we might practically deal with this issue, while still using BCNF...