# Overview

- **normalization procedures** remove redundancy by decomposing a relation schema into multiple smaller schemas
- **lossless-join decompositions**: a schema $R$ is decomposed into $R_1$ and $R_2$ such that for every legal relation $r$ of schema $R$, the following holds: $r = \Pi_{R1}(r)\bowtie\Pi_{R2}(r)$
    - if it does not hold, it is a **lossy decompositions**

# Dependency Preservation

- FDs are both an asset and a liability
    - allows us to constrain the set of legal relations, in principle, such constraints can be checked and enforced by the database to prevent anomalies
    - checking that a relation satisfies a set $F$ of FDs carries a cost
    - we want to minimize the cost by avoiding joins and Cartesian product

- **Goal for decomposition procedures**: ensure that all FDs can be enforced in the decomposed schema by checking them against one table at a a time

## Verification

- Let $R$ be a relation schema and let $F$ be a set of FDs for $R$
- consider a lossless-join decomposition of $R$ into $R_1, R_2, ..., R_n$
- let $F_i$ denote the set of dependencies in $F^+$ that include only attributes in $R_i$
- the decomposition is **dependency preserving** if and only if
    - $(F_1 \cup F_2 \cup ... \cup F_n)^+ = F^+$
- if the above does not hold, then checking updates for violation of FDs may require computing potentially expensive joins

## Decomposing a Schema into BCNF

- suppose we have a relation schema $R$ and set $F$ of FDs such that some non-trivial dependency $\alpha \rightarrow \beta$ in $F$ causes a violation w.r.t. $F$
- to avoid this violation, we can decompose $R$ into two relations
    - $(\alpha \cup \beta)$
    - $(R - (\beta - \alpha))$

- **Example**: $R(\underline{\textrm{ID}}, name, salary, dept\_name, building, budget)$
    - $\alpha = dept\_name$
    - $\beta = building, budget$
    - $(\alpha \cup \beta) = (\underline{dept\_name}, building, budget)$
    - $(R - (\beta - \alpha)) = (\underline{\textrm{ID}}, name, salary, dept\_name)$
    

    


## BCNF Decomposition Procedure

- **input** := $R, F$
- result := $\{R\}$
- **while** there is a schema $R_i$ in result that is not BCNF w.r.t. $F$, **do**
    - let $\alpha \rightarrow \beta$ be a nontrivial FD that holds on $R_i$ such that $(\alpha \cap \beta) = \emptyset$ and $\alpha$ is not a superkey for $R_i$
    - remove relation $R_i$ from result
    - add a relation on the attributes $R_i - \beta$ to result
    - add a relation on the attributes $\alpha\beta$ to result
- **output** result

- each $R_i$ returned is in BCNF w.r.t. $F$ and decomposition is lossless join


### BCNF Decomposition Example 

- $R(A,B,C,D,E)$, $F=\{A\rightarrow B, BC \rightarrow D\}$
    - $A \rightarrow B$ violates BCNF because $A$ is not a superkey
    - so $R_1(A, B), R_2(A,C,D,E)$
    - $R_1$ is BCNF because it only has two attributes
    - apply the general BCNF test to $R_2$
        - $(AC)^+ = ABCD$ => $AC \rightarrow D$ holds and violates BCNF
    - next, decompose $R_2$ into $R_{2.1}(A,C,D)$ and $R_{2.2}(A,C,E)$
        - apply the general BCNF test to $R_{2.1}$ and $R_{2.2}$, there are only trivial dependencies, so both of them are in BCNF
    - output: $\{R_1(A,B), R_{2.1}(A,C,D), R_{2.2}(A,C,E)\}$
    - this is not dependency preserving because $BC\rightarrow D$
    

## 3NF Decomposition Procedure

- **input**: $R$, $F$
- $F_c := \textrm{canonical cover for }F$
- $i := 0$
- **for each** FD $\alpha\rightarrow\beta$ in $F_c$ **do**
    - $i := i + 1$
    - $R_i = \alpha \beta$
- **if** none of the schemas $R_j$, for $1 \le j \le i$ contains a candidate key for $R$ **then**
    - $i := i+1$
    - $R_i :=$ any candidate key for $R$
    
/* optionally, remove redundant relations */
- **repeat**
    - **if** any schema $R_j$ is contained in another schema $R_k$, **then**
        - delete $R_j$, $R_j := R_i$, $i := i - 1$
- **until** no more $R_j$ can be deleted
- **return** $(R_1, R_2, ..., R_i)$

### Caveats in 3NF Decomposition

- $F_c$ is not the same as $F^+$
- To check whether $R_j$ contains a candidate key, let $\alpha = R_j$ then compute $\alpha^+$ using $F_c$, and test whether $\alpha^+ = R$
- if none of the $R_j$ contains a candidate key for $R$, then we must find a candidate key $\alpha$ for $R$ and create a new relation over $\alpha$, to find $\alpha$
    - first find any superkey $\gamma$ for $R$
    - then try to remove attributes from $\gamma$ (one by one) to make it minimal
    - let $\alpha$ be the final $\gamma$

### 3NF Decomposition Example

- $R(A, B, C, D), F = \{AB \rightarrow CD, B \rightarrow C, AC \rightarrow B, B \rightarrow D\}$
    - candidate keys are $AB$ and $AC$
    - apply simplified 3NF test since $F$ only refers to attributes in $R$
        - $R$ is not in 3NF w.r.t. $F$ because $B\rightarrow D$ holds and $B$ is not a superkey and $D$ is not part of any candidate key
    - compute a canonical cover
        - after removing extraneous attributes and applying the union rule, $F_c = \{B\rightarrow CD, AC \rightarrow B\}$
    - the 3NF decomposition first creates $R_1(B,C,D)$ and $R_2(A,B,C)$
    - no redundant relations and $R_2$ contains a candidate key
    - output $\{R_1, R_2\}$

- 3NF decomposition is **always dependency-preserving**
    - every FD in $F_c$ can be checked against one of $R_1$ and $R_2$

### 3NF Decomposition Example 2

- $R(A, B, C, D), F = \{A \rightarrow B, B \rightarrow C\}$
    - find the candidate keys
        - note that $A$ and $D$ must appear in every candidate key, since there is no functional dependency in $F$ where $A$ or $D$ appear on the right
        - $(AD)^+ = ABCD$ => $AD$ is one and only candidate key
    - apply simplified 3NF since $F$ only refers to attributes in $R$
        - $R$ is not in 3NF, since $A\rightarrow B$ holds where $A$ is not a superkey and $B$ is not part of of any candidate key
    - $F$ itself is a canonical cover, so $F_c = F$
    - the 3NF decomposition first creates $R_1(A, B)$ and $R_2(B,C)$
    - next, add $R_3(A,D)$ since neither $R_1$ nor $R_2$ contains a candidate key
    - none of the relations are redundant
    - output $R_1, R_2, R_3$