# Relational Database Design

- goal of relational database design is to generate a set of relation schemas that allows us to store information without necessary redundancy, yet also allows easy information retrival
- this is accomplished by designing schemas that are in an appropriate **normal form**

# Features of Good Relational Designs

- it is possible to generate a set of relation schemas directly from the E-R design
- the quality of the schemas depends on the quality of the E-R design

<img src="img/Snip20190927_26.png" width=80%/>



## Design Alternative: Larger Schemas

**Example** Instead of having the schemas *instructor* and *department*, we have the schema *inst_dept(ID, name, salary, dept_name, building, budget)*, which is equivalent to the result of a natural join on the relations corresponding to *instructor* and *department*

- we have repeated the department information ("building" and "budget") for each instructor in the department
- it is important that *all these tuples agree as to the budget amount* otherwise the database will be inconsistent
    - in the original design, we store the amount of each budget exactly once in *department*, this suggests that using *inst_dept* is a bad idea since it **stores the budget amounts redundantly and runs the risk that some user might update the budget amount in one tuple but not all**, and thus create inconsistency
- also, suppose we are creating a new department in the university, in the design of *inst_dept*, we **cannot represent directly the information concerning a department** unless that department has at least one instructor 


## Design Alternative: Smaller Schemas

**Example** Suppose we somehow started with *inst_dept*, how would we recognize that it requires repetition of information and should be split

- by observing the contents of actual relations on schema *inst_dept*, we could note the repetition of info resulting from having to list the building and budget once for each instructor associated with a department
- this approach also does not allow us to determine whether the lack of repetition is a special case or whether it is a manifestation of a general rule

**Functional Dependency**: if we want to enforce the rule that "each specific value for *dept_name* corresponds to at most one *budget*" even in cases where *dept_name* is not the primary key for the schema in question
- we need to write a rule that says "if there were a schema(*dept_name*, *budget*), the *dept_name* is able to serve as the primary key"
- $\textrm{dept_name} \rightarrow \textrm{budget}$

Given rules like the functional dependency, we can then recognize the problem in the *inst_dept* schema, because *dept_name* cannot be the primary key for *inst_dept*






# Database Normalization Motivation

- redundancy in database leads to troublesome anomalies
    - **update anomalies**: a repeated value may be changed in one place but not in another place
    - **insertion anomalies**: in order to insert one value, it becomes necessary to insert some unrelated value
    - **deletion anomalies**: deleting one type of information leads to the loss of an another unrelated type of information

- **Example** *inst_dept(ID, name, salary, dept_name, building, budget)*
    - if one department changes its name, we need to update many rows
    - if an instructor is inserted, we must include the instructor's department budget in the same row
    - if all ECE instructors are deleted, the ECE department no longer has any representation in the database
    
- redundancy can be removed by decomposing attributes and relations, the price is
    - additional processing needed to compute joins
    - additional space needed for tables and indexes
    - must enforce referential integrity constraints
- usually the price is right
    - computation and memory/storage becoming less expensive
    - joins and referential integrity checks can be fast (with small tables and good indexes)
    - dealing with anomalies may require human intervention (slow and costly)

# Repetition Diagnosis and Remedy

- repetition within one tuple
    - the value domains of some attributes are not **atomic**, one value may encode multiple pieces of information
    - *section(course_id, sec_id, semester, year, building, room_number)*
    - ('ECE356', '001', 'W**13**', '**2013**', '**E2**', '**E2**-1303')
    - **Remedy**: break up value domains to create atomic domains
- repetition between tuples
    - attribute values in different tuples are related by **functional dependencies**: one subset of attributes **functionally determines** the values of another subset (a type of constraint present in real data)
    - *inst_dept(ID, name, salary, dept_name, building, budget)*
    - department name, building, and budget is repeated
    - *dept_name $\rightarrow$ building, budget*
    - **Remedy**: decompose relations to avoid specific types of FD's


# Relational Decomposition

- the goal is to produce a **lossless-join decomposition**: decomposition of relation schema $R$ into schemas $R_1$ and $R_2$ such that for every instane $r(R)$, letting $r_1(R_1)$ and $r_2(R_2)$ denote the corresponding decomposed instances, $r=r_1\bowtie r_2$ holds
    - $r=r_1\bowtie r_2$ must hold for every possible relation instance $r$ and corresponding instances $r_1$, $r_2$
- beware of **lossy decompositions**, in which relation $r$ connot always be reconstructed by joining $r_1$ and $r_2$

<img src="img/Snip20190927_27.png" width=80%/>

<img src="img/Snip20190927_28.png" width=80%/>

# Roadmap

1. define a theory of functional dependencies
2. use functional dependencies to decide whether a particular relation schema R is in "good" form
3. if $R$ is not in "good" form, we decompose it into a set of relation schemas $\{R_1, R_2, ..., R_n\}$ such that
    - each of the new relation schemas is in "good" form
    - the decomposition is a lossless-join decomposition
4. we will study the precise definitions of various levels of "goodness" called **normal forms**, which can be achieved by applying specific decomposition procedures

# First Normal Form (1NF)

- a value domain is **atomic** if its elements are considered to be *indivisible units*
    - examples of non-atomic domains:
        - multi-valued and composite attributes
        - identifiers that can be broken up into parts
- a relational schema $R$ is in **first normal form** if the domains of all attributes of $R$ are atomic
- non-atomic domains are bad
    - complicate storage
    - encourage redundancy
    - lead to information being encoded in business logic

# Theory of Functional Dependencies

- **Functional Dependencies** (FD) are constraints on the set of **legal relations**, ones that conform to some conceptual model of the data, which itself is guided by our informal understanding of the world
- FDs state that the value for a certain set of attributes determines (i.e., constrains) uniquely the value for another set of attributes
    - an FD is a generalization of the notion of a *key*
    - e.g., *dept_name* $\rightarrow$ *building, budget*
    - "dept_name **functionally determines** building and budget"


- Let $R$ be a relation schema where $\alpha \subseteq R$ and $\beta \subseteq R$
- the **functional dependency $\alpha \rightarrow \beta$ holds on a relation instance $r(R)$** if and only if whenever any two tuples $t_1$ and $t_2$ of $r$ agree on the attributes $\alpha$, they also agree on the attributes $\beta$
    - $t_1[\alpha] = t_2[\alpha] \rightarrow t_1[\beta] = t_2[\beta]$
- the **functional dependency $\alpha \rightarrow \beta$ holds on $R$** if and only if for **any** legal relation instance $r(R)$, the functional dependency holds on $r$


- $K$ is a superkey for relation schema $R$ if and only if $K\rightarrow R$
- $K$ is a candidate key for $R$ if and only if 
    - $K$ is a superkey, $K\rightarrow R$
    - $K$ is minimal, there is no $\alpha \subset K$ such that $\alpha \rightarrow R$


- FDs allow us to express constraints that cannot be expressed using superkeys

## Use of Functional Dependencies

- test relations instances to see if they are legal under a given set of functional dependencies
    - given a relation schema $R$, if a relation instance $r(R)$ is legal under a set $F$ of functional depenencies (i.e., every F.D. in $F$ holds on $r$) then we say **$r$ satisfies $F$**
- specify constraints on the set of legal relations
    - given a relation schema $R$, if every legal relation instance $r(R)$ satisfies a set $F$ of functional dependencies then we say **$F$ holds on $R$**
    
- a specific instance of a relation may satisfy a functional dependency even if the functional dependency does not hold on all legal instances of the relation

- a functional dependency is **trivial** if it is satisfied by all instances of a relation


# Closure of a Set of FDs

- the set of **all** functional dependencies logically implied by $F$ is the *closure* of $F$
    - denoted by $F^+$
    - in general $F^+ \supseteq F$
- given a set $F$ of FDs, there may be certain other FDs that are not in $F$ but are logically implied by those in $F$

# Armstrong's Axioms

- $F^+$ can be found y repeatedly applying **Armstrong's Axioms**
    - **reflexivity**: if $\beta \subseteq \alpha$, then $\alpha \rightarrow \beta$
    - **augmentation**: if $\alpha \rightarrow \beta$, then $\gamma\alpha \rightarrow \gamma\beta$
    - **transitivity**: if $\alpha \rightarrow \beta$, and if $\beta \rightarrow \gamma$, then $\alpha \rightarrow \gamma$

- these axioms (**inference rules**) are
    - **sound**: generate only functional dependencies that hold
    - **complete**: generate *all* functional dependencies that hold

**Example** $R=(A,B,C,G,H,I)$, $F=\{A\rightarrow B, A\rightarrow C, CG \rightarrow H, CG \rightarrow I, B \rightarrow H\}$

- obtain additional members of $F^+$ by applying the axioms
    - $A\rightarrow H$
        - by transitivity from $A\rightarrow B$ and $B \rightarrow H$
    - $AG \rightarrow I$
        - augment $A \rightarrow C$ with $G$, then $AG \rightarrow CG$
        - since $CG \rightarrow I$, so $AG \rightarrow I$
    - $CG \rightarrow HI$
        - augment $CG \rightarrow I$ with $CG$, then $CG \rightarrow CGI$
        - augment $CG \rightarrow H$ with $I$, then $CGI \rightarrow HI$
        - then by transitivity, $CG \rightarrow HI$

# Procedure for Computing $F^+$

- input: $F$ (a set of FDs)
- $F^+ := F$

```
repeat
    for each functional dependency f in F+
        apply reflexivity and augmentation rules on f
        add the resulting FD to F+
    for each pair of FD f1 and f2 in F+
        if f1 and f2 can be combined using transitivity
            then add the resulting FD to F+
until F+ stops growing
output F+
```

# Additional Inference Rules

- **union**:
    - given $\alpha \rightarrow \beta$ and $\alpha \rightarrow \gamma$
    - then $\alpha \rightarrow \beta\gamma$
- **decomposition**: 
    - given $\alpha \rightarrow \beta\gamma$
    - then $\alpha \rightarrow \beta$ and $\alpha \rightarrow \gamma$
- **pseudotransitivity**:
    - given $\alpha \rightarrow \beta$ and $\gamma\beta \rightarrow \delta$
    - then $\alpha\gamma \rightarrow \delta$

# Closure of Attribute Sets

- given a set of attributes $\alpha$, the closure of $\alpha$ under $F$ (denoted by $\alpha^+$) is defined as the set of attributes that are functionally determined by $\alpha$ under $F$

**Algorithm**

<img src="img/Snip20190930_41.png" width=60%/>

**Example** $R=(A,B,C,G,H,I)$, $F=\{A\rightarrow B, A\rightarrow C, CG \rightarrow H, CG \rightarrow I, B \rightarrow H\}$, find $(AG)^+$

1. result = $AG$
    - add $B$ and $C$, since $A\rightarrow B$ and $A \rightarrow C$ and $A\subseteq AG$
2. result = $ABCG$
    - add $H$, since $CG \rightarrow H$ and $CG \subseteq ABCG$
3. result = $ABCGH$
    - add $I$, since $CG \rightarrow I$ and $CG \subseteq ABCGH$
4. result = $ABCGHI$
    - done since result = $R$


# Uses of Attributes Set Closure

- let $R$ denote a relation schema, and let $\alpha$ denote a set of attributes
- **Testing for a Superkey**
    - to set whether $\alpha$ is a superkey of $R$, compute $\alpha^+$ and check that $\alpha^+$ contains all attributes of $R$
- **Testing individual functional dependencies**
    - to test whether a FD $\alpha \rightarrow \beta$ holds on $R$, compute $\alpha^+$ and check that $\beta \subseteq \alpha^+$
- **Computing the closure $F^+$ of a set of FDs**
    - for each $\gamma \subseteq R$
         1. find the closure $\gamma^+$
         2. for each $S\subseteq \gamma^+$, output the FD $\gamma \rightarrow S$

**Example** $R=(A,B,C,G,H,I)$, $F=\{A\rightarrow B, A\rightarrow C, CG \rightarrow H, CG \rightarrow I, B \rightarrow H\}$, we have $(AG)^+ = ABCGHI$, is $AG$ a candidate key?

1. is $AG$ a super key?
    - test if $AG \rightarrow R$, which is equivalent to $(AG)^+ \supseteq R$
        - spoiler: yes
2. test if $AG$ is minimal (i.e., no subset of $AG$ is a superkey)
    - test if $A \rightarrow R$, which is equivalent to $(A)^+ \supseteq R$ hold
    - test if $G \rightarrow R$, which is equivalent to $(G)^+ \supseteq R$ hold


# Canonical Cover

- sets of functional dependencies may have redundant dependencies that can be inferred from the others
    - e.g., $A \rightarrow C$ is redundant in $\{A \rightarrow B, B \rightarrow C, A \rightarrow C\}$

- parts of a functional dependency may be redundant
    - e.g., on RHS: $\{A \rightarrow B, B \rightarrow C, A \rightarrow CD\}$
        - can be simplified to $\{A \rightarrow B, B \rightarrow C, A \rightarrow D\}$
    - e.g., on LHS: $\{A \rightarrow B, B \rightarrow C, AC \rightarrow D\}$
        - can be simplified to $\{A \rightarrow B, B \rightarrow C, A \rightarrow D\}$

**Canonical Cover** a canonical cover of $F$ is a "minimal" set of functional dependencies equivalent to $F$, having no redundant dependencies or redundant parts of dependencies

- formally, a **canonical cover** for a set $F$ of FDs is a set $F_C$ of FDs such that
    - 1) $F$ logically implies all dependencies in $F_C$
    - 2) $F_C$ logically implies all dependencies in $F$
    - 3) No functional dependency in $F_C$ contains an extraneous attribute
    - 4) each left side of a functional dependency in $F_C$ is unique

# Extraneous Attributes

- given a set $F$ of functional dependencies and a functional dependency $\alpha \rightarrow \beta$ in $F$
    - attribute $A$ is **extraneous** in $\alpha$ if $A\in \alpha$ and 
        - $F$ logically implies $(F - \{\alpha \rightarrow \beta\})\cup\{(\alpha - A) \rightarrow \beta\}$
    - attribute $B$ is **extraneous** in $\beta$ if $B\in \beta$ and the set of functional dependencies
        - $(F - \{\alpha \rightarrow \beta\})\cup\{\alpha \rightarrow (\beta - B)\}$ logically implies $F$

- logical implications ("$X$ logically implies $Y$") means that any FD in $Y$ can be obtained from the FDs in $X$ using Armstrong's Axioms
    - prove using attribute set closures

## Testing if an Attribute is Extraneous

Consider a set $F$ of FDs and a FD $\alpha\rightarrow\beta$ in $F$, we need two separate tests for extraneousness depending on whether the attribute being tested is part of $\alpha$ of $\beta$ (left or right)

- To test whether attribute $A\in \alpha$ is extraneous in $\alpha$
    - 1) compute $(\{\alpha - A\})^+$ using the dependencies in $F$
    - 2) A is extraneous in $\alpha$ if and only if $(\{\alpha - A\})^+$ contains $\beta$

- To test whether attribute $B \in \beta$ is extraneous in $\beta$
    - 1) compute $\alpha^+$ using only the dependencies in $G = (F - \{\alpha \rightarrow \beta\})\cup\{\alpha \rightarrow (\beta - B)\}$
    - 2) B is extraneous in $\beta$ if and only if $\alpha^+$ contains B

**Example** $R=(A,B,C,D)$, $F=\{A\rightarrow C, AB\rightarrow C\}$
- $B$ is extraneous in $AB\rightarrow C$ because $F$ logically implies $(F-\{AB\rightarrow C\})\cup\{A\rightarrow C\} = \{A\rightarrow C\}$

**Example** $R=(A,B,C,D)$, $F=\{A\rightarrow C, AB\rightarrow CD\}$
- $C$ is extraneous in $AB\rightarrow CD$ because $(F-\{AB\rightarrow CD\})\cup \{AB\rightarrow D\} = \{A\rightarrow C, AB\rightarrow D\}$ logically implies F

# Computing a Canonical Cover

**input**: $F$ (set of FDs)

$F_c := F$

- **repeat**
    - use the union rule to replace any pair of dependencies $\alpha_1 \rightarrow \beta_1$ and $\alpha_1 \rightarrow \beta_2$ in $F_c$ with $\alpha_1 \rightarrow \beta_1 \beta_2$
    - look for a functional dependency $\alpha \rightarrow \beta$ in $F_c$ with an extraneous attribute either in $\alpha$ or in $\beta$
    - if found an extraneous attribute in $\alpha$, delete it from $\alpha$ in $\alpha\rightarrow\beta$
    - **DO NOT DELETE ATTRIBUTES FROM BOTH $\alpha$ AND $\beta$ in the same iteration**
- **until** $F_c$ does not change
- **output** $F_c$

**Note**: if all attributes in a functional dependency $\alpha \rightarrow \beta$ are extraneous (or if $\alpha =\beta =\emptyset$) then the dependency should be removed

**Example** $R=(A, B, C)$, $F=\{A\rightarrow BC, B\rightarrow C, A\rightarrow B, AB\rightarrow C\}$
- combine $A\rightarrow BC$ and $A\rightarrow B$ into $A\rightarrow BC$
    - $F_c$ becomes $F=\{A\rightarrow BC, B\rightarrow C, AB\rightarrow C\}$
- test if $A$ is extraneous in $AB\rightarrow C$
    - check whether $F=\{A\rightarrow BC, B\rightarrow C, AB\rightarrow C\}$ logically implies $B\rightarrow C$
    - yes, because the set on the left includes $B\rightarrow C$
    - $F_c$ becomes $F=\{A\rightarrow BC, B\rightarrow C\}$
- test if $C$ is extraneous in $A\rightarrow BC$
    - check whether $F=\{A\rightarrow B, B\rightarrow C\}$ logically implies $A\rightarrow BC$
    - yes, by transitivity with $\{A\rightarrow B, B\rightarrow C\}$
    - $F_c$ becomes $\{A\rightarrow B, B\rightarrow C\}$
- result $\{A\rightarrow B, B\rightarrow C\}$ is a canonical cover for $F$
