Relations take the form $R(A, B, ...)$ where
- $R$ is the name of the relation
- $A, B, ...$ is the set of attributes of the relation
  - Often write the set without commas: $A, B, ... \equiv AB ...$, and can 
    refer to a set of attributes as $\vec{A}$ (vector A)
  - The number of attributes n is the arity of the relation
    Can call $R(A_1, ... A_n)$ an n-ary relation
  - $Domain(A)$ is the set of values (type) that the attribute can have
  - Will use $Attrs(R)$ to find $A, B, ...$

  

- The extent of $R(A, B, ...)$ is the set of tuples: 
        $\{<v^A_{1}, v^B_{1}, \ldots>, <v^A_{2}, v^B_{2} \ldots>, <v^A_{3}, v^B_{3} \ldots>\}$
    - $\forall.v^A_{x} \in Domain(A)$
    - No duplicate tuples
    - Not ordered
    - All tuples have the same arity



### Set Semantics

- Order of columns not significant
- Order of rows not significant
- No duplicate rows

---

- Attribute $=$ Column
- Tuple     $=$ Row

### Relational Keys

##### Key
- A key of a relation $R(AB \ldots)$ is a subset of the attributes for which the
  values in any extent are unique across all tuples

---

- Every relation has at least one key, which is the entire set of attributes
- A key is **violated** by there being two tuples in the extent which have the
  same values for the attributes of the key.
- If $A$ is a key, then so must $AB$ be a key
- A **minimal** key is a set of attributes $AB \ldots$ for which no subset of
  the attributes is also a key
- The **primary key** is one of the keys of the relation: serves as the default
  key when no key explicitly stated

----

A **minimal key** can indeed range anywhere from a single column all the way to
**every column in the table** (the full set of attributes) <-- $Attrs(A)$>.

1. The Single Column Case $(Size=1)$

    If a single attribute is unique for every tuple (like a `Student_ID` or
    `Email`), that single attribute is a key.
    - Since you cannot have a subset smaller than one attribute (other than an
      empty set $\emptyset$, which tells you. nothing), this single column is
      automatically a **minimal key**.

2. The Full Set Case (Size = All Attributes)

    The first bullet point in your image states: *"Every relation has at least*
    *one key, which is the entire set of attributes."*
    - However, the entire set is only a **minimal key** if **no smaller subset**
      of it is unique.

##### Important Distinction

While a minimal key can be the full set, it usually isn't if a smaller key exists.
- **Rule:** As the third bullet points out (`If A is a key, then so must AB`
            `be a key`), superkeys contain keys.
- **Consequence:** If you have a minimal key of size 1 (e.g., column `A`), then
            the full set (e.g., `ABC...`) is **not** a minimal key anymore,
            because it contains `A`.

So the full set is only the minimal-key in the "worst-case scenario" where no
smaller combination of columns is sufficient to identify a row uniquely.

### Relational Foreign Keys

A **foreign key** $R(\vec{X}) \stackrel{fk}{\Rightarrow} S(\vec{Y})$ of a 
relation $R(AB \ldots)$ is a subset $\vec{X} \subseteq AB...$ of the attributes 
for which the values in the extent of $R$ also appear as values of attributes 
$\vec{Y}$ in the extent of $S$, and $\vec{Y}$ is a key of $S$.

##### Quick 1.4: Foreign Key Violation

$account(sortcode) \stackrel {fk}{\Rightarrow} branch(sortcode)$

Here is the breakdown of why B and D are safe, and why C is the violation.

##### The Golden Rule of Foreign Keys

The notation `account(sortcode) -> branch(sortcode)` means: "Every `sortcode`
listed in the `account` table **MUST already exist* in the `branch` table.*"

- Parent Table: `branch` (The source of truth for sortcodes).
- Child Table: `account` (It references the parent).

---

**Why C is the violation**

Option C: `DELETE FROM branch WHERE sortcode = 67`
- Action: You are trying to delete the "Strand" branch (sortcode 67).
- Check: Do any accounts currently use sortcode 67?
- Result: **Yes**. Look at the `account` table; the first two rows (Account 100
  and 101) both use sortcode *67*.
- The Problem: If you delete the branch, those two accounts become "orphans"--
  they point to a branch that no longer exists. the database will block this to
  protect the data.


**Why B is Safe**

Option B: `INSERT INTO  branch (78, 'Ealing', 1000.00)`
- Action: You are adding a new branch (Ealing, sortcode 78)
- Analaysis: Adding a new valid parent is always fine. It doesn't matter that no
        accounts use it yet. It's just an empty branch waiting for customers.


**Why D is Safe**

Option D: `DELETE FROM account ...`
- Action: You are deleting a specific account (Account 103).
- Analysis: You are allowed to delete a child. "Closing an account" doesn't hurt
        the branch. The branch (sortcode 34) still exists; it just has one less
        customer. This does not break any links.

This is one of the most common points of confusion in database design, so your
intuition isn't "weird"---it's just looking at the relationship from a 
functional perspective (buisness logic) rather than a structural perspective
(data integrity).

To clear this up, you have to flip your thinking from "Who provides the data?"
to "**Who must exist first?**"

---

1. The "Who Exists First?" Rule

In a database relationship involving a Foreign Key, the flow of dependency is
strict:
    - Parent (Referenced Table): Must exist independently. It holds the master
            list of valid IDs.
    - Child (Referencing Table): Cannot exist without the parent. It is
            dependent.

Apply this to your Bank example:
- Can you have a **Branch** (e.g. "Strand") with zero accounts? **Yes.** It's
    just a new building waiting for customers.
- Can you have an **Account** attached to a **Branch** that doesn't exist?
    **No.** If you try to create an account at sortcode 99, and sortcode 99 
    isn't in the branch table, the database rejects it.

Because the `account` table relies on the `branch` table to validate its 
existence, `account` is the Child and `branch` is the Parent.

---

2. The "One-to-Many" Hint

You mentioned the key in `account` is "non-minimal" (meaning there are 
duplicates; many accounts share the sortcode 67). This is actually the biggest
clue that `account` is the child.

- Parent side: Usuaully has the Primary Key (Unique). One distinct Branch.
- Child side: Has the Foreign Key (Non-Unique). Many different accounts pointing
  to that one branch.

Think of it like a mother duck (Parent) and her ducklings (Children). There is 
one mother, but she can have **many** ducklings following her. The ducklings
are "dependent" on the mother.

---

3. Decoding the Arrow

$account(sortcode) \stackrel{fk}{\Rightarrow} branch(sortcode)$

In database notation, the arrow always points *FROM* the Child *TO* the Parent.
    - It reads: *"The account table points to the branch table for validation."*

---

Summary Table

| Feature | Branch Table | Account Table |
| --- | --- | --- |
| Role | Parent | Child |
| Key Type | Primary Key (PK) | Foreign Key (FK) |
| Uniqueness | Unique (Sortcode appears once) | Non-Unique (Sortcode repeats) |
| Dependency | Independent | Dependent on Branch |
| Deletion Risk | High Risk (Deleting a branch breaks orphan accounts) | Safe (Deleting an account hurts nothing) |

##### Select Project Join (SPJ) queries

If a product of tables is formed, where a selection is then done that compares
the attributes of those tables, we say that a **join** has been performed.

Normally, not all columns of the product are retuened, and therefore a project
is also required.

$\pi_{bname, no} \sigma_{branch.sortcode=account.sortcode \land account.type='current'} (branch \times account)$

### Relational Algebra: Union $\cup$

| $\pi_{sortcode\;as\;id}account$ |
| --- |
| $\pi_{no\;as\;id}account$ |
| $\pi_{sortcode\;as\;id}account \cup \pi_{no\;as\;id}account$ |

---

- relations must be **union compatible**

##### Rules for Combining Operators

Since all operators product a relation as output, any operator may produce
one of the inputs to any other operator.

| well formed RA query |
| --- |
| the output of the nested operator must contain the attributes required by an outer $\pi$ or $\sigma$|
| the two inputs to a $\cup$ or $-$ must contain the same number of attributes|

---


### Derived Relational Algebra: Natural Join $\Join$

| Natural Join |
| --- |
| $R \Join S = \sigma_{R.A_1 = S.A_1 \land ... \land R.A_m = S.A_m}R \times S$ |

$branch \Join account = \sigma_{branch.sortcode=account.sortcode} branch \times account$

---

QUICK PATTERNS AND PITFALLS
- `NATURAL JOIN` equals equi-join on all same-named attributes, then drop 
  duplicates. Prefer explicit `USING` to avoid accidental matches.
- `USING(k1, k2, ...)` equals $\Join$ on equality of each listed key, with
  merged columns for those keys.
- **Theta vs equi**: equi-join is a special case of theta. Natural join is a
  special case of equi with column-name matching and de-duplication.
- **Duplicate rows:** SQL may output duplicates. RA defaults to sets. Insert 
  $\delta$ if you need distinct.
- **Join reorder:** inner joins are assosciative and commutative under sets. 
  Outer joins are not; preserve order.
- **Selections pushdown:** $\sigma$ before joins reduce cost: $\sigma_{\theta} (R \times S)$ often becomes $(\sigma_{\theta_R}(R)) \Join (\sigma_{\theta_S}(S))$ 
when predicates split.

In RA, $\rho$ stands for the Rename operator.

**What it does**

It allows you to assign a temporary name to a table (relation) or its colummns.
This is crucial when you need to use the same table twice in a single query 
(a self-join) because you need a way to distinguish the "left" version from the
"right" version

- Without this renaming, the database wouldn't know which "version" of the 
  columns you are referring to (e.g. if $R$ has a column `ID`, asking for 
  `R.ID` would be ambiguous).

---

| SQL Language Components |
| --- |
| **Data Definition Language (DDL)**: a relational schema with data |
| **Data Manipulation Language (DML)**: a relational query and update language |

##### SQL DDL: SQL Data Types

| Some SQL Data Types | | 
| --- | --- |
| Keyword | Semantics |
| BOOLEAN | A logical value (TRUE, FALSE or UNKNOWN) |
| BIT | 1 bit integer (0, 1, or NULL) |
| SMALLINT | 16 bit integer |
| INTEGER | 32 bit integer |
| BIGINT | 64 int integer |
| FLOAT(n) | A n-bit mantissa floating point number |
| REAL | 32 bit floating point number ($\equiv$ FLOAT(24)) |
| DOUBLE PRECISION | 64-bit floating point number ($\equiv$ FLOAT(53)) |
| DECIMAL(p,s) | A p digit number with s digits after the decimal point |
| CHAR(n) | A fixed length string of n characters |
| VARCHAR(n) | A varying length string of n characters |
| DATE | A calendar date (day, month and year) |
| TIME | A time of day (seconds, minutes, hours) |
| TIMESTAMP | time and day together |
| ARRAY | An ordered list of a certain datatype |
| MULTISET | A bag (i.e. unordered list) of a certain datatype |
| XML | XML text |



---