### Week 3 - The Relational Model

![image.png](attachment:image.png)

#### 3-1 A Logical View of Data

- Shift from Physical to Logical:

- The relational model simplifies database design by focusing on the logical representation of data and relationships, hiding the complex physical storage details. This is analogous to an automatic transmission in a car, making things easier for the user.  
- Tables as the Core: The relational model uses tables to represent data, making it conceptually similar to files. This logical simplicity makes it easier to understand and design databases compared to older models like hierarchical or network models.  

3-1a Tables and Their Characteristics

- Tables (Relations): Tables are the fundamental building blocks of the relational model. They are perceived as two-dimensional structures with rows and columns.
- Rows (Tuples): Each row represents a single entity occurrence (e.g., a student in a STUDENT table).
- Columns (Attributes): Each column represents an attribute (e.g., student name, student ID).
- Data Types: Data within columns must conform to the same data format. Common data types include:
    - Numeric (for calculations)
    - Character (text)
    - Date (calendar dates)
    - Logical (true/false)
- Domains: Each column has a specific range of permissible values (e.g., GPA range of 0-4).
- Order Insignificance: The order of rows and columns doesn't matter to the database management system (DBMS).
- Primary Key (PK): Each table must have a primary key, which is an attribute or a combination of attributes that uniquely identifies each row.

Key Takeaways:

- The relational model prioritizes a logical view of data using tables.
- Tables consist of rows (tuples) and columns (attributes).
- Data types, domains, and primary keys are essential concepts in relational databases.
- The Relational Model is much more simple than previous data models.

#### 3-2 Keys

- Importance of Keys: Keys are crucial in relational databases for:
    - Uniquely identifying each row in a table.
    - Establishing relationships between tables.
    - Ensuring data integrity.
- Key Definition: A key is one or more attributes that determine other attributes.

3-2a Dependencies

- Determination: Knowing the value of one attribute allows you to determine the value of another.
- Functional Dependence: The value of one or more attributes determines the value of one or more other attributes.
    - Notation: A → B (A determines B)
    - Determinant (Key): The attribute that determines another (A).
    - Dependent: The attribute whose value is determined (B).
- Full Functional Dependence: All attributes in the determinant are necessary for the relationship.
- Composite Key: A key composed of more than one attribute.
- Key Attribute: An attribute that is part of a key.

3-2b Types of Keys

- Superkey: A key that uniquely identifies any row in a table.
- Candidate Key: A minimal superkey (no unnecessary attributes).
- Primary Key (PK): The chosen candidate key used to uniquely identify rows.
    - Entity Integrity:
        - All PK values must be unique.
        - No PK attribute can be null.
- Null Values: Absence of a data value. Should be avoided in PKs and generally.
- Foreign Key (FK): The PK of one table placed into another table to create a common attribute.
    - Referential Integrity: Every FK value must either be null or a valid PK value in the related table.
- Secondary Key: Used for data retrieval purposes, doesn't require functional dependency. Effectiveness depends on how restrictive it is.

Key Takeaways:

- Keys are essential for data integrity and relationships.
- Understanding different types of keys (superkey, candidate key, primary key, foreign key, secondary key) is crucial for database design.
- Functional dependency and full functional dependency are important concepts for determining keys.
- Null values should be avoided, especially in primary keys.
- Foreign keys enforce referential integrity between tables.

#### 3-3 Integrity Rules

- Importance: Integrity rules are fundamental for good database design.
- Enforcement: Relational Database Management Systems (RDBMS) can automatically enforce these rules.
- Key Rules:
    - Entity Integrity: Ensures each row has a unique identity.
        - Requirement: All primary key (PK) entries are unique and not null.
        - Purpose: Each row is uniquely identifiable, and foreign keys (FK) can properly reference PK values.
        - Example: Invoices have unique, non-null invoice numbers.
    - Referential Integrity: Ensures valid references between tables.
        - Requirement: FK values must either be null (if not part of the table's PK) or match a PK value in the related table.
        - Purpose: Every FK reference is valid. Prevents deletion of rows with mandatory matching FK values in other tables.
        - Example: Customers might not have an assigned sales representative (agent), but if they do, the agent number must exist in the AGENT table.

Figure 3.3 Illustration

- CUSTOMER Table:
    - CUS_CODE is the PK, with unique and non-null values.
    - AGENT_CODE is an FK referencing the AGENT table.
- AGENT Table:
    - AGENT_CODE is the PK, with unique and non-null values.
- Null Handling:
    - The AGENT_CODE in the CUSTOMER table can be null (e.g., customer without an assigned agent).
    - Flags: Special codes like -99 can be used instead of nulls, requiring a dummy row in the related table.

Other Integrity Constraints

- NOT NULL: Ensures a column cannot have null values.
- UNIQUE: Ensures a column has no duplicate values.

Key Takeaways:

- Integrity rules ensure data consistency and accuracy in relational databases.
- Entity integrity focuses on the uniqueness and non-nullability of primary keys.
- Referential integrity ensures valid relationships between tables through foreign keys.
- Null values can be handled using flags in some cases.
- NOT NULL and UNIQUE constraints provide additional data integrity.

#### 3-4 Relational Algebra
Relational Algebra Operators

- PROJECT (π):
    - Selects specific columns (attributes) from a table.
    - Produces a vertical subset of the table.
    - Notation: π<sub>attribute1, attribute2</sub>(table)
- UNION (∪):
    - Combines rows from two union-compatible tables (same number of columns, compatible domains).
    - Removes duplicate rows.
    - Notation: table1 ∪ table2
- INTERSECT (∩):
    - Yields rows that exist in both union-compatible tables.
    - Notation: table1 ∩ table2
- DIFFERENCE (-):
    - Yields rows from one table that are not found in another union-compatible table.
    - Order matters (table1 - table2 is different from table2 - table1).
    - Notation: table1 - table2
- PRODUCT (×):
    - Creates all possible pairs of rows from two tables (Cartesian product).
    - If table1 has m rows and table2 has n rows, the result has m * n rows.
    - Notation: table1 × table2
- JOIN (⋈):
    - Combines information from two or more tables based on common attributes.
    - Natural Join:
        - Selects rows with matching values in common attributes.
        - Eliminates duplicate columns.
        - Notation: table1 ⋈ table2
    - Equijoin:
        - Links tables based on an equality condition.
        - May have duplicate columns.
        - A type of theta join.
    - Theta Join (⋈θ):
        - Links tables using any comparison operator (e.g., <, >, =, ≠).
        - Notation: table1 ⋈θ table2
    - Inner Join:
        - Returns only matched rows.
    - Outer Join:
        - Returns matched rows plus unmatched rows from one or both tables.
        - Left Outer Join (⟕): Returns all rows from the left table.
        - Right Outer Join (⟖): Returns all rows from the right table.
- DIVIDE (÷):
    - Answers queries about one set of data being associated with all values in another set.
    - Uses a double-column dividend table and a single-column divisor table.
    - Notation r ÷ s

Key Takeaways:

- Relational algebra provides a formal way to manipulate and retrieve data from relational databases.
- Operators like PROJECT, UNION, INTERSECT, DIFFERENCE, and PRODUCT are fundamental.
- JOIN is a powerful operator for combining data from multiple tables, with different types (natural, equi, theta, inner, outer) serving specific purposes.
- DIVIDE is used for specific query types.