# Normalization

Normalization is a fundamental principle in relational database design that ensures data integrity, eliminates redundancy, and creates maintainable schemas. Understanding normalization helps you design databases that are robust, efficient, and accurately represent your domain.


## The Purpose of Normalization

Normalization addresses several critical problems that arise in poorly designed databases:


### Problems Solved by Normalization

**1. Update Anomalies**
```
Bad Design: Storing employee department info in every project record
┌────────────┬──────────┬────────────┬─────────────┐
│ project_id │ emp_name │ dept_name  │ dept_phone  │
├────────────┼──────────┼────────────┼─────────────┤
│ P1         │ Alice    │ Engineering│ 555-0100    │
│ P2         │ Alice    │ Engineering│ 555-0100    │  ← Redundant!
│ P3         │ Alice    │ Engineering│ 555-0100    │  ← Redundant!
└────────────┴──────────┴────────────┴─────────────┘

Problem: If department phone changes, must update multiple rows
Risk: Updates might be missed, creating inconsistencies
```


**2. Insertion Anomalies**
```
Bad Design: Cannot add a department without having projects
Problem: Department information only exists in project records
Result: Cannot represent departments that have no active projects
```

**3. Deletion Anomalies**
```
Bad Design: Deleting last project removes department information
Problem: Department data is tied to project existence
Result: Lose department records when all projects complete
```

**4. Data Redundancy**
```
Bad Design: Same information repeated in multiple rows
Problem: Wastes storage space
Result: Increases database size unnecessarily
```


### Benefits of Normalization

When properly normalized, databases achieve:

* **Data Integrity**: Each fact is stored in exactly one place
* **Consistency**: Updates propagate correctly without anomalies
* **Maintainability**: Changes are localized to specific tables
* **Clarity**: Schema structure reflects real-world entities clearly
* **Query Correctness**: Relationships are explicitly defined and enforced


## Classical Normalization: Codd's Normal Forms

Edgar F. Codd, the inventor of the relational model, developed a formal theory of normalization based on **functional dependencies** between attributes. His work established a progression of "normal forms"—increasingly strict rules for database design [@10.1145/358024.358054].


### Historical Context

Codd introduced normalization in the early 1970s, **before** the Entity-Relationship (ER) model was developed by Peter Chen in 1976. At that time, database design was primarily concerned with:
- Mathematical properties of relations (tables)
- Functional dependencies between attribute domains
- Decomposition of relations to eliminate anomalies

The conceptual framework of "entities" and "relationships" came later, providing a more intuitive way to think about normalization.


### The Classical Normal Forms

Codd and his successors defined a series of normal forms, each addressing specific types of problems:


#### First Normal Form (1NF)

**Rule**: All attributes must contain atomic (indivisible) values—no repeating groups or arrays.

**Violation Example**:
```
┌────────────┬─────────────────────────┐
│ student_id │ courses                 │
├────────────┼─────────────────────────┤
│ 1          │ Math, Physics, Chemistry│  ← NOT atomic!
└────────────┴─────────────────────────┘
```

**Normalized (1NF)**:
```
┌────────────┬───────────┐
│ student_id │ course    │
├────────────┼───────────┤
│ 1          │ Math      │
│ 1          │ Physics   │
│ 1          │ Chemistry │
└────────────┴───────────┘
```


#### Second Normal Form (2NF)

**Rule**: Must be in 1NF, and all non-key attributes must depend on the **entire** primary key (not just part of it).

**Violation Example** (composite primary key: student_id, course_id):
```
┌─────────────┬────────────┬──────────────┬──────────────┐
│ *student_id │ *course_id │ student_name │ course_title │
├─────────────┼────────────┼──────────────┼──────────────┤
│ 1           │ CS101      │ Alice        │ Databases    │
│ 1           │ CS102      │ Alice        │ Algorithms   │  ← Alice repeated!
└─────────────┴────────────┴──────────────┴──────────────┘

Problem: student_name depends only on student_id (part of PK)
Problem: course_title depends only on course_id (part of PK)
```

**Normalized (2NF)**:
```
Student table:           Course table:           Enrollment table:
┌─────────────┬──────┐   ┌────────────┬──────┐   ┌─────────────┬────────────┐
│ *student_id │ name │   │ *course_id │ title│   │ *student_id │ *course_id │
├─────────────┼──────┤   ├────────────┼──────┤   ├─────────────┼────────────┤
│ 1           │ Alice│   │ CS101      │ DB   │   │ 1           │ CS101      │
│ 2           │ Bob  │   │ CS102      │ Algo │   │ 1           │ CS102      │
└─────────────┴──────┘   └────────────┴──────┘   └─────────────┴────────────┘
```


#### Third Normal Form (3NF)

**Rule**: Must be in 2NF, and no non-key attribute depends on another non-key attribute (no transitive dependencies).

**Violation Example**:
```
┌─────────────┬──────┬─────────┬────────────┐
│ *student_id │ name │ dept_id │ dept_name  │
├─────────────┼──────┼─────────┼────────────┤
│ 1           │ Alice│ CS      │ Comp Sci   │
│ 2           │ Bob  │ CS      │ Comp Sci   │  ← Dept name repeated!
└─────────────┴──────┴─────────┴────────────┘

Problem: dept_name depends on dept_id, which depends on student_id
This is a transitive dependency: student_id → dept_id → dept_name
```

**Normalized (3NF)**:
```
Student table:                    Department table:
┌─────────────┬──────┬─────────┐   ┌──────────┬──────────┐
│ *student_id │ name │ dept_id │   │ *dept_id │ name     │
├─────────────┼──────┼─────────┤   ├──────────┼──────────┤
│ 1           │ Alice│ CS      │   │ CS       │ Comp Sci │
│ 2           │ Bob  │ CS      │   │ MATH     │ Math     │
└─────────────┴──────┴─────────┘   └──────────┴──────────┘
```


### Functional Dependencies

The classical normal forms are rooted in the concept of **functional dependencies**:

A functional dependency `A → B` means:
- "Attribute A functionally determines attribute B"
- If you know A, you can determine B uniquely
- Example: `student_id → student_name` (student ID determines name)

**Normalization process**:
1. Identify all functional dependencies
2. Decompose relations to eliminate:
   - Partial dependencies (violate 2NF)
   - Transitive dependencies (violate 3NF)
3. Each resulting relation contains attributes that depend on the primary key

**The famous mnemonic**: "Every non-key attribute must depend on the key, the whole key, and nothing but the key, so help me Codd!"

This mathematical approach, while rigorous, can be complex and difficult to apply intuitively.


## DataJoint's Entity-Centric Normalization

DataJoint takes a different, more intuitive approach to normalization that emerged after the development of the Entity-Relationship model. Rather than focusing on functional dependencies between attribute domains, DataJoint emphasizes **entities and their attributes**.


### The DataJoint Normalization Principle

> **Every relation (table) must represent a well-defined entity type, and all attributes in that table must describe that entity type directly and only that entity type.**

This principle guides schema design, ensuring that each table represents a coherent entity type.


### What This Means in Practice

#### Rule 1: One Entity Type Per Table

Each table should represent exactly one class or type of entity:


**Good (Normalized)**:


In [None]:
@schema
class Professor(dj.Manual):
    definition = """
    professor_id : int
    ---
    name : varchar(100)
    hire_date : date
    """

@schema
class Office(dj.Manual):
    definition = """
    office_number : varchar(10)
    ---
    building : varchar(50)
    phone : varchar(20)
    """


**Bad (Not Normalized)**:


In [None]:
@schema
class Professor(dj.Manual):
    definition = """
    professor_id : int
    ---
    name : varchar(100)
    hire_date : date
    office_number : varchar(10)   # Describes office, not professor!
    office_building : varchar(50)  # Describes office, not professor!
    office_phone : varchar(20)     # Describes office, not professor!
    """


**Why it's bad**: The office attributes describe the office, not the professor. A professor might change offices, but the office's building and phone don't change. These are properties of the office entity, not the professor entity.


#### Rule 2: All Attributes Must Describe ONLY That Entity

Every attribute in a table should be an intrinsic property of the entity represented by that table's primary key:

**Questions to ask**:
- Does this attribute describe the entity identified by this row's primary key?
- Would this attribute still apply if the entity's relationships changed?
- Is this attribute a permanent property of this entity?


**Example: Customer and Account**


In [None]:
# Good: Attributes describe the customer
@schema
class Customer(dj.Manual):
    definition = """
    customer_id : int
    ---
    name : varchar(100)           # Property of customer
    date_of_birth : date          # Property of customer
    social_security : varchar(11) # Property of customer
    """

# Good: Attributes describe the account
@schema
class Account(dj.Manual):
    definition = """
    account_number : int
    ---
    -> Customer
    open_date : date              # Property of account
    balance : decimal(10,2)       # Property of account
    account_type : varchar(20)    # Property of account
    """


The foreign key `-> Customer` is not an attribute OF the account—it's a **relationship** between account and customer. The account belongs to a customer, but the customer's identity is not a property of the account itself.


#### Rule 3: Relationships in Separate Tables When Needed

When entities relate to each other, sometimes that relationship itself needs to be represented as a separate entity:

**Example: Professor Office Assignment**


In [None]:
@schema
class Professor(dj.Manual):
    definition = """
    professor_id : int
    ---
    name : varchar(100)
    hire_date : date
    """

@schema
class Office(dj.Manual):
    definition = """
    office_number : varchar(10)
    ---
    building : varchar(50)
    phone : varchar(20)
    """

@schema
class ProfessorOfficeAssignment(dj.Manual):
    definition = """
    -> Professor
    ---
    -> Office
    assignment_date : date        # Property of the ASSIGNMENT
    """


**Why separate?**:
- The assignment is not a property of the professor (professors can change offices)
- The assignment is not a property of the office (offices can be reassigned)
- The assignment is an entity in itself, with its own properties (assignment_date)


### Normalization Requires Segregation

The process of normalization often requires **breaking a single table into multiple tables**, segregating information into distinct entity types:

#### Example: Denormalized Design


In [None]:
# BAD: Mixing customer, account, and transaction data
@schema
class CustomerAccountTransaction(dj.Manual):
    definition = """
    transaction_id : int
    ---
    customer_name : varchar(100)      # Customer property
    customer_email : varchar(100)     # Customer property
    account_number : int              # Account identity
    account_type : varchar(20)        # Account property
    transaction_amount : decimal(10,2)# Transaction property
    transaction_date : date           # Transaction property
    """


**Problems**:
- Customer info repeated for every transaction (redundancy)
- Account info repeated for every transaction (redundancy)
- Can't have customers without transactions (insertion anomaly)
- Can't have accounts without transactions (insertion anomaly)

#### Normalized Design


In [None]:
@schema
class Customer(dj.Manual):
    definition = """
    customer_id : int
    ---
    name : varchar(100)
    email : varchar(100)
    """

@schema
class Account(dj.Manual):
    definition = """
    account_number : int
    ---
    -> Customer
    account_type : varchar(20)
    """

@schema
class Transaction(dj.Manual):
    definition = """
    transaction_id : int
    ---
    -> Account
    amount : decimal(10,2)
    transaction_date : date
    """


**Benefits**:
- Each entity type is separate (Customer, Account, Transaction)
- No redundancy (customer info stored once)
- Can have customers without accounts, accounts without transactions
- Each table contains only attributes of that entity type


## Comparing Classical and DataJoint Normalization

### Classical Approach: Functional Dependencies

**Focus**: Mathematical properties of relations
**Question**: "What attributes determine what other attributes?"
**Method**: Identify functional dependencies, decompose to eliminate violations


**Example analysis**:
```
Relation: (student_id, course_id, student_name, dept_id, dept_name)

Functional dependencies:
- student_id → student_name, dept_id
- dept_id → dept_name
- course_id → (nothing in this relation)

Violations:
- Partial dependency: student_name depends on part of PK (violates 2NF)
- Transitive dependency: student_id → dept_id → dept_name (violates 3NF)

Decomposition:
- Student(student_id, student_name, dept_id)
- Department(dept_id, dept_name)
- Enrollment(student_id, course_id)
```

**Complexity**: Requires formal analysis of all functional dependencies. Can be difficult to apply intuitively.


### DataJoint Approach: Entity-Centric Normalization

**Focus**: Entities and their intrinsic properties
**Question**: "Does each attribute describe the entity identified by this row's primary key?"
**Method**: Design tables so each represents one entity type with only its own attributes


**Example analysis**:
```
Ask: What entity types do we have?
- Students (identified by student_id)
- Departments (identified by dept_id)
- Courses (identified by course_id)
- Enrollments (relationships between students and courses)

Design:
- Student table: Contains only student properties
- Department table: Contains only department properties
- Course table: Contains only course properties
- Enrollment table: Relates students to courses
```

**Simplicity**: Think about entities and what properties belong to them. Much more intuitive than analyzing functional dependencies.


### The Key Insight

Both approaches lead to the same result, but through different reasoning:

**Classical normalization**: "Eliminate functional dependency violations"
**DataJoint normalization**: "Separate distinct entity types"

DataJoint's approach is easier to understand and apply because it maps directly to how we conceptualize domains: as collections of entities with properties, not as collections of attributes with dependencies.


## Immutability and Workflow: Core Design Principles

DataJoint's normalization philosophy extends beyond avoiding redundancy—it fundamentally shapes how you think about data manipulation and schema evolution. Two principles are central: **immutability of tuples** and **schemas as workflows**.


### Principle 1: Entities as Immutable Tuples

A well-normalized DataJoint schema is designed so that all data manipulations are accomplished through **insert and delete operations only**, without requiring updates to existing rows.

#### Rows as Immutable Objects

**Concept**: Each row (tuple) in a table represents an immutable entity. Once created, the entity cannot be modified—only created or destroyed.

**Operations allowed**:
- ✅ **INSERT**: Create a new entity
- ✅ **DELETE**: Remove an entity
- ❌ **UPDATE**: Modify an existing entity's attributes (avoided)


**Why immutability?**

1. **Referential integrity operates on entire tuples**: Foreign keys establish relationships between complete rows, not individual attributes. When you reference a parent row, you're referencing its entire identity.

2. **History preservation**: Immutable records create an audit trail. Instead of updating a value, you delete the old record and insert a new one with the changed value.

3. **Consistency with constraints**: All constraints (foreign keys, unique indexes) relate entire tuples. Updates can violate constraints in subtle ways that inserts and deletes handle explicitly.

4. **Parallel processing**: Immutable data can be safely read by multiple processes without locks or race conditions.


#### Example: Mutable vs. Immutable Design

**Bad Design (Requires Updates)**:


In [None]:
@schema
class Mouse(dj.Manual):
    definition = """
    mouse_id : int
    ---
    date_of_birth : date
    sex : enum('M', 'F')
    cage_id : int                 # ✗ Will change over time!
    current_weight : decimal(5,2) # ✗ Changes frequently!
    """

# Typical operation: UPDATE mouse SET cage_id = 42 WHERE mouse_id = 1
# Problem: Requires modifying existing row


**Good Design (Insert/Delete Only)**:


In [None]:
@schema
class Mouse(dj.Manual):
    definition = """
    mouse_id : int
    ---
    date_of_birth : date          # ✓ Permanent property
    sex : enum('M', 'F')          # ✓ Permanent property
    """

@schema
class Cage(dj.Manual):
    definition = """
    cage_id : int
    ---
    location : varchar(50)        # ✓ Property of cage
    size : varchar(20)            # ✓ Property of cage
    """

@schema
class CageAssignment(dj.Manual):
    definition = """
    -> Mouse
    start_date : date
    ---
    -> Cage
    """

# Relocating a mouse: DELETE old assignment, INSERT new assignment
# Or simply add a new assignment, keeping history of all assignments
# No updates needed—immutable records


#### Why This Works

**Foreign keys relate entire tuples**:
```python
# When CageAssignment references Mouse:
-> Mouse

# It references the ENTIRE mouse entity (all its attributes together)
# Not just mouse_id in isolation
# The relationship is to the tuple: (mouse_id, date_of_birth, sex)
```

**Benefits**:
- Foreign key constraints check entire tuple existence
- No risk of partial updates breaking references
- Relationship semantics are clear: "this assignment relates to this specific mouse entity"


### Principle 2: Schemas as Workflows with Data Dependencies

DataJoint views schemas as **directed workflows** where downstream data depends on and is potentially derived from upstream data. This has profound implications for normalization.

#### Downstream Data Depends on Upstream Data

In a workflow schema:
- **Upstream tables**: Independent entities, source data
- **Downstream tables**: Derived results, computed values, dependent entities

**Critical insight**: Updating a secondary attribute in an upstream table may **invalidate** downstream data that was computed or derived from it.


#### Example: Data Invalidation

```
    RawImage                    ← Upstream: source data
       ↓
   PreprocessedImage            ← Derived from RawImage
       ↓
   SegmentedCells               ← Derived from PreprocessedImage
       ↓
   CellActivity                 ← Analyzed from SegmentedCells
```

**Scenario**: You discover an error in `RawImage` preprocessing parameters

**If you UPDATE `RawImage`**:
- ✗ All downstream data (`PreprocessedImage`, `SegmentedCells`, `CellActivity`) is now invalid
- ✗ No automatic mechanism to detect or fix the inconsistency
- ✗ Results may be incorrect, and you might not notice

**If you DELETE and INSERT**:
- ✓ Foreign key constraint violation: Can't delete `RawImage` while downstream data exists
- ✓ Forces you to delete entire pipeline downstream
- ✓ Then reinsert and recompute—ensures consistency
- ✓ Makes data dependencies explicit and enforceable


#### Why Updates Are Dangerous in Workflows

**Updates bypass referential integrity**:


In [None]:
PreprocessedImage.update1({"image_id": 42, "brightness": 1.5})

# But this image_id = 42 is referenced by downstream tables!
# Those downstream results are now based on outdated input
# No error raised, no warning given


**Delete-and-insert makes dependencies explicit**:


In [None]:
# Try to delete
Mouse.delete()  # Raises error if CageAssignment references it

# Must delete downstream first
CageAssignment.delete()  # Delete dependent data
Mouse.delete()           # Then delete source

# Or use cascade delete (if configured)
Mouse.delete_quick()     # Deletes mouse and all dependent data


### Design Principle: Permanent vs. Changeable Attributes

A crucial normalization decision: **What attributes can change over time?**

#### The Rule

**Separate changeable attributes into their own entity tables**: Associate only **permanent, intrinsic attributes** with each entity.

**Permanent attributes**: Properties that are fixed once the entity is created
**Changeable attributes**: Properties that may be updated during the entity's lifetime


#### Example 1: Animal Housing (from Lecture)

**Bad Design**:


In [None]:
@schema
class Mouse(dj.Manual):
    definition = """
    mouse_id : int
    ---
    date_of_birth : date          # ✓ Permanent
    sex : enum('M', 'F')          # ✓ Permanent
    cage_id : int                 # ✗ CHANGEABLE! Mouse can be relocated
    """

# Problem: When mouse is relocated, must UPDATE the mouse table
# This modifies the mouse entity, but location is not intrinsic to the mouse


**Good Design**:


In [None]:
@schema
class Mouse(dj.Manual):
    definition = """
    mouse_id : int
    ---
    date_of_birth : date          # ✓ Permanent property of mouse
    sex : enum('M', 'F')          # ✓ Permanent property of mouse
    """

@schema
class Cage(dj.Manual):
    definition = """
    cage_id : int
    ---
    location : varchar(50)        # ✓ Property of cage
    capacity : int                # ✓ Property of cage
    """

@schema
class CageAssignment(dj.Manual):
    definition = """
    -> Mouse
    assignment_date : date        # When assignment started
    ---
    -> Cage
    """

# Relocating a mouse:
# 1. DELETE from CageAssignment WHERE mouse_id = X
# 2. INSERT into CageAssignment (mouse_id, cage_id, assignment_date) VALUES (X, Y, today)
# No updates—mouse entity remains immutable


**Why this is better**:
- Mouse entity is truly immutable (permanent attributes only)
- Cage assignment is a separate entity (the relationship)
- History can be preserved (keep old assignments, add new ones)
- No updates needed—just insert/delete operations
_

#### Example 2: Employee Department Assignment

**Bad Design**:


In [None]:
@schema
class Employee(dj.Manual):
    definition = """
    employee_id : int
    ---
    name : varchar(100)           # ✓ Permanent (rarely changes)
    hire_date : date              # ✓ Permanent
    department_id : int           # ✗ CHANGEABLE! Employee can transfer
    """

# Problem: Employee transfers require UPDATE
# Modifies the employee entity unnecessarily


**Good Design**:


In [None]:
@schema
class Employee(dj.Manual):
    definition = """
    employee_id : int
    ---
    name : varchar(100)
    hire_date : date
    """

@schema
class Department(dj.Manual):
    definition = """
    department_id : int
    ---
    department_name : varchar(100)
    """

@schema
class DepartmentAssignment(dj.Manual):
    definition = """
    -> Employee
    assignment_date : date
    ---
    -> Department
    """

# Employee transfers:
# 1. End current assignment (DELETE or add end_date)
# 2. Create new assignment (INSERT)
# Employee entity never modified


#### Example 3: Product Pricing

**Bad Design**:


In [None]:
@schema
class Product(dj.Manual):
    definition = """
    product_id : int
    ---
    name : varchar(100)           # ✓ Permanent
    description : varchar(500)    # ✓ Permanent
    current_price : decimal(10,2) # ✗ CHANGEABLE! Prices fluctuate
    """

# Problem: Price changes require UPDATE
# Loses price history


**Good Design**:


In [None]:
@schema
class Product(dj.Manual):
    definition = """
    product_id : int
    ---
    name : varchar(100)           # ✓ Permanent
    description : varchar(500)    # ✓ Permanent
    """

@schema
class ProductPrice(dj.Manual):
    definition = """
    -> Product
    effective_date : date
    ---
    price : decimal(10,2)         # ✓ Price at this date
    """

# Price changes: INSERT new ProductPrice record
# Preserves history, no updates needed


### The Temporal Evolution Design Rule

**Design Rule**: Think about the typical evolution of your data over time. Attributes that change should be modeled as separate entities.

**Questions to ask**:
1. Will this attribute ever change during the entity's lifetime?
2. If it changes, does the entity become "a different entity" or just "the same entity with updated info"?
3. Do I need to preserve history of changes?

**Decision tree**:
```
Is this attribute permanent for the entity's lifetime?
│
├─ YES ──→ Include in entity table
│          Example: Mouse.date_of_birth, Mouse.sex
│
└─ NO ───→ Create separate association/time-series table
           Example: CageAssignment, WeightMeasurement
```


## Normalization in Schema Design

When designing a DataJoint schema, apply the normalization principle at each table:

### Step 1: Identify Entity Types

Ask: "What are the things (entities) in my domain?"

**Example domain**: University course management
- Students
- Professors  
- Courses
- Departments
- Enrollments (student taking a course)
- Course Offerings (course taught in a specific semester)


### Step 2: For Each Entity, List Its Intrinsic Properties

**Intrinsic properties**: Attributes that describe the entity itself, regardless of its relationships


In [None]:
@schema
class Student(dj.Manual):
    definition = """
    student_id : int
    ---
    name : varchar(100)           # ✓ Property of student
    date_of_birth : date          # ✓ Property of student
    email : varchar(100)          # ✓ Property of student
    # NOT: major_name              ✗ Property of major, not student
    # NOT: advisor_name            ✗ Property of professor, not student
    """


## The "Nothing But The Entity" Rule

A helpful way to remember DataJoint's normalization principle:

> **"Each table should contain attributes about the entity, the whole entity, and nothing but the entity."**

This is a variation of the classical mnemonic "the key, the whole key, and nothing but the key," but focused on entities rather than functional dependencies.

**Applied**:
- **The entity**: All attributes must describe the entity identified by the primary key
- **The whole entity**: Include all relevant intrinsic properties (don't split unnecessarily)
- **Nothing but the entity**: Exclude properties of related entities (use foreign keys instead)


## Summary

### Classical Normalization (Codd)
- **Foundation**: Functional dependencies between attributes
- **Goal**: Eliminate update, insertion, deletion anomalies
- **Method**: Decompose relations based on dependency analysis
- **Era**: Pre-Entity-Relationship model (early 1970s)
- **Focus**: Mathematical properties of relations

### DataJoint Normalization
- **Foundation**: Entities and their intrinsic properties
- **Goal**: Separate distinct entity types, eliminate redundancy
- **Method**: Design tables to represent one entity type each
- **Era**: Post-ER model (leverages conceptual clarity)
- **Focus**: Semantic meaning of entities and relationships
- **Key principles**: Immutability of tuples, schemas as workflows, permanent vs. changeable attributes


### The Unified Principle

> **Every table must have a well-defined entity type, and all attributes must describe that entity type directly.**

When this principle is followed:
- Update anomalies are eliminated (each fact stored once)
- Insertion anomalies are eliminated (entities can exist independently)
- Deletion anomalies are eliminated (deleting one entity doesn't affect others)
- Schema structure is clear (one entity type per table)
- Data integrity is maintained through immutable tuples and explicit dependencies

### Practical Application

When designing or reviewing a schema:

1. **For each table, ask**: "What entity type does this table represent?"
2. **For each attribute, ask**: "Is this an intrinsic property of that entity?"
3. **If no**: Move the attribute to its proper entity table or create a relationship table

This intuitive approach achieves the same rigor as classical normal forms but is much easier to apply in practice, especially in complex scientific and computational workflows.
