# dbApps06c Task: Normalization & ER Diagrams

**Objective:** Understand database normalization (1NF, 2NF, 3NF) by taking a flat, denormalized dataset and transforming it into multiple normalized tables.

**Scenario:** A movie rental company stores customer and rental information in a single flat table. Your task is to identify redundancies, split the data into multiple tables, and create an ER (Entity-Relationship) diagram showing the relationships between tables.

---

## Setup
Import pandas and create the sample denormalized dataset.

In [None]:
import pandas as pd

# Sample denormalized movie rental data
# This is a FLAT table with repeated information
rentalData = {
    "rentalId": [1, 2, 3, 4, 5],
    "customerName": ["Alice Smith", "Bob Jones", "Alice Smith", "Carol Lee", "Bob Jones"],
    "customerPhone": ["555-0101", "555-0202", "555-0101", "555-0303", "555-0202"],
    "movieTitle": ["The Matrix", "Inception", "Inception", "The Matrix", "Interstellar"],
    "movieGenre": ["Sci-Fi", "Sci-Fi", "Sci-Fi", "Sci-Fi", "Sci-Fi"],
    "movieYear": [1999, 2010, 2010, 1999, 2014],
    "rentalDate": ["2024-01-15", "2024-01-15", "2024-01-16", "2024-01-17", "2024-01-18"],
    "returnDate": ["2024-01-22", "2024-01-22", "2024-01-23", "2024-01-24", "2024-01-25"]
}

# Create a pandas DataFrame from the dictionary
dfFlat = pd.DataFrame(rentalData)
print(dfFlat)

---

## Task 1: Display and Identify Redundancy

**Instruction:** Run the cell above to display the flat dataset. Then, in the cell below, identify what data is repeated or redundant. (Hint: Look for customer names, movie information, etc. that appear more than once.)

**Your Answer:**

[List the redundancies you observe in the flat table]

---

## Task 2: First Normal Form (1NF) Check

**Instruction:** Examine the flat table and answer the following:
- Does every field contain only atomic (single, indivisible) values?
- Does every row represent a unique record?
- Does the table satisfy 1NF requirements?

Explain your reasoning in the cell below.

**Your Answer:**

[Explain whether the flat table satisfies 1NF and why]

---

## Task 3: Identify the Entities

**Instruction:** In the flat table, there are THREE main entities (conceptual objects). Identify them and list the attributes that belong to each entity. Remember: an entity is a thing (person, object, or concept) that the database tracks.

Example format:
- **Entity 1 Name:** Attributes (attr1, attr2, ...)
- **Entity 2 Name:** Attributes (attr1, attr2, ...)

**Your Answer:**

[List the three entities and their attributes]

---

## Task 4: Create the Customers Table

**Instruction:** Create a new DataFrame called `dfCustomers` with the following structure:
- **customerId** (1, 2, 3, ... assigned sequentially for each unique customer)
- **customerName** (Alice Smith, Bob Jones, Carol Lee)
- **customerPhone** (corresponding phone numbers)

Use `.drop_duplicates()` on the original data to find unique customers. Assign customerId sequentially (Alice=1, Bob=2, Carol=3).

Print the result.

In [None]:
# Create the customers table
# Extract unique customer records, drop duplicates on customerName and customerPhone

# TODO: Create dfCustomers with customerId, customerName, customerPhone

---

## Task 5: Create the Movies Table

**Instruction:** Create a new DataFrame called `dfMovies` with the following structure:
- **movieId** (1, 2, 3, ... assigned sequentially for each unique movie)
- **movieTitle** (The Matrix, Inception, Interstellar)
- **movieGenre** (corresponding genres)
- **movieYear** (corresponding release years)

Extract unique movies from the flat table and assign movieId sequentially.

Print the result.

In [None]:
# Create the movies table
# Extract unique movie records

# TODO: Create dfMovies with movieId, movieTitle, movieGenre, movieYear

---

## Task 6: Create the Rentals Table

**Instruction:** Create a new DataFrame called `dfRentals` with the following structure:
- **rentalId** (from the original data: 1, 2, 3, 4, 5)
- **customerId** (foreign key referencing dfCustomers)
- **movieId** (foreign key referencing dfMovies)
- **rentalDate** (from the original data)
- **returnDate** (from the original data)

Map each rental record to the corresponding customerId and movieId.

Print the result.

In [None]:
# Create the rentals table
# Map original rental records to customerId and movieId using the customers and movies tables

# TODO: Create dfRentals with rentalId, customerId, movieId, rentalDate, returnDate

---

## Task 7: Normalization Check (2NF & 3NF)

**Instruction:** Now that you've split the flat table into three normalized tables, verify that each table satisfies Second Normal Form (2NF) and Third Normal Form (3NF).

Answer the following:
1. Does each table have a single primary key?
2. Are there any partial dependencies? (Does a non-key attribute depend on only part of a composite key?)
3. Are there any transitive dependencies? (Does a non-key attribute depend on another non-key attribute?)
4. Do all three tables satisfy 2NF and 3NF?

Explain your reasoning.

**Your Answer:**

[Explain whether each table satisfies 2NF and 3NF]

---

## Task 8: ER Diagram (ASCII Art)

**Instruction:** Create an ASCII art Entity-Relationship (ER) diagram showing the three tables (Customers, Movies, Rentals) and how they are connected. Include:
- Table names
- Primary keys (marked with PK)
- Foreign keys (marked with FK)
- Relationship lines and cardinality (1:M)

Example format:
```
+----------+        +----------+
| Customers|        |  Movies  |
+----------+        +----------+
| PK custId|        | PK movieId|
|   name   |        |   title  |
|   phone  |        |  genre   |
+----------+        +----------+
     |                   |
     |      1:M          |
     +-------+  +--------+
             |  |
         +----------+
         | Rentals  |
         +----------+
         | PK rentalId|
         | FK custId  |
         | FK movieId |
         | rentalDate |
         | returnDate |
         +----------+
```

**Your Answer (ER Diagram):**

[Draw your ER diagram showing Customers, Movies, and Rentals tables with relationships]

---

## Task 9: Relationship Types

**Instruction:** For each relationship shown in your ER diagram, identify and describe the relationship type:
1. What is the relationship between Customers and Rentals?
2. What is the relationship between Movies and Rentals?
3. Explain what 1:M (one-to-many) means in your own words.

Example answer format:
- **Relationship 1:** [Entity A] — [relationship type] — [Entity B] because [reason]


**Your Answer:**

[Describe the relationships and what 1:M means]

---

## Reflection

**Questions to think about:**
- What problems would occur if you kept all data in the original flat table?
- How does splitting into normalized tables improve data integrity?
- What would happen if a customer's phone number changed? (How would you update it in the normalized design vs. the flat design?)