In [1]:
%load_ext sql
%sql duckdb://

# Normalization
Explanation:
This code snippet demonstrates the concept of normalization in database design using SQL. Normalization is a process of organizing data in a database to eliminate redundancy and improve data integrity.

The code creates several tables to illustrate different levels of normalization. Here's a breakdown of the tables:

1. Customers table: Stores customer information such as customer_id, customer_name, customer_email, and customer_phone.
2. Orders table: Stores order information including order_id, customer_id (foreign key referencing Customers table), order_date, and order_total.
3. OrderItems table: Stores individual order items with order_item_id, order_id (foreign key referencing Orders table), product_id (foreign key referencing Products table), quantity, and price.
4. Products table: Stores product information with product_id, product_name, category_id (foreign key referencing Categories table), and category_name.
5. Categories table: Stores category information with category_id, category_name, and category_description.

Sample data is then inserted into the tables to demonstrate the relationships between them.

Finally, a query is executed to retrieve customer information along with their orders and order items using JOIN statements to link the tables together.

The result of the query will display the customer name, order details, order item details, product name, quantity, and price for each customer.

## Creating Normalized Data

Note the lack of redundancy other than for foreign keys to point into other tables.  Also note the large amount of tables.

In [2]:
%%sql

CREATE OR REPLACE TABLE Customers (
    customer_id INT PRIMARY KEY,
    customer_name VARCHAR(50),
    customer_email VARCHAR(50),
    customer_phone VARCHAR(15)
);

CREATE OR REPLACE TABLE Orders (
    order_id INT PRIMARY KEY,
    customer_id INT,
    order_date DATE,
    order_total DECIMAL(10, 2),
    FOREIGN KEY (customer_id) REFERENCES Customers(customer_id)
);

CREATE OR REPLACE TABLE Categories (
    category_id INT PRIMARY KEY,
    category_name VARCHAR(50),
    category_description TEXT
);

CREATE OR REPLACE TABLE Products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(50),
    category_id INT,
    category_name VARCHAR(50),
    FOREIGN KEY (category_id) REFERENCES Categories(category_id)
);

CREATE OR REPLACE TABLE OrderItems (
    order_item_id INT PRIMARY KEY,
    order_id INT,
    product_id INT,
    quantity INT,
    price DECIMAL(10, 2),
    FOREIGN KEY (order_id) REFERENCES Orders(order_id),
    FOREIGN KEY (product_id) REFERENCES Products(product_id)
);

INSERT INTO Customers (customer_id, customer_name, customer_email, customer_phone)
VALUES (1, 'John Doe', 'john.doe@example.com', '123-456-7890');

INSERT INTO Orders (order_id, customer_id, order_date, order_total)
VALUES (1, 1, '2022-01-01', 100.00);

INSERT INTO Categories (category_id, category_name, category_description)
VALUES (1, 'Category A', 'Description of Category A');

INSERT INTO Products (product_id, product_name, category_id, category_name)
VALUES (1, 'Product A', 1, 'Category A');

INSERT INTO OrderItems (order_item_id, order_id, product_id, quantity, price)
VALUES (1, 1, 1, 2, 50.00);

Count


## Querying

Note the large amount of joining and table aliasing.

In [3]:
%%sql

SELECT
    c.customer_name,
    o.order_id,
    o.order_date,
    oi.order_item_id,
    p.product_name,
    oi.quantity,
    oi.price
FROM
    Customers c
    JOIN Orders o ON c.customer_id = o.customer_id
    JOIN OrderItems oi ON o.order_id = oi.order_id
    JOIN Products p ON oi.product_id = p.product_id;

customer_name,order_id,order_date,order_item_id,product_name,quantity,price
John Doe,1,2022-01-01,1,Product A,2,50.0


# Denormalization

Denormalization means having tables with a lot of redundancy instead of foreign keys into other tables (eg. same customer 50 times because they have 50 orders).  It is as if you did a `JOIN` on normalized tables.

Some applications need to store data this way for reasons such as performance, maintenance, etc.