## Keys for Data Relationship

#### Terminology and functions overview 
- Relational data: Structured data organized into individual entities and keys that establish relationships between them
- ALTER TABLE: SQL command used to modify the structure of an existing entity
- ADD: SQL command, used with ALTER TABLE, to add new elements to the entity

In [None]:
ALTER TABLE table_name 
ADD COLUMN column_name column_datatype; 

ALTER TABLE table_name 
ADD PRIMARY KEY (column_name);

ALTER TABLE table_name 
ADD FOREIGN KEY (column_name) REFERENCES foreign_table(PK_from_foreign_table);

### Altering an entity
Imagine the business has decided to track more contact details for each fruit product supplier. This information is crucial for businesses like quality control and supply chain management.

Your task is to adapt the suppliers entity to have all the required attributes and a well-identified key so that you can relate later suppliers to the rest of the data model.

In [None]:
-- Alter suppliers table
ALTER TABLE suppliers
-- Add new column
ADD COLUMN IF NOT EXISTS region VARCHAR(255);

-- Alter suppliers table
ALTER TABLE suppliers
-- Add the new column
ADD COLUMN IF NOT EXISTS contact VARCHAR(255);

-- Alter suppliers table
ALTER TABLE suppliers
-- Assign the unique identifier
ADD PRIMARY KEY (supplier_id);

### Adjusting the model
Cocofarm is a business specializing in chocolate review and analysis. They use the productqualityrating entity as part of their data model. This entity contains comprehensive data about various chocolate bars, including their manufacturers, place of production, reviews, bean origin, cocoa percentage, ingredients, and ratings. The business uses this data to provide detailed reviews and quality ratings of chocolate, helping consumers and professionals make informed choices.

The business has decided to enhance its review process by adding production batch details for each chocolate product. Your task is to adapt the existing data model to include these production batch details and establish a relationship between that new entity and the current product quality rating.

In [None]:
-- Create entity
CREATE OR REPLACE TABLE batchdetails (
	-- Add numerical attribute
	batch_id NUMBER(10, 0),
	-- Add characters attributes
    batch_number VARCHAR(255),
    production_notes VARCHAR(255)
); 

-- Modify the entity
ALTER TABLE productqualityrating
-- Add new column
ADD COLUMN IF NOT EXISTS batch_id NUMBER(10,0);

## Normalizing Relational Data
- Unnormalized data (UNF): Data that might lacks a structure, be disorganized, containsrepetitions and/or anomalies

In [None]:
# Identifying unnormalized data 
SELECT manufacturer_id,       
manufacturer_name,      
location, COUNT(*) AS repetitions
FROM allproducts 
GROUP BY manufacturer_id, manufacturer_name, location 
HAVING COUNT(*) >1;

In [None]:
-- Querying unique values while being filtered by a specific condition 
SELECT DISTINCT column_name
FROM table_name 
WHERE column_name condition value; 


-- Counting the values aggregated by a specific column while filtering the results
SELECT column_name, 
COUNT(*) AS alias_name 
FROM table_name 
GROUP BY column_name
HAVING COUNT(*) condition value;

### Identifying Data Redundancy
Businesses thrive on efficient data management, and identifying data redundancy is crucial for maintaining an organized and cost-effective data model. Redundant data can take up unnecessary space and complicate processes.

Your task is to identify redundancy for product quality rating data, the business has collected over their product and the manufacturers, which differs from delicious chocolate bars and each manufacturer has their own characteristics.

This step is essential to identifying unnormalized data, getting ready for the normalization step, and streamlining the storage of your data model.

In [None]:
-- List all values from the attribute
SELECT manufacturer,
	company_location
-- Read all these values from the entity 
FROM productqualityrating;

In [None]:
SELECT manufacturer, 
	company_location,
	-- Add a count of all the records, and set an alias for it
	COUNT(*) AS product_count
FROM productqualityrating 
-- Aggregate the results
GROUP BY manufacturer,
company_location;

In [None]:
SELECT manufacturer, 
	company_location, 
	COUNT(*) AS product_count
FROM productqualityrating
GROUP BY manufacturer, 
	company_location
-- Add a filter for occurrence count greater than 1
HAVING COUNT(*) > 1;

### Spotting Anomalies
The company has set a standard after 2006 that all 'Arriba' chocolate bars from the same manufacturer must have the same cocoa percentage and ingredients.

The business will need to update any record that is not compliant. And want to implement these updates quickly and efficiently.

Your task is to ensure that the data complies with this standard. You need to review the data to ensure no anomalies limit updating the records.

In [None]:
-- Select the different values for the attributes list
SELECT DISTINCT manufacturer,
	cocoa_percent, 
    ingredients
FROM productqualityrating
-- Add filter for attribute referring to name
WHERE bar_name = 'Arriba'
	-- Add filter for attribute referring to year
	AND year_reviewed > 2006;

In [None]:
SELECT manufacturer, 
	-- Add count of distinct combinations, and add alias to it
	COUNT(DISTINCT cocoa_percent, ingredients) AS distinct_combinations
FROM productqualityrating
WHERE bar_name = 'Arriba' 
    AND year_reviewed > 2006 
-- Group the results    
GROUP BY manufacturer;

In [None]:
SELECT manufacturer, 
	COUNT(DISTINCT cocoa_percent, ingredients) AS distinct_combinations
FROM productqualityrating
WHERE bar_name = 'Arriba' 
    AND year_reviewed > 2006 
GROUP BY manufacturer
-- Add the clause to filter
HAVING COUNT(*) > 1;

## The First Norm

In [None]:
-- Fill a entity with data from a query result
INSERT INTO table_name (column_name, other_columns) 
SELECT
-- Generate a unique value using the row number 
ROW_NUMBER() OVER (ORDER BY TRIM(alias.value)),
TRIM(alias.value) 
FROM another_table,
-- Split a text attribute value based on a delimiter 
LATERAL FLATTEN(INPUT => SPLIT(another_table.column_name, 'delimiter_value')) alias
-- Aggregate the data to ensure uniqueness of values
GROUP BY TRIM(alias.value);

### Creating 1NF entities
Previously, you have learned about the disadvantages of unnormalized data and the benefits of relational data with normalized data. It is time to apply it to the productqualityrating entity.

To ensure that the data model complies with the first normal form, breaking the repeating groups and eliminating them is necessary. productqualityrating entity contains attributes with multiple values; ingredients, and reviews.

Your task is to start normalizing this data by creating new entities that conform to 1NF, ensuring each record holds an atomic, individual and unique piece of information.

In [None]:
-- Create a new entity
CREATE OR REPLACE TABLE ingredients(
	-- Add unique identifier 
    ingredient_id NUMBER(10,0) PRIMARY KEY,
  	-- Add other attributes 
    ingredient VARCHAR(255) 
);

In [None]:
-- Create a new entity
CREATE OR REPLACE TABLE reviews (
	-- Add unique identifier 
    review_id NUMBER(10,0) PRIMARY KEY,
  	-- Add other attributes 
    review VARCHAR(255)
);

### Applying 1NF
After creating ingredients and reviews entities, it's time to populate them with data from productqualityrating.

This step is crucial to ensure that each value within the ingredients and review is treated as a distinct entry, aligning with the principles of 1NF.

Start by crafting a query to transform and input the unnormalized data into these entities, adhering to the first normal form principles.

In [None]:
SELECT
	-- Clean empty values
	TRIM(f.value)
FROM productqualityrating,
-- Add function to split values separated by comma
LATERAL FLATTEN(INPUT => SPLIT(productqualityrating.ingredients, ';')) f;

In [None]:
SELECT
	-- Create a sequential number
	ROW_NUMBER() OVER (ORDER BY TRIM(f.value)),
	TRIM(f.value)
FROM productqualityrating,
LATERAL FLATTEN(INPUT => SPLIT(productqualityrating.ingredients, ';')) f
-- Group the data
GROUP BY TRIM(f.value);

In [None]:
-- Add command to insert data
INSERT INTO ingredients (ingredient_id, ingredient)
SELECTL
	ROW_NUMBER() OVER (ORDER BY TRIM(f.value)),
	TRIM(f.value)
FROM productqualityrating,
LATERAL FLATTEN(INPUT => SPLIT(productqualityrating.ingredients, ';')) f
GROUP BY TRIM(f.value);

In [None]:
-- Modify script for review
INSERT INTO reviews (review_id, review)
SELECT
	ROW_NUMBER() OVER (ORDER BY TRIM(f.value)),
	TRIM(f.value)
FROM productqualityrating,
LATERAL FLATTEN(INPUT => SPLIT(productqualityrating.review, ';')) f
GROUP BY TRIM(f.value);

## 2NF and 3NF

### Applying 2NF
Your productqualityrating entity is a rich dataset that captures the essence of chocolate bars reviewed over various years. You noticed that the company_location attribute depends on the manufacturer attribute. The company_location values repeat every time they are together with the manufacturer attribute, which violates the second normal form's rule against partial dependencies.

By separating this data into a dedicated manufacturers entity, you can remove redundancies and prepare our data model for more efficient operations. This step is crucial in maintaining data integrity and allows scalable business intelligence solutions.

Your task is to normalize this data to align with 2NF principles, ensuring that each non-key attribute entirely depends on the primary key.

In [None]:
-- Add new entity
CREATE OR REPLACE TABLE manufacturers (
  	-- Assign unique identifier
  	manufacturer_id NUMBER(10,0) PRIMARY KEY,
  	--Add other attributes
  	manufacturer VARCHAR(255),
  	company_location VARCHAR(255)
);

In [None]:
-- Add values to manufacturers
INSERT INTO manufacturers (manufacturer_id, manufacturer, company_location)
SELECT 
	-- Generate a sequential number
	ROW_NUMBER() OVER (ORDER BY manufacturer, company_location),
	manufacturer, 
	company_location
FROM productqualityrating
-- Aggregate data by the other attributes
GROUP BY manufacturer, 
company_location;

### Applying 3NF
With our manufacturers table successfully reflecting 2NF standards, we now set our sights on the third normal form. We've recognized that a manufacturer's location is an independent piece of data.

Your task is to eliminate this transitive dependency, refining our data model to support our business's dynamic data needs. This adjustment will significantly improve our data model's flexibility and minimize the impact of data modifications.

In [None]:
-- Create entity
CREATE OR REPLACE TABLE locations(
	-- Add unique identifier
  	location_id NUMBER(10,0) PRIMARY KEY,
  	-- Add main attribute
  	location VARCHAR(255)
);

In [None]:
-- Populate entity from other entity's data
INSERT INTO locations (location_id, location)
SELECT 
	-- Generate unique sequential number
	ROW_NUMBER() OVER(ORDER BY company_location),
    -- Select the main attribute
	company_location
FROM manufacturers
-- Aggregate data by main attribute
GROUP BY company_location;

In [None]:
-- Modify entity
ALTER TABLE manufacturers
-- Remove attribute
DROP COLUMN IF EXISTS company_location;