## Normalization Stages (1NF, 2NF, 3NF, and BCNF)
 - Normalization is the process of organizing data in a database to reduce redundancy and dependency. 
 - This is done by dividing large tables into smaller ones and ensuring that data is stored logically. 
 - Let's go through each normal form (1NF, 2NF, 3NF, and BCNF) one by one.


### 1. First Normal Form (1NF)
A relation is in 1NF if it meets the following conditions:

- Atomicity: Each column contains atomic values (indivisible values). This means no repeating groups or arrays in a single column.

- Uniqueness: Each row in the table must be unique, and no two rows can have exactly the same data.

Example of Non-1NF:

In [None]:
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    customer_name VARCHAR(100),
    items VARCHAR(255)  -- This column has a list of items
);

INSERT INTO orders (order_id, customer_name, items)
VALUES (1, 'John Doe', 'Item1, Item2'),
       (2, 'Jane Smith', 'Item3, Item4');

Here, the items column is not atomic because it stores multiple values in a single field.

Converting to 1NF:
To bring the table to 1NF, we separate the repeating group (items) into individual rows.

In [None]:
CREATE TABLE orders (
    order_id INT,
    customer_name VARCHAR(100),
    item VARCHAR(100),  -- Atomic column for items
    PRIMARY KEY (order_id, item)  -- Composite primary key
);

INSERT INTO orders (order_id, customer_name, item)
VALUES (1, 'John Doe', 'Item1'),
       (1, 'John Doe', 'Item2'),
       (2, 'Jane Smith', 'Item3'),
       (2, 'Jane Smith', 'Item4');


### 2. Second Normal Form (2NF)
A relation is in 2NF if:

- It is in 1NF.

- It has no partial dependency; that is, all non-key attributes must be fully functionally dependent on the entire primary key, not just part of it.

Example of Non-2NF:

Let's say we have the following table, where order_id and product_id together form a composite primary key.

In [None]:
CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    product_name VARCHAR(100),
    quantity INT,
    PRIMARY KEY (order_id, product_id)
);

INSERT INTO order_items (order_id, product_id, product_name, quantity)
VALUES (1, 101, 'Laptop', 2),
       (1, 102, 'Smartphone', 3),
       (2, 103, 'Tablet', 1);


Here, product_name depends only on product_id, but the primary key is a composite of order_id and product_id. This is a partial dependency, which violates 2NF.

Converting to 2NF:

We remove the partial dependency by creating a new table for products, leaving product_name in the new table.

In [None]:
CREATE TABLE products (
    product_id INT PRIMARY KEY,
    product_name VARCHAR(100)
);

CREATE TABLE order_items (
    order_id INT,
    product_id INT,
    quantity INT,
    PRIMARY KEY (order_id, product_id),
    FOREIGN KEY (product_id) REFERENCES products(product_id)
);

INSERT INTO products (product_id, product_name)
VALUES (101, 'Laptop'),
       (102, 'Smartphone'),
       (103, 'Tablet');


### 3. Third Normal Form (3NF)
A relation is in `3NF` if:

- It is in `2NF`.

- It has no `transitive dependency`; that is, non-key attributes should depend only on the primary key, not on other non-key attributes.

Example of Non-3NF:

Let's say we have the following table, where employee_id is the primary key:

In [None]:
CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(100),
    department_id INT,
    department_name VARCHAR(100),
    department_head VARCHAR(100)
);

INSERT INTO employees (employee_id, employee_name, department_id, department_name, department_head)
VALUES (1, 'John Doe', 101, 'HR', 'Alice'),
       (2, 'Jane Smith', 102, 'IT', 'Bob');

Here, department_name and department_head are dependent on department_id, but department_id is dependent on employee_id. This is a transitive dependency.

Converting to 3NF:

We remove the transitive dependency by creating a separate table for departments

In [None]:
CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(100),
    department_head VARCHAR(100)
);

CREATE TABLE employees (
    employee_id INT PRIMARY KEY,
    employee_name VARCHAR(100),
    department_id INT,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

INSERT INTO departments (department_id, department_name, department_head)
VALUES (101, 'HR', 'Alice'),
       (102, 'IT', 'Bob');


## Constraints in PostgreSQL

Constraints help ensure data integrity, prevent invalid data, and enforce business rules at the database level.

### ✅ 1. NOT NULL Constraint

🔍 Definition
- Ensures that a column cannot contain `NULL` values.

📌 Use Case
- When a field is mandatory—like email, username, or date_of_birth in a user profile.

🧪 Example

In [None]:
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    username VARCHAR(100) NOT NULL,
    email VARCHAR(150) NOT NULL
);

💡 Best Practices
- Use `NOT NULL` on all essential fields.

Avoid using nullable columns in primary or foreign keys.

🎯 Interview Question

Q: Can a `NOT NULL` column be part of a composite key?

A: Yes. In fact, all columns in a primary/composite key must be `NOT NULL`.

----------

### ✅ 2. UNIQUE Constraint

🔍 Definition
- Ensures that all values in a column (or a group of columns) are distinct.

📌 Use Case
- For fields like `email`, `phone_number`, `Aadhar number`, etc., where duplicates aren't allowed.

🧪 Example

In [None]:
CREATE TABLE employees (
    emp_id SERIAL PRIMARY KEY,
    email VARCHAR(150) UNIQUE,
    phone_number VARCHAR(15) UNIQUE
);

💡 Best Practices
- Use `UNIQUE` constraints to prevent logical duplication.
- Use `UNIQUE` (col1, col2) for combined uniqueness (e.g., user_id + role).

🎯 Interview Question

 Q: Can a table have multiple UNIQUE constraints?

A: Yes, you can have multiple UNIQUE constraints on different columns.

-----------

### ✅ 3. PRIMARY KEY Constraint

🔍 Definition
- Uniquely identifies each row in the table. Combines `NOT NULL` + `UNIQUE`.

📌 Use Case
- Always required for identifying records. Use auto-incrementing `SERIAL` or `BIGSERIAL` types.

🧪 Example


In [None]:
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(100) NOT NULL
);

💡 Best Practices
- Prefer single-column primary keys when possible.

- Composite keys should be used when a natural key combination is better than surrogate keys.

🎯 Interview Question

Q: How is a `PRIMARY KEY` different from `UNIQUE + NOT NULL`?

A: Technically same, but only one primary key is allowed per table and it clearly indicates the main identifier.

--------------

### ✅ 4. FOREIGN KEY Constraint

🔍 Definition
- Links a column (or columns) in one table to the `PRIMARY KEY` or `UNIQUE` constraint of another table.

📌 Use Case
- Maintains referential integrity. For example, orders should reference a valid customer.

🧪 Example

In [None]:
CREATE TABLE customers (
    customer_id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL
);

CREATE TABLE orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INT,
    FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
);


💡 Best Practices
- Index foreign key columns for performance.

- Use `ON DELETE CASCADE` or `SET NULL` to define dependent record behavior.


In [None]:
FOREIGN KEY (customer_id) REFERENCES customers(customer_id)
    ON DELETE CASCADE


🎯 Interview Question
Q: What happens if you delete a parent row referenced by a foreign key?

A: By default, it throws an error unless you use `ON DELETE` rules like `CASCADE` or `SET NULL`.

------------

### ✅ 5. CHECK Constraint

🔍 Definition
- Allows you to enforce custom validations using expressions.

📌 Use Case
- Age must be > 18, salary must be positive, quantity must be between 1 and 100, etc.

🧪 Example


In [None]:
CREATE TABLE employees (
    emp_id SERIAL PRIMARY KEY,
    age INT CHECK (age >= 18),
    salary NUMERIC(10, 2) CHECK (salary > 0)
);

💡 Best Practices
- Use `CHECK` for validations that are strictly rule-based.
- Avoid overcomplicating CHECK logic.

🎯 Interview Question

Q: Can you write a `CHECK` constraint to allow only specific departments?

A:

In [None]:
CHECK (department IN ('HR', 'IT', 'Finance'))

--------------

### ✅ 6. DEFAULT Constraint

🔍 Definition
- Automatically assigns a default value to a column when no value is provided.

📌 Use Case
- Set default status = ‘active’, created_at = current date, etc.

🧪 Example

In [None]:
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    status VARCHAR(20) DEFAULT 'active',
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

💡 Best Practices

- Use `DEFAULT` for timestamps, status fields, or counters.
- Combine with `NOT NULL` to ensure the field is always populated.

🎯 Interview Question

Q: Can you override a `DEFAULT` value during insertion?

A: Yes, by explicitly providing the value.

---------------------

### ✅ 7. Composite / Multi-Column Constraints

🔍 Definition
- Constraints that involve multiple columns together, often used for `UNIQUE` or `PRIMARY KEY`.

📌 Use Case
- A student can enroll only once in a course → student_id + course_id must be unique.

🧪 Example

In [None]:
CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    PRIMARY KEY (student_id, course_id)  -- Composite Key
);

Or:

In [None]:
UNIQUE (email, phone_number)

💡 Best Practices
- Use composite keys only when natural uniqueness comes from multiple fields.
- Always document composite constraints for clarity.

🎯 Interview Question

Q: What’s the difference between composite primary key and separate unique constraints?

A: Composite PK combines columns as a single identifier, while unique constraints ensure combinations are distinct but don’t act as identifiers.

----------------



### 🔄 Bonus: Modify Constraints

In [None]:
-- Add NOT NULL
ALTER TABLE users ALTER COLUMN email SET NOT NULL;

-- Drop CHECK
ALTER TABLE employees DROP CONSTRAINT employees_age_check;

-- Add Composite Unique
ALTER TABLE users ADD CONSTRAINT unique_email_phone UNIQUE (email, phone_number);


If you are confused about: `What is employees_age_check and where did it come from?`

In [None]:
ALTER TABLE employees DROP CONSTRAINT employees_age_check;

✅ Explanation

When you create a CHECK constraint like this:

In [None]:
CREATE TABLE employees (
    emp_id SERIAL PRIMARY KEY,
    age INT CHECK (age >= 18)
);

PostgreSQL automatically gives a default name to that CHECK constraint if you don't provide one explicitly.
That name is usually generated like:

So in our case:

- Table name: `employees`

- Column name: `age`

- Constraint type: `CHECK`

👉 PostgreSQL auto-generates the name:

That’s why, to remove that constraint, you write:

In [None]:
ALTER TABLE employees DROP CONSTRAINT employees_age_check;

✅ Want to avoid default names?

You can give your own custom constraint name like this:

In [None]:
CREATE TABLE employees (
    emp_id SERIAL PRIMARY KEY,
    age INT CONSTRAINT check_min_age CHECK (age >= 18)
);

Now if you want to drop it later, you’ll do:

In [None]:
ALTER TABLE employees DROP CONSTRAINT check_min_age;

## PostgreSQL Strong Data Types

### 🔹 1. Text Data Types
- ✅ TEXT
    - Stores variable-length text with no size limit.

In [None]:
CREATE TABLE blogs (
  id SERIAL PRIMARY KEY,
  title TEXT,
  content TEXT
);

- ✅ VARCHAR(n)
    - Limits string length to n characters.

In [None]:
CREATE TABLE users (
  name VARCHAR(50),
  email VARCHAR(100)
);

📌 When to Use

`TEXT`: Use for general strings (e.g., descriptions, content).

`VARCHAR(n)`: When you need to validate or restrict string length (e.g., username, mobile number).

💡 Best Practices

Prefer `TEXT` unless you need to enforce length.

PostgreSQL doesn’t optimize `VARCHAR(n)` better than `TEXT` i.e Same Performance.

Why not use `CHAR(n)`?

A: Fixed-length, wastes space. Use `VARCHAR` or `TEXT` instead.

### 🔹 2. Numeric Data Types
- ✅ INTEGER, BIGINT

    - INTEGER: Use for counters, quantities.

    -   BIGINT: Use when values > 2 billion.

In [None]:
CREATE TABLE products (
  id SERIAL PRIMARY KEY,
  stock_quantity INTEGER,
  total_orders BIGINT
);

- `✅ NUMERIC(precision,scale) / DECIMAL(precision,scale)` - High precision, great for money calculations.

    - NUMERIC: Ideal for currency, salaries, billing.

    - REAL: Use in scientific/engineering contexts where precision can be approximate.

In [None]:
CREATE TABLE payments (
  id SERIAL,
  amount NUMERIC(10,2)
);

10 is the Precision (p) → Total number of digits allowed (before + after the decimal).

2 is the Scale (s) → Number of digits allowed after the decimal point.



- ✅ REAL

    - Approximate decimal number (floating-point).
    
    - Use in scientific/engineering contexts where precision can be approximate.

In [None]:
CREATE TABLE sensors (
  id INT,
  temperature REAL
);

✅ How It Works

- REAL is stored in 4 bytes.

- Precision is approximately 6 decimal digits.

- Rounds or truncates values exceeding this precision.

- Operates faster than NUMERIC, but with floating-point inaccuracies.

### Operates faster than NUMERIC, but with floating-point inaccuracies. explain 

- ✅ 1. Why REAL is Faster
    🔧 How It Works Internally:

    - REAL uses binary floating-point hardware (IEEE 754 standard).

    - CPU has dedicated registers and instructions to handle these.
    
    - Operations like +, -, *, / are done in hardware, not software.




In [None]:
-- Done in CPU register, super fast

SELECT 0.1::REAL + 0.2::REAL;  -- ~Instant
-- Output: 0.300000012


-- Think of this as Turbo Mode for math operations.

Wait... shouldn’t that be 0.3?

- ✅ Why this happens:
    - Decimal 0.1 is like 0.0001100110011... in binary → it gets truncated or rounded in memory.

💥 This leads to rounding errors, especially in large calculations or comparisons.



🔁 Compare with NUMERIC:

In [None]:
SELECT 0.1::NUMERIC + 0.2::NUMERIC;
-- Output: 0.3  ✅ Accurate!


Why? Because NUMERIC stores exact decimal representation — no binary rounding.

### 💡 Best Practices

- Don’t use `REAL` or `FLOAT` for money.

In [None]:
CREATE TABLE billing (
  item TEXT,
  price REAL
);
-- Risky: ₹0.1 + ₹0.2 ≠ ₹0.3 due to float inaccuracy

- Use `NUMERIC` for exact precision.

In [None]:
CREATE TABLE billing (
  item TEXT,
  price NUMERIC(10, 2)
);
-- Always gives ₹0.30 when 0.1 + 0.2

### 🔹 3. Date & Time Data Types
- `✅ DATE`

In [None]:
CREATE TABLE employees (
  id INT,
  hire_date DATE
);

- `✅ TIMESTAMP WITH TIME ZONE (timestamptz)`

In [None]:
CREATE TABLE meetings (
  id SERIAL,
  start_time TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);

📌 When to Use

- `DATE`: Birthdays, joining dates.

- `TIMESTAMPTZ`: Global applications with time zone awareness (e.g., online meetings, travel apps).

### 💡 Best Practices

- Always use `TIMESTAMPTZ` if your app runs across time zones.

- Use `CURRENT_TIMESTAMP` or `NOW()` for default values.



### 🔹 4. Boolean Data Type
- ✅ BOOLEAN

In [None]:
CREATE TABLE accounts (
  id INT,
  is_verified BOOLEAN DEFAULT FALSE
);

📌 When to Use
- Ideal for yes/no, true/false flags like is_active, has_access, deleted.

### 💡 Best Practices

- Don’t use CHAR(1) or INT for booleans — prefer BOOLEAN.

- Set DEFAULT TRUE/FALSE for clear behavior.

### 🔹 5. Special Data Types
- ✅ JSONB (Binary JSON)

In [None]:
CREATE TABLE users (
  id SERIAL,
  profile JSONB
);

INSERT INTO users(profile) VALUES ('{"name": "Hemendra", "skills": ["Python", "React"]}');


➕ Query Example:

In [None]:
SELECT profile->>'name' FROM users;

📌 When to Use
- When data is semi-structured, varies by record — e.g., metadata, preferences, configurations.

💡 Best Practices
- Use `JSONB` over `JSON` – faster and indexable.
- Add GIN index:

In [None]:
CREATE INDEX idx_profile ON users USING GIN(profile);

- `✅ UUID – Universally Unique Identifier`

In [None]:
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

CREATE TABLE sessions (
  id UUID DEFAULT uuid_generate_v4(),
  user_id INT
);

📌 When to Use
- Global uniqueness needed, e.g., API keys, session tokens, public IDs.

💡 Best Practices

- Avoid for high-frequency indexing due to performance.

- Use only where security & uniqueness is crucial.

❓ Interview Qs

Q: What is size of UUID?

A: 128 bits (16 bytes).

Q: What function generates UUID v4?

A: uuid_generate_v4().

- `✅ ARRAY`

In [None]:
CREATE TABLE books (
  id INT,
  tags TEXT[]
);

INSERT INTO books(tags) VALUES (ARRAY['fiction', 'ai', 'tech']);

➕ Query ARRAY:

In [None]:
SELECT * FROM books WHERE tags @> ARRAY['ai'];

📌 When to Use
- Small, limited options like genres, skills, multiple tags.

💡 Best Practices
- Use separate table + foreign key for large/complex data.

- Keep arrays small and fixed-length if possible.

❓ Interview Qs

Q: What operator checks if array contains value?

A: @> (contains), ANY() (match any).

- `✅ BYTEA – Binary Data`

In [None]:
CREATE TABLE files (
  id SERIAL,
  name TEXT,
  data BYTEA
);


📌 When to Use
- Storing small binary files (images, PDFs, encrypted data).

💡 Best Practices
- For large files, save file path in DB and store files in object storage/cloud.

❓ Interview Qs

Q: Can BYTEA be indexed?

A: Not efficiently. Use SHA or IDs for lookup.

🚀 Bonus: Project Use Case Ideas

- Use all types together in a mini project DB:

In [None]:
CREATE TABLE users (
  id UUID PRIMARY KEY DEFAULT uuid_generate_v4(),
  name TEXT NOT NULL,
  email VARCHAR(100) UNIQUE,
  password BYTEA,
  is_active BOOLEAN DEFAULT TRUE,
  preferences JSONB,
  skills TEXT[],
  created_at TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP
);