# DML or CRUD Operations

Let us understand how to perform CRUD operations using Postgresql.

* Normalization Principles
* Tables as Relations
* Database Operations - Overview
* CRUD Operations
* Creating Table
* Inserting Data
* Updating and Deleting Data
* Overview of Transactions
* Exercise - Database Operations

## Normalization Principles

Let us get an overview about Normalization Principles.

Here are different normal forms we use. Provided links are from Wiki.
* [1st Normal Form](https://en.wikipedia.org/wiki/First_normal_form)
* [2nd Normal Form](https://en.wikipedia.org/wiki/Second_normal_form)
* [3rd Normal Form](https://en.wikipedia.org/wiki/Third_normal_form)
* [Boyce Codd Normal Form](https://en.wikipedia.org/wiki/Boyce–Codd_normal_form)

Most of the well designed Data Models will be in either 3rd Normal Form. BCNF is used in some extreme cases where 3rd Normal Form does not eliminate all insertion, updation and deletion anomalies.

### Reporting Environments
While normalization is extensively used for transactional systems, they are not ideal for reporting or descision support systems. We tend to use dimensional modeling for reporting systems where tables will contain pre processed data as per the report requirements.

### Normal Forms - Key Terms

Let us understand some of the key terms we use while going through the normal forms.
* Domain
* Attribute
* Atomic (indivisible)
* Functionally Dependent
* Prime Attribute
* Candidate Key
* Data Anomalies - potential issues to data due to the mistakes by users or developers
* Transitive Dependency


## Tables as Relations

Let us understand details about relations and different types of relationships we typically use.

* In RDBMS - R stands for Relational.
* In the transactional systemes, tables are created using normalization principles. There will be relations or tables created based on relationships among them.
* Here are the typical relationships among the tables.
  * 1 to 1
  * 1 to many or many to 1 (1 to n or n to 1)
  * many to many (m to n)
* To **enforce** relationships we typically define constraints such as **Primary Key** and **Foreign Key**.
* Here is the typical process we follow from requirements to physical database tables before building applications.
  * Identify entities based up on the requirements.
  * Define relationships among them.
  * Create ER Diagram (Entity Relationship Diagram). It is also called as Logical Data Model.
  * Apply Normalization Principles on the entities to identify tables and constraints to manage relationships among them.
  * Come up with Physical Data Model and generate required DDL Scripts.
  * Execute the scripts in the database on which applications will be eventually build based up on business requirements.
* Logical modeling is typically done by Data Architects.
* Physical modeling is taken care by Application Architect or Development lead.
* Let us go through [data model](https://docs.oracle.com/cd/B28359_01/server.111/b28328/diagrams.htm) related to HR and OE systems.
  * Identify the relationships between the tables.
  * Differentiate between transactional tables and non transactional tables.

## Database Operations - Overview

Let us get an overview of Database Operations we typically perform on regular basis. They are broadly categorized into the following:

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/l91PFVnmSnI?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* DDL - Data Definition Language
  * CREATE/ALTER/DROP Tables
  * CREATE/ALTER/DROP Indexes
  * Add constraints to tables
  * CREATE/ALTER/DROP Views
  * CREATE/ALTER/DROP Sequences
* DML - Data Manipulation Language
  * Inserting new data into the table
  * Updating existing data in the table
  * Deleting existing data from the table
* DQL - Data Query Language
  * Read the data from the table

On top of these we also use TCL (Transaction Control Language) which include **COMMIT** and **ROLLBACK**. 

As part of this section in the subsequent topics we will primarily focus on basic DDL and DML.

## CRUD Operations

Let us get an overview of CRUD Operations. They are nothing but DML and queries to read the data while performing database operations.

In [2]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/6XKfmClzBeQ?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* CRUD is widely used from application development perspective.
* C - CREATE (INSERT)
* R - READ (READ)
* U - UPDATE (UPDATE)
* D - DELETE (DELETE)

As part of the application development process we perform CRUD Operations using REST APIs.

## Creating Table

Before getting into action with respect to basic DML and queries or CRUD operations, we need to prepare tables.

In [3]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/O1IHKcF1bxw?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

At this time we have not covered DDL yet. All database operations related to managing tables come under DDL.

For now, let's just create the table by copy pasting below `CREATE TABLE` statement. We will get into concepts as part of the subsequent sections.

* Connect to the database.
* Create the table.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db

In [None]:
%%sql 

SELECT * FROM information_schema.tables 
WHERE table_catalog = 'itversity_sms_db' AND table_schema = 'public'
LIMIT 10

In [None]:
%%sql

DROP TABLE IF EXISTS users;

In [None]:
%%sql 

SELECT * FROM information_schema.tables 
WHERE table_catalog = 'itversity_sms_db' AND table_schema = 'public'
LIMIT 10

In [None]:
%%sql

CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    user_first_name VARCHAR(30) NOT NULL,
    user_last_name VARCHAR(30) NOT NULL,
    user_email_id VARCHAR(50) NOT NULL,
    user_email_validated BOOLEAN DEFAULT FALSE,
    user_password VARCHAR(200),
    user_role VARCHAR(1) NOT NULL DEFAULT 'U', --U and A
    is_active BOOLEAN DEFAULT FALSE,
    create_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    last_updated_ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)

* Let us validate the objects that are created in the underlying database. We can either run query against **information_schema** or use Database Explorer in **SQL Workbench** or even `psql`.

In [None]:
%%sql 

SELECT * FROM information_schema.tables 
WHERE table_catalog = 'itversity_sms_db' AND table_schema = 'public'
LIMIT 10

In [None]:
%%sql 

SELECT * FROM information_schema.columns 
WHERE table_name = 'users'
LIMIT 10

In [None]:
%sql SELECT * FROM users

## Inserting Data

Let us see how to insert the data into the table.

In [4]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Pk51SIqKeIU?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* We need to use INSERT clause to insert the data. Here is the sample syntax.

```sql
INSERT INTO <table_name> (col1, col2, col3)
VALUES (val1, val2, val3)
```

* If we don't pass columns after table name then we need to specify values for all the columns. It is not good practice to insert records with out specifying column names.
* If we do not specify value for `SERIAL` field, a sequence generated number will be used.
* It is not mandatory to pass the values for those fields where `DEFAULT` is specified. Values specified in `DEFAULT` clause will be used.
* It is mandatory to specify columns and corresponding values for all columns where `NOT NULL` is specified.

In [None]:
%load_ext sql

In [None]:
%env DATABASE_URL=postgresql://itversity_sms_user:sms_password@localhost:5432/itversity_sms_db

In [None]:
%sql TRUNCATE TABLE users

In [None]:
%%sql

INSERT INTO users (user_first_name, user_last_name, user_email_id)
VALUES ('Scott', 'Tiger', 'scott@tiger.com')

In [None]:
%sql SELECT * FROM users

In [None]:
%%sql

INSERT INTO users (user_first_name, user_last_name, user_email_id)
VALUES ('Donald', 'Duck', 'donald@duck.com')

In [None]:
%sql SELECT * FROM users

In [None]:
%%sql

INSERT INTO users (user_first_name, user_last_name, user_email_id, user_role, is_active)
VALUES ('Mickey', 'Mouse', 'mickey@mouse.com', 'U', true)

In [None]:
%sql SELECT * FROM users

In [None]:
%%sql

INSERT INTO users 
    (user_first_name, user_last_name, user_email_id, user_password, user_role, is_active) 
VALUES 
    ('Gordan', 'Bradock', 'gbradock0@barnesandnoble.com', 'h9LAz7p7ub', 'U', true),
    ('Tobe', 'Lyness', 'tlyness1@paginegialle.it', 'oEofndp', 'U', true),
    ('Addie', 'Mesias', 'amesias2@twitpic.com', 'ih7Y69u56', 'U', true)

In [None]:
%sql SELECT * FROM users

## Updating Data

Let us see how we can update data in the table.

In [5]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/vGrx2HMHEJ0?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* Typical syntax

```sql
UPDATE <table_name>
SET
    col1 = val1,
    col2 = val2
WHERE <condition>
```

* If `WHERE` condition is not specified all rows in the table will be updated.
* For now we will see basic examples for update. One need to have good knowledge about `WHERE` clause to take care of complex conditions. Using `WHERE` will be covered extensively as part of filtering the data at a later point in time.

* Set user role for user_id 1 as 'A'

In [None]:
%sql SELECT * FROM users

In [None]:
%%sql

UPDATE users 
    SET user_role = 'A' 
WHERE user_id = 1

In [None]:
%sql SELECT * FROM users

* Set user_email_validated as well as is_active to true for all users

In [None]:
%sql SELECT user_id, user_email_validated, is_active FROM users

In [None]:
%%sql

UPDATE users
SET
    user_email_validated = true,
    is_active = true

In [None]:
%sql SELECT user_id, user_email_validated, is_active FROM users

* Convert case of user_email_id to upper for all the records

In [None]:
%sql SELECT user_id, user_email_id FROM users

In [None]:
%%sql

UPDATE users
SET
    user_email_id = upper(user_email_id)

In [None]:
%sql SELECT user_id, user_email_id FROM users

* Add new column by name **user_full_name** and update it by concatenating **user_first_name** and **user_last_name**.

In [None]:
%%sql

ALTER TABLE users ADD COLUMN user_full_name VARCHAR(50)

In [None]:
%sql SELECT user_id, user_first_name, user_last_name, user_full_name FROM users

In [None]:
%sql SELECT concat(user_first_name, ' ', user_last_name) FROM users

In [None]:
%%sql 

UPDATE users
    SET user_full_name = upper(concat(user_first_name, ' ', user_last_name))

In [None]:
%sql SELECT user_id, user_first_name, user_last_name, user_full_name FROM users

## Deleting Data

Let us understand how to delete the data from a table.

In [1]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/Wt10A5s9wqk?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* Typical Syntax - `DELETE FROM <table> WHERE <condition>`.
* If we do not specify condition, it will delete all the data from the table.
* It is not recommended to use delete with out where condition to delete all the data (instead we should use `TRUNCATE`).
* For now we will see basic examples for delete. One need to have good knowledge about `WHERE` clause to take care of complex conditions.
* Let's see how we can delete all those records from users where the password is not set. We need to use `IS NULL` as condition to compare against Null values.

In [None]:
%sql SELECT user_id, user_password FROM users

In [None]:
%sql DELETE FROM users WHERE user_password IS NULL

In [None]:
%sql SELECT user_id, user_password FROM users

In [None]:
%sql SELECT count(1) FROM users

## Overview of Transactions

Let us go through the details related to Transactions.

In [2]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/SoIzfoeJY8s?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* We typically perform operations such as `COMMIT` and `ROLLBACK` via the applications.
* `COMMIT` will persist the changes in the database.
* `ROLLBACK` will revert the uncommitted changes in the database.
* We typically rollback the uncommitted changes in a transaction if there is any exception as part of the application logic flow.
* For example, once the order is placed all the items that are added to shopping cart will be rolled back if the payment using credit card fails.
* By default every operation is typically committed in Postgres. We will get into the details related to transaction as part of application development later.
* Commands such as `COMMIT`, `ROLLBACK` typically comes under TCL (Transaction Control Language)

## Exercise - Database Operations

Let's create a table and perform database operations using direct SQL.

In [3]:
%%HTML
<iframe width="560" height="315" src="https://www.youtube.com/embed/pqtKUUI5cCo?rel=0&amp;controls=0&amp;showinfo=0" frameborder="0" allowfullscreen></iframe>

* Create table - **courses**
  * course_id - sequence generated integer and primary key
  * course_name - which holds alpha numeric or string values up to 60 characters
  * course_author - which holds the name of the author up to 40 characters
  * course_status - which holds one of these values (published, draft, inactive). 
  * course_published_dt - which holds date type value. 
* Insert data into courses using the data provided. Make sure id is system generated.

|Course Name                      |Course Author         |Course Status|Course Published Date|
|---------------------------------|----------------------|-------------|---------------------|
|Programming using Python         |Bob Dillon            |published    |2020-09-30           |
|Data Engineering using Python    |Bob Dillon            |published    |2020-07-15           |
|Data Engineering using Scala     |Elvis Presley         |draft        |                     |
|Programming using Scala          |Elvis Presley         |published    |2020-05-12           |
|Programming using Java           |Mike Jack             |inactive     |2020-08-10           |
|Web Applications - Python Flask  |Bob Dillon            |inactive     |2020-07-20           |
|Web Applications - Java Spring   |Mike Jack             |draft        |                     |
|Pipeline Orchestration - Python  |Bob Dillon            |draft        |                     |
|Streaming Pipelines - Python     |Bob Dillon            |published    |2020-10-05           |
|Web Applications - Scala Play    |Elvis Presley         |inactive     |2020-09-30           |
|Web Applications - Python Django |Bob Dillon            |published    |2020-06-23           |
|Server Automation - Ansible      |Uncle Sam             |published    |2020-07-05           |

* Update the status of all the **draft courses** related to Python and Scala to **published** along with the **course_published_dt using system date**. 
* Delete all the courses which are neither in draft mode nor published.
* Validation - Get count of all published courses by author and make sure output is sorted in descending order by count. 

```sql
SELECT course_author, count(1) AS course_count
FROM courses
WHERE course_status= 'published'
GROUP BY course_author
```

|Course Author   |Course Count|
|----------------|------------|
|Bob Dillon      |5           |
|Elvis Presley   |2           |
|Uncle Sam       |1           |