# Data Definition Language (DDL)

## 1. What is DDL?
---

DDL is a subset of SQL. Up until now you have only been reading data from different tables in the database. 
This is just the R in CRUD (Create, Read, Update, Delete). 

Data Definition Language enables:
* The creation of tables -- (Create)
* The insertion of data -- (Create)
* Altering tables -- (Update)
* Updating data -- (Update)
* Dropping tables -- (Delete)
* Deleting data -- (Delete)




In [None]:
# Imports
from tabulate import tabulate
import mysql.connector

In [6]:
# Database connection parameters
dbname = 'webshop_db'
user = 'user'
password = 'user_password'
host = 'mysql_db'  # This should be the service name defined in docker-compose.yml
port = '3306'  

In [62]:
# Establish connection to the MySQL database
conn = mysql.connector.connect(user=user, password=password, host=host, port=port, database=dbname)

# Create a cursor object to interact with the database
cur = conn.cursor()

## Create
---
We start of with a completely empty database

In [58]:
cur.execute("SHOW TABLES;")

records = cur.fetchall()

# Fetch column names
col_names = [desc[0] for desc in cur.description]

# Print the records in table format
print(tabulate(records, headers=col_names, tablefmt="grid"))

+------------------------+
| Tables_in_webshop_db   |
| products               |
+------------------------+


Of course an empty database is useless. We aim to create a database for a webshop, thus we would like to store our products in the databse. However, before we can insert our products we need to create a schema (table).

Creating a table is done using the following SQL:
```SQL
CREATE TABLE table_name (
    col_1 data_type_1 optional_constraint_1,
    col_2 data_type_2 optional_constraint_2,
    col_3 data_type_3 optional_constraint_3,
    ...
    multi_column_constraint
);
```

The first parameters are simple: `table_name` will be the name of your table in the database and `col_i` the name of the columns. The `data_type_i`, `optional_constraint_i`, and `multi_column_constraint` fields require more thought however.

### Data Types

We introduce some of the most essential data types (in MySQL) here:

**Numeric**
* Integers: they come in different sizes from `TINYINT` (1 Bytes), `SMALLINT` (2 Bytes), `MEDIUMINT` (3 Bytes), `INT` (4 Bytes), to `BIGINT` (8 Bytes)
* Floating Point: should be familier from other programming languages, namely `FLOAT` and `DOUBLE`
* Fixed point: unlike floating point numbers, the number of digits to the left and right of the point is fixed in data type `DECIMAL(M,D)`. This is a parameterised data type where `M` is the total number of digits and `D` the number of digits after the decimal point.

You will sometimes also see data types like `INT` parameterised as `INT(11)`. The number here indicates the number of digits that are displayed back to the user when querying the data, but does not affect the range of values that can be stored. Moreover we can also add the `UNSIGNED` keyword after to ensure the numbers are positive

**Strings**
* `CHAR(M)`: stores `M` characters. If fewer than `M` characters are provided, the string will be padded and if more are provided the string will be truncated
* `VARCHAR(M)`: stores any number of characters up to `M`. If more characters are inserted, the string will be truncated.
* `BLOB/TEXT`: used to store *large* data. `BLOB` stores binary data, whereas `TEXT` is used to store large strings
* `ENUM('A','B','C')`: just like in other programming languages, this allows the storage of only a set number of different strings.

**Date & Time**
* `DATE`: stores a date in 'YYYY-MM-DD'
* `DATETIME`: stores a date together with a time on that day in 'YYYY-MM-DD hh:mm:ss'
* `TIMESTAMP`: stores the time as the number of seconds since January 1st 1970

**Others**
* JSON
* Spatial Data

Note that the exact data types and how they are stored may differ per DBMS. For example, the `ENUM` type does not exist in PostgreSQL.

## Constraints

We introduce some of the most essential constraints (in MySQL) here:
* `PRIMARY KEY`: used to set a column as the primary key
* `UNIQUE`: used to ensure that all values in the column are unique
* `NOT NULL`: used to disallow NULLs in a column
* `FOREIGN KEY REFERENCES table_name(col)`: ensures that values in this column match a column in another table
* `DEFAULT value`: used to set a default value for a column if data is not provided for the column when inserting a row
* `AUTO_INCREMENT`: used to create a simple ID that is just an integer which increase withe every data row insertion
* `CHECK (condition)`: used to add a custom constraint/condition on a column
* `GENERATED ALWAYS AS`: used to derive a columns value from data in other columns

Note that some of these constraints may also be added after the column definitions. This is necessary if a constraint spans multiple columns:
* `PRIMARY KEY (col_1, col_2)`: added after column definitons to add a primary constraint on multiple columns
* `UNIQUE (col_1, col_2)`: added after column definitions to ensure that the combination of 2 columns is always unique
* `FOREIGN KEY (col_1, col_2) REFERENCES table_name(col_1, col_2)`: added after column definitions to add a foreign key constraint on multiple columns 

## Creating our table

Now that we have all our building blocks, let's create a table!

We want the following data to be stored:
* The product ID
* The product name
* The product description
* The product price
* The stock

In [None]:
table_def = '''
CREATE TABLE products (
  ID INT PRIMARY KEY AUTO_INCREMENT,
  Name VARCHAR(255) NOT NULL UNIQUE,
  Description TEXT,
  Price DECIMAL(10, 2) NOT NULL,
  Stock Int UNSIGNED 
);
'''

cur.execute(table_def)

We created our first table, but for now, it is empty:

In [54]:
cur.execute("SELECT * FROM products;")

records = cur.fetchall()

# Fetch column names
col_names = [desc[0] for desc in cur.description]

# Print the records in table format
print(tabulate(records, headers=col_names, tablefmt="grid"))

+------+-----------+-----------------------------------------------+---------+---------+
|   ID | Name      | Description                                   |   Price |   Stock |
|    1 | product 1 | product 1 is of very high quality             |   12.5  |     100 |
+------+-----------+-----------------------------------------------+---------+---------+
|    2 | product 2 |                                               |   25    |      25 |
+------+-----------+-----------------------------------------------+---------+---------+
|    3 | product 3 | An improved version of our popular product 1! |   99.99 |      50 |
+------+-----------+-----------------------------------------------+---------+---------+


## Insert Data

We would like to insert data into our table. This can be done using the following syntax
```SQL
INSERT INTO table_name (col_1, col_2, col_3) VALUES
(value_1, value_2, value_3),
(value_1, value_2, value_3),
(value_1, value_2, value_3),
...;
```
As you can see from the syntax, we can choose what columns to insert data into. When an *incomplete* row is inserted, all the columns not selected are filled in with their default values which are dependent on the data types for the column and the constraints placed on the column 

In [12]:
insert = '''
INSERT INTO products (Name, Description, Price, Stock) VALUES
('product 1', 'product 1 is of very high quality', 12.50, 100),
('product 2', NULL, 25.00, 25),
('product 3', 'An improved version of our popular product 1!', 99.99, 50);
'''

cur.execute(insert)

In [29]:
cur.execute("SELECT * FROM products;")

records = cur.fetchall()

# Fetch column names
col_names = [desc[0] for desc in cur.description]

# Print the records in table format
print(tabulate(records, headers=col_names, tablefmt="grid"))

+------+-----------+-----------------------------------------------+---------+---------+
|   ID | Name      | Description                                   |   Price |   Stock |
|    1 | product 1 | product 1 is of very high quality             |   12.5  |     100 |
+------+-----------+-----------------------------------------------+---------+---------+
|    2 | product 2 |                                               |   25    |      25 |
+------+-----------+-----------------------------------------------+---------+---------+
|    3 | product 3 | An improved version of our popular product 1! |   99.99 |      50 |
+------+-----------+-----------------------------------------------+---------+---------+


As you can see the products were successfully inserted and the ID was automatically incremented!

All the data we inserted satisfied the constraints we put on the columns, but what would happen if we broke a constraint?

In [16]:
duplicate_insert = '''
INSERT INTO products (Name, Description, Price, Stock) VALUES
('product 1', 'this is a duplicate of product 1', 00.00, 0),
'''

try:
  cur.execute(insert)
except Exception as exc:
  print(f"ERROR: {exc}")

ERROR: 1062 (23000): Duplicate entry 'product 1' for key 'products.Name'


We correctly run into an error because we insert a second product with the name 'product 1', which is not possible as per our `UNIQUE` constraint placed on the column *name*

# Update
---

There may be times where we want to alter the schema. We want to add or remove a constraint, or add an additional column to store more data.
In these case we can make use of the following SQL to do this

```SQL
ALTER TABLE table_name
ADD col_name data_type col_constraint,
MODIFY col_name data_type col_constraint,
RENAME COLUMN old_col_name TO new_col_name,
DROP constraint_name,
DROP col_name
```

Let's imagine that we would like to handle our product IDs differently. Instead of a number, we want to use a 10 character product code.

We could do this in two ways:
* We can remove the ID column and add a new ID column
* We can modify the ID column

We can try both out and see what happens

In [32]:
alter_table = '''
ALTER TABLE products
DROP ID,
ADD ID CHAR(10)
'''

try:
  cur.execute(alter_table)
except Exception as exc:
  print(f"ERROR: {exc}")

ERROR: 1062 (23000): Duplicate entry '' for key 'products.PRIMARY'


The code above threw an error because when removing the ID column and adding it again all rows (now without an ID) are given the empty string as their ID, which violates the `PRIMARY KEY` constraint.

Note that dropping the column ID did not remove the `PRIMARY KEY` constraint. Removing constraints, like the `PRIMARY KEY`, has to be done separately

In [33]:
alter_table = '''
ALTER TABLE products 
MODIFY ID CHAR(10);
'''

cur.execute(alter_table)

With the code above we were succesfully able to alter the table and changed the type of the column. The data inside the table remains the same as shown below:

In [55]:
cur.execute("SELECT * FROM products;")

records = cur.fetchall()

# Fetch column names
col_names = [desc[0] for desc in cur.description]

# Print the records in table format
print(tabulate(records, headers=col_names, tablefmt="grid"))

+------+-----------+-----------------------------------------------+---------+---------+
|   ID | Name      | Description                                   |   Price |   Stock |
|    1 | product 1 | product 1 is of very high quality             |   12.5  |     100 |
+------+-----------+-----------------------------------------------+---------+---------+
|    2 | product 2 |                                               |   25    |      25 |
+------+-----------+-----------------------------------------------+---------+---------+
|    3 | product 3 | An improved version of our popular product 1! |   99.99 |      50 |
+------+-----------+-----------------------------------------------+---------+---------+


## Updating the Data

Now that we altered the table, we would like to change the product IDs of our products to their new 10 character identifiers. For this we utilise the following code:
```SQL
UPDATE table_name SET
col_1 = value_1,
col_2 = value_2,
...
WHERE condition
```

We can now update our products as follows

In [None]:
cur.execute("UPDATE products SET ID = 'PRD0000001' WHERE Name = 'product 1';")
cur.execute("UPDATE products SET ID = 'PRD0000002' WHERE Name = 'product 2';")
cur.execute("UPDATE products SET ID = 'PRD0000003' WHERE Name = 'product 3';")

KeyboardInterrupt: 

In [59]:
cur.execute("SELECT * FROM products;")

records = cur.fetchall()

# Fetch column names
col_names = [desc[0] for desc in cur.description]

# Print the records in table format
print(tabulate(records, headers=col_names, tablefmt="grid"))

+------+-----------+-----------------------------------------------+---------+---------+
|   ID | Name      | Description                                   |   Price |   Stock |
|    1 | product 1 | product 1 is of very high quality             |   12.5  |     100 |
+------+-----------+-----------------------------------------------+---------+---------+
|    2 | product 2 |                                               |   25    |      25 |
+------+-----------+-----------------------------------------------+---------+---------+
|    3 | product 3 | An improved version of our popular product 1! |   99.99 |      50 |
+------+-----------+-----------------------------------------------+---------+---------+


# Delete
---

It may happen that we want to remove data from our database or maybe even a whole table.

In terms of our simple database, we might want to remove a product once we are not able to sell it anymore. To remove a row we can make use of the following SQL:
```SQL
DELETE FROM table_name WHERE condition;
```

Let's say that we cannot sell product 2 anymore then we can run the following SQL code

In [63]:
cur.execute("DELETE FROM products WHERE ID = 'PRD0000002'")

KeyboardInterrupt: 

As you can see below, product 2 was removed from the table:

In [None]:
cur.execute("SELECT * FROM products;")

records = cur.fetchall()

# Fetch column names
col_names = [desc[0] for desc in cur.description]

# Print the records in table format
print(tabulate(records, headers=col_names, tablefmt="grid"))

Now that we are not selling product 2 anymore, our most popular product, the webshop unfortunately went out of business and we want to clean it up.

What we can do first is remove all the data in our table using the `TRUNCATE` keyword as follows:

In [None]:
cur.execute("TRUNCATE products;")

As can be seen below, all data was removed from the table

In [64]:
cur.execute("SELECT * FROM products;")

records = cur.fetchall()

# Fetch column names
col_names = [desc[0] for desc in cur.description]

# Print the records in table format
print(tabulate(records, headers=col_names, tablefmt="grid"))

DatabaseError: 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Finally we would like to remove the table itself as well. For this we can make use of the `DROP` keyword as follows

In [None]:
cur.execute("DROP TABLE products;")

To see that the `products` table was removed we can run `SHOW TABLES` again

In [None]:
cur.execute("SHOW TABLES;")

records = cur.fetchall()

# Fetch column names
col_names = [desc[0] for desc in cur.description]

# Print the records in table format
print(tabulate(records, headers=col_names, tablefmt="grid"))