# Lecture 16. Set Up Delta Tables

## Learning Objectives

- You will understand how to use **CTAS statements** to create Delta tables.
- And we will learn how to add **table constraints** to an existing table.
- Lastly, we will see how to make a **copy of a Delta table**.

## CTAS

- In addition to regular `CREATE TABLE` statements, 
  we can use CTAS statements to create Delta tables.

  *CTAS statements* or *Create Table As Select statements* create and populate data tables using the output of a SELECT statement.

- Here is an example. We are creating table number 1 and fill this table by data retrieved from table number 2.

  ```SQL
  CREATE TABLE table_1
  AS SELECT * FROM table_2
  ```
  
- CTAS statements automatically infer schema information from query results, and do not support manual schema declaration.

- With CTAS statements, we can do simple transformations like changing column names or omitting columns from target tables during table creation.

  * **[CTAS: Filtering and Renaming Columns]** 
    In this example, the statement creates a new table, `table_1` containing a subset of columns from `table_2`. And we are also renaming the column 3.

    ```SQL
    CREATE TABLE table_1
    AS SELECT col_1, col_3 AS new_col_3 FROM table_2
    ```

  * **[CTAS: Additional Options]** 
    In addition, the `CREATE TABLE` clause contains several options.
    You can provide a descriptive comment for the table. This allows for easier discovery of table contents.

    ```SQL
    CREATE TABLE new_table
    COMMENT "Contains PII"
    PARTITIONED BY (city, birth_date)
    LOCATION '/some/path'
    AS SELECT id, name, email, birth_date, city FROM users
    ```

    Here we are adding a comment indicating that the table contains Personal Identifiable Information like the name and the email of the user.

    The underlying data of a data table can be partitioned in subfolders by the value of one or more columns.
    Here we are partitioning by the city and birthdate.
    Partitioning can improve the performance of huge delta tables.

    On the other hand, small to medium sized tables will not benefit from partitioning, because partitioning physically separates data files which results in a small files problem.
    This can prevent file compaction and efficient data skipping.

    As a best practice, you should default to non partition tables for most use cases when working with Delta tables.

    Lastly, the created table with CTAS statements can be an external table, so the data will be stored in an external location specified by the LOCATION keyword.

### CREATE TABLE vs. CTAS

Let us see this comparison between regular `CREATE TABLE` statements and CTAS statements.

<div style="text-align: center;">
<img src="../../assets/images/CREATE TABLE vs. CTAS.jpg" style="width:640px" >
</div> 

- Regular CREATE TABLE statements need manual schema declaration.

  * Here, for example, column 1 of type Integer, Column 2 of type String and column 3 of type Double.

  * While CTAS statements do not support manual schema declaration. They automatically infer schema information from query results.

- Regular CREATE TABLE statements create an empty table.

  * So, you need an INSERT INTO statement to load data into the table.

  * On the other hand, with CTAS statements, data will be inserted during table creation from the output of the SELECT statement.

## Table Constraints

Now, once you create your Delta table, either with a regular create table or CTAS statements, you can add constraints to your table.

```sql
ALTER TABLE table_name ADD CONSTRAINT constraint_name constraint_details
```

Databricks currently supports two types of table constraints, 
  * NOT NULL constraints and 
  * CHECK constraints.

In both cases, you must ensure that there is no data violating the constraint is already in the table prior to defining the constraint.

Once a constraint has been added to a table, new data violating the constraint would result in write failure.

In this example, we add a Check constraint to the date column of our table.

```sql
ALTER TABLE orders ADD CONSTRAINT valid_date CHECK (date > '2020-01-01');
```

Not that Check constraints look like standard WHERE clauses you might use to filter a dataset.

## Cloning Delta Lake Tables

Lastly, what if you want to back up or make a copy of your delta table?

For this data, Lake has two options for efficiently copying Delta Lake tables, either deep clone or shallow clone.

### Deep Cloning

*Deep clone* fully copies both data and metadata from a source table to a target.

The command is pretty simple.
`CREATE TABLE` and you provide the name of the new target table,
followed by `DEEP CLONE` keyboard and you indicate the name of the source table.

```sql
CREATE TABLE table_clone
DEEP CLONE source_table
```

This copy can occur incrementally.

So executing this command again can synchronize changes from the source to the target location.

And because all the data must be copied over, this can take a while for large data sets.
This is why you may need a shallow clone.



### Shallow Cloning

With *shallow clone*, you can quickly create a copy of a table 
since it just copies the Delta transaction logs.
That means there is no data moving during shallow cloning.

```sql
CREATE TABLE table_clone
SHALLOW CLONE source_table
```

Shallow clone is a good option, for example, to test out applying changes on a table without the risk of modifying the current table.

### Cloning Delta Lake Tables

- Cloning is a great way to copy production tables for testing your code in development.

- In either cases, deep or shallow, data modification applied to the cloned version of the table will be tracked and stored separately from the source, so it will not affect the source table.