# **Lab 01: Creating Tables within a Database**

Welcome to our hands-on exercise designed to guide you through the fundamental process of creating tables within a database using SQL. This tutorial is aimed at beginners and those looking to reinforce their understanding of how databases are structured and manipulated. Whether you're a student, a professional looking to switch to a tech-oriented role, or just someone curious about database management, this exercise will equip you with the skills necessary to start shaping your own databases.

In this session, we will explore various SQL commands, particularly focusing on the CREATE TABLE statement, which serves as the foundation for storing data in a structured way. You'll learn how to define tables with different types of fields, set constraints to ensure data integrity, and understand how these tables become the building blocks for robust database systems.

By the end of this exercise, you will not only be able to create your own tables but also appreciate the importance of precise data structuring in facilitating effective data storage, retrieval, and management. Get ready to take your first steps into the world of databases, where a well-structured table can make a world of difference in data handling!

## **Exploring Dataset**
The dataset is designed around a fictitious eCommerce company with millions of records to let a data analyst analyze multiple tables related to different aspects of a business. The database "dualcore" is set up, and six tables are created: employees, products, customers, orders, order_details, and suppliers.

1. Employees Table:
    - Columns: emp_id, fname, lname, address, city, state, zipcode, job_title, email, active, salary

2. Products Table:
   - Columns: prod_id, brand, name, price, cost, shipping_wt

3. Customers Table:
   - Columns: cust_id, fname, lname, address, city, state, zipcode

4. Orders Table:
   - Columns: order_id, cust_id, order_date
   - Description: This table manages order information, including order IDs, customer IDs (linked to the customers table), and order dates.

5. Order_details Table:
   - Columns: order_id, prod_id
   - Description: This table handles the details of each order, linking order IDs to the products purchased (linked to the products table).

6. Suppliers Table:
   - Columns: supp_id, company, contact, address, city, state, zipcode, phone


You can download the delimited text files from this link: https://drive.google.com/drive/folders/1JdZxGaeRddmhmEYNOs7p79f_3zEc9wuM?usp=sharing


The LOAD command is used to populate these tables with data from corresponding CSV files. The syntax for LOAD command is as follows:

```sql
LOAD DATA LOCAL INFILE '/path/to/data/file/{filename}.txt' INTO TABLE dualcore.{tablename};
```

The above command loads data from the specified CSV file into the respective table in the "dualcore" database. The columns in the CSV files match the columns in the corresponding tables. This process efficiently imports large datasets into the database, allowing data analysts to perform various analyses and queries on the data.

# The CREATE TABLE Statement

The `CREATE TABLE` syntax is used to create a new table in a database.  

There are three versions of the statement each allowing a SQL developer to create tables in different ways. Let's look at all of them.


1. **The Basic `CREATE TABLE` Statement**: This is the standard SQL method for creating a table. It allows you to define the table's structure, including column names, data types, and any constraints. The syntax has been discussed in the previous responses.

2. **Using `CREATE TABLE LIKE` Statement** (Table Creation Based on Existing Table): This method creates a new table with the same structure as an existing table, including column names, data types, and constraints. However, it does not copy any data from the existing table.

3. **The CTAS or `CREATE TABLE AS SELECT` Statement** (Table Creation with Data): This method creates a new table and populates it with data from an existing table or the result of a query. It is useful when you want to clone an existing table or create a new table based on specific data from an existing table.

Let's look at them one by one.

## **Connecting to and Showing the Database**

By using the following command you can load the Jupyter extension for MySQL database server runing on your machine.

In [None]:
%load_ext sql

The following commands will help you connect to the specific database schema within your MySQL instance. In this case the name of the schema is **dualcore**.

Connect to the database using the following command. The general syntax is:
```
%sql mysql://user:password@server[:port]/dbname
```

In [None]:
%sql mysql://root:root@localhost:3306/dualcore

UsageError: Line magic function `%sql` not found.


## The basic `CREATE TABLE` Statement

This is a simplfied syntax of the `CREATE TABLE` statement that is most often used by data analysts when tehy are working with the database tables.
```
CREATE [ TEMPORARY | TEMP ] [database_name.schema_name.]TABLE [IF NOT EXISTS] <table_name>
(
    <col1> <type> [{NOT NULL | NULL | UNIQUE | PRIMARY KEY | DEFAULT <value>}]
    [,<col2> <type> [<col_constraint>]…]
    [<table_constraint>
    [,<table_constraint>… ]]
);
```

Where `table-constraint` can be:
```
[ CONSTRAINT <constraint_name> ]
    {
    UNIQUE (<col>[,<col>…] ) |
    PRIMARY KEY (<pkcol_name>[,<pkcol_name>…] ) |
    FOREIGN KEY (<fkcol_name>[,<fkcol_name>…] )
    }
```

The `CREATE TABLE` syntax is used to create a new table in a database. Let's break down the various components of the syntax:

1. `[ TEMPORARY | TEMP ]`: This optional clause allows you to create a temporary table that will be automatically dropped at the end of the session or when the connection is closed.

2. `[database_name.schema_name.]`: This optional part allows you to specify the database and schema (also known as a namespace) where the table will be created. If not specified, the table will be created in the current database and schema.

3. `TABLE`: This keyword indicates that you are creating a new table.

4. `[IF NOT EXISTS]`: This is an optional clause that ensures the table is created only if it does not already exist. If the table already exists and this clause is specified, the `CREATE TABLE` command will not produce an error.

5. `<table_name>`: This is the name of the table you want to create.

6. `<col1> <type> [{NOT NULL | NULL | UNIQUE | PRIMARY KEY | DEFAULT <value>}]`: These are the column definitions for the table. Each column is defined by its name (`<col1>`) and data type (`<type>`). You can also specify optional constraints for each column like `NOT NULL` (meaning the column cannot have a NULL value), `UNIQUE` (ensuring that the values in the column are unique), `PRIMARY KEY` (defining the column as the primary key for the table), or `DEFAULT <value>` (providing a default value for the column if no value is specified during an insert).

7. `[,<col2> <type> [<col_constraint>]…]`: Additional columns and their constraints can be added to the table definition separated by commas.

8. `[<table_constraint> [,<table_constraint>… ]]`: These are optional table-level constraints that can be applied to the table. They can include `UNIQUE`, `PRIMARY KEY`, or `FOREIGN KEY` constraints.

9. `[ CONSTRAINT <constraint_name> ]`: This optional clause allows you to assign a name to the constraint for better manageability.

10. `{ UNIQUE (<col>[,<col>…] ) | PRIMARY KEY (<pkcol_name>[,<pkcol_name>…] ) | FOREIGN KEY (<fkcol_name>[,<fkcol_name>…] ) }`: These are the definitions for the table-level constraints that can be applied to the table.

    - `UNIQUE`: This constraint ensures that the combination of values in the specified column(s) is unique for each row in the table.
    - `PRIMARY KEY`: This constraint designates the specified column(s) as the primary key for the table, which means it will uniquely identify each row in the table.
    - `FOREIGN KEY`: This constraint establishes a relationship between the column(s) in the current table and the column(s) in another table, enforcing referential integrity.

### Example 1: Creating a Table

Let's create an example table called `payment_transactions` to record payment transactions. We'll include columns to store transaction details such as transaction ID, customer ID, payment amount, payment date/time, and payment method.

In [None]:
%%sql

CREATE TABLE payment_transactions (
    txn_id INT PRIMARY KEY,
    customer_id INT NOT NULL,
    txn_amount DECIMAL(10, 2) NOT NULL,
    txn_method VARCHAR(50) NOT NULL DEFAULT 'Credit Card',
    txn_date DATE NOT NULL,
    txn_time TIME NOT NULL,
    txn_datetime TIMESTAMP NOT NULL
--  , FOREIGN KEY (customer_id) REFERENCES t_customers (customer_id)
--  FOREIGN KEY is commented as there is no t_customers table as of now
);

-- We are using the name t_customer for the customers table referenced here
-- so as to not to confuse with the custoemrs table in the dualcore database.

Now let's create the t_customers table to store the data for customers in the sample database.

In [None]:
%%sql

CREATE TABLE t_customers (
    customer_id INT PRIMARY KEY,
    fname VARCHAR(50) NOT NULL,
    lname VARCHAR(50) NOT NULL,
    address VARCHAR(100) NOT NULL,
    city VARCHAR(50) NOT NULL,
    state VARCHAR(50) NOT NULL,
    zipcode VARCHAR(10) NOT NULL,
    email VARCHAR(100) NOT NULL
);

### Checking if tables have been created

Using the `SHOW TABLES` statement we can check all the tables that have been created in the database.

In [None]:
%%sql

SHOW TABLES;

Here you can see the the `payment_transactions` and `t_customers` table have been created.

Now, to see the structure of the tables use the `DESCRIBE table_name` statement as follows:

In [None]:
%%sql

DESCRIBE payment_transactions;

In [None]:
%%sql

DESCRIBE t_customers;

As the two tables above are linked to each other, we will later see how to create a `FORIGN KEY` reference between them using the `ALTER TABLE` statement.

## The `CREATE TABLE LIKE` Statement

The `CREATE TABLE LIKE` statement in SQL is used to create a new table called `new_table` with the same structure as an existing table called `existing_table`. However, it does not copy any data from the existing_table; it only replicates the structure, including column names, data types, and constraints.

The following is the syntax of the statement:
```
CREATE TABLE new_table LIKE existing_table;
```

Here's a breakdown of the statement:

- `CREATE TABLE new_table`: This part of the statement indicates that a new table named new_table will be created.
- `LIKE existing_table`: This is the crucial part of the statement. The LIKE keyword is used to specify that the structure of the new table should be based on an existing table named existing_table. Instead of explicitly defining the columns, data types, and constraints for the new table, the LIKE keyword tells SQL to copy these details from the specified existing_table.

### Example 2: Creating table using `CREATE TABLE LIKE`

Now, let's create a copy of the existing `employees` table for seeing how the `CREATE TABLE LIKE` statement works. We will name our new table `t_employees`.

In [None]:
%%sql

CREATE TABLE t_employees LIKE employees;

Describe the new table `t_employees`.

In [None]:
%%sql

DESCRIBE t_employees;

Check if there are any records in the new table by using a simple `SELECT` query.

In [None]:
%%sql

SELECT *
FROM t_employees
LIMIT 10;

To insert data into the `t_employees` table from the exiting `employees` table, we will use the `INSERT INTO` statement later.

Let's now focus on the CTAS statement.

## The CTAS or `CREATE TABLE AS SELECT` Statement

The `CREATE TABLE AS SELECT` statement in SQL allows you to create a new table and populate it with data from an existing table or the result of a query. This is a convenient way to create a new table based on specific data from an existing table or view. The name of this command is sometimes abbreviated to CTAS.

```
CREATE [ TEMPORARY | TEMP ] TABLE <table> [ (<col>[,<col>…] ) ] AS
    SELECT {* | <col_list>}
    FROM {table_name | view_name}
    [WHERE condition]
    [GROUP BY <col_name>]
    [HAVING condition]
    [ORDER BY <col_name>];
```

- `CREATE [ TEMPORARY | TEMP ] TABLE <table>`: This part of the statement indicates that a new table with the name <table> will be created. The `TEMPORARY` or `TEMP` keyword (optional) signifies that the table is temporary, and it will be automatically dropped at the end of the session or when the connection is closed.
- `[ (<col>[,<col>…] ) ]`: This optional part allows you to specify the columns for the new table. If not provided, the columns will be automatically determined based on the columns selected in the SELECT statement.
- `AS SELECT {* | <col_list>}`: The `AS SELECT` clause tells SQL that the data for the new table will be derived from the SELECT statement that follows. You can use `*` to select all columns from the source table or specify a `<col_list>` to select specific columns.
- `FROM {table_name | view_name}`: This part specifies the source table or view from which data will be selected to populate the new table.
- `[WHERE condition]`: An optional `WHERE` clause allows you to filter the rows from the source table or view before inserting them into the new table.
- `[GROUP BY <col_name>]`: If you want to perform grouping, you can use the `GROUP BY` clause followed by the column name (<col_name>) on which you want to group the data.
- `[HAVING condition]`: If you use the `GROUP BY` clause, an optional `HAVING` clause allows you to further filter the grouped data based on aggregate conditions.
- `[ORDER BY <col_name>]`: An optional `ORDER BY` clause allows you to sort the result set before inserting it into the new table.


### Example 3: using the CTAS statement to create an analytical table

As an example, we will write a CATS query involving customers, orders, order_details, and products tables to store
the records of first 100 customers who have bought any product for the brand "Dualcore".  
The table should include the customer id, fname, lname, city, zipcode from customers table, the order id,
the product id, the product name, brand, product price, product cost and profit margin on each product sold.

In [None]:
%%sql

CREATE TABLE first_100_customers_with_dualcore AS
    SELECT
        c.cust_id, c.fname, c.lname, c.city, c.zipcode,
        o.order_id, p.prod_id, p.name AS product_name,
        p.brand, p.price AS product_price,
        p.cost AS product_cost, (p.price - p.cost) AS profit_margin
    FROM customers c
        JOIN orders o ON c.cust_id = o.cust_id
        JOIN order_details od ON o.order_id = od.order_id
        JOIN products p ON od.prod_id = p.prod_id
    WHERE p.brand = 'Dualcore'
LIMIT 100;

In this query, the `CREATE TABLE` AS statement creates a new table `first_100_customers_with_dualcore` with the same structure as the result of the `SELECT` query. The SELECT statement retrieves the necessary information from the relevant tables (customers, orders, order_details, and products) and calculates the profit margin for each product sold using the expression `(p.price - p.cost) AS profit_margin`. The `LIMIT 100` clause ensures that we insert records only for the first 100 customers who have bought any product for the brand "Dualcore".

After running this query, the table `first_100_customers_with_dualcore` will contain the records of the first 100 customers who have purchased products from the brand "Dualcore" along with the details like customer ID, first name, last name, city, zipcode, order ID, product ID, product name, brand, product price, product cost, and profit margin for each product sold. The table is created and populated in a single step.

Let's describe the table.

In [None]:
%%sql

DESCRIBE first_100_customers_with_dualcore;

And, now see the data as well

In [None]:
%%sql

SELECT *
FROM first_100_customers_with_dualcore
ORDER BY cust_id, order_id
LIMIT 10;

Voila!! We have successfully created tables using various type of `CREATE` commands