<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Database_05_Design.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Database Design
### Database and SQL Through Pop Culture | Brendan Shea, PhD

Welcome to the exciting world of database design! In this chapter, we'll embark on a journey through the process of creating a database from scratch. We'll begin with the conceptual modeling phase, where we'll identify the key entities, attributes, and relationships in our spy academy database. From there, we'll move on to logical modeling, where we'll learn about the importance of keys, normalization, and join tables in ensuring data integrity and avoiding redundancy. Finally, we'll roll up our sleeves and dive into physical modeling, where we'll use SQL statements to actually create, modify, and delete tables in our database. Throughout this chapter, we'll use the spy academy scenario to provide concrete examples and bring the concepts to life. By the end of this chapter, you'll have a solid understanding of how to design and implement a database that can handle even the most covert operations. So, let's put on our secret agent hats and get started on this thrilling database design mission!

Learning Outcomes:

1.  Understand the three main phases of database design: conceptual, logical, and physical modeling
2.  Learn how to identify entities, attributes, and relationships in the conceptual modeling phase
3.  Discover the importance of keys in ensuring uniqueness and establishing relationships between tables
4.  Understand the role of normalization in reducing data redundancy and improving data integrity
5.  Learn how to resolve many-to-many relationships using join tables in the logical modeling phase
6.  Gain hands-on experience with SQL statements like CREATE TABLE, DROP TABLE, and ALTER TABLE in the physical modeling phase
7.  Understand the importance of constraints in enforcing data integrity and consistency
8.  Learn how to use SQL data types to define the structure of columns in a table
9. Understand the role of supertypes and subtypes in modeling inheritance relationships between entities

Keywords: database design, conceptual modeling, logical modeling, physical modeling, entities, attributes, relationships, keys, primary key, foreign key, normalization, normal forms, join tables, many-to-many relationships, SQL, CREATE TABLE, DROP TABLE, ALTER TABLE, constraints, data types, CASCADE, supertypes, subtypes, inheritance

## Conceptual Design at the Covert Academy

Welcome to the Covert Academy, the world's premier educational institution for aspiring secret agents! Our state-of-the-art facility is hidden beneath the streets of London and equipped with the latest in espionage technology. We offer a wide range of classes to train our students in the arts of surveillance, infiltration, disguise, and more.

To keep track of our students and classes, we need a well-designed database. Let's walk through the process of conceptual design for this database.

### What is Conceptual Design?

Conceptual design is the first step in creating a database. It involves **formulating business rules**, which define how the database should work and what constraints it should have. It also involves creating a **preliminary list of entities and attributes**, which represent the key concepts and data elements in the system. Finally, it involves creating a **preliminary Entity-Relationship Diagram (ERD)** to visually represent the entities and their relationships.

### Formulating Business Rules

**Business rules** are concise, unambiguous statements that define or constrain some aspect of the database. They are derived through a combination of **descriptive processes**, such as interviewing stakeholders to understand their needs and requirements, and **normative processes**, which involve making decisions about how the system should work.

For the Covert Academy, we have the following business rules:

1.  Each **student** must be enrolled in at least one **class**, but may be enrolled in multiple classes.
2.  Each **class** must have at least one **student** enrolled, but may have many students.

To write effective business rules:

-   Keep them concise and unambiguous
-   Ensure they are testable (you can determine if the system is complying with the rule)
-   Involve all relevant stakeholders in formulating and reviewing them
-   Prioritize them (some may be must-haves, others may be nice-to-haves)

### Preliminary List of Entities and Attributes

Based on our business rules, we can identify our preliminary **entities**, which are the key concepts or objects in our system, and their **attributes**, which are the data elements that describe each entity.

For the Covert Academy, our entities and attributes are:

**Student**

-   Name
-   Codename
-   Nationality
-   Specialization

**Class**

-   Name
-   Description
-   Instructor
-   Location

Note that at this stage, we don't include data types, identifier columns, or join tables. We're just focusing on the core entities and their key attributes.

### Preliminary ERD

Finally, we can create a preliminary ERD to visualize our entities and their relationship. Here's what it looks like for the Covert Academy:

In [None]:
import base64
from IPython.display import Image, display, HTML

def mm(graph):
    graphbytes = graph.encode("utf8")
    base64_bytes = base64.b64encode(graphbytes)
    base64_string = base64_bytes.decode("ascii")
    display(Image(url="https://mermaid.ink/img/" + base64_string))

mm("""
    erDiagram
    STUDENT |{--|{ CLASS : enrolls
    """)

This diagram shows that there is a many-to-many relationship between **STUDENT** and **CLASS** (a student can enroll in many classes, a class can have many students).

We'll refine this ERD in the next step, logical modeling, where we'll resolve the many-to-many relationship with a join table. But this preliminary ERD gives us a good starting point to visualize our system.


## What is Logical Modeling?
In the previous section, we laid the conceptual foundation for the Covert Academy's database. Now it's time to take that conceptual model and transform it into a **logical model** using the **relational model**.

Logical modeling is the process of taking a conceptual model and adapting it to fit a specific logical data model. In our case, we'll be using the **relational model**, which organizes data into **tables** (also known as relations) with **rows** (also known as tuples) and **columns** (also known as attributes).

In the relational model:

-   Each table should represent a single entity or concept
-   Each row in a table represents a specific instance of that entity
-   Each column in a table represents an attribute of that entity
-   Each table should have a **primary key**, a unique identifier for each row

### Resolving Many-to-Many Relationships

In our conceptual model, we had a many-to-many relationship between **STUDENT** and **CLASS**. However, in the relational model, we can't directly represent many-to-many relationships. Instead, we need to introduce a **join table**.

A join table is a table that breaks down a many-to-many relationship into two one-to-many relationships. It does this by having foreign keys to both of the original tables.

For the Covert Academy, our join table will be called **ENROLLMENT**. It will have the following structure:

**ENROLLMENT**

-   StudentID (Foreign Key to STUDENT)
-   ClassID (Foreign Key to CLASS)
-   EnrollmentDate

Now, instead of a direct many-to-many relationship, we have:

-   A one-to-many relationship between **STUDENT** and **ENROLLMENT**
-   A one-to-many relationship between **CLASS** and **ENROLLMENT**

### Choosing Primary Keys

Each table in our database needs a primary key. A primary key is a unique identifier for each row in a table. There are two main options for primary keys:

1.  **Natural Key**: A natural key is a key that uses one of the existing attributes of the entity. For example, we could use a student's email as the primary key for the STUDENT table. Natural keys can be convenient because they don't require an additional column. However, they can also be problematic if the natural key ever needs to change.
2.  **Surrogate Key**: A surrogate key is an artificial key that is created specifically to be the primary key. It's usually a simple integer or a universally unique identifier (UUID). Surrogate keys are often preferable because they are guaranteed to be unique and they never need to change.

For the Covert Academy, we'll use surrogate keys for all of our tables. We'll call these `ID` columns.

### Updated ERD

With these changes in mind, here's our updated ERD:

In [None]:
mm("""
erDiagram
    STUDENT ||--o{ ENROLLMENT : has
    ENROLLMENT }o--|| CLASS : is_for
""")

The tables are as follows:

**STUDENT** (ID (Primary Key), Name, Codename, Nationality, Specialization)

**CLASS** (ID (Primary Key), Name, Description, Instructor, Location)

**ENROLLMENT** (StudentID (Foreign Key to STUDENT), ClassID (Foreign Key to CLASS), EnrollmentDate)

## Normalization: Keeping Your Data in Line

Before we move on to physical modeling, let's take a moment to discuss a crucial concept in database design: **normalization**. (As it turns out, our table are already in 3NF. However, this won't always be the case!).

Normalization is the process of organizing data in a database to avoid data redundancy and improve data integrity. It involves dividing large tables into smaller tables and defining relationships between them based on the rules of **normal forms**.

Normalization is important because it:

1.  Minimizes data redundancy
2.  Avoids data anomalies (such as update and deletion anomalies)
3.  Simplifies data management
4.  Reduces data inconsistencies

There are several normal forms, each with its own set of rules. The most common are 1NF, 2NF, and 3NF. Let's dive into each one!

### First Normal Form (1NF)

A database is in 1NF if:

1.  Each column contains atomic values (indivisible values)
2.  There are no repeating groups of columns

For example, consider this non-normalized STUDENT table:

| ID | Name | Codename | Nationality | Specialization |
| --- | --- | --- | --- | --- |
| 1 | James | 007 | British | Espionage, Combat |
| 2 | Natasha | Black Widow | Russian | Combat, Infiltration |

This table is not in 1NF because the Specialization column contains multiple values. To bring it to 1NF, we would create a separate SPECIALIZATION table and establish a one-to-many relationship:

**STUDENT**

| ID | Name | Codename | Nationality |
| --- | --- | --- | --- |
| 1 | James | 007 | British |
| 2 | Natasha | Black Widow | Russian |

**SPECIALIZATION**

| ID | StudentID | Specialization |
| --- | --- | --- |
| 1 | 1 | Espionage |
| 2 | 1 | Combat |
| 3 | 2 | Combat |
| 4 | 2 | Infiltration |

### Second Normal Form (2NF)

A database is in 2NF if:

1.  It is in 1NF
2.  All non-key columns are fully dependent on the primary key

Consider this 1NF ENROLLMENT table:

| StudentID | ClassID | EnrollmentDate | ClassName |
| --- | --- | --- | --- |
| 1 | 1 | 2023-01-01 | Spy Gadgets |
| 1 | 2 | 2023-01-15 | Disguise 101 |
| 2 | 1 | 2023-01-01 | Spy Gadgets |

This table is not in 2NF because ClassName is dependent on ClassID, which is only part of the primary key (StudentID, ClassID). To bring it to 2NF, we move ClassName to the CLASS table:

**ENROLLMENT**

| StudentID | ClassID | EnrollmentDate |
| --- | --- | --- |
| 1 | 1 | 2023-01-01 |
| 1 | 2 | 2023-01-15 |
| 2 | 1 | 2023-01-01 |

**CLASS**

| ID | ClassName |
| --- | --- |
| 1 | Spy Gadgets |
| 2 | Disguise 101 |

### Third Normal Form (3NF)

A database is in 3NF if:

1.  It is in 2NF
2.  There are no transitive dependencies

A transitive dependency is when a non-key column depends on another non-key column.

Consider this 2NF CLASS table:

| ID | ClassName | InstructorID | InstructorName |
| --- | --- | --- | --- |
| 1 | Spy Gadgets | 1 | Q |
| 2 | Disguise 101 | 2 | M |

This table is not in 3NF because InstructorName is transitively dependent on InstructorID (a non-key column). To bring it to 3NF, we move InstructorName to a new INSTRUCTOR table:

**CLASS**

| ID | ClassName | InstructorID |
| --- | --- | --- |
| 1 | Spy Gadgets | 1 |
| 2 | Disguise 101 | 2 |

**INSTRUCTOR**

| ID | Name |
| --- | --- |
| 1 | Q |
| 2 | M |

And there you have it! By applying these normal forms, we ensure our database is well-structured, minimizes redundancy, and avoids anomalies.

Physical Modeling - Bringing the Database to Life
=================================================

Welcome back, future data masters! We've conceptually designed our database and transformed it into a logical model using the relational model and normalization techniques. Now it's time for the exciting part - actually creating our database!

What is Physical Modeling?
--------------------------

Physical modeling is the process of taking the logical model and implementing it in a specific database management system (DBMS). This involves defining the actual tables, columns, data types, and constraints in the database using the Data Definition Language (DDL) of the chosen DBMS.

For the Covert Academy's database, we'll be using **SQLite**.

What is SQLite?
---------------

SQLite is a lightweight, file-based DBMS. It's a popular choice for many applications because:

1.  It's serverless (the database is just a file)
2.  It's self-contained (no external dependencies)
3.  It's cross-platform (works on all major operating systems)
4.  It's open-source and free to use

SQLite, like most modern DBMSs, uses **SQL (Structured Query Language)** for defining and manipulating databases.

### What is ANSI Standard SQL?
**ANSI (American National Standards Institute)** is an organization that defines standards for various industries, including database systems. ANSI SQL is a standard that defines the SQL language.  Most modern DBMSs (Database Management Systems), including SQLite, follow the ANSI SQL standard to a large extent. This means that the core SQL syntax is the same across different DBMSs. However, each DBMS also has its own extensions and quirks.

While ANSI initiated the SQL standardization process, the **International Organization for Standardization (ISO)** has also played a crucial role, and typically adopts a standard very close to taht of ANSI.

### ANSI Standard SQL Data Types

ANSI SQL defines a set of standard data types. The most common ones are:

1.  **INTEGER**: A whole number.
2.  **REAL**: A floating-point number.
3.  **VARCHAR(n)**: A string of characters with a maximum length of n.
4. **CHAR(n)**: A string of characters with precise length of n.
5.  **BLOB**: Binary Large Object, used for storing large amounts of binary data.
6.  **DATE**: A date value (YYYY-MM-DD).
7.  **TIME**: A time value (HH:MM:SS).
8. **TEXT**: A long section of string data.
9. **JSONB**: A dataype to hold JSON data in binary format (to speed up queries).

SQLite supports most of these data types, although it has some of its own quirks:

-   SQLite has a small number of underlying "storage classes"  -- TEXT, NUMERIC, REAL, BLOB, NULL -- that it uses to store all values (regardless of SQL data type).
- SQLite is "dynamically typed", meaning that it determines the storage class for each value as is it inserted and doesn't "enforce" column data types, unlike most RDBMSs).

## The CREATE TABLE Statement

Now that we know about data types, we can start creating our tables! In SQL, we use the `CREATE TABLE` statement for this.

The general syntax is:

```sql
CREATE TABLE table_name (
    column1 datatype constraint,
    column2 datatype constraint,
    ...
    PRIMARY KEY (one or more columns)
);
```

For example, let's create the STUDENT table:

In [None]:
# connect to sqlite using sql magic
%load_ext sql
%sql sqlite:///covert_academy.db

In [None]:
%%sql
CREATE TABLE STUDENT (
    ID INTEGER PRIMARY KEY,  -- The primary key, an auto-incrementing integer
    Name VARCHAR(100) NOT NULL,  -- The student's name, cannot be NULL
    Codename VARCHAR(50),  -- The student's codename, can be NULL
    Nationality VARCHAR(50),  -- The student's nationality
    Specialization VARCHAR(100)  -- The student's specialization
);

 * sqlite:///covert_academy.db
Done.


[]

This creates a table named STUDENT with five columns. Note the use of ANSI standard data types and the comments explaining each column.

We can create the CLASS and ENROLLMENT tables similarly:

In [None]:
%%sql
CREATE TABLE CLASS (
    ID INTEGER PRIMARY KEY,  -- The primary key
    Name VARCHAR(100) NOT NULL,  -- The class name, cannot be NULL
    Description VARCHAR(200),  -- The class description
    Instructor VARCHAR(100),  -- The class instructor
    Location VARCHAR(100)  -- The class location
);

CREATE TABLE ENROLLMENT (
    StudentID INTEGER,  -- Foreign key to STUDENT table
    ClassID INTEGER,  -- Foreign key to CLASS table
    EnrollmentDate DATE,  -- The date of enrollment
    PRIMARY KEY (StudentID, ClassID),  -- Composite primary key
    FOREIGN KEY (StudentID) REFERENCES STUDENT(ID),  -- Foreign key constraint
    FOREIGN KEY (ClassID) REFERENCES CLASS(ID)  -- Foreign key constraint
);

 * sqlite:///covert_academy.db
Done.
Done.


[]

Note how in the ENROLLMENT table, we define a composite primary key (StudentID, ClassID) and also specify the foreign key constraints. Let's now use PRAGMA to see how our tables appear:

In [None]:
%%sql
PRAGMA table_info(STUDENT);

 * sqlite:///covert_academy.db
Done.


cid,name,type,notnull,dflt_value,pk
0,ID,INTEGER,0,,1
1,Name,VARCHAR(100),1,,0
2,Codename,VARCHAR(50),0,,0
3,Nationality,VARCHAR(50),0,,0
4,Specialization,VARCHAR(100),0,,0


In [None]:
%%sql
PRAGMA table_info(CLASS);

 * sqlite:///covert_academy.db
Done.


cid,name,type,notnull,dflt_value,pk
0,ID,INTEGER,0,,1
1,Name,VARCHAR(100),1,,0
2,Description,VARCHAR(200),0,,0
3,Instructor,VARCHAR(100),0,,0
4,Location,VARCHAR(100),0,,0


In [None]:
%%sql
PRAGMA table_info(ENROLLMENT);

 * sqlite:///covert_academy.db
Done.


cid,name,type,notnull,dflt_value,pk
0,StudentID,INTEGER,0,,1
1,ClassID,INTEGER,0,,2
2,EnrollmentDate,DATE,0,,0


## Column Constraints

Before we start inserting data into our database, let's discuss some important concepts that help maintain the integrity and consistency of our data: **constraints**.

Column constraints are rules applied to individual columns in a table. They restrict the type of data that can be stored in a column. The most common constraints are:

1.  **CHECK**: Ensures that a column's value satisfies a boolean expression.
2.  **DEFAULT**: Specifies a default value for a column when no value is provided.
3.  **NOT NULL**: Ensures that a column cannot have a NULL value.
4.  **UNIQUE**: Ensures that each value in a column is unique across the whole table.

Let's see how we can apply these constraints to our tables.

### The DROP TABLE Statement

But first, let's discuss how to drop a table. The `DROP TABLE` statement is used to remove a table definition and all its data. The syntax is simple:

```sql
DROP TABLE table_name;
```

This statement is irreversible, so use it with caution!

Now, let's recreate our tables with some constraints.

### Recreating the STUDENT Table

In [None]:
%%sql
DROP TABLE IF EXISTS STUDENT;  -- Drop the table if it already exists

CREATE TABLE STUDENT (
    ID INTEGER PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Codename VARCHAR(50) UNIQUE,  -- Codenames must be unique
    Nationality VARCHAR(50) DEFAULT 'Unknown',  -- Default nationality is 'Unknown'
    Specialization VARCHAR(100),
    Age INTEGER CHECK (Age >= 18)  -- Students must be at least 18 years old
);

 * sqlite:///covert_academy.db
Done.
Done.


[]

Here, we've added a UNIQUE constraint to the Codename column, a DEFAULT constraint to the Nationality column, and a CHECK constraint to ensure that all students are at least 18 years old.

### Recreating the CLASS Table
For the CLASS table, we've added a NOT NULL constraint to the Instructor column and a CHECK constraint to ensure that the StartDate is always before the EndDate.

In [None]:
%%sql
DROP TABLE IF EXISTS CLASS;

CREATE TABLE CLASS (
    ID INTEGER PRIMARY KEY,
    Name VARCHAR(100) NOT NULL,
    Description VARCHAR(200),
    Instructor VARCHAR(100) NOT NULL,  -- Every class must have an instructor
    Location VARCHAR(100),
    StartDate DATE,
    EndDate DATE,
    CHECK (StartDate < EndDate)  -- The start date must be before the end date
);

 * sqlite:///covert_academy.db
Done.
Done.


[]

### Recreating the ENROLLMENT Table
For the ENROLLMENT table, we've added a DEFAULT constraint to set the EnrollmentDate to the current date if no value is provided.

In [None]:
%%sql
DROP TABLE IF EXISTS ENROLLMENT;

CREATE TABLE ENROLLMENT (
    StudentID INTEGER,
    ClassID INTEGER,
    EnrollmentDate DATE DEFAULT CURRENT_DATE,  -- Default is the current date
    PRIMARY KEY (StudentID, ClassID),
    FOREIGN KEY (StudentID) REFERENCES STUDENT(ID),
    FOREIGN KEY (ClassID) REFERENCES CLASS(ID)
);

 * sqlite:///covert_academy.db
Done.
Done.


[]

And there we have it! Our Covert Academy database is now set up with constraints to ensure data integrity.

### Syntax for Creating Tables: A Quick Reference

As we've seen, SQL provides a variety of tools for defining the structure and constraints of our database tables. Let's summarize the syntax for some of the key concepts we've encountered.

### Defining a Primary Key

To define a primary key for a table, you can use the `PRIMARY KEY` constraint. This can be done in two ways:

1.  As part of the column definition:

   ```sql
CREATE TABLE table_name (
        column_name datatype PRIMARY KEY,
        ...
    );
```

2.  As a separate table constraint:

    ```sql
CREATE TABLE table_name (
        column_name datatype,
        ...,
        PRIMARY KEY (column_name)
    );
```

If the primary key is composed of multiple columns (a composite key), you must use the second form and list all the columns in the primary key:

```sql
CREATE TABLE table_name (
    column1 datatype,
    column2 datatype,
    ...,
    PRIMARY KEY (column1, column2)
);
```

#### Defining a Foreign Key

To define a foreign key, you use the `FOREIGN KEY` constraint. Again, this can be done as part of the column definition or as a separate table constraint:

```sql
CREATE TABLE table_name (
    ...,
    foreign_key_column datatype,
    ...,
    FOREIGN KEY (foreign_key_column) REFERENCES referenced_table(referenced_column)
);
```

#### The NOT NULL Constraint
To specify that a column cannot hold NULL values, use the `NOT NULL` constraint:

```sql
CREATE TABLE table_name (
    column_name datatype NOT NULL,
    ...
);
```

#### The UNIQUE Constraint

To ensure that all values in a column are different, use the `UNIQUE` constraint:

```sql
CREATE TABLE table_name (
    column_name datatype UNIQUE,
    ...
);
```

#### The CHECK Constraint
To ensure that a column's value satisfies a boolean expression, use the `CHECK` constraint:

```sql
CREATE TABLE table_name (
    ...,
    column_name datatype CHECK (boolean_expression),
    ...
);
```

#### The DEFAULT Constraint

To specify a default value for a column when no value is provided, use the `DEFAULT` constraint:

```sql
CREATE TABLE table_name (
    column_name datatype DEFAULT default_value,
    ...
);
```

These are the basic building blocks for defining the structure and integrity of your database tables. With these tools, you can ensure that your data is consistent, valid, and maintains the relationships you've defined.

## The DROP Statement: Cascading Effects and SQLite Specifics

In the last section, we briefly introduced the `DROP TABLE` statement as a way to remove a table from the database. While dropping tables is a straightforward concept, it can have some unexpected consequences, especially when dealing with tables that are related through foreign key constraints.

### The DROP TABLE IF EXISTS Syntax

One more thing to note: in our previous examples, we used the syntax `DROP TABLE IF EXISTS` instead of just `DROP TABLE`. What does this do?

`DROP TABLE IF EXISTS` is a safer way to drop tables. If the specified table doesn't exist, `DROP TABLE` would normally throw an error. However, with `IF EXISTS`, the statement doesn't throw an error if the table doesn't exist; it simply does nothing.

This is useful in scripts where you want to ensure that a table doesn't exist before creating it, but you don't want the script to fail if the table doesn't exist to begin with.

## ALTER TABLE: Modifying Your Database Structure

As your database evolves, you may find yourself needing to modify the structure of existing tables. Maybe you need to add a new column, change a column's data type, or add a new constraint. This is where the `ALTER TABLE` statement comes in.

The `ALTER TABLE` statement allows you to change the structure of an existing table without deleting and recreating it. This is useful because it preserves the existing data in the table.

### Adding a New Column

To add a new column to an existing table, you use the `ADD COLUMN` clause:

```sql
ALTER TABLE table_name
ADD COLUMN new_column_name datatype constraint;
```
For example, let's say we want to add a 'DateOfBirth' column to our STUDENT table:

In [None]:
%%sql
ALTER TABLE STUDENT
ADD COLUMN DateOfBirth DATE;

 * sqlite:///covert_academy.db
Done.


[]

We can know view the table schema to ensure it has been added.

In [None]:
%%sql
PRAGMA table_info(STUDENT);

 * sqlite:///covert_academy.db
Done.


cid,name,type,notnull,dflt_value,pk
0,ID,INTEGER,0,,1
1,Name,VARCHAR(100),1,,0
2,Codename,VARCHAR(50),0,,0
3,Nationality,VARCHAR(50),0,'Unknown',0
4,Specialization,VARCHAR(100),0,,0
5,Age,INTEGER,0,,0
6,DateOfBirth,DATE,0,,0


### Dropping a Column
We can drop (remove) a column using the drop table keyword:

```sql
ALTER TABLE table_name
DROP COLUMN column_name
```
Let's now drop the DateOfBirth column.  

In [None]:
%%sql
ALTER TABLE STUDENT
DROP COLUMN DateOfBirth;

 * sqlite:///covert_academy.db
Done.


[]

### Renaming Tables and Columns
We can also rename tables and columns. For examples, let's suppose we prefer table names like `Students`, `Classes`, and `Enrollments`.

In [None]:
%%sql
ALTER TABLE STUDENT
RENAME TO Students;

 * sqlite:///covert_academy.db
Done.


[]

In [None]:
%%sql
ALTER TABLE CLASS
RENAME TO Classes;

 * sqlite:///covert_academy.db
Done.


[]

In [None]:
%%sql
ALTER TABLE ENROLLMENT
RENAME TO Enrollments;

 * sqlite:///covert_academy.db
Done.


[]

### Modifying a Column's Definition

To change a column's data type in databases like Oracle, MySQL, or Postgres, you use the `MODIFY COLUMN` clause (or `ALTER COLUMN` in other databases):

```sql
ALTER TABLE table_name
MODIFY COLUMN column_name new_datatype;
```

For example, if you wanted to change the 'Specialization' column in the `STUDENT` table to have a maximum length of 200 characters:

```sql
ALTER TABLE STUDENT
MODIFY COLUMN Specialization VARCHAR(200);
```

You can also do something similar in these database to add constraints:

```sql
ALTER TABLE STUDENT
ADD CONSTRAINT CHK_Age CHECK (Age < 100);
```

However, SQLite has limited support for `ALTER TABLE`. While you can rename, add, or drop columns, you can't directly modify a column's datatype or constraints. To achieve this in SQLite, you need to use a workaround:


1.  Create a new column with the desired constraints (or new data type).
2.  Copy the data from the old column to the new column.
3.  Drop the old column.
4.  Rename the new column to the original name

For example, here's how we can add a constraint to make sure age is between 0 and 100.

In [None]:
%%sql
-- Step 1: Add a new column with the desired constraints
ALTER TABLE Students ADD COLUMN Age_NEW INTEGER CHECK (Age_NEW < 100);

-- Step 2: Copy the data from the old column to the new column
-- Not needed here, since our table is empty!
UPDATE Students SET Age_NEW = Age;

-- Step 3: Drop the old column
ALTER TABLE Students DROP COLUMN Age;

-- Step 4: Rename the new column to the original name
ALTER TABLE Students RENAME COLUMN Age_NEW TO Age;


 * sqlite:///covert_academy.db
Done.
0 rows affected.
Done.
Done.


[]

## Supertypes and Subtypes at the Spy Academy

As our Spy Academy database grows, we might realize that some of our entities have common attributes. For example, both students and instructors are people, and they likely share attributes like name, date of birth, and nationality. In database design, we can model this relationship using **supertypes and subtypes**.

A **supertype** is an entity that contains the common attributes and relationships of one or more other entities, which are called **subtypes**. The subtypes inherit the attributes and relationships of the supertype and can also have their own specific attributes and relationships.

In our Spy Academy example, we could have a `persons` supertype, with `students` and `instructors` as subtypes. Both students and instructors would inherit attributes like name and date of birth from the `persons` supertype, but they would also have their own specific attributes (like specialization for students and office number for instructors).

There are several ways to implement supertypes and subtypes in a relational database. We'll use the "Class Table Inheritance" approach, where each entity (supertype and subtypes) gets its own table, and the subtype tables reference the supertype table with a foreign key.

### Implementing Supertypes and Subtypes

Let's start by creating the `Persons` table, which will be our supertype:

In [None]:
%%sql
DROP TABLE IF EXISTS Persons;

CREATE TABLE Persons (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    date_of_birth DATE,
    nationality VARCHAR(50) DEFAULT 'Unknown'
);

 * sqlite:///covert_academy.db
Done.
Done.


[]

In this table, we've added a `DEFAULT` constraint to set the default nationality to 'Unknown' if not provided.

Now, let's create the `Students` and `Instructors` tables as subtypes. They will reference the `Persons` table with a foreign key:


In [None]:
%%sql

DROP TABLE IF EXISTS Students;
CREATE TABLE Students (
    person_id INTEGER PRIMARY KEY,
    codename VARCHAR(50) UNIQUE,
    specialization VARCHAR(200),
    FOREIGN KEY (person_id) REFERENCES Persons (id)
);

DROP TABLE IF EXISTS Instructors;
CREATE TABLE Instructors (
    person_id INTEGER PRIMARY KEY,
    office_number VARCHAR(20),
    FOREIGN KEY (person_id) REFERENCES Persons (id)
);

 * sqlite:///covert_academy.db
Done.
Done.
Done.
Done.


[]

In these tables, the `person_id` column is a foreign key that references the `id` column of the `Persons` table. This establishes the inheritance relationship. We've also made `person_id` `NOT NULL` to ensure every student and instructor is linked to a person.

Finally, let's modify the `Enrollments` table to include a `role` column, which can be either 'S' (for student) or 'I' (for instructor):

In [None]:
%%sql
DROP TABLE IF EXISTS Enrollments;
CREATE TABLE Enrollments (
    person_id INTEGER,
    class_id INTEGER,
    role CHAR(1) CHECK (role IN ('S', 'I')),
    PRIMARY KEY (person_id, class_id),
    FOREIGN KEY (person_id) REFERENCES Persons (id),
    FOREIGN KEY (class_id) REFERENCES Classes (id)
);

 * sqlite:///covert_academy.db
Done.
Done.


[]

Finally, let's recreate the `Classes` table. We'll no longer need to store an "instructor" attribute, since this will be handled by enrollment.

In [None]:
%%sql
DROP TABLE IF EXISTS Classes;
CREATE TABLE Classes (
    id INTEGER PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    description VARCHAR(200),
    location VARCHAR(100),
    start_date DATE,
    end_date DATE,
    CHECK (start_date < end_date)
);

 * sqlite:///covert_academy.db
Done.
Done.


[]

## Final ERD for Spy School
And here is the final ERD for our Spy School database.

In [None]:
mm("""
    erDiagram
    Persons {
        INTEGER id PK
        VARCHAR name
        DATE date_of_birth
        VARCHAR nationality    }

    Students {
        INTEGER person_id PK, FK
        VARCHAR codename
        VARCHAR specialization    }

    Instructors {
        INTEGER person_id PK, FK
        VARCHAR office_number     }

    Classes {
        INTEGER id PK
        VARCHAR name
        VARCHAR description
        VARCHAR location
        DATE start_date
        DATE end_date     }

    Enrollments {
        INTEGER person_id PK, FK
        INTEGER class_id PK, FK
        CHAR role     }

    Persons ||--o{ Students : "is a"
    Persons ||--o{ Instructors : "is a"
    Persons ||--o{ Enrollments : "teaches or takes"
    Classes ||--o{ Enrollments : "has"
    """
)

### Querying Supertypes and Subtypes

Querying supertypes and subtypes involves using joins. For example, to get the names and codenames of all students, we would do:

```sql
-- Since we don't have any data yet, we can't run this!
SELECT p.name, s.codename
FROM persons p
JOIN students s ON p.id = s.person_id;
```

And to get all people (both students and instructors) enrolled in a specific class:

```sql
SELECT p.name, e.role
FROM persons p
JOIN enrollments e ON p.id = e.person_id
WHERE e.class_id = 1;
```

Supertypes and subtypes allow us to model inheritance relationships in our database, reducing data redundancy and making our schema more flexible. They're a powerful tool in database design when used appropriately.

In [None]:
%%sql
--get PRAGMA for Classes
PRAGMA table_info(Classes);

 * sqlite:///covert_academy.db
Done.


cid,name,type,notnull,dflt_value,pk
0,ID,INTEGER,0,,1
1,Name,VARCHAR(100),1,,0
2,Description,VARCHAR(200),0,,0
3,Instructor,VARCHAR(100),1,,0
4,Location,VARCHAR(100),0,,0
5,StartDate,DATE,0,,0
6,EndDate,DATE,0,,0


## Inserting Sample Data
In the next chapter, we'll cover inserting data in much more detail. For now, though, here are a few rows of data so that we can test our database.

In [None]:
%%sql
-- Insert data into the Persons table
DELETE FROM Persons;
INSERT INTO Persons (Name, date_of_birth, Nationality)
VALUES
    ('James Bond', '1920-11-11', 'British'),
    ('Ethan Hunt', '1964-08-18', 'American'),
    ('Jason Bourne', '1970-09-13', 'American'),
    ('Natasha Romanoff', '1984-11-22', 'Russian'),
    ('Harry Hart', '1960-09-17', 'British'),
    ('Lorraine Broughton', '1990-11-10', 'British'),
    ('Evelyn Salt', '1980-09-14', 'American');

-- Insert data into the Students table
DELETE FROM Students;
INSERT INTO Students (person_id, codename, specialization)
VALUES
    (1, '007', 'Covert Operations'),
    (2, 'Raptor', 'Infiltration'),
    (4, 'Black Widow', 'Assassination'),
    (6, 'Wisdom', 'Counter-Intelligence');

-- Insert data into the Instructors table
DELETE FROM Instructors;
INSERT INTO Instructors (person_id, office_number)
VALUES
    (3, '101A'),
    (5, '221B'),
    (7, 'X-7');

-- Insert data into the Classes table
DELETE FROM Classes;
INSERT INTO Classes (name, description, start_date, end_date)
VALUES
    ('Espionage 101', 'Introduction to Spying', '2023-09-01', '2023-12-15'),
    ('Advanced Disguise', 'Becoming Anyone', '2023-09-01', '2023-11-30'),
    ('Infiltration Tactics', 'Getting In and Out', '2023-10-15', '2024-01-15'),
    ('Weapons Training', 'From Pens to Rocket Launchers', '2024-01-15', '2024-04-30');

-- Insert data into the Enrollments table
DELETE FROM Enrollments;
INSERT INTO Enrollments (person_id, class_iD, role)
VALUES
    (1, 1, 'S'),
    (2, 1, 'S'),
    (3, 1, 'I'),
    (4, 2, 'S'),
    (5, 2, 'I'),
    (6, 3, 'S'),
    (7, 3, 'I'),
    (1, 4, 'S'),
    (2, 4, 'S'),
    (4, 4, 'S'),
    (5, 4, 'I');

 * sqlite:///covert_academy.db
7 rows affected.
7 rows affected.
4 rows affected.
4 rows affected.
3 rows affected.
3 rows affected.
4 rows affected.
4 rows affected.
0 rows affected.
11 rows affected.


[]

## Sample Queries
Now, we can run some sample queries against the database. First, let's get a list of the students and the classes they are enrolled in.

In [None]:
%%sql
SELECT
  s.codename,
  c.name AS "class_name"
FROM
  Students s
  JOIN Enrollments e ON s.person_id = e.person_id
  JOIN Classes c ON e.class_id = c.id
WHERE e.role = 'S';

 * sqlite:///covert_academy.db
Done.


codename,class_name
007,Espionage 101
Raptor,Espionage 101
Black Widow,Advanced Disguise
Wisdom,Infiltration Tactics
007,Weapons Training
Raptor,Weapons Training
Black Widow,Weapons Training


We could also alter this query to include the information about the student from the `Persons` table.

In [None]:
%%sql
SELECT
  p.name AS "student_name",
  s.codename,
  c.name AS "class_name"
FROM
  Students s
  JOIN Enrollments e ON s.person_id = e.person_id
  JOIN Classes c ON e.class_id = c.id
  JOIN Persons p ON s.person_id = p.id
WHERE e.role = 'S';

 * sqlite:///covert_academy.db
Done.


student_name,codename,class_name
James Bond,007,Espionage 101
Ethan Hunt,Raptor,Espionage 101
Natasha Romanoff,Black Widow,Advanced Disguise
Lorraine Broughton,Wisdom,Infiltration Tactics
James Bond,007,Weapons Training
Ethan Hunt,Raptor,Weapons Training
Natasha Romanoff,Black Widow,Weapons Training


Or, we could get a count of the classes taught by each instructor.

In [None]:
%%sql
SELECT
  p.name AS "instructor_name",
  COUNT(e.class_id) AS "classes_taught"
FROM
  Instructors i
  JOIN Enrollments e ON i.person_id = e.person_id
  JOIN Persons p ON i.person_id = p.id
WHERE e.role = 'I'
GROUP BY i.person_id;

 * sqlite:///covert_academy.db
Done.


instructor_name,classes_taught
Jason Bourne,1
Harry Hart,2
Evelyn Salt,1


## Lab: Create Your Own Database

In this lab, you will design and implement a database related to a personal interest of your choice. This exercise will take you through the entire process of database design and implementation, from conceptualization to data insertion. **You can can complete directly in the Colab notebook. Just add "text cells" to write text, and use code cells starting with %%sql to write SQL queries.

### Step 1: Description of idea

Choose a topic for your database based on a personal interest. Your database should involve at least two related entities. Here are some ideas to get you started:

- A music collection (artists, albums, songs)
- A recipe book (recipes, ingredients, categories)
- A personal movie library (movies, directors, genres)
- A fitness tracker (workouts, exercises, muscle groups)
- A plant care diary (plants, care instructions, watering schedule)

Write 1-3 paragraphs describing your chosen topic and what kind of information your database will store. Consider what questions your database should be able to answer.

Output: A brief description of your database concept.

### Step 2: Business rules

Define at least 5 business rules for your database. These rules should describe the relationships between your entities and any constraints on the data.

Output: A list of at least 5 business rules.

### Step 3: Conceptual model (ERD)

Create an Entity-Relationship Diagram (ERD) for your database using Mermaid syntax. Include at least two entities and show the relationships between them.

Output: A Mermaid ERD diagram. You can use the following syntax:

mm("""
erDiagram
    STUDENT ||--o{ ENROLLMENT : has
    ENROLLMENT }o--|| CLASS : is_for
""")

### Step 4: Simple create table (SQLite)

Write SQLite commands to create the tables for your database. At this stage, don't worry about adding constraints - just define the basic structure of your tables.

Output: SQLite CREATE TABLE statements for each of your entities.

### Step 5: Drop tables, create again with full constraints

Now, enhance your table creation scripts. First, write statements to drop the existing tables. Then, recreate them with proper constraints, including primary keys, foreign keys, and any other constraints that enforce your business rules.

Output: SQLite DROP TABLE and CREATE TABLE statements with full constraints.

### Step 6: Practice alter table

Choose one of your tables and write an ALTER TABLE statement to add a new column or constraint that wasn't in your original design.

Output: An ALTER TABLE statement for one of your tables.

### Step 7: Simple insert

Write INSERT statements to add some sample data to your tables. Include at least two rows of data for each table.

Output: INSERT statements with sample data for each of your tables.

Remember, the goal of this lab is to apply the database design and SQL concepts you've learned to a topic that interests you. Be creative, and don't hesitate to ask for clarification if you're unsure about any step.

## Summary
-   Database design is a crucial process that ensures the efficient storage, retrieval, and management of data in a database.
-   The process of database design consists of three main phases: conceptual modeling, logical modeling, and physical modeling.
-   Conceptual modeling focuses on identifying the key entities, attributes, and relationships in the database, without worrying about the specific implementation details.
-   Logical modeling involves refining the conceptual model by introducing concepts like keys, normalization, and join tables to ensure data integrity and avoid redundancy.
-   Physical modeling involves using SQL statements to actually create the tables and define their structure, constraints, and relationships in the database.
-   Keys, such as primary keys and foreign keys, are essential for ensuring uniqueness and establishing relationships between tables.
-   Normalization is a technique used to reduce data redundancy and improve data integrity by organizing data into multiple tables based on their dependencies.
-   Join tables are used to resolve many-to-many relationships between entities in the logical modeling phase.
-   SQL statements like CREATE TABLE, DROP TABLE, and ALTER TABLE are used to create, modify, and delete tables in the physical modeling phase.
-   Constraints, such as NOT NULL, UNIQUE, and CHECK, are used to enforce data integrity and consistency in the database.
-   Supertypes and subtypes are used to model inheritance relationships between entities, allowing for more efficient storage and retrieval of data.
-   Designing a database is an iterative process that requires careful planning, attention to detail, and a good understanding of the business requirements and data relationships.

## Review WIth QUizlet

In [None]:
%%html
<iframe src="https://quizlet.com/930419891/learn/embed?i=psvlh&x=1jj1" height="600" width="100%" style="border:0"></iframe>

## Glossary

| Term | Definition |
|------|------------|
| ALTER TABLE t ADD COLUMN c | SQL command to add a new column to an existing table. |
| ALTER TABLE t DROP COLUMN c | SQL command to remove a column from an existing table. |
| ALTER TABLE t RENAME TO t2 | SQL command to change the name of an existing table. |
| ANSI | American National Standards Institute, which sets standards for various technologies, including SQL. |
| Attributes (Crow's foot ERD) | Characteristics or properties of entities that appear as text within boxes. |
| BLOB | Binary Large Object, a data type for storing large binary objects such as images or files in a database. |
| Business rule | A statement that defines or constrains some aspect of business structure or behavior. |
| CHAR(n) | Fixed-length character string data type, where n specifies the number of characters. |
| CHECK | A constraint that specifies a condition that must be true for each row in a table. |
| col_name datatype PRIMARY KEY | SQL syntax to define a column as the primary key when creating a table. |
| Conceptual model | High-level representation of data and relationships in a system, independent of physical implementation details. |
| CREATE TABLE | SQL command used to create a new table in a database. |
| DATE | Data type for storing date values (typically year, month, and day). |
| Descriptive process | Approach to database design that starts with existing data structures and refines them. |
| DROP TABLE | SQL command to remove an existing table and all its data from the database. |
| DROP TABLE t if exists | SQL command to remove a table only if it exists, preventing errors if the table doesn't exist. |
| Entities (Crow's foot ERD) | Distinct objects or concepts represented as boxes |
| First normal form (1NF) | A set of rules in database normalization that eliminates repeating groups and ensures atomic values. |
| FOREIGN KEY (c1) REFERENCES t (c2) | SQL constraint defining a column as a foreign key referencing another table's column. |
| INTEGER | Data type for storing whole numbers without fractional components. |
| ISO | International Organization for Standardization, which publishes standards for various technologies, including SQL. |
| JSONB | PostgreSQL data type for storing JSON data in a binary format, allowing for efficient querying and indexing. |
| Logical model | Detailed representation of data structures and relationships, independent of a specific database management system. |
| Natural key | A unique identifier for a database record that is formed from existing attributes with real-world meaning. |
| Normalization | Process of organizing data to reduce redundancy and improve data integrity. |
| Normative process | Approach to database design that starts with requirements and builds the structure from scratch. |
| NOT NULL | Constraint that ensures a column cannot contain null (empty) values. |
| Physical model | Representation of database design as implemented in a specific database management system. |
| PRIMARY KEY | Constraint that uniquely identifies each record in a database table. |
| REAL | Data type for storing single precision floating-point numbers. |
| Relationships (Crow's foot ERD) | Connections between entities in a Crow's Foot Entity-Relationship Diagram, represented as lines |
| Second normal form (2NF) | Normalization level that builds on 1NF and eliminates partial dependencies on the primary key. |
| Surrogate key | Artificial identifier assigned to an entity as a substitute for a natural key. |
| TEXT | Data type for storing variable-length character strings, typically with no specified maximum length. |
| Third normal form (3NF) | Normalization level that builds on 2NF and eliminates transitive dependencies. |
| TIME | Data type for storing time values (typically hour, minute, second). |
| UNIQUE | Constraint ensuring all values in a column or set of columns are distinct from one another. |
| VARCHAR(n) | Variable-length character string data type, where n specifies the maximum number of characters. |