<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Database_DatabaseDesign.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## What is Conceptual Modeling?

Conceptual modeling is the first step in designing a database. It involves understanding the data requirements without focusing on technical details. Think of it as sketching a rough draft before creating a detailed blueprint. This process helps us determine what data needs to be stored and how it should be organized to reflect real-world scenarios accurately.

### Determining Business Rules

**Business rules** are the guidelines that define how data can be created, stored, and modified within the database. They ensure that the data accurately represents the business operations. In Wednesday Addams' Web Shop, several business rules need to be established to ensure the database supports the shop's operations effectively.

#### Examples of Business Rules

1.  Each order must be linked to a specific customer. This means that every time an order is created, it must reference an existing customer in the database.

2. Each product should have a stock count that decreases with each sale. This ensures that the database keeps track of inventory accurately.

3.  Each customer must have a unique email address to prevent duplicate accounts and ensure efficient communication.

4.  Orders should have a status (e.g., pending, shipped, delivered) to track their progress through the system.

These business rules help us understand the structure and constraints needed for the database.

### Identifying Entities and Attributes

The next step in conceptual modeling is identifying the entities and their attributes. An entity is a distinct object or thing in the system, and an attribute is a property or characteristic of an entity.

#### Example Entities and Attributes for Wednesday's Web Shop

1.  Customer

    -   Attributes:
        -   `CustomerID`: A unique identifier for each customer.
        -   `Name`: The full name of the customer.
        -   `Email`: The email address of the customer (must be unique).
        -   `Address`: The postal address of the customer.
        -   `PhoneNumber`: The contact number of the customer.
    -   Business Rule: Each customer must have a unique `Email`.
2.  Order

    -   Attributes:
        -   `OrderID`: A unique identifier for each order.
        -   `OrderDate`: The date the order was placed.
        -   `CustomerID`: A reference to the customer who placed the order.
        -   `TotalAmount`: The total cost of the order.
        -   `Status`: The current status of the order (e.g., pending, shipped, delivered).
    -   Business Rule: Each order must be linked to a valid `CustomerID`.
3.  Product

    -   Attributes:
        -   `ProductID`: A unique identifier for each product.
        -   `ProductName`: The name of the product.
        -   `Description`: A brief description of the product.
        -   `Price`: The cost of the product.
        -   `StockCount`: The number of units available in stock.
    -   Business Rule: `StockCount` must be updated with each sale.

### Relationships

Once we've identified the entities, we need to determine how they relate to each other. Relationships help us understand how data is connected within the database.

First, we need to consider relationship **cardinality**, which concerns the number of entities of one type that can be in that particular relationship with the other type.

1.   In a **one-to-one relationship**, a single instance of one entity is associated with a single instance of another entity. For example, if each customer has one loyalty card, the relationship between `Customer` and `LoyaltyCard` would be one-to-one.
2.  In a **one-to-many relationship**, a single instance of one entity is associated with multiple instances of another entity. For example, each customer can place multiple orders, but each order is placed by only one customer. Here, the relationship between `Customer` and `Order` is one-to-many.
3.  In a **many-to-many relationship**, multiple instances of one entity are associated with multiple instances of another entity. For example, each order can contain multiple products, and each product can be part of multiple orders. This relationship cannot be directly represented in a relational database and usually requires an intermediary entity to resolve it. We'll discuss this in the next section.

Along with cardinality, we must consider **optionality**. Here:

1. In an **optional relationship**, an entity can exist without being associated with another entity. For example, a student may not be enrolled in any courses yet, but they still exist in the system. Optional relationships are depicted with an open circle or using the notation `|o`-- for one-to-many optional relationships.

2. In a **mandatory relationship** An entity must be associated with another entity. For example, each order must be associated with at least one customer. Mandatory relationships are depicted with a solid line or using the notation `||`-- for one-to-many mandatory relationships.

### Creating an Initial List of Entities and Attributes

Based on the business rules and identified entities, let's create an initial list of entities and their attributes for Wednesday's Web Shop:

1.  Customer:

    -   `CustomerID`: A unique identifier for each customer.
    -   `Name`: The full name of the customer.
    -   `Email`: The email address of the customer (must be unique).
    -   `Address`: The postal address of the customer.
    -   `PhoneNumber`: The contact number of the customer.
2.  Order:

    -   `OrderID`: A unique identifier for each order.
    -   `OrderDate`: The date the order was placed.
    -   `CustomerID`: A reference to the customer who placed the order.
    -   `TotalAmount`: The total cost of the order.
    -   `Status`: The current status of the order (e.g., pending, shipped, delivered).
3.  Product:

    -   `ProductID`: A unique identifier for each product.
    -   `ProductName`: The name of the product.
    -   `Description`: A brief description of the product.
    -   `Price`: The cost of the product.
    -   `StockCount`: The number of units available in stock.

With these entities and attributes identified, we have a solid foundation for our database. Next, we will explore how these entities relate to each other and how to represent these relationships visually. This prepares us for the logical and physical modeling stages, where we will refine and implement our database design.

## Creating and Interpreting ER Diagrams

An **Entity-Relationship Diagram (ERD)** is a visual representation of the entities within a database and the relationships between them. ERDs help us understand the structure and flow of data, making it easier to design and implement databases. In this section, we will introduce the basics of creating and interpreting ER diagrams using Mermaid in Python.

An ER diagram consists of the following components:

1.  **Entities** are represented by rectangles, entities are objects or things in the system that have data stored about them. For example, `Customer`, `Order`, and `Product` are entities in Wednesday Addams' Web Shop.

2.  **Attributes** are represented by ovals or listed within the entity rectangles, attributes are properties or details about the entities. For instance, a `Customer` entity might have attributes like `name`, `custNumber`, and `email`.

3.  **Relationships** are represented by diamonds or lines connecting entities, relationships illustrate how entities are connected to each other. For example, a `Customer` places an `Order`, indicating a relationship between these two entities.

### Example ER Diagram

Let's consider an example ER diagram for a simple e-commerce system, including `Customer`, `Order`, and `Product` entities. We'll use **Mermaid** syntax to create this diagram.

The basic rules of Mermaid (when it comes to ER diagrams) are:

0. We start with the keyword `erDiagram`.

1. We can define entities with their attributes like this:
```mermaid
ENTITY_NAME {
    attributeType attributeName
    attributeType attributeName
}
```
2. We can connect entities with relationship lines:

-   One-to-One: `||--||` (mandatory on both sides)
-   One-to-Many: `||--o{` (mandatory on the "one" side, optional on the "many" side)
-   Many-to-Many: `}o--o{` (optional on both sides)
-   Optional One-to-One: `|o--||` (optional on one side, mandatory on the other)
-   Optional One-to-Many: `|o--o{` (optional on the "one" side, optional on the "many" side)

To create ER diagrams using Mermaid syntax in Python, we can use a helper function. Here's an example of how to define the function and create an ER diagram:

In [9]:


def er_diagram(graph):
    """ Create an ER diagram using Mermaid syntax. """
    import base64
    from IPython.display import Image, display

    graphbytes = graph.encode("utf8")
    base64_bytes = base64.b64encode(graphbytes)
    base64_string = base64_bytes.decode("ascii")
    display(Image(url="https://mermaid.ink/img/" + base64_string))

er_diagram("""
erDiagram
    CUSTOMER ||--o{ ORDER : places
    CUSTOMER {
        string name
        string custNumber
        string email
        string address
        string phoneNumber
    }
    ORDER ||--o{ PRODUCT : contains
    ORDER {
        int orderID
        string orderDate
        string status
        float totalAmount
    }
    PRODUCT {
        int productID
        string productName
        string description
        float price
        int stockCount
    }
""")


Here's how to read this diagram:
-   In an ER diagram, entities are represented by rectangles. Inside each rectangle, you will find the attributes of the entity, listed as lines within the rectangle. For example, the `CUSTOMER` entity includes attributes like `name`, `custNumber`, `email`, `address`, and `phoneNumber`.

-   Relationships between entities are shown by lines connecting the rectangles. Each relationship has a name that describes the nature of the relationship. For instance, the line labeled "places" connects `CUSTOMER` to `ORDER`, indicating that customers place orders.

-  Cardinality describes the numerical relationship between entities. It is depicted using symbols at the ends of the relationship lines. A `crow's foot` (three branching lines) represents "many," indicating that one instance of an entity can be related to many instances of another entity. For example, the `crow's foot` on the `ORDER` side of the "places" relationship shows that one customer can place many orders.

-    Optionality indicates whether the relationship is mandatory or optional. A small circle (or the absence of a circle) at the end of a relationship line denotes optionality. If there is no circle, it means the relationship is mandatory from the entity with the `crow's foot`. For example, in the "places" relationship, the absence of a circle on the `CUSTOMER` side means that each order must be placed by a customer, making it a mandatory relationship from the order's perspective.

For instance, in our diagram:

    -   A `CUSTOMER` can place many `ORDER`s (one-to-many).
    -   Each `ORDER` must belong to a `CUSTOMER` (mandatory).
    -   An `ORDER` can contain many `PRODUCT`s (one-to-many).
    -   Each `PRODUCT` must be part of an `ORDER` (mandatory).

Understanding these symbols and their meanings will help you accurately interpret the relationships and constraints within an ER diagram.

### An E-R Diagram: Wednesday Goes to School
For a more complex E-R diagram, let's take a look at the database for Wednesday's school.

In [10]:
er_diagram("""
erDiagram
    STUDENT ||--o{ ENROLLMENT : enrolls_in
    STUDENT ||--|| ADVISOR : assigned_to
    STUDENT {
        int StudentID
        string Name
        string Email
        string Major
        string Year
    }
    ENROLLMENT {
        int EnrollmentID
        int StudentID
        int CourseID
        string EnrollmentDate
        string Grade
    }
    COURSE ||--o{ ENROLLMENT : includes
    COURSE {
        int CourseID
        string CourseName
        int Credits
        string Department
    }
    ADVISOR {
        int AdvisorID
        string Name
        string Email
    }
""")

Here, we see the following:

-   STUDENT enrolls in many `ENROLLMENT`s (one-to-many, optional for students).
-   COURSE includes many `ENROLLMENT`s (one-to-many, optional for courses).
-   ENROLLMENT has attributes linking it to both `STUDENT` and `COURSE`.
-   STUDENT is assigned to one `ADVISOR`, and each `ADVISOR` oversees one `STUDENT` (one-to-one, mandatory).

## ANSI-standard SQL Datatypes
ANSI-standard SQL refers to the standardized version of SQL (Structured Query Language) that is defined by the American National Standards Institute (ANSI). This standardization ensures that SQL implementations across different database management systems (DBMS) follow a consistent set of rules and syntax, making SQL code more portable and interoperable across systems.


The following table outlines common datatypes defined in the ANSI SQL standard along with their descriptions:

| ANSI Datatype | Description | SQLite Affinity |
| --- | --- | --- |
| CHAR(n) | Fixed-length character string. `n` specifies the length. | TEXT |
| VARCHAR(n) | Variable-length character string. `n` specifies the maximum length. | TEXT |
| TEXT | Large variable-length character string. | TEXT |
| INT | Integer number. | INTEGER |
| SMALLINT | Small integer number. | INTEGER |
| BIGINT | Large integer number. | INTEGER |
| FLOAT | Floating-point number. | REAL |
| DOUBLE | Double-precision floating-point number. | REAL |
| DECIMAL(p,s) | Fixed-point number. `p` specifies precision (total digits), and `s` specifies scale (digits after the decimal). | NUMERIC |
| NUMERIC(p,s) | Similar to `DECIMAL` but used for arithmetic calculations requiring exact precision. | NUMERIC |
| DATE | Calendar date (year, month, day). | Varies |
| TIME | Time of day (hour, minute, second). | Varies |
| TIMESTAMP | Combination of `DATE` and `TIME`. | Varies |
| BOOLEAN | Logical TRUE or FALSE value. | NUMERIC |
| BLOB | Binary large object, used to store binary data. | BLOB |


### SQLite and "Type Affinities"
Unlike many SQL databases that enforce strict datatype constraints, SQLite uses a concept called **type affinity**. This means that while a column can have a ASNI-standard datatype declared, SQLite is flexible in terms of what it accepts for storage. For instance, you can store a string in a column declared as INTEGER.

Underneath the hood, SQLite stores all data (regardless of datatype) in one of five **storage classes**:

-   NULL: The value is a NULL value.
-   INTEGER: The value is a signed integer.
-   REAL: The value is a floating-point number.
-   TEXT: The value is a text string, stored using the database encoding (UTF-8, UTF-16BE, or UTF-16LE).
-   BLOB: The value is a blob of data, stored exactly as it was input.

Enterprise-scale RDBMs (Oracle, SQL Server, MySQL, Postgres) have a more complex way of storing date. However, so long as you write SQL code using ANSI-standard SQL, it should work with ANY of them.
