<a href="https://colab.research.google.com/github/brendanpshea/database_sql/blob/main/Database_05_Data_Modeling_and_ER_Diagrams.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Data Modeling and E-R Diagrams
### Brendan Shea, PhD


**Data modeling** is like building the blueprints for a house. But instead of creating spaces for people to live in, we're making space for data to live in a database. This chapter will guide you through the steps of drawing up these blueprints for your database, starting from your initial concept to the final design.

To make things more interesting and practical, we're going to walk through this process using a case study. We'll be helping Wednesday Addams, who wants to build a database for her online shop selling quirky magical items. By the end of this chapter, you'll be equipped with the knowledge to create your own robust and efficient database designs.

## Introduction to the Case Study: Wednesday's "Web Shop"
To aid us in our exploration of data modeling, we're going to enlist the help of the famous Wednesday Addams.

No stranger to the peculiar, Wednesday has a fascinating project in her hands - she's creating an e-commerce "Web shop" for trading all things magic. Not your everyday magic rabbits or hats, mind you, but the kind of magic that makes the hairs on the back of your neck stand up. Be it a broom that sweeps itself or a teapot that sings cryptic riddles, Wednesday's website will be a marketplace for items steeped in enchantment.

Now, magic items aren't your usual commodities, each possesses its own set of peculiar traits, meaning Wednesday will need a well-planned database to keep track of them. So, where does she start? What does she need to consider?

For beginners, Wednesday will need a Data Model - a kind of visual map that outlines how all the data within her website will relate to each other. This model will serve as the blueprint for her entire operation. Will she need a separate category for potions and amulets, or should they all be under 'items'? The Data Model can help decide that.

Next, she will consider Conceptual Modeling. This is a way of simplifying the complexity of her data into a higher-level diagram. With an array of enchanted items, from self-pouring teapots to mirrors that tell uncomfortable truths, a conceptual model will help Wednesday and her team to get a general understanding of the data without getting entangled in details.

Then, there are the Business Rules. These are the laws that govern her database, defining how it operates. For instance, should an item be held by multiple owners at the same time? How would trades be conducted? Business rules help answer these questions and provide structure to the data. They can be Structural, which dictates how the data is organized, or Procedural, which dictates how the operations are performed.

To visualize her database and its relationships more intuitively, Wednesday will rely on Entity-Relationship Diagrams (ERDs). Using ERDs, she can represent entities - like witches, potions, or flying brooms - and show the relationships between them. Whether she uses Chen's notation or Crow's Foot, these diagrams will be invaluable in understanding the structure of her data.

Distinguishing between entities and their attributes is also key. Entities are the things she wants to keep track of - like a magic item or a customer. Attributes are the specific pieces of data associated with each entity - like the item's age or the customer's name. Some attributes may be mandatory, like the name of an item, while others may be optional, like its curse status.

Understanding relationships between entities and deciding on Unique Identifiers (UIDs) and Primary Keys will also be part of her journey. Relationships determine how one entity is associated with another. UIDs and primary keys, meanwhile, help in uniquely identifying each record in her database.

Wednesday's challenge doesn't stop at the Conceptual model. She will also have to build a Physical Model, which includes specific technical details about how her database will be implemented.

This chapter will delve into these concepts, using Wednesday's peculiar case as a backdrop. We’ll uncover the intricacies of managing many-to-many relationships, choosing appropriate data types, and much more. We hope you're ready for a fun, macabre, and informative ride! After all, as Wednesday would remind us, "Life's not all lovely thorns and singing vultures, you know."

## Data Models as Blueprints

"Imagine building a haunted house without a blueprint," Wednesday Addams might say, "Sure, you could probably throw something together, but will it withstand the first ghoulish storm or the weight of an attic full of bewitched artifacts?" Similarly, a **data model** serves as the blueprint in the world of databases. It's a simplified representation---usually visual, mathematical, or symbolic---of complex real-world data structures and the relationships between them.

British statistician George E.P. Box once said, "All models are wrong, but some are useful." He meant that while no model can perfectly encapsulate the complexity and unpredictability of the real world, good models can simplify that complexity into something understandable and useful. They help us predict outcomes, make informed decisions, and understand systems better.

When it comes to databases, data models are indispensable tools for several reasons:

1.  *Planning and Structuring Data:* A data model helps organize data systematically and logically. This aids in understanding what data is needed and how it should be organized. For Wednesday, this could mean figuring out how to categorize her array of magic items and their various attributes in a coherent way.

2.  *Increasing Efficiency:* A well-planned data model enhances the efficiency of a database by reducing data redundancy and improving data integrity. It helps ensure that the system can effectively store, retrieve, and manage data. This means that when a witch in Timbuktu orders a self-sweeping broom at midnight, the order is processed smoothly without any hitches.

3.  *Improving Communication:* Data models provide business and technical teams a shared language. They serve as a collaboration tool, helping stakeholders visualize data structures and validate requirements. So, whether it's Wednesday's coven of web developers or the users of her magical marketplace, everyone can better understand how the data works.

So how might a data model aid Wednesday in designing her magic item shopping site? Well, imagine she's got an ancient wand listed on her website. This wand could have numerous attributes: its material, its age, the spells it can cast, its price, and so on. Now, how does she link this wand to its previous owners, or to the customers interested in buying it? And what about the transaction data? The answers to these questions lie in creating a robust data model, which will help her design an efficient, user-friendly website that even the most discerning witches and wizards would enjoy using.

Remember, a data model is not an end in itself but a means to an end. As Wednesday's peculiar sense of humor might prompt her to say, "A good data model is like a good tombstone. It stands the test of time, and even if it doesn't tell the whole story, it provides enough information to pique one's curiosity."

## Conceptual Modeling: A Bird's-Eye View of Your Data Universe

Just as Wednesday Addams might lay out the pieces of a disquieting jigsaw puzzle of a graveyard on a cold, gloomy evening, conceptual modeling involves arranging complex data into a clear, comprehensive picture. It's all about creating a high-level, simplified diagram of a database. This model doesn't delve into the nitty-gritty of technical details but instead focuses on the big picture: the critical entities in the database and their relationships.

Let's take a moment to decode that a bit. When we speak about **entities**, we're referring to the significant objects or concepts in our data universe. In Wednesday's case, an entity could be a 'magic item', a 'customer', or a 'transaction'.  On the other hand, **relationships** are the associations or interactions between these entities. For example, a 'customer' might 'purchase' a 'magic item', forming a relationship between those two entities.

A **conceptual (logical)** model captures these entities and relationships in a manner that's easily understood, not just by database designers or developers, but by non-technical stakeholders too. It's an invaluable tool that fosters better team communication and understanding.

But how does Wednesday Addams, purveyor of all things magical and mysterious, create a conceptual model for her magic item website?

- First, she'd identify the key entities in her database. These could be 'magic items', 'customers', 'transactions', 'suppliers', or anything else significant to her website. Each entity would be a major category of data that she needs to keep track of.

- Next, she'd identify the relationships between these entities. Does a customer 'buy' a magic item? Does a supplier 'supply' a magic item? These relationships would illustrate how the different entities interact with each other.

- Then, she'd represent all these entities and relationships in a high-level, easy-to-understand diagram. This might look like a bunch of boxes (representing entities) connected by lines (representing relationships), but to Wednesday, it would be an essential roadmap for her website's data structure.

By creating this conceptual model, Wednesday would establish a clear, coherent view of her website's data. It would be a simplified snapshot of her entire database, giving her and her team a strong foundation to build upon. And as Wednesday might dryly note, "Just like planning a good prank, the success of a project often lies in the preparation."

## Business Rules: The Laws Governing Your Data Universe

Imagine a rule at the Addams Family dinner table: "Always compliment Morticia's deadly nightshade salad." That's a business rule, but in the world of databases. Precisely defined, **business rules** are explicit, actionable, and business-specific criteria that provide a robust framework for an organization's decision-making, behavior, and operations. They act as the structural and operational directives guiding a system. (Note: In the context of data modeling, a "business" might be anything from a scientific research project to a video game--it doesn't necessarily have to involve money or sales).

In conceptual modeling, business rules play a fundamental role. They shape the structure and flow of data and precisely define business operations, directly influencing how entities and relationships are formed and manipulated. Comprehending these business rules aids in creating a data model that mirrors the realities of the business.

Business rules can be of two types:

1.  **Structural Rules:** These rules pertain to the organization of data within the system. They outline the relationships between entities and stipulate how data elements correlate. An example in Wednesday's case could be "Each magic item must be affiliated with one category."

2.  **Procedural Rules:** These rules govern the operations performed within the system. They outline the guidelines for processes and transactions that interact with the data. For instance, a rule like "An item must be paid for before it can be shipped" is a procedural rule.

Business rules function as **constraints** on data--specific boundaries placed on the data or operations within the system. They serve to maintain data integrity and reliability. For instance, Wednesday could have a rule such as "The listed price of a magic item must be a positive number.  
Now, considering Wednesday's uniquely peculiar magic item shop, here are five potential business rules:

1. Each magic item should have at least one associated magical property. (Structural Rule)
2.  A customer must provide a valid magical identification number to make a purchase. (Procedural Rule)
3.  A magic item cannot belong to multiple categories (e.g., potions, amulets). (Structural Rule)
4.  Customers can only purchase items if their magical credit score exceeds a certain threshold. (Procedural Rule)
5.  All transactions must be recorded with the customer's ID, item ID, and the date of purchase. (Structural Rule)

These business rules, whether invisible directives or explicit constraints, are the governing principles for Wednesday's magic item shop. Like a well-spun spiderweb, they provide a consistent, reliable, and efficient framework for managing data. Or as Wednesday might put it, "In a world where the norm is peculiar, rules are your only constant."

## Entity-Relationship Diagrams: Mapping the Data Landscape

If the Addams Family mansion were a database, an Entity-Relationship Diagram (ERD) would be its floor plan. ERDs are graphical tools that allow us to map out and visualize the various entities in a database and the relationships between them. In Wednesday's case, creating an ERD for her magic item shop would help her better understand how different data elements interact with one another.

An ERD is essentially a collection of symbols representing entities (think: tables), attributes (think: columns), and relationships (think: connections). Now, there are several ways we can draw an ERD, and two popular notations are Chen's and Crow's Foot.

### Chen's Notation

Invented by Peter Chen in the 1970s, this notation represents entities as rectangles, attributes as ovals, and relationships as diamonds. Lines are used to link these symbols together. The relationship symbols also contain verbs that describe the interaction between entities. The basic symbols are:

- Entities are represents by RECTANGLES.
- Atttributes are represented by OVALS.
- Attributes that are keys are UNDERLINED.
- Relationships are represented by DIAMONDS.
- Cardinality and Optionality (more on these latter) are represted by LABELS on LINES. N means "many" and 1 means "one."

So, suppose Wednesday were using Chen's notation. In that case, she might have a rectangles labeled 'Customer' and 'Item', an oval labeled 'Name', and a diamond labeled 'Buys' to represent the relationship between a customer and the items they purchased.

![Simple_Chen](https://github.com/brendanpshea/database_sql/raw/main/images/Wed_Chen_Simple.svg)



Here, the "M" and "N" indicate that there is a "many-to-many" (cardinality) relationship between Customers and Items. That is, one customer can buy many items, and Item of a given type can be bought by many customers.

### Crow's Foot Notation

This notation, with its simplified design, is widely used for its ease of interpretation. In this notation:

- Entities are represented by RECTANGLES
- Relationships are represented by LINES directly connecting those rectangles
- Attributes are written insides the rectangles
- The 'Crow's Foot' - three lines at the end of the relationship line - represents the 'many' in a 'one-to-many' or 'many-to-many' relationship.
- A "one" (as opposed to "many") is a single line.

For example, Wednesday might connect a rectangle labeled 'Customer' to another labeled 'Item' with a line that has a crow's foot near the 'Item' rectangle. This would represent that one customer can buy many items.


Entity-Relationship diagrams can also be described by the use of **"ERDish" sentences**. These sentences allow us to express the elements of an ERD in a simple, English-like language. For example, some ERDish sentence for a relationship might be:

1. "A Customer (entity) can Buy (relationship) many Items (entity).
2. "A Customer (entity) Has a Name (attribute)."
3. "An Item (entity) can be Bought (relationship) by many Customers (entity)."

You don't necessarily need to inlude the labels such as "(entity)"--I've simply used them here to make there more explicit. This form of representation aids in making ERDs more readable and understandable, especially for stakeholders without a technical background.


![Simple_Crow](https://github.com/brendanpshea/database_sql/raw/main/images/Wed_Crow_Simple.svg)

As Wednesday Addams starts designing her ERD, she'd choose the notation that suits her best and start defining her entities (e.g., Customers, Items, Transactions), their attributes (e.g., Name, Price, Date), and their relationships (e.g., Buys, Sells, Contains). Her ERD would help her visualize her database structure, enabling her to understand how data flows and interacts within her system.

## Entities and Attributes: Defining the Characters of Your Database Story

As Wednesday knows well, every member of the Addams Family has their unique quirks, just as entities in a database have their distinctive attributes. Entities represent the 'things' or objects we want to collect data about, while attributes are the specific data points associated with each entity.

In the context of Wednesday's magic item shop, potential entities could include 'Customers', 'Magic Items', 'Transactions', and 'Suppliers'. Each of these entities would have its unique attributes. For example, the 'Customer' entity might possess attributes such as 'Name', 'Contact Information', and 'Magical Affiliation', whereas the 'Magic Items' entity could have attributes like 'Item Name', 'Price', 'Category', and 'Cursed Status'.

During the identification process of attributes, Wednesday would distinguish between:

1.  **Mandatory** vs **Optional**: Mandatory attributes must have a value for every instance of an entity. In contrast, optional attributes might not always have a value. For example, the 'Customer Name' could be a mandatory attribute, while 'Magical Affiliation' might be optional, given not all customers are magical beings.
  - In SQL, a mandatoary attribute will eventually be coded with a NOT NULL contraint during the table creation process.
  - On Chen diagrams, optional attributes are sometimes indicated with dotted (as opposed to solid) lines.

2.  **Volatile** vs **Non-volatile**: Volatile attributes change frequently, like 'Item Quantity' for the 'Magic Items' entity, whereas non-volatile attributes are relatively stable, such as 'Item Name'.

After pinpointing her entities and their attributes, Wednesday could craft ERDish sentences to illustrate her data model more understandably:

1.  "A Customer (entity) [Mandatory] has a Name (attribute)."
2.  "A Magic Item (entity) [Optional] may have a Cursed Status (attribute)."
3.  "A Supplier (entity) [Mandatory] has a Contact Information (attribute)."
4.  "A Transaction (entity) [Volatile] can change its Total Amount (attribute)."
5.  "A Customer (entity) [Non-volatile] maintains a Stable Magical Affiliation (attribute)."

In database design, like the crafting of a mysterious potion, every ingredient matters. By meticulously identifying entities and their attributes, Wednesday lays the foundation for an efficient and effective system. After all, even in the peculiar world of the Addams Family, a well-structured plan can make all the difference.


## Unique Identifiers (UIDs) and Primary Keys: The Magical Keys to Unlocking Organized Data

Let's delve back into the mystical world of Wednesday Addams, where every potion has a unique formula, and each customer in her magic item shop has a unique identity. In database terms, these distinct identities are known as Unique Identifiers (UIDs) or Primary Keys.

A **Unique Identifier (UID)** is a specific attribute that sets each record apart within a table. The UID that primarily identifies records within a table is recognized as a **Primary Key.** Remember, these are not just ordinary keys - they hold the power to uniquely identify each record, and no two records can share the same value. EVERy TABLE MUST HAVE A PRIMARY KEY, MADE UP OF ONE OR MORE ATTRIBUTES (the caps are because this is so important!).

Take Wednesday's 'Customer' entity for example, which has attributes like 'Name', 'Contact Information', 'Email', 'Magical Affiliation', 'Favorite Item', 'Last Purchase Date', and 'Referral Source'. At first glance, 'Email' stands out as a potential UID since it is unique for each customer. However, emails can change over time or even become invalid, leading to potential instability in our data. For this reason, Wednesday decides against using 'Email' as a primary key.

So, what's next for Wednesday?

1.  **Candidate UIDs** and **Primary Keys:** Given that none of the existing attributes perfectly fit the role of a UID, Wednesday chooses to create an artificial 'Customer ID'. This ID is an alphanumeric combination, such as 'CUST1234', and serves as the ideal candidate for both a UID and Primary Key.

2.  **Natural** vs **Artificial UIDs/Keys**: Wednesday had natural key options, like 'Email', derived from the data itself. However, considering their potential to change and cause instability, she opts for an artificial key. The 'Customer ID', an alphanumeric value, provides stable identification as it remains constant and isn't subject to change.

3.  **Simple** vs **Composite Keys:** A primary key can be simple (based on a single attribute) or composite (based on multiple attributes). Although 'Name' and 'Email' together could form a composite key, it could still suffer from potential instability if the customer changes their name or email. Hence, Wednesday selects a simple key, 'Customer ID', as her primary key.

4.  **Integer Keys** vs **Alphanumeric Keys:** When creating artificial keys, Wednesday could choose integers (e.g., 1, 2, 3) or alphanumeric values (e.g., CUST123). She opts for the alphanumeric route, providing an added layer of context ('CUST' implies it's a customer ID) and capacity (the combination of letters and numbers allows for a greater number of unique IDs).

In the process of choosing keys, Wednesday considers these guiding principles:

-   Uniqueness: The key should uniquely identify each record.
-   Stability: The key should remain stable over time.
-   Simplicity: Ideally, the key should be a single attribute or a minimal combination of attributes.
-   Consistency: The same key should be used consistently across different tables to represent the same data.

After considering all these factors, Wednesday decides that 'Customer ID' is the perfect Primary Key for her 'Customer' table. It's as unique as a magical sigil, as stable as a centuries-old spell, and as simple and consistent as the elemental symbols in her spellbook.

As every sorcerer needs a unique name to control their magic, every record in a database needs a unique identifier. And so, Wednesday, armed with her carefully chosen UIDs and Primary Keys, is one step closer to mastering the arcane art of databases.

On ERDs, keys are indicated by being UNDERLINED (in Chen) or explicitly labeled (Crow's Foot.)