# Data Modeling

### What is data modeling?

* Process of creating a visual representation of a whole information system or parts of it to communicate connections between data points and structures
* Goal: Illustrate the types of data used/stored, the relationships among the data, and the ways the data can be grouped/organized
* Data Models are built around business needs!
* Data Models rules/requirements are made upfront or adapted into an existing model
* Collect information about business requirements -> translate into data structures
* Data modeling employs standardized schemas and formal techniques
  * Common, consistent, and predictable way of defining and managing data resources

### Types of Data Models

#### Conceptual Data Models (Highest Level / Least Specific)
* AKA Domain Models
* Offers a big-picture view!
  * What will the system contain?
  * How will this be organized?
  * What business rules are involved?
* Typically this is created when gathering initial project requirements
* Very basic just to see the generic relationships
```
  TIME   
    |
  SALES - PRODUCT
    |
  STORE
```

#### Logical Data Models
* Less abstract thatn Conceptual Data Models
* Generally provides greater details about the concepts/relationships
* May indicate data attributes
* Will not specificy technical system requirements
* Typically used in data warehouse design or reporting system development

```

| Date              |
|-------------------|
| Date Description  |
| Month             |
| Month Description |
| Year              |
| Week              |

         |

| Product ID (FK) |                    | Product ID           |
| Store ID (FK)   |                    |----------------------|
| Date (FK)       |                    | Product Description  |
|-----------------|          -         | Category             |
| Item Sold       |                    | Category Description |
| Sales Amount    |                    | Unit Price           |
| Week            |                    | Created              |
  
         |

| Store ID          |
|-------------------|
| Store Description |
| Region            |
| Region Name       |
| Created           |
```

#### Physical Data Models (Lowest Level / Most Specific)
* Will provide a schema for how the data is physically stored in a db
* Typically a finalized design that can be implemented as a relational database

```
| Date              | INTEGER      |
|-------------------|--------------|
| Date Description  | VARCHAR(30)  |
| Month             | INTEGER      |
| Month Description | VARCHAR(30)  |
| Year              | INTEGER      |
| Week              | INTEGER      |

              |

| Product ID (FK) | INTEGER  |             | Product ID           | INTEGER     |
| Store ID (FK)   | INTEGER  |             |----------------------|-------------|
| Date (FK)       | INTEGER  |             | Product Description  | VARCHAR(50) |
|-----------------|----------|       -     | Category             | INTEGER     |
| Item Sold       | INTEGER  |             | Category Description | VARCHAR(50) |
| Sales Amount    | FLOAT    |             | Unit Price           |   FLOAT     |
                                           | Created              |   DATE      |
  
               |
               
| Store ID          | INTEGER     |
|-------------------|-------------|
| Store Description | VARCHAR(50) |
| Region            | INTEGER     |
| Region Name       | VARCHAR(50) |
| Created           | DATE        |
```

#### Data Modeling Processes
* General Steps:
 1. Identify the entities
   * Every entity should be cohesive and logically discrete
 2. Identify key properties of each entity
   * What attributes are important to maintain?
 3. Identify relationships among entities
   * Specify the nature of these relationships
   * Ex: Customer ABC lives at 123 Street and uses XYZ payment method
 4. Map attributes to entities completely
 5. Assign keys as needed and decide on normalization requirements
   * We want to maximize performance and reduce storage space
 6. Finalize and validate the data model
   * Repeat and refine the data model
   
### Types of Data Modeling

#### Hierarchical Data Models
* Represents one-to-many relationships in a treelike format
```

          o
        /   \
      x       y
    /  \       \
   1    2       3
```

#### Relational Data Models
* Data segments are explicitly joined through the use of tables
* Imagine a joined version of the physical data model

* Entity-relationships (ER) Data Models:
  * Uses formal diagrams to represent the relationships between entities in a database
  * Emphasizes efficient storage
* Object-oriented data models
  * Objects are abstractions of real-world entities
  * Objects are grouped in class hierarchies w/ associated features
* Dimensional data models
  * Designed for optimizing data retrieval in a data warehouse
  * Makes it easier to locate information for reporting/retrieval

* Popular Dimensional Data Models
  * STAR schema: 
      * Data is organized into facts and dimensions
      * Each fact is surrounded by its dimensions in a star-like pattern
```
      a   b
       \ /
        o 
       / \
     c    d
```
  * Snowflake Schema: 
      * Like star but with additional layers of associated dimensions
      
```
      a  -  b
       \   /
      |  o  |
        /  \
      c  -  d
```

### Benefits of Data Modeling
* Makes it easier for stakeholders to understand data relationships
* Reduces errors in database development
* Increase document consistency / system design 
* Improves application and database performance
* Easy data mapping throughout the organization
* Ease and speed of database design
