# Worksheet 2: Data modelling with MongoDB

### Exercise 1: Identify documents and collections

#### Case Study: Online Bookstore

You are tasked with designing a MongoDB database for an online bookstore. The bookstore sells books, and each book can have multiple authors. Customers can purchase books, and each purchase can contain multiple books. Additionally, customers can leave reviews for books they have purchased.

#### Objectives:
1. Design the data model for the online bookstore.
2. Define the relationships between different entities.
3. Create sample documents for each collection.

#### Step-by-Step Instructions:

1. **Identify Entities and Relationships:**
    - **Books**: Each book has a title, ISBN, publication date, and a list of authors.
    - **Authors**: Each author has a name and a list of books they have written.
    - **Customers**: Each customer has a name, email, and a list of purchases.
    - **Purchases**: Each purchase has a date, customer reference, and a list of books purchased.
    - **Reviews**: Each review has a rating, comment, customer reference, and book reference.

2. **Design the Data Model:**
    - **Books Collection**:
        ```json
        {
            "_id": ObjectId,
            "title": String,
            "ISBN": String,
            "publication_date": Date,
            "authors": [ObjectId]  // References to Authors
        }
        ```
    - **Authors Collection**:
        ```json
        {
            "_id": ObjectId,
            "name": String,
            "books": [ObjectId]  // References to Books
        }
        ```
    - **Customers Collection**:
        ```json
        {
            "_id": ObjectId,
            "name": String,
            "email": String,
            "purchases": [ObjectId]  // References to Purchases
        }
        ```
    - **Purchases Collection**:
        ```json
        {
            "_id": ObjectId,
            "date": Date,
            "customer_id": ObjectId,  // Reference to Customer
            "books": [ObjectId]  // References to Books
        }
        ```
    - **Reviews Collection**:
        ```json
        {
            "_id": ObjectId,
            "rating": Number,
            "comment": String,
            "customer_id": ObjectId,  // Reference to Customer
            "book_id": ObjectId  // Reference to Book
        }
        ```

3. **Create Sample Documents:**

    - **Books Collection**:
        ```json
        {
            "_id": ObjectId("60c72b2f9b1d8b3a4c8e4d1a"),
            "title": "MongoDB Basics",
            "ISBN": "1234567890",
            "publication_date": ISODate("2021-01-01T00:00:00Z"),
            "authors": [ObjectId("60c72b2f9b1d8b3a4c8e4d1b")]
        }
        ```

    - **Authors Collection**:
        ```json
        {
            "_id": ObjectId("60c72b2f9b1d8b3a4c8e4d1b"),
            "name": "John Doe",
            "books": [ObjectId("60c72b2f9b1d8b3a4c8e4d1a")]
        }
        ```

    - **Customers Collection**:
        ```json
        {
            "_id": ObjectId("60c72b2f9b1d8b3a4c8e4d1c"),
            "name": "Jane Smith",
            "email": "jane.smith@example.com",
            "purchases": [ObjectId("60c72b2f9b1d8b3a4c8e4d1d")]
        }
        ```

    - **Purchases Collection**:
        ```json
        {
            "_id": ObjectId("60c72b2f9b1d8b3a4c8e4d1d"),
            "date": ISODate("2021-06-01T00:00:00Z"),
            "customer_id": ObjectId("60c72b2f9b1d8b3a4c8e4d1c"),
            "books": [ObjectId("60c72b2f9b1d8b3a4c8e4d1a")]
        }
        ```

    - **Reviews Collection**:
        ```json
        {
            "_id": ObjectId("60c72b2f9b1d8b3a4c8e4d1e"),
            "rating": 5,
            "comment": "Great book on MongoDB!",
            "customer_id": ObjectId("60c72b2f9b1d8b3a4c8e4d1c"),
            "book_id": ObjectId("60c72b2f9b1d8b3a4c8e4d1a")
        }
        ```

## Exerees 2: Identify database workload

In this exercise, you will extend your data modeling skills by identifying entities and attributes, quantifying entities, and analyzing read and write operations for different types of application users. You will also map these operations to application flows and quantify them.

#### Objectives:
1. Identify entities and their attributes.
2. Quantify the number of entities.
3. Identify read and write operations for different types of application users.
4. Map read and write operations to application flows.
5. Quantify the read and write operations.

#### Step-by-Step Instructions:

1. **Identify Entities and Attributes:**
    - **Books**: title, ISBN, publication_date, authors
    - **Authors**: name, books
    - **Customers**: name, email, purchases
    - **Purchases**: date, customer_id, books
    - **Reviews**: rating, comment, customer_id, book_id

2. **Quantify Entities:**
    - Estimate the number of each entity in the system:
        - **Books**: 10,000
        - **Authors**: 2,000
        - **Customers**: 50,000
        - **Purchases**: 200,000
        - **Reviews**: 500,000

3. **Identify Reads and Writes for Different App Users:**
    - **Customers**:
        - **Reads**:
            - Browse books
            - View book details
            - View purchase history
            - Read reviews
        - **Writes**:
            - Make a purchase
            - Leave a review
    - **Admin**:
        - **Reads**:
            - View all books
            - View all customers
            - View all purchases
        - **Writes**:
            - Add new books
            - Update book details
            - Remove books

4. **Map Reads and Writes to Application Flows:**
    - **Customer Flows**:
        - **Browse Books**:
            - Read operation
        - **View Book Details**:
            - Read operation for book details
            - Read operation for reviews
        - **View Purchase History**:
            - Read operation for purchase history
        - **Read Reviews**:
            - Read operation for reviews
        - **Make a Purchase**:
            - Write operation for purchase
        - **Leave a Review**:
            - Write operation for review
    - **Admin Flows**:
        - **View All Books**:
            - Read operation for all books
        - **View All Customers**:
            - Read operation for all customers
        - **View All Purchases**:
            - Read operation for all purchases
        - **Add New Books**:
            - Write operation for new book
        - **Update Book Details**:
            - Write operation for updating book details
        - **Remove Books**:
            - Write operation for removing book

5. **Quantify Reads and Writes:**
    - **Customers**:
        - **Reads**:
            - Browse Books: 100,000 reads/day
            - View Book Details: 50,000 reads/day
            - View Purchase History: 10,000 reads/day
            - Read Reviews: 30,000 reads/day
        - **Writes**:
            - Make a Purchase: 5,000 writes/day
            - Leave a Review: 2,000 writes/day
    - **Admin**:
        - **Reads**:
            - View All Books: 500 reads/day
            - View All Customers: 200 reads/day
            - View All Purchases: 300 reads/day
        - **Writes**:
            - Add New Books: 50 writes/day
            - Update Book Details: 100 writes/day
            - Remove Books: 20 writes/day

#### Submission:
- Provide a detailed list of entities and their attributes.
- Estimate the number of each entity in the system.
- Identify and list read and write operations for customers and admins.
- Map these operations to specific application flows.
- Quantify the read and write operations for each flow.

This exercise will help you understand the workload and data access patterns in a MongoDB-based application, enabling you to optimize your data model and queries for better performance.

### Exercise 3: Identifying and Modeling Relationships in MongoDB

#### Case Study: Online Bookstore (Continued)

In this exercise, you will identify one-to-one, one-to-many, and many-to-many relationships between entities in an online bookstore. You will analyze these entities to determine whether to embed or reference them using common guidelines. Finally, you will model these relationships using both embedded and referenced approaches.

#### Objectives:
1. Identify one-to-one, one-to-many, and many-to-many relationships between entities.
2. Analyze entities to determine whether to embed or reference using common guidelines.
3. Model embedded and referenced one-to-one, one-to-many, and many-to-many relationships.

#### Step-by-Step Instructions:

1. **Identify Relationships:**
    - **One-to-One**:
        - Each customer has one profile.
    - **One-to-Many**:
        - Each book can have multiple reviews.
        - Each customer can have multiple purchases.
    - **Many-to-Many**:
        - Each book can have multiple authors, and each author can write multiple books.

2. **Analyze Entities to Determine Embed or Reference:**
    - **One-to-One**:
        - **Customer and Profile**: Embed the profile within the customer document if the profile data is small and frequently accessed together with the customer data.
    - **One-to-Many**:
        - **Book and Reviews**: Embed reviews within the book document if the number of reviews is relatively small and they are frequently accessed together with the book data.
        - **Customer and Purchases**: Reference purchases from the customer document if the number of purchases is large or if purchases are frequently accessed independently.
    - **Many-to-Many**:
        - **Books and Authors**: Use references to model the relationship since both books and authors are likely to be accessed independently and the relationship is many-to-many.

3. **Model Relationships:**

    - **One-to-One (Embedded)**:
        - **Customer and Profile**:
            ```json
            {
                "_id": ObjectId,
                "name": "Jane Smith",
                "email": "jane.smith@example.com",
                "profile": {
                    "address": "123 Main St",
                    "phone": "555-1234"
                }
            }
            ```

    - **One-to-One (Referenced)**:
        - **Customer and Profile**:
            ```json
            // Customer Document
            {
                "_id": ObjectId,
                "name": "Jane Smith",
                "email": "jane.smith@example.com",
                "profile_id": ObjectId("profileId")
            }

            // Profile Document
            {
                "_id": ObjectId("profileId"),
                "address": "123 Main St",
                "phone": "555-1234"
            }
            ```

    - **One-to-Many (Embedded)**:
        - **Book and Reviews**:
            ```json
            {
                "_id": ObjectId,
                "title": "MongoDB Basics",
                "ISBN": "1234567890",
                "reviews": [
                    {
                        "rating": 5,
                        "comment": "Great book on MongoDB!",
                        "customer_id": ObjectId("customerId")
                    },
                    {
                        "rating": 4,
                        "comment": "Very informative.",
                        "customer_id": ObjectId("anotherCustomerId")
                    }
                ]
            }
            ```

    - **One-to-Many (Referenced)**:
        - **Customer and Purchases**:
            ```json
            // Customer Document
            {
                "_id": ObjectId,
                "name": "Jane Smith",
                "email": "jane.smith@example.com",
                "purchases": [ObjectId("purchaseId1"), ObjectId("purchaseId2")]
            }

            // Purchase Document
            {
                "_id": ObjectId("purchaseId1"),
                "date": ISODate("2021-06-01T00:00:00Z"),
                "customer_id": ObjectId("customerId"),
                "books": [ObjectId("bookId1"), ObjectId("bookId2")]
            }
            ```

    - **Many-to-Many (Referenced)**:
        - **Books and Authors**:
            ```json
            // Book Document
            {
                "_id": ObjectId,
                "title": "MongoDB Basics",
                "ISBN": "1234567890",
                "authors": [ObjectId("authorId1"), ObjectId("authorId2")]
            }

            // Author Document
            {
                "_id": ObjectId("authorId1"),
                "name": "John Doe",
                "books": [ObjectId("bookId1"), ObjectId("bookId2")]
            }
            ```

#### Submission:
- Identify and list one-to-one, one-to-many, and many-to-many relationships between entities.
- Analyze each relationship to determine whether to embed or reference using common guidelines.
- Provide models for embedded and referenced one-to-one, one-to-many, and many-to-many relationships.

This exercise will help you understand how to model different types of relationships in MongoDB and make informed decisions about embedding or referencing data based on access patterns and data size.

### Exercise 4: Schema Design Patterns in MongoDB

#### Case Study: Online Bookstore (Continued)

In this exercise, you will explore advanced data modeling patterns in MongoDB, including the Inheritance Pattern, the Computed Pattern, the Approximation Pattern, the Extended Reference Pattern, and the Schema Versioning Pattern. You will apply these patterns to the online bookstore case study.

#### Objectives:
1. Understand and apply the Inheritance Pattern.
2. Understand and apply the Computed Pattern.
3. Understand and apply the Approximation Pattern.
4. Understand and apply the Extended Reference Pattern.
5. Understand and apply the Schema Versioning Pattern.

#### Step-by-Step Instructions:

1. **The Inheritance Pattern:**
    - **Scenario**: The bookstore sells different types of books, such as eBooks and printed books. Each type of book has some common attributes (e.g., title, ISBN) and some specific attributes (e.g., file format for eBooks, weight for printed books).
    - **Task**: Model the inheritance pattern for books.

    ```json
    // Common Book Document
    {
        "_id": ObjectId,
        "title": "MongoDB Basics",
        "ISBN": "1234567890",
        "type": "eBook" // or "Printed"
    }

    // eBook Document
    {
        "_id": ObjectId,
        "book_id": ObjectId("commonBookId"),
        "file_format": "PDF"
    }

    // Printed Book Document
    {
        "_id": ObjectId,
        "book_id": ObjectId("commonBookId"),
        "weight": "500g"
    }
    ```

2. **The Computed Pattern:**
    - **Scenario**: The bookstore wants to store the average rating of each book to avoid recalculating it every time a book is queried.
    - **Task**: Model the computed pattern for storing the average rating of books.

    ```json
    {
        "_id": ObjectId,
        "title": "MongoDB Basics",
        "ISBN": "1234567890",
        "average_rating": 4.5 // Computed field
    }
    ```

3. **The Approximation Pattern:**
    - **Scenario**: The bookstore wants to store an approximate count of the number of times each book has been viewed.
    - **Task**: Model the approximation pattern for storing the view count of books.

    ```json
    {
        "_id": ObjectId,
        "title": "MongoDB Basics",
        "ISBN": "1234567890",
        "view_count": 1000 // Approximate count
    }
    ```

4. **The Extended Reference Pattern:**
    - **Scenario**: The bookstore wants to store additional information about authors directly within the book document to avoid additional queries.
    - **Task**: Model the extended reference pattern for storing author information within the book document.

    ```json
    {
        "_id": ObjectId,
        "title": "MongoDB Basics",
        "ISBN": "1234567890",
        "authors": [
            {
                "_id": ObjectId("authorId1"),
                "name": "John Doe",
                "bio": "Author bio here"
            },
            {
                "_id": ObjectId("authorId2"),
                "name": "Jane Smith",
                "bio": "Author bio here"
            }
        ]
    }
    ```

5. **The Schema Versioning Pattern:**
    - **Scenario**: The bookstore's schema for books has evolved over time, and you need to handle multiple versions of the schema.
    - **Task**: Model the schema versioning pattern for handling different versions of the book schema.

    ```json
    // Version 1
    {
        "_id": ObjectId,
        "title": "MongoDB Basics",
        "ISBN": "1234567890",
        "version": 1
    }

    // Version 2
    {
        "_id": ObjectId,
        "title": "MongoDB Basics",
        "ISBN": "1234567890",
        "publication_date": ISODate("2021-01-01T00:00:00Z"),
        "version": 2
    }
    ```

#### Submission:
- Apply the Inheritance Pattern to model different types of books.
- Apply the Computed Pattern to store the average rating of books.
- Apply the Approximation Pattern to store the view count of books.
- Apply the Extended Reference Pattern to store author information within the book document.
- Apply the Schema Versioning Pattern to handle different versions of the book schema.

This exercise will help you understand and apply advanced data modeling patterns in MongoDB, enabling you to design more efficient and scalable data models.

## Submission instructions

{rubric: mechanics = 5}

- Make sure the notebook can run from top to bottom without any error. Restart the kernel and run all cells.
- Commit and push your notebook to the github repo
- Double check your notebook is rendered properly on Github and you can see all the outputs clearly
- Submit a URL to the github repo that contain this worksheet to Moodle