# Northwind Database Overview

Let's dive into the Northwind database—a classic, professionally curated dataset that simulates a small business's operations. We'll begin with **SQLite Browser**, a graphical user interface tool that allows you to interact with SQLite databases outside of Jupyter Notebooks.

* * * 

<div class="alert alert-success">  

## Learning Objectives

- Navigate and understand the structure of a relational database.
- Become familiar with relational database concepts.
</div>

### Sections
1. [Exploring the Database Schema](#ex1)
2. [Diagram of Northwind Database Schema](#ex2)
3. [Observations About the Schema](#ex3)
4. [Primary Key (PK) vs. Foreign Key (FK)](#ex3)
5. [Additional Resources](#resources)

# About the SQLite version of the Northwind Database

This workshop uses [the SQLite3 version of Microsoft's Northwind Database](https://github.com/jpwhite3/northwind-SQLite3/):

> This is a version of the Microsoft Access 2000 Northwind sample database, re-engineered for SQLite3.
> 
> The Northwind sample database was provided with Microsoft Access as a tutorial schema for managing small business customers, orders, inventory, purchasing, suppliers, shipping, and employees. Northwind is an excellent tutorial schema for a small-business ERP, with customers, orders, inventory, purchasing, suppliers, shipping, employees, and single-entry accounting.
>
> All the TABLES and VIEWS from the MSSQL-2000 version have been converted to Sqlite3 

<a id='ex1'></a>
## Exploring the Database Schema

**Objective:** Familiarize yourself with the structure of the Northwind database.

**Tasks:**
  - Click on the **"Database Structure"** tab in SQLite Browser.
  - Browse through the list of tables (e.g., `Customers`, `Orders`, `Products`, `Employees`, `Suppliers`).
  - For each table, view the schema to understand the columns and data types.

#### 🔔 Questions
  - How many tables are in the database?
  - What are the primary keys of the `Customers` and `Orders` tables?
  - Identify foreign key relationships between tables.



<a id='ex2'></a>
## Diagram of Northwind Database Schema

```mermaid
erDiagram
    CustomerCustomerDemo }o--|| CustomerDemographics : have
    CustomerCustomerDemo }o--|| Customers : through
    Employees ||--|| Employees : "reports to"
    Employees ||--o{ EmployeeTerritories : through
    Orders }o--|| Shippers : "ships via"
    "Order Details" }o--|| Orders : have
    "Order Details" }o--|| Products : contain
    Products }o--|| Categories : in
    Products }o--|| Suppliers : "supplied by"
    Territories ||--|| Regions : in
    EmployeeTerritories }o--|| Territories : have
    Orders }o--|| Customers : place
    Orders }o--|| Employees : "sold by"


    Categories {
        int CategoryID PK
        string CategoryName
        string Description
        blob Picture
    }
    CustomerCustomerDemo {
        string CustomerID PK, FK
        string CustomerTypeID PK, FK
    }
    CustomerDemographics {
        string CustomerTypeID PK
        string CustomerDesc
    }
    Customers {
        string CustomerID PK
        string CompanyName
        string ContactName
        string ContactTitle
        string Address
        string City
        string Region
        string PostalCode
        string Country
        string Phone
        string Fax
    }
    Employees {
        int EmployeeID PK
        string LastName
        string FirstName
        string Title
        string TitleOfCourtesy
        date BirthDate
        date HireDate
        string Address
        string City
        string Region
        string PostalCode
        string Country
        string HomePhone
        string Extension
        blob Photo
        string Notes
        int ReportsTo FK
        string PhotoPath
    }
    EmployeeTerritories {
        int EmployeeID PK, FK
        int TerritoryID PK, FK
    }
    "Order Details" {
        int OrderID PK, FK
        int ProductID PK, FK
        float UnitPrice
        int Quantity
        real Discount
    }
    Orders {
        int OrderID PK
        string CustomerID FK
        int EmployeeID FK
        datetime OrderDate
        datetime RequiredDate
        datetime ShippedDate
        int ShipVia FK
        numeric Freight
        string ShipName
        string ShipAddress
        string ShipCity
        string ShipRegion
        string ShipPostalCode
        string ShipCountry
    }
    Products {
        int ProductID PK
        string ProductName
        int SupplierID FK
        int CategoryID FK
        int QuantityPerUnit
        float UnitPrice
        int UnitsInStock
        int UnitsOnOrder
        int ReorderLevel
        string Discontinued
    }
    Regions {
        int RegionID PK
        string RegionDescription
    }
    Shippers {
        int ShipperID PK
        string CompanyName
        string Phone
    }
    Suppliers {
        int SupplierID PK
        string CompanyName
        string ContactName
        string ContactTitle
        string Address
        string City
        string Region
        string PostalCode
        string Country
        string Phone
        string Fax
        string HomePage
    }
    Territories {
        string TerritoryID PK
        string TerritoryDescription
        int RegionID FK
    }

```

#### Sidebar: Mermaid Diagrams
The diagram above was created with [Mermaid](https://github.com/mermaid-js/mermaid#mermaid) which is an open source diagram generator that uses a markdown-like syntax like this:

```
erDiagram
    CustomerCustomerDemo }o--|| CustomerDemographics : have
    CustomerCustomerDemo }o--|| Customers : through

    Categories {
        int CategoryID PK
        string CategoryName
        string Description
        blob Picture
    }
    CustomerCustomerDemo {
        string CustomerID PK, FK
        string CustomerTypeID PK, FK
    }
```

It is now also included in [JupyterLab 4.1 and Notebook 7.1](https://blog.jupyter.org/jupyterlab-4-1-and-notebook-7-1-are-here-20bfc3c102170)


---
<a id='ex3'></a>
### Observations About the Schema 

1. **Many-to-Many Relationships Through Linking Tables:**  
   Some entities are not directly related in a simple one-to-many manner. Instead, they use intermediate tables (junction tables) to manage many-to-many relationships. For example:
   - **EmployeeTerritories** links Employees and Territories. An employee can be responsible for multiple territories, and a territory can be assigned to multiple employees.
   - **CustomerCustomerDemo** links Customers and CustomerDemographics, allowing one customer to match multiple demographic segments and vice versa.

2. **Multiple Interconnected Entities Reflecting a Business Domain:**  
   The schema models a fairly realistic business scenario similar to what might be found in a supply chain or order management system:
   - **Customers** place **Orders**.
   - **Orders** are fulfilled by **Employees** (“sold by”) and **Shippers** (“ships via”).
   - **Orders** contain details referencing **Products**, which belong to certain **Categories** and are supplied by **Suppliers**.
   This structure shows how different aspects of a business—customers, suppliers, products, orders, shipping—interconnect in a relational database.

3. **Hierarchies and Self-Joins:**  
   The **Employees** table includes a `ReportsTo` column, which is a foreign key referencing another EmployeeID. This creates a self-referential relationship that can represent an organizational hierarchy within the same table.

4. **Use of Codes as IDs (Surrogate Keys):**  
   Many tables use integer primary keys (like `CategoryID`, `EmployeeID`, `ProductID`) and sometimes strings (like `CustomerID`) instead of meaningful business descriptors. This is a common practice to keep relationships simple and efficient. We use surrogate keys (like numeric IDs) rather than meaningful business descriptors as primary keys because business data can change over time, isn’t guaranteed unique, and is often longer and more complex. Surrogate keys are stable, consistently unique, and efficient for indexing and performance. This approach simplifies maintenance, reduces the need for cascading updates, and keeps the database structure independent from changing business logic.

5. **Data Integrity and Constraints:**  
   The presence of foreign keys (FK) enforces referential integrity. For instance, you cannot have an Order referencing a non-existent Customer, or a Product referencing a Category that doesn’t exist. This ensures the data remains logically consistent. Data integrity and referential constraints ensure all references between tables are valid, preventing records that point to nonexistent entities. This consistency reduces the appearance of null or invalid values, making the data more coherent and reliable for analysis. As a result, analysts spend less time cleaning and fixing broken relationships, and can focus more on extracting meaningful insights.

---
<a id='ex4'></a>
### Primary Key (PK) vs. Foreign Key (FK):

Primary keys and foreign keys are important building blocks of relational databases. Primary keys give each record a stable and unique identity, while foreign keys define how entities are connected, supporting the relational structure that allows complex queries and reliable data integrity.

- **Primary Key (PK):**  
  A primary key is a column (or set of columns) that uniquely identifies each record in a table. Every table should have a primary key to differentiate one row from another. For example:
  - `ProductID` is the PK in the **Products** table.
  - `CustomerID` is the PK in the **Customers** table.
  No two products or customers can share the same primary key value, ensuring uniqueness.

- **Foreign Key (FK):**  
  A foreign key is a column (or set of columns) in one table that refers to a primary key in another table. It establishes a relationship between the two tables. For example:
  - In the **Products** table, `CategoryID` is a FK referencing `Categories.CategoryID`.
  - In the **Orders** table, `CustomerID` is a FK referencing `Customers.CustomerID`.

**Difference Between PK and FK:**

- The primary key identifies a record within its own table. It is unique within that table.
- A foreign key references a primary key in another table. It does not need to be unique in its own table, but it must match an existing primary key value in the related table (unless it’s allowed to be NULL for optional relationships).

**Purpose of PK and FK in a Database System:**

- **Primary Keys:**  
  PKs ensure each row in a table can be uniquely retrieved. This is fundamental for indexing, efficient look-ups, and maintaining data integrity. Without a PK, it would be difficult to update, delete, or relate specific rows without ambiguity.

- **Foreign Keys:**  
  FKs maintain referential integrity by ensuring that relationships between tables are consistent. If a table references an entity in another table, the FK ensures that the entity actually exists. For example, you cannot place an order for a non-existent product if the `ProductID` in the order details is a foreign key referencing the products table. This maintains logical coherence and prevents “orphan” records that reference non-existent parents.

<a id='resources'></a>
## Additional Resources

- **SQLite Documentation:** [https://www.sqlite.org/docs.html](https://www.sqlite.org/docs.html)
- **SQLite Command Line Reference:** [https://www.sqlite.org/cli.html](https://www.sqlite.org/cli.html)
- **SQL Tutorial Cheatsheet:** [https://www.sqltutorial.org/sql-cheat-sheet/](https://www.sqltutorial.org/sql-cheat-sheet/)
- **SQLite Browser Wiki:** [SQLite Browser GitHub Wiki](https://github.com/sqlitebrowser/sqlitebrowser/wiki)
- **Northwind SQLite Repo:** [Github Repo](https://github.com/jpwhite3/northwind-SQLite3/)
- **Northwind Database Schema:** [Schema Diagram](https://github.com/jpwhite3/northwind-SQLite3/blob/main/docs/Northwind_ERD.png)