# Introduction to NoSQL: Concepts

## A brief history of database

What's a database?

> _database_ is an organized collection of data -- [wikipedia](https://en.wikipedia.org/wiki/Database)

In this definition, a book is a database, in physical form, and one of the earliest examples of a database. The book is organized in a way that allows you to find information quickly.


![Punch card and 1890 census](https://upload.wikimedia.org/wikipedia/commons/7/7c/1890_Census_Hollerith_Electrical_Counting_Machines_Sci_Amer.jpg)

> The Hollerith machine and the punch card system was one of the first examples of a digital/mechanical database. The machine was used to process the 1890 US Census, which was the first census to be conducted using a mechanical counting system. The punch card system allowed for the storage and retrieval of census data using mechanical counting machines.


The evolution of databases has been driven by the application demand and the storage media. 

```mermaid
timeline
    title Database Innovations Driven by Storage Eras

    section Pre-relational (~1950s-1960s)
        Magnetic Disk : 1968 Hierarchical model (IBM IMS)
               : 1969 Network model (CODASYL).

    section Relational (~1970s-Now)
        Codd's Relational Model : 1974 IBM System R 
        : 1978 Oracle
        : 1980 Ingres
        : 1986 official SQL standard
        : 1989 Postgres
        : 1995 MySQL

    section Next Generation (~Late 2000s-Now)
        Object Programming & Cloud : 2005 MapReduce
        : 2007 Neo4j
        : 2009 MongoDB
```


### pre-relational database

The emergence of the magnetic disk enabled the infrastructure of a database:
- You could now seek directly to a particular sector or block.
- This made random access feasible and efficient.
- It allowed databases to scale in complexity without sacrificing access speed.

Two dominant models emerged:
- Hierarchical model (IBM IMS)
- Network model (CODASYL)

Both models are "navigational" in nature, that is, you must navigate from one object to another using **explicitly defined** pointers.

```mermaid
---
title : Hierarchical model & Network model
---
graph TD
  subgraph Hierarchical model
    PROJ1[Project: RNA Folding] --> LEAD1[Lead: Alice]
    LEAD1 --> DEPT1[Department: CAB]
    LEAD1 --> EMAIL1[Email: alice@stjude.org]
    PROJ1 --> STOR1[Storage: /research/groups/alicegrp/projects/]
    STOR1 --> SIZE1[Size: 5.2 TB]
  end

  subgraph Network model
    PROJ2[Project: RNA Folding] --> LEAD2[Lead: Alice]
    STOR2[Storage: /research/groups/alicegrp/projects/] --> LEAD2

    LEAD2 --> DEPT2[Department: CAB]
    LEAD2 --> EMAIL2[Email: alice@stjude.org]
    STOR2 --> SIZE2[Size: 5.2 TB]
  end
```

The data schema was fixed in the pre-relational era.

### Relational Era
Edgar Codd published his relational database theory in paper "A Relational Model of Data for Large Shared Data Banks

| Concept | Description |
| ------- | ----------- |
| Table (Relation) | A set of rows and columns, like a spreadsheet. Each table represents an entity (e.g., User, Project). |
| Row (Tuple) | A single record in the table — a complete set of values for one entity instance. |
| Column (Attribute) | A field or property of the entity (e.g., name, email, size_tb). |
| Primary Key (PK) | A unique identifier for each row in a table. |
| Foreign Key (FK) | A reference to the primary key of another table — to link related data. |
| Normalization | A design process to eliminate redundancy and ensure data integrity. |

```mermaid
classDiagram
  class Project {
    +int project_id
    +string project_name
    +int lead_id
    +int storage_id
  }

  class Lead {
    +int lead_id
    +string name
    +string email
    +int department_id
  }

  class Department {
    +int department_id
    +string name
  }

  class Storage {
    +int storage_id
    +string path
    +float size_tb
  }

  Project --> Lead : belongs to
  Project --> Storage : uses
  Lead --> Department : works in

```

the relational database are normalized


```mermaid
timeline
    title Storage Media Driving Database Innovation

    section Punch Cards (1890s–1950s)
        1890 : Punch cards used in U.S. Census – early data storage

    section Magnetic Tape (1950s–1960s)
        1951 : IBM 726 magnetic tape – first automated storage
        1960 : IBM IMS – Hierarchical DBMS designed for tape-based sequential access

    section Hard Disk Drives (1956–1980s)
        1956 : IBM RAMAC – first HDD, 5MB storage
        1969 : CODASYL network model – more complex data relationships
        1970 : Codd’s relational model – abstracted storage access

    section Floppy & Optical Storage (1980s–1990s)
        1982 : 3.5" floppy – affordable personal computing
        1985 : CD-ROM – distribution of large DB software
        1979 : Oracle v2 – commercial RDBMS
        1986 : SQL becomes ANSI standard
        1995 : Object-relational databases emerge

    section Flash Memory & USB (2000s)
        2006 : Google Bigtable – scalable, distributed DB for unstructured data
        2009 : MongoDB released – flexible schema, JSON-like docs

    section Cloud Storage (2010s)
        2012 : Amazon DynamoDB – NoSQL DB with auto-scaling
        2014 : Snowflake – cloud-native, separation of storage & compute

    section SSDs & Persistent Memory (2010s–2020s)
        2010 : SSDs replace HDDs in performance-critical DB workloads
        2015 : Redis, MemSQL (now SingleStore) – rise of in-memory DBs
        2020 : NVMe, Intel Optane – persistent memory enables hybrid DB systems

```



```mermaid
timeline
    title Database Innovations Timeline

    section Pre-Relational Era
        1960 : IBM's IMS introduces the hierarchical database model
        1969 : CODASYL develops the network database model

    section Relational Era
        1970 : Edgar F. Codd proposes the relational model
        1974 : IBM begins development of System R
        1979 : Oracle releases the first commercial RDBMS
        1980s : SQL becomes the standard query language
        1990s : Emergence of object-relational databases and parallel processing

    section NoSQL Era
        2006 : Google introduces Bigtable
        2007 : Neo4j is released
        2008 : Apache HBase is released
        2009 : MongoDB is released
        2012 : Amazon launches DynamoDB
        2011 : Term "NewSQL" is coined to describe scalable relational databases
```



## Reference
- Next Generation Databases: NoSQL, NewSQL, and Big Data. Guy Harrison, 2015