# NoSQL - Document-oriented

> A document-oriented data store maintains information within documents like CML, YAML or JSON rather than storing data as rows and columns. To organize these documents in one unit, there is a specific key assigned to each document. They are currently one of the most popular types of NoSQL used by global companies.

A _document_ is a record in a document data store. A document typically stores all the information about _one object_ and any of its related metadata. Documents store data in key-value pairs. The values can be a variety of types and structures, including strings, numbers, dates, arrays, or objects. Documents can be stored in formats like JSON, BSON, and XML. We can consider this type of NoSQL as a more complex version of the key-value data store.

To help visualise what this looks like, below is a JSON document that stores information about a user named _Tom_:

<p align="center">
  <img src="images/document.png" width=600>
  <figcaption align="center"><cite>JSON Document</cite></figcaption>
</p>


Even though document stores do not have a unified schema, they are usually organized in a way to enable easy access and analysis of the data. This means they can be considered to be semi-structured data. Seeing that each complete object is commonly stored in a single document, there is generally no need for defining relationships between documents. 

These documents are in no way similar to tables of a relational database; they do not have a set number of fields, strict rules on data types, etc. Missing data is simply omitted rather than there being an empty field or NULL values. Data can be added, edited, removed and queried relatively easily.

The _keys_ assigned to each document are unique identifiers required to access data within the data store, usually a path, string or a uniform resource identifier. IDs tend to be automatically incrementing indices (the 3rd row will have an id = 3) rather than UUIDs to speed up data retrieval. 

The content of documents within a document store is usually specified in _metadata_ files corresponding to each document. They allow document data stores "understand" the structure of the corresponding document information -- whether a field contains addresses, phone numbers, or social security numbers and so on. 

## Querying in Document Data Stores vs SQL
For improved efficiency and user experience, many document stores have query languages, which allow querying documents based on the metadata or the actual document content. 

To help us better understand how querying works in a document-oriented data store, let's look at an example of how to retrieve data from a SQL database and the equivalent script from MongoDB, one of the most popular document data stores.

Let's assume we have a table called `inventory`.  To select all records from `inventory`, we would use the following SQL statement:

In [None]:
SELECT * FROM inventory 

In MongoDB (one of the most popular Document stores), the corresponding code to select all _documents_ in a collection would be:

In [None]:
db.inventory.find( {} )

Now, let's assume we want to add a filter to to our query to select only the data which has `name = AiCore`. 

In SQL, we would use the following code:

In [None]:
SELECT * FROM inventory
WHERE name = 'AiCore'

The corresponding code in MongoDB would be:

In [None]:
db.inventory.find( { name: "AiCore" } )

## Strengths of Document-Oriented Data Stores

- __Flexibility__: 
    - Documents of one data store do not require a specific schema or have to be of the same type
    - A flexible schema means that the data model can evolve as the requirements change
<p></p>

- __Easy to update__:
    -  With document stores, you can add new pieces of information easily to specific documents only 
    -  In contrast, in a relational database, new pieces of information might affect other tables as well
<p></p>

- __Improved read and write speed compared to relational databases__:
    -   In NoSQL document stores you can find everything you need within one document. With everything kept in a single location, it is much faster to reach and retrieve the data.
        - One reason for that is the schemaless architecture. As there is no schema, adding or updating data doesn't require any upfront validations (as is the case in a SQL database). This provides a larger count of write operations per second.
        - Another reason is that due to data normalization in SQL databases, many joins might be required to retrieve data. Joins are resource-intensive operations. In document data stores, no joins are required as the related data is generally stored as it is in one big document.
    - This, of course, is a trade-off.
<p></p>

- __Rich API's and query languages__:
    -   Due to the popularity of document-oriented data stores, there is a wide variety of industry-grade API's and querying tools available to use. Other NoSQL stores do not have such tools.

## Limitations of Document-Oriented Data Stores
- __Document size limit__:
    -   The popular document data stores usually have a limit on the size of each document it can store. For example, MongoDB has a limit of 16mb as the maximum size per document. If the size exceeds this limit, we'll need to create an additional document which can be a hassle.
<p></p>

- __Difficulty joining documents__:
    -   Implementing joins in document data stores can be very difficult or even impossible (depending on how the data is structured)
<p></p>

- __High disk storage usage__:
    -   Due to data replication for backups, there is an increase in data redundancy which requires more disk storage and is obviously more costly

## Top Use Cases

- __Content management systems (CMS)__:

Due to their flexible schema, document data stores are ideal for storing and analysing any type of data including images and videos in real-time. This makes them a perfect choice for storing and querying media-content (like images, text, etc.) efficiently, like you might find in an online store such as eBay or Amazon.
<p></p>

- __Mobile apps__:

Due to their ability to support real-time big data, and the ease of scaling out vertically and horizontally, document data stores are an ideal choice for companies that need to collect mobile application data from millions of users. One such company is the Weather Channel, which uses a MongoDB data store to handle millions of requests per minute while also simultaneously processing user data and weather update information obtained from thousands of data sources globally.

## Popular Document Data Stores

- [MongoDB](https://www.mongodb.com/)
- [CouchDB](https://couchdb.apache.org/)

## Key Takeaways

- Document-oriented data stores are currently one of the most popular types of NoSQL used in industry
- A document-oriented data store maintains information within documents like CML, YAML or JSON rather than storing data as rows and columns
- A document typically stores all the information about _one object_ and any of its related metadata
- Document-oriented data stores are flexible, easy to update, fast and have a wide variety of APIs available to connect with other tools
- On the other hand, they have some limitations especially regarding the document size limit and the lack of document JOINing capability (which SQL easily provides)
- MongoDB and CouchDB are currently 2 of the most widely used Document-oriented data stores in industry
