# NoSQL - Overview

## What is NoSQL?

NoSQL is shorthand for: Not Only SQL. 

> They are distributed data stores used to store, manage and interact with semi-structured and unstructured data (but can also manage structured data).

NoSQL stores are distributed, meaning that they run on not just one machine but many. Doing this means that the storage limits increase with the number of machines on the system, so they can scale hugely. That means that the whole dataset might be spread across 100 computers. Often, these data stores also have replicas of the data, in case of machines failing and data being lost on one of those replicas.

Traditionally, organizations have used relational databases to capture their data and query it.  

However, companies started seeing limitations of traditional relational databases:

-   Emergence of new data types such as videos and images due to the rapid adoption of the Internet, social media and IoT devices (variety)
-   Increasing rate of data generation (velocity)
-   Growing size of data files (for instance, video files can be much larger than CSVs) (volume)
-   Relational databases have no flexibility. If you want to add a column, you would need to consider how this relates to other data, and populate it's values for all other rows. This can make it difficult to maintain the database schema as it changes.

Accordingly, a new group of technologies, termed NoSQL, started to emerge to address these limitations. The first time the term NoSQL was introduced was in 1998 by Carlo Strozzi, who gave the name to his first open-source database that was created at that time.

__History of NoSQL__:
- 1998 - Carlo Strozzi used the term NoSQL for his lightweight, open-source database
- 2000 - Graph database Neo4j launched
- 2004 - Google BigTable launched
- 2005 - CouchDB launched
- 2007 - The research paper on Amazon Dynamo is released
- 2008 - Facebooks open-sources the Cassandra project
- 2009 - The term NoSQL was reintroduced and tools are rapidly being adopted



## Charactaristics of NoSQL data stores:
-   __Non-relational__:
    -   They are mainly designed for semi-structured and unstructured data
<p></p>
    
-   __Schema free__:
    -   Don't require the data to have a pre-determined model or schema (which is not the case in SQL databases)
    -   Can store different types of data in the same place. E.g. JSON and images and CSV
    -   There exists a number of different types of NoSQL data stores to handle the numerous data types (more on that in the next section)
<p></p>

-   __Replication of data stores to avoid single point of failure__:
    -   Stored data is spread out on many computers, known as nodes. This group of interconnected nodes make up a _cluster_.
    -   This avoids having a centralized architecture that could lead to a single point of failure if the server fails
<p></p>

-   __Easy to scale__:
    -   Are able to scale across (adding more columns) easily as a result of the schemaless architecture. This is a key difference between NoSQL and SQL.
    -   Are able to scale up and down (adding or removing data) by adding or removing nodes (computers) to the distributed cluster (group of computers). This means they can easily handle billions of data points.


## SQL vs NoSQL

There are some key differences between SQL and NoSQL tools.  The main ones are highlighted in the table below:

<p align="center">
  <img src="images/sql-vs-nosql2.png" width=600>
</p>

## Types of NoSQL

> The rapidly growing range of use-cases mean that tools need to be able to handle different scenarios. Within NoSQL, technologies can be divided into 4 main groups according to particular scenarios

__Types of NoSQL data stores__:
-   Document-oriented
-   Graph-oriented
-   Key-value pair 
-   Columnar-oriented 

A high-level diagram of how each one of them can be visualized is below:

<p align="center">
  <img src="images/nosql-types2.png" width=600>
</p>

### A comparison of the different types of NoSQL

Before diving into the details of each of the 4 types of NoSQL, let's look at the high-level charactaristics of each one:

<p align="center">
  <img src="images/comparing-nosql.png" width=600>
</p>

### Document-oriented

> A document-oriented data store maintains information within documents like CML, YAML or JSON rather than storing data as rows and columns. To organize these documents in one unit, there is a specific key assigned to each document. They are currently one of the most popular types of NoSQL used by global companies.

A _document_ is a record in a document data store. A document typically stores all the information about _one object_ and any of its related metadata. Documents store data in key-value pairs. The values can be a variety of types and structures, including strings, numbers, dates, arrays, or objects. Documents can be stored in formats like JSON, BSON, and XML. We can consider this type of NoSQL as a more complex version of the key-value data store.

To help visualise what this looks like, below is a JSON document that stores information about a user named Tom:

<p align="center">
  <img src="images/document.png" width=600>
</p>


Even though document stores do not have a unified schema, they are usually organized in a way to enable easy access and analaysis of the data. This means they can be considered to be semi-structured data. Seeing that each complete object is commonly stored in a single document, there is generally no need for defining relationships between documents. 

These documents are in no way similar to tables of a relational database; they do not have a set number of fields, strict rules on data types, etc. Missing data is simply omitted rather than there being an empty field or NULL values. Data can be added, edited, removed and queried relatively easily.

The _keys_ assigned to each document are unique identifiers required to access data within the data store, usually a path, string or a uniform resource identifier. IDs tend to be automatically incrementing indices (the 3rd row will have an id = 3) rather than UUIDs to speed up data retrieval. 

The content of documents within a document store is usually specified in _metadata_ files corresponding to each document. They allow document data stores "understand" the structure of the corresponding document information -- whether a field contains addresses, phone numbers, or social security numbers and so on. 

#### Querying in document data stores vs SQL
For improved efficiency and user experience, many document stores have query languages, which allow querying documents based on the metadata or the actual document content. 

To help us better understand how querying works in a document-oriented data store, let's look at an example of how to retrieve data from a SQL database and the equivalent script from MongoDB, one of the most popular document data stores.

Let's assume we have a table called `inventory`.  To select all records from `inventory`, we would use the following SQL statement:

In [None]:
SELECT * FROM inventory 

In MongoDB, the corresponding code to select all _documents_ in a collection would be:

In [None]:
db.inventory.find( {} )

Now, let's assume we want to add a filter to to our query to select only the data which has `name = AiCore`. 

In SQL, we would use the following code:

In [None]:
SELECT * FROM inventory
WHERE name = 'AiCore'

The corresponding code in MongoDB would be:

In [None]:
db.inventory.find( { name: "AiCore" } )


#### Strengths of document-oriented data stores

- __Flexibility__: 
    - Documents of one data store do not require a specific schema or have to be of the same type
    - A flexible schema means that the data model can evolve as the requirements change
<p></p>

- __Easy to update__:
    -  With document stores, you can add new pieces of information easily to specific documents only 
    -  In contrast, in a relational database, new pieces of information might affect other tables as well
<p></p>

- __Improved read and write speed compared to relational databases__:
    -   In NoSQL document stores you can find everything you need within one document. With everything kept in a single location, it is much faster to reach and retrieve the data.
        - One reason for that is the schemaless architecture. As there is no schema, adding or updating data doesn't require any upfront validations (as is the case in a SQL database). This provides a larger count of write operations per second.
        - Another reason is that due to data normalization in SQL databases, many joins might be required to retrieve data. Joins are resource-intensive operations. In document data stores, no joins are required as the related data is generally stored as it is in one big document.
    - This, of course, is a trade-off.
<p></p>

- __Rich API's and query lanaguges__:
    -   Due to the popularity of document-oriented data stores, there is a wide variety of industry-grade API's and querying tools available to use. Other NoSQL stores do not have such tools.

#### Limitations of document-oriented data stores
- __Document size limit__:
    -   The popular document data stores usually have a limit on the size of each document it can store. For example, MongoDB has a limit of 16mb as the maximum size per document. If the size exceeds this limit, we'll need to create an additional document which can be a hassle.
<p></p>

- __Difficulty joining documents__:
    -   Implementing joins in document data stores can be very difficult or even impossible (depending on how the data is structured)
<p></p>

- __High disk storage usage__:
    -   Due to data replication for backups, there is an increase in data redundancy which requires more disk storage and is obviously more costly

#### Top use cases

- __Content management systems (CMS)__:

Due to their flexible schema, document data stores are ideal for storing and analysing any type of data including images and videos in real-time. This makes them a perfect choice for storing and querying media-content (like images, text, etc) efficiently, like you might find in an online store such as eBay or Amazon.
<p></p>

- __Mobile apps__:

Due to their ability to support real-time big data, and the ease of scaling out vertically and horizontally, document data stores are an ideal choice for companies that need to collect mobile application data from millions of users. One such company is the Weather Channel, which uses a MongoDB data store to handle millions of requests per minute while also simultaneously processing user data and weather update information obtained from thousands of data sources globally.

#### Popular document data stores

- [MongoDB](https://www.mongodb.com/)
- [CouchDB](https://couchdb.apache.org/)

### Graph-oriented

> The graph data store is one that stores data in a graph format

<p align="center">
  <img src="images/nosql-graph-example.png" width = 800>
</p> 

Graphs consist of nodes and edges. A graph data store uses those to respresent data as such:

- __Node__:
  - Stores the data entities
  - This entity stores the actual data itself, such as the number of people who read a certain tweet, or the number of people who watched a Youtube video
  - Node data can is usually structured as key-value pairs, and is usually an atomic value. It's also possible to import CSV and JSON files as input data.
  
- __Edge__:
  - Stores the relationship between the various nodes
  - For example, an attribute of a tweet such as the number of retweets would have a direct relationship connecting it to the text of the tweet
  - Can also contain the direction showing how the data will flow between the nodes

  For example, below is JSON that represents the graph above.

In [None]:
{
    "Training": [
       {
          "termName": "NoSQLModule",
          "link": "/terms/Training/NoSQLModule",
          "info": "This module contains 2 notebooks on NoSQL",
          "relatedTerms": [
              {
                  "name": "Fundamentals",
                  "link": "/terms/go_to_term/2"
              },
              {
                  "name": "Hbase",
                  "link": "/terms/go_to_term/3"
              }
          ]
          },
      ],
      "category": "Training"
}

This advanced model allows for storing highly-connected data and for complex querying of that data. 

Despite the fact that graph data stores can represent even the most complex interconnected data structures,
they are not widely used compared to other data stores. This is because many use cases can be handled with a simpler data storage tool that makes writing queries easier (everyone is used to SQL). For example, if you have a set of data records which simply map user id to user name, then a traditional relational database will suffice (and be highly performant), there is no need for a complex graph data store.

#### What does manipulating data in a graph data store do?
- New relationships between existing data are added by creating new edges between existing nodes
  - An edge always has a _start node_, _end node_, _type_, and _direction_
  - There is no limit to the number and kind of relationships a node can have
- New data is inserted by adding a new node.
  - Instead of creating tables or columns for each new data type, we can add a new node with a specific relationship to others

#### What does querying a graph data store look like?
A graph in a graph data store can be traversed along specific edge types or across the entire graph. 
Traversing the relationships is very fast because the relationships between nodes are not calculated at query times, but are persisted in the data store itself. 

To get a better understanding of how queries operate in a graph data store, let's look at an example. Assume we have a CSV file with data about actors. In SQL, we can load that data into a table called `actors`. To query the table for movies that `Tom Hanks` starred in, we'd use the following SQL query:

In [None]:
SELECT * FROM actors
WHERE actor_name = 'Tom Hanks'

The equivalent query in a graph data store (using the Cypher programming language) would be as follows:

In [None]:
MATCH (p {actor_name: 'Tom Hanks'})
RETURN p

The above Cypher query will return the node with the actor name `Tom Hanks`.  The expected result would look something like the below:

<p align="center">
  <img src="images/nosql-graph-output.png">
</p> 



Graph-oriented data stores are ideal for mapping social media type of relationships and hence this is their most popular use case in industry.

#### Strengths of graph data stores

- __High performance vs SQL databases for graph data__:
  - Very fast in creating relationships between data and querying them
  - One recent experiment found that Neo4j (one of the most popular types of graph data stores) was 60% faster than a MySQL database when running a friends of friends query.  Here is the [link to the experiment results](https://neo4j.com/news/how-much-faster-is-a-graph-database-really/#:~:text=For%20the%20simple%20friends%20of,on%20the%20depth%205%20query.)

<p></p>

- __Query and manipulate any part of the data__:
  - Graph data stores allow you to select and edit any data stored in any node with a query language
  - Key-value stores, for example, do not allow you to query attributes of records

<p></p>

- __Can represent even the most complex relationships between data__
  - Any node can connect to any other

<p></p>

- __Flexibility__:
  - New nodes can easily be added at any time - there's no need for updating a schema
  - Can support more complex data models when compared to key-value data stores. For example, in key-value stores the values cannot link to any other parts of records, whereas with graph data stores, any node can link to any other node.

<p></p>

- __ACID guarantees__:
  - Some types of graph data stores can provide ACID (Atomicity, Consistency, Isolation, Durability) properties similar to a RDBMS, which helps maintain data integrity
  - For example, the newer versions of MongoDB and Neo4j provide ACID guarantees

#### Limitations of graph data stores

- __May be overly complicated for your use case__
  - Most data manipulation and analysis can be done easily without needing to represent it in a graph
  - Because of this, other types of data stores are more widely used, meaning less demand for more graph data stores, resulting in a limited number of them on the market

<p></p>

- __Slow for common queries__:
  - Queries that span the entire dataset (scans) are slow for graph data stores compared to other data stores
    - For example, calculating the average transaction value for each user would require you to get the value node from each transaction node from each user node. Doing the same thing with a relational database would be as simple as joining the user and transaction tables, and running a sum aggregated by user - no need to traverse a graph.

<p></p>

- __No unified query language__:
  - There isn't yet a universal query language, and there may be a need to learn tool-specific langauges to interact with the data

#### Top use cases 

- __Social media networks__:
  - Social media networks are naturally thought of as interconnected nodes representing people, so this is type of data is a natural fit for a graph data store. Instead of having to convert this type of data into a table structure for analysis, a tool like Neo4j can be used.
<p></p>

- __Recommendation engines__:
  - Real-time recommendation enginges are key to the success of many online businesses. One type of recommendation engine, called collaborative filtering, works by recommending similar people similar things (w.g. products, movies, music). Graphs make it easy to see who is similar to one another by looking for nodes with similar connections.

#### Popular graph data stores

- [Neo4J](https://neo4j.com/)
- [Amazon Neptune](https://aws.amazon.com/neptune/)
- [Redis Graph](https://oss.redis.com/redisgraph/)
- [OrientDB](https://orientdb.org/)


### Key-value pair 

> Key-value data stores use a simple key-value design to store data.  Each key has an associated value or set of values. 

Before getting into the details of the key-value NoSQL data store, let's first review our understanding of what exactly is a key-value pair by looking at the below (very simple) JSON data containing an object with some flight information:


In [None]:
 
  {
    "id": "fc704c16fd79",
    "company": "US Airlines",
    "points": 25000,
    "duration": 590
  }


We'll notice that we have certain _keys_ such as id, company, points and duration.  For each key, we have an associated _value_.  The value can be retrieved by querying the file using its key.

For example:
-   If we parse the JSON for `id`, we'll get `fc704c16fd79` as the result
-   If we parse the JSON for `company`, we'll get `US Airlines` as the return value



Key-value data stores use the key-value method described above. The _keys_ are unique identifiers for the values. The _values_ can be any type of object - a number or a string, or even another key-value pair in which case the structure of the data grows more nested.

Unlike relational databases, which store data in tables where each column has an assigned data type, key-value data stores do not have a specified structure. They differ in both the keys and values:
- Keys: In key-value data stores, keys do not specifically have to all be the same type. However, as it's the only way of retrieving the value associated with it, naming/assigning the keys should be done strategically.
- Values: As with most NoSQL data stores, the values do not have to have a consistent schema. They may contain attributes not present in other records, or they may have different data types for some attributes.

Key names can range from automatically incrementing numbers to semantic descriptions of the value that it represents (e.g. `unit/module/lesson`).

The key-value store is one of the least complex types of NoSQL databases. This is precisely what makes this design so attractive. It uses very simple functions to store, retrieve and remove data. Apart from those main functions, key-value stores do not have a universal querying language. 

Below is an example diagram of how key-value data looks like.  Note that the columns are not of equal length (unlike relational databases):

<p align="center">
  <img src="images/key-value.png" width=600>
</p>

#### Querying key-value data stores vs SQL

To get a better understanding of what querying key-value data stores looks like, let's review an example. Assume we have a table called `Music` which stores song and artist information. In SQL, we would retrieve all song records that belong to a specific artist using the following command:


In [None]:
-- Return all songs by Alesso
SELECT * FROM Music
WHERE artist = 'Alesso'

In a key-value data store, such as Amazon's DynamoDB (one of the most popular tools), the corresponding query would be:

In [None]:
# Return all songs by Alesso

{
    TableName: "Music",
    KeyConditionExpression: "Artist = :a",
    ExpressionAttributeValues: {
        ":a": "Alesso"
    }
}

#### Strengths of key-value data stores

- __Simplicity__:
  - Key value databases are quite simple to use. The straightforward commands make work easier for data engineers.
  - This simplicity allows data to assume any type, or even multiple types, when needed
<p></p>

- __Speed__:
  - This simple architecture makes key-value data stores quick to respond, provided that the infrastructure is optimized
<p></p>

- __Scalability__:
  - This is a key advantage of NoSQL over relational databases in general, and key-value stores in particular. Unlike relational databases, which are only scalable vertically (by rows), key-value stores are also infinitely scalable horizontally (by nesting).
<p></p>

- __Reliability__:
  -  Built-in redundancy automatically manages the restoration of data on lost nodes by using replication

#### Limitations of key-value databases

- __Simplicity__:
  - Although this was also listed as a strength, the simplicity of key-value data stores can also make certain things hard. For example, there is no language nor straightforward means that allows querying with anything else other than the key.
<p></p>

- __No unified query language__:
  - Unlike SQL, which is (roughly) the same across all databases, different key-value data stores have their own way to query keys. Without a unified query language to use, queries from one data store may not be transportable into a different key-value store.
  - Values can't be filtered. Filtering by value is a common operation, which is hard to do with key-value data stores
  - All attributes of an entry matching a specific key queried are returned, rather than a specific attribute
  - When values get updated, the entire value section needs to be updated rather than just a specific part of it

### Top use cases

- __Web-session storage__:
  - A session-oriented application, such as a web application, starts a session when a user logs in and is active until the user logs out or the session times out
  - During this period, the application stores all session-related data either in the main memory or in a data store
  - Session data may include: user profile information, messages, personalized data and themes, recommendations, targeted promotions, and discounts
  - Each user session has a _unique identifier_. Session data is never queried by anything other than a primary key, so a fast key-value store is a better fit for session data. 

- __Shopping cart__:
  - During the holiday shopping season, an e-commerce website may receive millions of orders in seconds. Key-value stores can handle the scaling of large amounts of data and extremely high volumes of state changes while servicing millions of simultaneous users through distributed processing and storage. 

### Popular key-value data stores

- [Amazon DynamoDB](https://aws.amazon.com/dynamodb/)
  - DynamoDB is a data store trusted by many large-scale users in major companies
  - It is fully managed and reliable, with built-in backup and security options
  - It is able to endure high loads and handle up to trillions of requests daily with sub-millisecond read times
<p></p>

- [Redis](https://redis.io/)
  - Redis is an open source key-value data store
  - With keys containing lists, hashes, strings and sets, Redis is known as a data structure server

### Columnar-oriented

> Column-based (sometimes called wide-column-based) are another type of NoSQL data store. In column-oriented stores, data is stored in _cells_ grouped in columns of data rather than as rows of data, like we are used to in relational databases.

Using a columnar data store provides certain benefits over traditional relational databases.  One of the main benefits is the faster read-query performance because the column design keeps the data physically closer together, which reduces the seek time especially when only certain columns are required.  Another benefit is that columnar data stores are more efficient at compressing data (since it's closer together) and can have better compression ratios over relational databases.  


Relational databases have a set schema and they function as tables of rows and columns. Wide-column data stores have a similar, but different schema. They also have rows and columns. However, they are not fixed within a table, but rather have a _dynamic_ schema. Each column is stored separately. If there are similar (related) columns, they are joined into _column families_ and then the column families are stored separately from other column families. 

To help visualise this, take a look at the below diagram (note how the data is stored):

<p align="center">
  <img src="images/columnar.png" width=600>
</p>

The _row key_ is the first column in each column family, and it serves as an identifier of a row. Furthermore, each column after that has a column key (name). It identifies columns within rows and thus enables the querying of the columns. Column families can contain a virtually unlimited number of columns that can be created at runtime or while defining the schema. Read and write is done using columns rather than rows. Column families are groups of similar data that is usually accessed together. As an example, we often access customer names and profile information at the same time, but not the information on their orders. 

Here is an example of what a row key looks like:

<p align="center">
  <img src="images/columnar-row-key.png" width=600>
</p>

Now, let's look at a real example with some data. Below are 3 records containing `User Profile` information. We can see that we have 3 rowkeys (Bob, Britney and Tori). For each _rowkey_, we have a number of columns. Each column consists of a name (key) and some values. Note that for the first rowkey (Bob), we have 3 columns (emailAddress, gender and age),, while for the second rowkey (Britney), we only have 2 columns (emailAddress and gender) and for the third rowkey (Tori), we have 3 columns, but _2 are different_ than all the other rowkeys (country and hairColor). This column flexibility is an example of the schema-less architecture mentioned earlier. In a relational database, such a scenario could not happen as the data must matcha pre-defined schema or it otherwise will not be loaded into the table.

<p align="center">
  <img src="images/columnar-user-profile.png" width=600>
</p>

<p></p>

Columns are logically grouped into _column families_. As mentioned earlier, this architecture optimizes the data store for fast retrieval of columns of data, rather than rows of data. Columnar data stores use the concept of _keyspace_, which is similar to a schema in relational models. This keyspace contains all the column families, which then contain rows, which then contain columns. See the below diagram to help visualise this:

<p align="center">
  <img src="images/columnar-keyspace.png" width=600>
</p>


Columnar data stores are most often utilized when there is a need for a large data model. Because they can store this large data model they are often used as data warehouses. They can also be appropriate when there is a need for running complex queries that for example, filter based on values. This would not be possible with a key-value data store.

#### Querying a column-oriented data store vs SQL

For improved efficiency and user experience, many column-oriented data stores have query languages, which allow querying data stored in the table using the _column family_.

Let's look at an example to better understand this point. Assume we have a SQL table called `employee` which stores staff data. In SQL, to select all records that belong to a particular employee who has an id = 10, we would use the following command:


In [None]:
SELECT * FROM employee
WHERE id = 10

The equivalent query in HBase (assuming that the _rowkey_ is the employee id) would become:

In [None]:
get 'employee', '10'

#### Strengths of column-oriented data stores

- __Efficient in online analytical processing (OLAP) scenarios__:
  - Due to nature of its design, columnar data stores are very fast in OLAP systems
  - This is because OLAP operates on data aggregations, which fits nicely with a column-based design
<p></p>

- __Able to store complex data models__:
  - Able to act as a data warehouse
  - Enables complex queries to be run faster. This is because instead of looking row by row, we can skip over non-required fields and focus only on the relevant columns.
<p></p>

- __Compression__:
  - Not only are they infinitely scalable, but they are also good at compressing data and thus saving storage
  - Column data is of a uniform type, which provides an opportunity for storage-size optimizations. For instance, missing values and duplicates can be represented by a 2-bit marker which helps to save space.
  - Self-indexing features use less disk space than indexing in relational databases, which helps to save disk storage space
  - Popular compression techniques include Dictionary encoding and Run-length encoding. When used on columnar data stores, they provide much better compression results when compared to row-based relational databases.
 

#### Limitations of column-oriented data stores

- __Poor Online transactional processing (OLTP) performance__:
  - Columnar NoSQL are not very efficient with online transactional processing as much as they are for online analytical processing (OLAP). OLTP are best served with row-based structures such as the traditional relational database.
  - This means they are not quick at updating (writing) data but rather are designed to quickly analyze them
<p></p>

- __Incremental data loading__:
  - While incremental data loads are not impossible, columnar data stores do not perform them in the most efficient way.
  - The columns first need to be scanned to identify the right rows and scanned further to locate the modified data which requires overwriting.
<p></p>

- __Row-specific queries__:
  -  Frequent queries that involve data existing in an entire row might cause performance issues by slowing down a column-oriented data store
  
#### Top use cases

- __Telecom call detail records__:
  - Telecom companies need to store billions of logs of recorded customer phone calls. One telecom operator had a requirement to store this continuously growing dataset (20 Terabytes are added monthly) and query this data in real-time. As a mature technology that is extremely cheap but still provides the ability to query data, Hbase was selected as the tool of choice.
<p></p>

- __IoT data monitoring__:
  - Blackberry had a requirement to setup a cutting edge IoT platform that was reliable, secure and scalable. This platform was to be used to store and analyze real-time machine data. Apache Cassandra (a columnar store) was selected as the tool of choice.

#### Popular columnar data stores

- [HBase](https://hbase.apache.org/)
- [Cassandra](https://cassandra.apache.org/_/index.html)


## Strengths of NoSQL over relational databases

- __Schema flexibility__:
    -   NoSQL data stores do not require a schema
    -   This makes it much easier to change the data model and add new data
<p></p>

- __Un/semi-structured data support__:
    -   NoSQL provide different types of data stores that can handle multiple data formats easily
    -   Example of big data types that can be handled include video, images and documents such as JSON
    -   Even structured data can also be stored
<p></p>

- __Ability to handle large volumes of data at high speed__:
    -   NoSQL data stores generally follow a scale-out architecture. This means that scalability is achieved by distributing the data storage over more nodes in the cluster. When extra storage is needed, we simply add additional computer nodes to the cluster.
    -   Can easily handle big data (billions or even trillions of data points).  
<p></p>

- __High availability__:
    -   The vast majority of NoSQL data stores leverage distributed computing, which means the data is spread out among many computers in the cluster. Moreover, data is generally replicated multiple times to ensure continuous availability
    -   This design avoids having a single point of failure (which was an issue in relational databases) and makes the system robust      

<p></p>


## Limitations of NoSQL over relational databases

- __Not as well optimised for analytical queries__
    - Having no schema can make it harder and slower to query your data
<p></p>

- __Immaturity__:
    -   When compared to relational databases, NoSQL is still novel and not that popular
    -   There might be integration issues with other tools as the technologies are still evolving
<p></p>

- __Less support__:
    –   NoSQL systems are mostly open-source and new, so support is limited compared to relational databases
    -   Moreover, finding the right talent to work on these tools may be somewhat challenging
<p></p>

- __Lack of standardisation__:
    -   Being a novel technology, there aren't clear-cut standards yet
    -   There isn't one scripting language to learn for querying (like SQL), so there may be a need to learn multiple languages


## Key Takeaways

- NoSQL is shorthand for Not Only SQL.  These are a family of data stores that manage data but operate very differently from relational (SQL) databases.
- There are 4 main groups of NoSQL data stores: key-value, columnar, document and graph. Each type is better suited for certain use cases.
    - Key-value data stores capture data in a simple format using a unique key and grouping the remaining data as the value parameter. This model is quite simple and is best suited for use cases involving web sessions and shopping carts. 
    - Columnar (also known as wide-column) data stores organise data by columns (as opposed to the row arrangement approach of relational databases). They are very popular in industry and are used as data warehouses in many global companies.
    - Document data stores capture and store related data in a single document (such as JSON). This makes it easier to group related information together.  They are commonly used for handling real-time data in organizations.
    - Graph data stores are a specialised type that are efficient to storing and querying graph data types, such as those from social networks.  They are the least commonly used NoSQL data store.
- NoSQL technology provides numerous benefits such as schema flexibility, scalability, lower costs and the ability to manage structured and unstructured data.
- Nonetheless, the tools have some drawbacks especially since most of them are novel, immature, lack wide adoption in industry and don't have the support maturity that relational databases currently have.

