# NoSQL - Overview

> NoSQL is shorthand for: Not Only SQL

## What is NoSQL?

> They are distributed data stores used to store, manage and interact with semi-structured and unstructured data (but can also manage structured data).

NoSQL stores are distributed, meaning that they run on not just one machine but many. Doing this means that the storage limits increase with the number of machines on the system, so they can scale hugely. That means that the whole dataset might be spread across 100 computers. Often, these data stores also have replicas of the data, in case of machines failing and data being lost on one of those replicas.

Traditionally, organizations have used relational databases to capture their data and query it.  

However, companies started seeing limitations of traditional relational databases:

-   Emergence of new data types such as videos and images due to the rapid adoption of the Internet, social media and IoT devices (variety)
-   Increasing rate of data generation (velocity)
-   Growing size of data files (for instance, video files can be much larger than CSVs) (volume)
-   Relational databases have no flexibility. If you want to add a column, you would need to consider how this relates to other data, and populate its values for all other rows. This can make it difficult to maintain the database schema as it changes.

Accordingly, a new group of technologies, termed NoSQL, started to emerge to address these limitations. The first time the term NoSQL was introduced was in 1998 by Carlo Strozzi, who gave the name to his first open-source database that was created at that time.

__History of NoSQL__:
- 1998 - Carlo Strozzi used the term NoSQL for his lightweight, open-source database
- 2000 - Graph database Neo4j launched
- 2004 - Google BigTable launched
- 2005 - CouchDB launched
- 2007 - The research paper on Amazon Dynamo is released
- 2008 - Facebooks open-sources the Cassandra project
- 2009 - The term NoSQL was reintroduced and tools are rapidly being adopted



## Charactaristics of NoSQL Data Stores:
-   __Non-relational__:
    -   They are mainly designed for semi-structured and unstructured data
<p></p>
    
-   __Schema-free__:
    -   Don't require the data to have a pre-determined model or schema (which is not the case in SQL databases)
    -   Can store different types of data in the same place. E.g. JSON and images and CSV
    -   There exists a number of different types of NoSQL data stores to handle the numerous data types (more on that in the next section)
<p></p>

-   __Replication of data stores to avoid single point of failure__:
    -   Stored data is spread out on many computers, known as nodes. This group of interconnected nodes make up a _cluster_.
    -   This avoids having a centralized architecture that could lead to a single point of failure if the server fails
<p></p>

-   __Easy to scale__:
    -   Are able to scale across (adding more columns) easily as a result of the schemaless architecture. This is a key difference between NoSQL and SQL.
    -   Are able to scale up and down (adding or removing data) by adding or removing nodes (computers) to the distributed cluster (group of computers). This means they can easily handle billions of data points.


## SQL vs NoSQL

There are some key differences between SQL and NoSQL tools. The main ones are highlighted in the table below:

<p align="center">
  <img src="images/sql-vs-nosql2.png" width=600>
  <figcaption align="center"><cite>Comparing SQL to NoSQL</cite></figcaption>

</p>

## Types of NoSQL

> The rapidly growing range of use-cases mean that tools need to be able to handle different scenarios. Within NoSQL, technologies can be divided into 4 main groups according to particular scenarios

__Types of NoSQL data stores__:
-   Document-oriented
-   Graph-oriented
-   Key-value pair 
-   Columnar-oriented 

A high-level diagram of how each one of them can be visualised is below:

<p align="center">
  <img src="images/nosql-types2.png" width=600>
  <figcaption align="center"><cite>The 4 Types of NoSQL Data Stores</cite></figcaption>

</p>

## Comparing Different Types of NoSQL

Before diving into the details of each of the 4 types of NoSQL, let's look at the high-level characteristics of each one:

<p align="center">
  <img src="images/comparing-nosql.png" width=600>
  <figcaption align="center"><cite>Charactaristics of the 4 NoSQL Data Stores</cite></figcaption>

</p>

## Strengths of NoSQL over Relational Databases

- __Schema flexibility__:
    -   NoSQL data stores do not require a schema
    -   This makes it much easier to change the data model and add new data
<p></p>

- __Un/semi-structured data support__:
    -   NoSQL provide different types of data stores that can handle multiple data formats easily
    -   Example of big data types that can be handled include video, images and documents such as JSON
    -   Even structured data can also be stored
<p></p>

- __Ability to handle large volumes of data at high speed__:
    -   NoSQL data stores generally follow a scale-out architecture. This means that scalability is achieved by distributing the data storage over more nodes in the cluster. When extra storage is needed, we simply add additional computer nodes to the cluster.
    -   Can easily handle big data (billions or even trillions of data points).  
<p></p>

- __High availability__:
    -   The vast majority of NoSQL data stores leverage distributed computing, which means the data is spread out among many computers in the cluster. Moreover, data is generally replicated multiple times to ensure continuous availability
    -   This design avoids having a single point of failure (which was an issue in relational databases) and makes the system robust      

<p></p>


## Limitations of NoSQL over Relational Databases

- __Not as well optimised for analytical queries__
    - Having no schema can make it harder and slower to query your data
<p></p>

- __Immaturity__:
    -   When compared to relational databases, NoSQL is still novel and not that popular
    -   There might be integration issues with other tools as the technologies are still evolving
<p></p>

- __Less support__:
    –   NoSQL systems are mostly open-source and new, so support is limited compared to relational databases
    -   Moreover, finding the right talent to work on these tools may be somewhat challenging
<p></p>

- __Lack of standardisation__:
    -   Being a novel technology, there aren't clear-cut standards, yet
    -   There isn't one scripting language to learn for querying (like SQL), so there may be a need to learn multiple languages


## Key Takeaways

- NoSQL is shorthand for Not Only SQL. These are a family of data stores that manage data but operate very differently from relational (SQL) databases.
- There are 4 main groups of NoSQL data stores: key-value, columnar, document and graph. Each type is better suited for certain use cases.
    - Key-value data stores capture data in a simple format using a unique key and grouping the remaining data as the value parameter. This model is quite simple and is best suited for use cases involving web sessions and shopping carts. 
    - Columnar (also known as wide-column) data stores organise data by columns (as opposed to the row arrangement approach of relational databases). They are very popular in industry and are used as data warehouses in many global companies.
    - Document data stores capture and store related data in a single document (such as JSON). This makes it easier to group related information together. They are commonly used for handling real-time data in organizations.
    - Graph data stores are a specialised type that are efficient to storing and querying graph data types, such as those from social networks. They are the least commonly used NoSQL data store.
- NoSQL technology provides numerous benefits such as schema flexibility, scalability, lower costs and the ability to manage structured and unstructured data.
- Nonetheless, NoSQL tools have some drawbacks especially since most of them are novel, immature, lack wide adoption in industry and don't have the support maturity that relational databases currently have.