## NoSQL - Graph-Oriented

> The Graph-oriented data store is one that stores data in a graph format

<p align="center">
  <img src="images/nosql-graph-example.png" width = 800>
  <figcaption align="center"><cite>Graph Data Example</cite></figcaption>


</p> 

Graphs consist of _nodes_ and _edges_. A graph data store uses those to represent data as such:

- __Node__:
  - Stores the data entities
  - This entity stores the actual data itself, such as the number of people who read a certain tweet, or the number of people who watched a YouTube video
  - Node data can is usually structured as key-value pairs, and is usually an atomic value. It's also possible to import CSV and JSON files as input data.
  
- __Edge__:
  - Stores the relationship between the various nodes
  - For example, an attribute of a tweet such as the number of retweets would have a direct relationship connecting it to the text of the tweet
  - Can also contain the direction showing how the data will flow between the nodes

  For example, below is JSON that represents the graph above:

In [None]:
{
    "Training": [
       {
          "termName": "NoSQLModule",
          "link": "/terms/Training/NoSQLModule",
          "info": "This module contains 2 notebooks on NoSQL",
          "relatedTerms": [
              {
                  "name": "Fundamentals",
                  "link": "/terms/go_to_term/2"
              },
              {
                  "name": "Hbase",
                  "link": "/terms/go_to_term/3"
              }
          ]
          },
      ],
      "category": "Training"
}

This advanced model allows for storing highly-connected data and for complex querying of that data. 

Despite the fact that graph data stores can represent even the most complex interconnected data structures, they are not widely used compared to other data stores. This is because many use cases can be handled with a simpler data storage tool that makes writing queries easier (everyone is used to SQL). For example, if you have a set of data records which simply map `user id` to `username`, then a traditional relational database will suffice (and be highly performant), there is no need for a complex graph data store.

## Data Manipulation in a Graph Data Store
- New relationships between existing data are added by creating new edges between existing nodes
  - An edge always has a _start node_, _end node_, _type_, and _direction_
  - There is no limit to the number and kind of relationships a node can have
- New data is inserted by adding a new node
  - Instead of creating tables or columns for each new data type, we can add a new node with a specific relationship to others


## Data Querying a Graph Data Store
A graph in a Graph data store can be traversed along specific edge types or across the entire graph. Traversing the relationships is very fast because the relationships between nodes are not calculated at query times, but are persisted in the data store itself. 

To get a better understanding of how queries operate in a graph data store, let's look at an example. Assume we have a CSV file with data about actors. In SQL, we can load that data into a table called `actors`. To query the table for movies that `Tom Hanks` starred in, we'd use the following SQL query:

In [None]:
SELECT * FROM actors
WHERE actor_name = 'Tom Hanks'

The equivalent query in a graph data store (using the Cypher programming language) would be as follows:

In [None]:
MATCH (p {actor_name: 'Tom Hanks'})
RETURN p

The above Cypher query will return the node with the actor name `Tom Hanks`.  The expected result would look something like the below:

<p align="center">
  <img src="images/nosql-graph-output.png">
  <figcaption align="center"><cite>NoSQL Graph Query Output</cite></figcaption>
</p> 

Graph-oriented data stores are ideal for mapping social media type of relationships and hence this is their most popular use case in industry.

## Strengths of Graph Data Stores

- __High performance vs SQL databases for graph data__:
  - Very fast in creating relationships between data and querying them
  - One recent experiment found that Neo4j (one of the most popular types of graph data stores) was 60% faster than a MySQL database when running a friends of friends query.  Here is the [link to the experiment results](https://neo4j.com/news/how-much-faster-is-a-graph-database-really/#:~:text=For%20the%20simple%20friends%20of,on%20the%20depth%205%20query.)

<p></p>

- __Query and manipulate any part of the data__:
  - Graph data stores allow you to select and edit any data stored in any node with a query language
  - Key-value stores, for example, do not allow you to query attributes of records

<p></p>

- __Can represent even the most complex relationships between data__
  - Any node can connect to any other

<p></p>

- __Flexibility__:
  - New nodes can easily be added at any time - there's no need for updating a schema
  - Can support more complex data models when compared to key-value data stores. For example, in key-value stores the values cannot link to any other parts of records, whereas with graph data stores, any node can link to any other node.

<p></p>

- __ACID guarantees__:
  - Some types of graph data stores can provide ACID (Atomicity, Consistency, Isolation, Durability) properties similar to a RDBMS, which helps maintain data integrity
  - For example, the newer versions of MongoDB and Neo4j provide ACID guarantees

#### Limitations of Graph Data Stores

- __May be overly complicated for your use case__
  - Most data manipulation and analysis can be done easily without needing to represent it in a graph
  - Because of this, other types of data stores are more widely used, meaning less demand for more graph data stores, resulting in a limited number of them on the market

<p></p>

- __Slow for common queries__:
  - Queries that span the entire dataset (scans) are slow for graph data stores compared to other data stores
    - For example, calculating the average transaction value for each user would require you to get the value node from each transaction node from each user node. Doing the same thing with a relational database would be as simple as joining the user and transaction tables, and running a sum aggregated by user - no need to traverse a graph.

<p></p>

- __No unified query language__:
  - There isn't yet a universal query language, and there may be a need to learn tool-specific languages to interact with the data

## Top Use Cases 

- __Social media networks__:
  - Social media networks are naturally thought of as interconnected nodes representing people, so this is type of data is a natural fit for a graph data store. Instead of having to convert this type of data into a table structure for analysis, a tool like Neo4j can be used.
<p></p>

- __Recommendation engines__:
  - Real-time recommendation engines are key to the success of many online businesses. One type of recommendation engine, called collaborative filtering, works by recommending similar people similar things (w.g. products, movies, music). Graphs make it easy to see who is similar to one another by looking for nodes with similar connections.

## Popular Graph Data Stores

- [Neo4J](https://neo4j.com/)
- [Amazon Neptune](https://aws.amazon.com/neptune/)
- [Redis Graph](https://oss.redis.com/redisgraph/)
- [OrientDB](https://orientdb.org/)


## Key Takeaways

- The Graph-oriented data store is another type of NoSQL that stores data in a graph structure
- It is not that widely used, as it is designed for specialised data types such as social media information 
- Graphs consist of _nodes_, which store the data entities themselves, and _edges_, which store the relationships between the nodes
- Data is added by creating a new node. New relationships are added by creating new edges between existing nodes 
- Graph data stores have several strengths including high performance, ability to represent complex data easily, flexibility as well as some ACID guarantees 
- On the other hand, Graph data stores may be overly complex for simple scenarios, can be slow for common queries that scan the entire dataset, and don't have a standard query language
- The most popular types of Graph data stores used in industry include Neo4j, Amazon Neptune, Redis and OrientDB 
