## Chapter 2: Data Models and Query Languages
Different data models (relational, document, graph-based) have different characteristics that are best suited to some use cases.

---

### Relational model vs Document model
The relational model was a theoretical proposal, and many people at the time
doubted whether it could be implemented efficiently. However, by the mid-1980s,
relational database management systems (RDBMSes) and SQL had become the tools
of choice for most people who needed to store and query data with some kind of regular
structure. The dominance of relational databases has lasted around 25‒30 years
—an eternity in computing history.

The roots of relational databases lie in business data processing, which was performed
on mainframe computers in the 1960s and ’70s. The use cases appear mundane from
today’s perspective: typically transaction processing (entering sales or banking transactions,
airline reservations, stock-keeping in warehouses) and batch processing (customer
invoicing, payroll, reporting).

Relational databases turned out to
generalize very well, beyond their original scope of business data processing, to a
broad variety of use cases. Much of what you see on the web today is still powered by
relational databases, be it online publishing, discussion, social networking, ecommerce,
games, software-as-a-service productivity applications, or much more.

#### The Birth of NoSQL

There are several driving forces behind the adoption of NoSQL databases, including:
* A need for greater scalability than relational databases can easily achieve, including
very large datasets or very high write throughput
* A widespread preference for free and open source software over commercial
database products
* Specialized query operations that are not well supported by the relational model
* Frustration with the restrictiveness of relational schemas, and a desire for a more
dynamic and expressive data model

---

### Graph-Like Data Models

If your application has mostly one-to-many relationships
(tree-structured data) or no relationships between records, the document
model is appropriate.

But what if many-to-many relationships are very common in your data? The relational
model can handle s
imple cases of many-to-many relationships, but as the connections
within your data become more complex, it becomes more natural to start
modeling your data as a graph.

A graph consists of two kinds of objects: _vertices_ (also known as nodes or entities) and
_edges_ (also known as relationships or arcs). Many kinds of data can be modeled as a
graph. Typical examples include:
* Social graphs
    - Vertices are people, and edges indicate which people know each other.
* The web graph
    - Vertices are web pages, and edges indicate HTML links to other pages.
* Road or rail networks
    - Vertices are junctions, and edges represent the roads or railway lines between
them.

![](images/image_5.png)

The most common implementations of the graph data model use two different abstractions **property graph** and **triple-store**.

#### Property graph
In the property graph model, each vertex consists of:
* A unique identifier
* A set of outgoing edges
* A set of incoming edges
* A collection of properties (key-value pairs)

Each edge consists of:
* A unique identifier
* The vertex at which the edge starts (the tail vertex)
* The vertex at which the edge ends (the head vertex)
* A label to describe the kind of relationship between the two vertices
* A collection of properties (key-value pairs)

Some important aspects of this model are:
1. Any vertex can have an edge connecting it with any other vertex. There is no
schema that restricts which kinds of things can or cannot be associated.
2. Given any vertex, you can efficiently find both its incoming and its outgoing
edges, and thus traverse the graph—i.e., follow a path through a chain of vertices
—both forward and backward.
3. By using different labels for different kinds of relationships, you can store several
different kinds of information in a single graph, while still maintaining a clean
data model.

#### Triple-store
The triple-store model is mostly equivalent to the property graph model, using different
words to describe the same ideas. It is nevertheless worth discussing, because
there are various tools and languages for triple-stores that can be valuable additions
to your toolbox for building applications.

In a triple-store, all information is stored in the form of very simple three-part statements:
(subject, predicate, object). For example, in the triple (Jim, likes, bananas), Jim
is the subject, likes is the predicate (verb), and bananas is the object.

The subject of a triple is equivalent to a vertex in a graph. The object is one of two
things:
1. A value in a primitive datatype, such as a string or a number. In that case, the
predicate and object of the triple are equivalent to the key and value of a property
on the subject vertex. For example, (lucy, age, 33) is like a vertex lucy with properties
{"age":33}.
2. Another vertex in the graph. In that case, the predicate is an edge in the graph,
the subject is the tail vertex, and the object is the head vertex. For example, in
(lucy, marriedTo, alain) the subject and object lucy and alain are both vertices,
and the predicate marriedTo is the label of the edge that connects them.

---

Examples of data base systems using a graph-like data model:

| Property graph|Triple-store  |
| ------------- |-------------:|
| Neo4j         | Datomic      |
| Titan         | AllegroGraph |
| InfiniteGraph |              |

Declarative query languages for graphs: Cypher, SPARQL, and Datalog

---

There are other data models specifically designed for some use cases, for example:

* Researchers working with genome data often need to perform sequencesimilarity
searches, which means taking one very long string (representing a
DNA molecule) and matching it against a large database of strings that are similar,
but not identical. 

None of the databases described here can handle this kind
of usage, which is why researchers have written specialized genome database
software like GenBank.

* Particle physicists have been doing Big Data–style large-scale data analysis for
decades, and projects like the Large Hadron Collider (LHC) now work with hundreds
of petabytes! At such a scale custom solutions are required to stop the
hardware cost from spiraling out of control

* Full-text search is arguably a kind of data model that is frequently used alongside
databases.