Skip to content

Commit

Permalink
Additional documentation updates
Browse files Browse the repository at this point in the history
  • Loading branch information
mpredli01 committed May 25, 2021
1 parent 56c2f33 commit b82f8a1
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 57 deletions.
70 changes: 35 additions & 35 deletions spec/src/main/asciidoc/introduction.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,18 @@
//
// SPDX-License-Identifier: EPL-2.0 OR GPL-2.0 WITH Classpath-exception-2.0

== Let's talk about standard to NoSQL database in Java
== Let's Talk About Standard to NoSQL Databases in Java

NoSQL DB are databases that provide mechanisms for storage and retrieval of unstructured data (non-relational), in stark contrast of the tabular relations used in relational databases. NoSQL databases, comparing to relational databases, have better performance and high scalability. They are becoming more popular in several industry verticals, such as finance and streaming. As a result of this increased usage, the number of users and database vendors are increasing too.
NoSQL databases provide mechanisms for storage and retrieval of unstructured data (non-relational) in stark contrast of the tabular relations used in relational databases. NoSQL databases, compared to relational databases, have better performance and high scalability. They are becoming more popular in several industry verticals, such as finance and streaming. As a result of this increased usage, the number of users and database vendors are increasing.

A NoSQL database is defined basically by a model of storage. There are four types:
A NoSQL database is basically defined by a model of storage. There are four types:

=== Key-value
=== Key-Value

.Key-value structure
.Key-Value structure
image::key-value.svg[alt=key-value structure, width=300%]

This database has a structure that looks like a java.util.Map API, values are mapped to keys.
This database type has a structure that looks like a java.util.Map API, values are mapped to keys.

*Examples:*

Expand All @@ -33,7 +33,7 @@ This database has a structure that looks like a java.util.Map API, values are ma
• Voldemort


.Key-Value vs Relational structure
.Key-Value vs. Relational Structure
|===
| Relational structure | Key-value structure
| Table | Bucket
Expand All @@ -42,12 +42,12 @@ This database has a structure that looks like a java.util.Map API, values are ma
| Relationship | ----
|===

=== Document collection
=== Document

.Document structure
.Document Structure
image::document.svg[alt=document structure,width=160%]

This model can store documents without a predefined structure. This document may be composed of numerous fields with many different kinds of data, including a document inside another document. This model works either with XML or JSON file.
This model can store documents without a predefined structure. A document may be composed of numerous fields with different kinds of data, including a document inside another document. This model works either with XML or JSON file.

*Examples:*

Expand All @@ -56,7 +56,7 @@ This model can store documents without a predefined structure. This document may
* MongoDB


.Document vs Relational structure
.Document vs Relational Structure
|===
| Relational structure | Document Collection structure
| Table | Collection
Expand All @@ -67,7 +67,7 @@ This model can store documents without a predefined structure. This document may

=== Column Family

.Column Family structure
.Column Family Structure
image::column.svg[alt=column family structure, width=300%]

This model became popular with the Bigtable paper by Google, with the goal of being a distributed storage system for structured data, projected to have either high scalability or volume.
Expand All @@ -81,7 +81,7 @@ This model became popular with the Bigtable paper by Google, with the goal of be
* SimpleDB


.Column Family vs Relational structure
.Column Family vs Relational Structure
|===
| Relational structure | Column Family structure
| Table | Column Family
Expand All @@ -92,16 +92,16 @@ This model became popular with the Bigtable paper by Google, with the goal of be

=== Graph

.Graph structure
.Graph Structure
image::graph.svg[alt=Graph structure, width=300%]

A graph database uses graph structures for semantic queries with nodes, edges and properties to represent and store data.

* *Vertex*: A node in the graph, it stores data like the table in SQL or a Document in a Document database;
* *Edge*: The element that establishes the relationship between vertices;
* *Property*: A key-value pair that can be at
* *Vertex*: A node in the graph. It stores data like the table in SQL or a Document in a Document database;
* *Edge*: An element that establishes the relationship between vertices;
* *Property*: A key-value pair that defines an edge's properties.

.Graph with vertex, edge and properties
.Graph with Vertex, Edge and Properties
image::graph_deep.svg[alt=Graph structure, width=460%]

The graph direction is an important concept in a graph structure. For example, you can know a person despite this person not knowing you. This is stored in the relationship (edge) direction of the graph.
Expand All @@ -123,7 +123,7 @@ The graph direction is an important concept in a graph structure. For example, y
|===


=== Multi-model database
=== Multi-Model Database

Some databases have support for more than one kind of model storage. This is the multi-model database.

Expand All @@ -134,27 +134,27 @@ Some databases have support for more than one kind of model storage. This is the

=== Scalability vs Complexity

Every database type has specific persistence structures to solve particular problems. There is a balance regarding model complexity; more complicated models are less scalable. E.g., a key-value NoSQL database is more scalable and there is a simple complexity because all queries and operations are key-based.
Every database type has specific persistence structures to solve particular problems. There is a balance regarding model complexity; more complicated models are less scalable. For example, as shown in Figure 6, a key-value NoSQL database is more scalable simple complexity because all queries and operations are key-based.

.Scalability vs Complexity
.Scalability vs. Complexity
image::scalability_vs_complexity.svg[alt=Scalability vs Complexity, width=300%]

=== BASE vs ACID
.BASE vs ACID
.BASE vs. ACID
image::acid_vs_base.png[alt=BASE vs ACID, width=300%]

Relational persistence technologies have as key characteristics: Atomicity, Consistency, Isolation, Durability (ACID):
Key characteristics of relational persistence technologies are defined as: Atomicity, Consistency, Isolation, Durability (ACID):

* *Atomicity*: Either all transaction operations complete, or none will.
* *Consistency*: The database is in a consistent state when a transaction begins and ends.
* *Isolation*: A transaction will behave as if it is the only operation being performed upon the database.
* *Isolation*: A transaction will behave as if it is the only operation being performed on the database.
* *Durability*: Upon completion of a transaction, a operation will not be reversed.

In the NoSQL world, the key characteristics are Basic Availability, Soft-state and Eventual consistency (BASE):
In the NoSQL world, the key characteristics are defined as: Basic Availability, Soft-State and Eventual consistency (BASE):

* *Basic Availability*: The database appears to work most of the time.
* *Soft-state*: Stores dont have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
* *Eventual consistency*: Stores exhibit consistency at some later point (e.g., lazily at read time).
* *Soft-state*: Data stores don't have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
* *Eventual consistency*: Data stores exhibit consistency at some point later (e.g., lazily at read time).

=== CAP Theorem
.CAP Theorem
Expand All @@ -166,24 +166,24 @@ The CAP theorem is applied to distributed systems that store state. Eric Brewer,
* *Availability*: Every non-failing node returns a response for all read and write requests in a reasonable amount of time. The key word here is "every". To be available, every node (on either side of a network partition) must be able to respond in a reasonable amount of time.
* *Partition Tolerance*: The system continues to function and uphold its consistency guarantees in spite of network partitions. Network partitions are a fact of life. Distributed systems guaranteeing partition tolerance can gracefully recover from partitions once the partition heals.

=== The diversity in NoSQL
=== The Diversity in NoSQL

There are around two hundred and twenty-five NoSQL databases (at time of writing). These databases usually support one or more types of structures. They also have specific behavior. Particular features make developer’s life more comfortable in different ways, such as Cassandra Query language in Cassandra databases, a search engine in Elasticsearch, live query in OrientDB, N1QL in Couchbase, and so on. Such aspects matter when the topic is NoSQL databases.
There are approximately 225 NoSQL databases (at time of writing). These databases usually support one or more types of structures. They also have specific behavior. Particular features make developer's experience more comfortable in different ways, such as Cassandra Query language in Cassandra databases, a search engine in Elasticsearch, live query in OrientDB, N1QL in Couchbase, and so on. Such aspects matter with NoSQL databases.

=== Standard in SQL

Java applications that use relational databases have as good practice a layer between business logic and data. This is known as DAO - Data Access Object. There are also APIs, such as JPA and JDBC providing advantages to developers:
Java applications that use relational databases have, as a good practice, a layer between business logic and data. This is known as Data Access Object (DAO). There are also APIs, such as JPA and JDBC providing advantages to developers:


* Avoid vendor lock-in. Using a standard (such as JDBC), a database has less impact and is easier to implement - because we just need to change a simple driver.
* There is no need to learn a new API to work with a new database - that's implemented into the driver.
* There is less code change when changing to a new vendor. There may be some code changes, but not all code that talks to the database is lost.
* There is no need to learn a new API to work with a new database - that's implemented in to the driver.
* There is less code change when changing to a new vendor. There may be some code changes, but not all code that communicates with the database is lost.

Currently, there are no NoSQL standards for Java. This causes a Java developer to:

* Be locked-in to a vendor
* Learn a new API every time it needs to use a new database. Every database vendor change has a high impact, because a rewrite of the communication layer is needed. This happens even when changing to a new database that is the same kind of NoSQL database.
* Learn a new API every time an application needs to use a new database. Every database vendor change has a high impact because a rewrite of the communication layer is required. This happens even when changing to a new database that is the same kind of NoSQL database.

There are initiatives to create NoSQL APIs, such as Spring Data, Hibernate ORM, and TopLink. JPA is a popular API in the Java world, which is why all these initiatives try to use it. However, this API is created for SQL and not for NoSQL and, as such, doesn't support all behaviors in NoSQL databases. Many NoSQL databases have no transaction concecpt, and many NoSQL database don't support to asynchronous insertion either.
There are initiatives to create NoSQL APIs, such as Spring Data, Hibernate ORM, and TopLink. JPA is a popular API in the Java world, which is why all these initiatives try to use it. However, this API is created for SQL and not for NoSQL and, as such, doesn't support all behaviors in NoSQL databases. Many NoSQL databases have no transaction concept, and many NoSQL database also don't support to asynchronous insertion.

The solution, in this case, is the creation of a specification that covers the four kinds of NoSQL database; as described, each kind has specific structures that must be recognized. This new API should resemble JPA because of its popularity amongst Java developers. It should also be extensible, to support cases when a database has more than one particular behavior.
The solution, in this case, is to create a specification that covers the four types of NoSQL database as described earlier in this chapter Each database type has specific structures that must be recognized. This new API should resemble JPA because of its popularity amongst Java developers. It should also be extensible, to support cases when a database has more than one particular behavior.
64 changes: 42 additions & 22 deletions spec/src/main/asciidoc/jakarta_nosql_spec.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,12 @@ include::license-alv2.asciidoc[]

Jakarta NoSQL is a Java framework that streamlines the integration of Java applications with NoSQL databases. It defines a set of APIs and provides a standard implementation for most NoSQL databases. This clearly helps to achieve very low application coupling with the underlying NoSQL technologies used in applications.

The project has two layers that define communication with NoSQL databases through API's. These are:
The project has two layers that define communication with NoSQL databases through APIs. These are:

1. *Communication Layer*: Contains four modules, one for each NoSQL database type: Key-Value, Column Family, Document and Graph.
1. *Communication Layer*: This layer contains four modules, one for each NoSQL database type: Key-Value, Column Family, Document and Graph.
In the traditional the RDBMS world, this layer may be compared to the JDBC API.

2. *Mapping Layer*: This layer is annotation-driven and uses technologies like Jakarta Contexts and Dependency Injection and Jakarta Bean Validation, making it simple for developers to use.
2. *Mapping Layer*: This layer is annotation-driven and uses technologies such as Jakarta Contexts and Dependency Injection and Jakarta Bean Validation, making it simple for developers to use.
In the traditional RDBMS world, this layer may be compared to the Java Persistence API or object-relational mapping frameworks such as Hibernate.

Jakarta NoSQL defines an API for each NoSQL database type. However, it uses the same annotations to map Java objects.
Expand Down Expand Up @@ -76,7 +76,11 @@ Service service = container.select(Service.class).get();
DeityRepository repository = service.getDeityRepository();
Deity diana = Deity.builder().withId("diana").withName("Diana").withPower("hunt").builder();
Deity diana = Deity.builder()
.withId("diana")
.withName("Diana")
.withPower("hunt")
.builder();
repository.save(diana);
Expand All @@ -87,44 +91,60 @@ Optional<Deity> nameResult = repository.findByName("Diana");

=== Beyond JPA

JPA is a good API for object-relational mapping and it's already a standard in the Java world defined in JSRs. It would be ideal to use the same API for both SQL and NoSQL, but there are behaviors in NoSQL that SQL does not cover, such as time to live and asynchronous operations. JPA was simply not made to handle those features.
JPA is a good API for object-relational mapping and has established itself as a standard in the Java world defined in JSRs. It would be ideal to use the same API for both SQL and NoSQL, but there are behaviors in NoSQL that SQL does not cover, such as time to live and asynchronous operations. JPA was simply not designed to handle those features.


[source,java]
----
ColumnTemplate template = …;
Deity diana = Deity.builder().withId("diana").withName("Diana")
.withPower("hunt").builder();
ColumnTemplate template = ...;
Deity diana = Deity.builder()
.withId("diana")
.withName("Diana")
.withPower("hunt")
.builder();
Duration ttl = Duration.ofSeconds(1);
template.insert(diana, Duration.ofSeconds(1));
----


=== A Fluent API

Jakarta NoSQL is a fluent API that makes it easier for Java developers create queries that either retrieve or delete information in a Document type, for example.
Jakarta NoSQL is a fluent API for Java developers to more easily create queries that either retrieve or delete information in a Document database type, for example.

[source,java]
----
DocumentTemplate template = //;//a template to document nosql operations
Deity diana = Deity.builder().withId("diana").withName("Diana")
.withPower("hunt").builder();
DocumentTemplate template = ...; // a template to document nosql operations
Deity diana = Deity.builder()
.withId("diana")
.withName("Diana")
.withPower("hunt")
.builder();
template.insert(diana);//insert an entity
DocumentQuery query = select().from(Deity.class).where("name")
.eq("Diana").build();//select Deity where name equals “Diana”
List<Deity> deities = template.select(query);//execute query
DocumentDeleteQuery delete = delete().from("deity").where("name")
.eq("Diana").build();//delete query
DocumentQuery query = select()
.from(Deity.class)
.where("name")
.eq("Diana")
.build(); // select Deity where name equals “Diana”
List<Deity> deities = template.select(query); // execute query
DocumentDeleteQuery delete = delete()
.from("deity").where("name")
.eq("Diana")
.build(); // delete query
template.delete(delete);
----

=== Let's not reinvent the wheel: Graph
=== Let's Not Reinvent the Wheel: Graph Database Type

The Communication Layer defines three new APIs: Key-Value, Document and Column Family. It does not have new Graph API, because a very good one already exists. Apache TinkerPop is a graph computing framework for both graph databases (OLTP) and graph analytic systems (OLAP). Using Apache TinkerPop as Communication API for Graph databases, the Mapping API has a tight integration with it.

=== Particular behavior matters in NoSQL database
=== Particular Behavior Matters in NoSQL Databases

Particular behavior matters. Even within the same type, each NoSQL database has a unique feature that is a considerable factor when choosing a database over another. This ‘’feature’’ might make it easier to develop, make it more scaleable or consistent from a configuration standpoint, have the desired consistency level or search engine, etc. Some examples are Cassandra and its Cassandra Query Language and consistency level, OrientDB with live queries, ArangoDB and its Arango Query Language, Couchbase with N1QL - the list goes on. Each NoSQL has a specific behavior and this behavior matters, so Jakarta NoSQL is extensible enough to capture this substantiality different feature elements.
Particular behavior matters. Even within the same type, each NoSQL database has a unique feature that may be a considerable factor when choosing one database over another. This ‘’feature’’ might make it easier to develop, make it more scaleable or consistent from a configuration standpoint, have the desired consistency level or search engine, etc. Some examples include: Cassandra and its Cassandra Query Language and consistency level; OrientDB with live queries; ArangoDB and its Arango Query Language; Couchbase with N1QL; etc. Each NoSQL database has a specific behavior and this behavior matters, so Jakarta NoSQL was designed to be extensible enough to capture these substantially different feature elements.

[source,java]
----
Expand All @@ -143,11 +163,11 @@ ConsistencyLevel level = ConsistencyLevel.THREE;
template.save(person, level);
----

=== Key features
=== Key Features

* Simple APIs supporting all well-known NoSQL storage types - Column Family, Key-Value Pair, Graph and Document databases.
* Use of Convention Over Configuration
* Easy-to-implement API Specification and Test Compatibility Kit (TCK) for NoSQL Vendors
* Easy-to-implement API Specification and Technology Compatibility Kit (TCK) for NoSQL Vendors
* The API’s focus is on simplicity and ease of use. Developers should only have to know a minimal set of artifacts to work with Jakarta NoSQL. The API is built on Java 8 features like Lambdas and Streams, and therefore fits perfectly with the functional features of Java 8+.


Expand Down

0 comments on commit b82f8a1

Please sign in to comment.