# Aerospike Java Client – Data Modeling 101

The goal of this tutorial is to explain basic intuitions about modeling with Aerospike.
The key to getting the most out of Aerospike is to find the right way to match an application’s data model to Aerospike’s data model.

This notebook contains:
* A modeling-oriented overview of Aerospike’s architecture.
* Questions about an application to help determine how to align to Aerospike’s data model and/or data types. 
* Simple example API calls associated with each.

This notebook does not include:
* Discussion of Normalizing and Denormalizing Data.
* Detailed examples for each data model or data type.
* Techniques for efficient reads or updates.

Other tutorials will focus on these facets of modeling in more detail.

This [Jupyter Notebook](https://jupyter-notebook.readthedocs.io/en/stable/notebook.html) requires the Aerospike Database running locally with Java kernel and Aerospike Java Client. To create a Docker container that satisfies the requirements and holds a copy of these notebooks, visit the [Aerospike Notebooks Repo](https://github.com/aerospike/aerospike-dev-notebooks.docker).

## Notebook Setup

### Import Jupyter Java Integration 

Make it easier to work with Java in Jupyter.

In [1]:
import io.github.spencerpark.ijava.IJava;
import io.github.spencerpark.jupyter.kernel.magic.common.Shell;

IJava.getKernelInstance().getMagics().registerMagics(Shell.class);

### Start Aerospike

Ensure Aerospike Database is running locally.

In [2]:
%sh asd

### Download the Aerospike Java Client

Ask Maven to download and install the project object model (POM) of the Aerospike Java Client.

In [3]:
%%loadFromPOM
<dependencies>
  <dependency>
    <groupId>com.aerospike</groupId>
    <artifactId>aerospike-client</artifactId>
    <version>5.0.0</version>
  </dependency>
</dependencies>

### Start the Aerospike Java Client and Connect

Create an instance of the Aerospike Java Client, and connect to the demo cluster.

The default cluster location for the Docker container is *localhost* port *3000*. If your cluster is not running on your local machine, modify *localhost* and *3000* to the values for your Aerospike cluster.

In [4]:
import com.aerospike.client.AerospikeClient;

AerospikeClient client = new AerospikeClient("localhost", 3000);
System.out.println("Initialized the client and connected to the cluster.");

Initialized the client and connected to the cluster.


## A KV Store with Deliberate Structure

Aerospike differentiates from other Key-Value Stores through its architecture and the consequential structure and tools it provides. One can throw documents of data at Aerospike and achieve some performance to keep up with most applications. However, when applications must achieve high performance at scale, expert use of Aerospike provides those results. Those successful outcomes are due to the structure that other Key-Value Stores do not provide. 

### Aerospike Uses All Storage Media Types to Achieve High Performance 

Aerospike was architected to efficiently store document-oriented data.
Priorities:
* Efficient parallel use of all of a machine’s storage media, especially flash storage (SSD, PCIe, NVMe).
* Reads with sub-millisecond latencies at very high throughput (100K to 1M), while under a heavy write load.

The Aerospike data model is a direct result of these priorities. 


### Schema-less Relational Database

The pieces of the Aerospike data model can be thought of as a mirror of the anatomy of a relational database.
* Namespace → Relational Database
* Primary Index → Primary Index
* Set → Table
* Record → Database Row
* Bin → Field

However, despite the similarities to their RDBMS counterpart, each of these has a well-defined purpose and characteristics that make each scale differently. 

## Match App Data Model Elements to Aerospike's Model

The best practice is to consider the dimensions along which the application data must scale and to match its data model elements to the Aerospike data model elements. Because of Aerospike’s focus on scalability, a proper matching will result in highly performant app scalability.

To accomplish this, first, learn about the elements of the Aerospike Data Model.

## Elements of the Aerospike Data Model

The following are the elements of the [Aerospike Data Model](https://www.aerospike.com/docs/architecture/data-model.html): 
* Namespace and Primary Index
* Set
* Key and Digest
* Record
* Bin
    * Collection Data Types
        * List
        * Map

At low read and write volumes, the above may seem like unnecessary complexity. However, as the application scales, the structure provided by the Aerospike data model allows Aerospike to be used surgically at petabyte scale more efficiently by (ROI x Performance) than most varieties of database product.

The following sections share modeling-related details and API code for working with those elements.

### Namespace and Primary Index
The **Namespace** is a server configuration. It associates index and data with related storage media. Because each type of data in a data model has different read/write profile demands, it is common to divide further. For example, data for an ecommerce app might store the hottest sales items in RAM, where the rest are stored in Flash. 

Each Aerospike server in a cluster has a **Primary Index** per namespace detailing the location of all records in all storage media on the node. Within the index, each record has a substantial footprint per record – 64 bytes. By option, the index can contain record data instead of data location.

### Set

The **Set** is a client configuration representing an optional segment of a Namespace.
A set facilitates easy reads and deletes.

### Key and Digest
A **Record** is uniquely identified by a namespace and **Digest**. The digest is a client-generated RIPEMD-160 20-Byte hash of the set name and the user key. The digest can be optionally stored in the Aerospike Database.

### Creating a Key using Namespace Set and User Key

The following is Java Client code to create a key using the namespace, set, and user key.

In [5]:
import com.aerospike.client.Key;

String namespaceName = "test";
String setName = "dm101set";

Integer theKey = 0;  // A key can be any value.

Key key = new Key(namespaceName, setName, theKey);
System.out.println("Key created." );

Key created.


### Record

Aerospike offers a type of record-level *ACID-compliance*. That is, Aerospike allows execution of multiple record-operations as one atomic transaction. 

The structure of a record is a Map containing:
* Metadata
   * Expiration
   * Last Update Time
   * Generation Counter
* Map of Bins

### Bin
A **Bin** is a flexible container that contains data of one type. That type can be either a scalar or a collection data type, however a bin's data type is not formally declared.
When applying one or more operations to a record as a transaction, Aerospike returns all return values per bin. A bin can be the unit of data replication across an Aerospike cluster. The bin name is stored in the Aerospike Database.

### Creating a Simple Record Containing An Integer and A String Bin

The following is Java client code uses the key created above to put integer and string data into a record in Aerospike. 

In [6]:
import com.aerospike.client.Bin;
import com.aerospike.client.policy.ClientPolicy;


String aString = "modeling";
Integer anInteger = 8;

String stringBinName = "str";
String integerBinName = "int";
ClientPolicy clientPolicy = new ClientPolicy();

Bin bin0 = new Bin(stringBinName, aString);
Bin bin1 = new Bin(integerBinName, anInteger);

client.put(clientPolicy.writePolicyDefault, key, bin0, bin1);

System.out.println("Put data into Aerospike: " + stringBinName + "=" + aString + ", " + integerBinName + "=" + anInteger);

Put data into Aerospike: str=modeling, int=8


### Reading the Record

Uses the same key to read the record.

In [7]:
import com.aerospike.client.Record;

Record record = client.get(null, key);
System.out.println("Generation count: " + record.generation);
System.out.println("Record expiration: " + record.expiration);
System.out.println("Bins: " + record.bins);

Generation count: 1
Record expiration: 359403813
Bins: {str=modeling, int=8}


### Collection Data Types

Lists and maps are **Collection Data Types (CDTs)**. These are flexible, schema-less data types that can contain nested scalar data and/or nested collection data types. For data efficiency, lists are frequently used to create [tuples](https://en.wikipedia.org/wiki/Tuple), a lightweight record structure using position instead of field names. 

Maps are commonly used JSON-like data structures. Because a Bin can contain a scalar data type or collection data type, a common question to consider when creating a data model is whether to store a level of record data in one bin containing a CDT or multiple bins.  

#### Lists

Create a tuple and put it in Aerospike.

In [8]:
import com.aerospike.client.Value;

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;

ArrayList<Value> aTuple = new ArrayList<Value>();
aTuple.add(Value.get(9.92));
aTuple.add(Value.get("Carl Lewis"));
aTuple.add(Value.get("Seoul, South Korea"));
aTuple.add(Value.get("September 24, 1988"));

String tupleBinName = "tuple";
Bin bin2 = new Bin(tupleBinName, aTuple);

client.put(clientPolicy.writePolicyDefault, key, bin2);
Record record = client.get(null, key);

System.out.println("Put data into Aerospike: " + tupleBinName + "=" + aTuple);
System.out.println("After operation, Bins: " + record.bins);
System.out.println( tupleBinName + ": " + record.getValue(tupleBinName));

Put data into Aerospike: tuple=[9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]
After operation, Bins: {str=modeling, int=8, tuple=[9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]}
tuple: [9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]


#### Maps

Rather than use a simple tuple, this model needs a map containing a list of Tuples. Reuse the  Tuple bin. 

In [9]:
import java.util.HashMap;

String tupleMapKey = "world-records";
ArrayList<Value> tupleList = new ArrayList<Value>();
tupleList.add(Value.get(aTuple));
HashMap <String, ArrayList> wrMap = new HashMap <String, ArrayList>();
wrMap.put(tupleMapKey, tupleList);

Bin bin2 = new Bin(tupleBinName, wrMap);

client.put(clientPolicy.writePolicyDefault, key, bin2);
Record record = client.get(null, key);
    
System.out.println("After operation, " + tupleBinName + ": " + record.getValue(tupleBinName));

After operation, tuple: {world-records=[[9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]]}


## Distinguishing Questions

From a modeling perspective, each Aerospike data model element is a potential mesh point with the application data model. The following questions instruct broadly how to fit them together. The *italic* text after the question explains the intuition to apply. 

### Questions related to Storage Medium.

**Q:** Does the application require a specific storage medium for a particular type of data, to achieve a necessary scale and frequency of reads or writes?

*The easiest way to match data to hardware is to assign it to the right namespace. Namespaces associate index and data with storage media.*


**Q:** Is there a class of data that is very small and with extremely frequent reads or writes?

*Aerospike can store data in the index, instead of storing memory locations of the data.* 

### Questions related to Reads
**Q:** Is there a class of data where a large number of like application records are frerquently read by the application?

*The fastest way to read a group of application records is either:*
* *Scanning an Aerospike namespace or set of records.*
* *Creating a secondary index and querying against that secondary index.*
*At large scale, the time taken by parsing every record in a namespace as part of a scan is significant. It may be worthy of consideration when building an Aerospike data model.* 

**Q:** Are reads interspersed with writes as part of complex transactions on a record? Does the application require mid-transaction reads?
Best practice is to store data operated on different threads of transaction work in separate bins. 

*When operating on a record, the Aerospike client returns a map of bins containing return values for each bin’s operations. These return values can be accessed by index, making transaction results easy to work with.*

**Q:** Is the size of a set of application records small? (e.g., Closer to 1 MiB than it is to 1GiB?)

*Consider storing the set of app records in one or more Aerospike records, rather than in a set or namespace of records.*


**Q:** Is the timestamp when data is written a significant factor for data reads?

*When data is large, appending a timestamp to a set name allows efficient reads. When data is small, consider appending a timestamp to a user key.*     


**Q:** Does the application do more reads on the data than writes?

*Common practice is to architect based on the highest volume activities. The common intuition is that if there are more reads than writes, then consider applying an sort order to the data, so it is returned to the client in a usable order.* 


### Questions related to Writes

**Q:** Do writes occur grouped into transactions or are individual pieces of data updated one by one?

*Aerospike provides simple record atomic transactions. Store data requiring atomic updates in one or more bins in the same Aerospike record, and use the Operate API to execute a multiple operation transaction. If updates occur element by element, data can be stored in one or more records, or in one or more bins of data.* 

**Q:** Does the application write data more frequently than data is read?

*When an application writes data more frequently than it is read, consider not applying a server-side sort to the data. Aerospike's client will JIT sort the data on-demand.*

### Questions related to Deletes

**Q:** What dimensions of the data model expand over time to the extent that they will need to be rotated out or dropped from the database?

*The most efficient way to rotate data is to create and truncate sets.* 

### Questions related to Application Scale

**Q:** Does your application volume result in sufficient scale that multiple servers routinely suffer simultaneous downtime for disrepair or service?"

*It is common for Aerospike clusters when the model is architected properly, to replace competing databases at a 1:5 (Aerospike:Other) ratio. When handling downtime, it will be important to configure whether Aerospike will run in AP mode or SC mode.*

* **AP or Availability Mode** *– Priorizes data availability for reads over data replication.*
* **SC or Strong Consistency Mode** *– Data replication across an Aerospike cluster can cause reads to fail.* 

[Go here](https://docs.aerospike.com/docs/architecture/consistency.html) for more information on Data Consistency.

## DELETING the Records and Closing Server Connection

### Truncate the Set
Truncate the set from the Aerospike Database.

In [10]:
import com.aerospike.client.policy.InfoPolicy;
InfoPolicy infoPolicy = new InfoPolicy();

client.truncate(infoPolicy, namespaceName, setName, null);
System.out.println("Set Truncated.");

Set Truncated.


### Close the Client connections to Aerospike

In [11]:
client.close();
System.out.println("Server connection(s) closed.");

Server connection(s) closed.


## Code Summary

### Overview
Here is a collection of all of the non-Jupyter code from this tutorial.
1. Import Java Libraries.
2. Import Aerospike Client Libraries.
3. Start the Aerospike Client.
4. Create a Key using Namespace Set and User Key.
5. Create Bins of Data.
    1. String
    2. Integer
    3. List
    4. Map
6. Put Bins into an Aerospike Record.    
7. Get the Record from Aerospike.
8. Truncate the Set.
9. Close Client Connections.

In [12]:
// Import Java Libraries

import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.Map;
import java.util.HashMap;


// Import Aerospike Client Libraries

import com.aerospike.client.AerospikeClient;
import com.aerospike.client.Key;
import com.aerospike.client.Bin;
import com.aerospike.client.policy.ClientPolicy;
import com.aerospike.client.Value;
import com.aerospike.client.Record;
import com.aerospike.client.policy.InfoPolicy;
InfoPolicy infoPolicy = new InfoPolicy();


// Start the Aerospike Client.

AerospikeClient client = new AerospikeClient("localhost", 3000);
System.out.println("Initialized the client and connected to the cluster.");


// Create a Key using Namespace Set and User Key

String namespaceName = "test";
String setName = "dm101set";

Integer theKey = 0;  // A key can be any value.

Key key = new Key(namespaceName, setName, theKey);
System.out.println("Key created." );


// Create Bins of Data.
//    A. Integer

Integer anInteger = 8;

String integerBinName = "int";
ClientPolicy clientPolicy = new ClientPolicy();

Bin bin0 = new Bin(integerBinName, anInteger);


//    B. String

String aString = "modeling";

String stringBinName = "str";

Bin bin1 = new Bin(stringBinName, aString);


//    C. List

ArrayList<Value> aTuple = new ArrayList<Value>();
aTuple.add(Value.get(9.92));
aTuple.add(Value.get("Carl Lewis"));
aTuple.add(Value.get("Seoul, South Korea"));
aTuple.add(Value.get("September 24, 1988"));

String tupleBinName = "tuple";
Bin bin2 = new Bin(tupleBinName, aTuple);

client.put(clientPolicy.writePolicyDefault, key, bin2);


//    D. Map

String mapTupleBinName = "maptuple";

String tupleMapKey = "world-records";
ArrayList<Value> tupleList = new ArrayList<Value>();
tupleList.add(Value.get(aTuple));
HashMap <String, ArrayList> wrMap = new HashMap <String, ArrayList>();
wrMap.put(tupleMapKey, tupleList);

Bin bin3 = new Bin(mapTupleBinName, wrMap);


// Put the Bins into Aerospike

client.put(clientPolicy.writePolicyDefault, key, bin0, bin1, bin2, bin3);
    

// Get the Record from Aerospike.

Record record = client.get(null, key);
System.out.println("Read from Aerospike –");
System.out.println("Generation count: " + record.generation);
System.out.println("Record expiration: " + record.expiration);
System.out.println( integerBinName + ": " + record.getValue(integerBinName));
System.out.println( stringBinName + ": " + record.getValue(stringBinName));
System.out.println( tupleBinName + ": " + record.getValue(tupleBinName));
System.out.println( mapTupleBinName + ": " + record.getValue(mapTupleBinName));


// Truncate the Set.

client.truncate(infoPolicy, namespaceName, setName, null);
System.out.println("Set Truncated.");


// Close Client Connections.

client.close();
System.out.println("Server connection(s) closed.");

Initialized the client and connected to the cluster.
Key created.
Read from Aerospike –
Generation count: 2
Record expiration: 359403815
int: 8
str: modeling
tuple: [9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]
maptuple: {world-records=[[9.92, Carl Lewis, Seoul, South Korea, September 24, 1988]]}
Set Truncated.
Server connection(s) closed.


## Data Modeling is an Art and a Science

Data modeling with Aerospike is a science, but deep enough that it will seem like an art at first. An intuitive matching of data model elements Aerospike's elements will generally result in a successful application. 

When pushing the envelope of performance, do not hesitate to use additional resources. A great way to learn more about modeling is to, post questions to the [data modeling discussion forum](https://discuss.aerospike.com/c/how-developers-are-using-aerospike/data-modeling/143). This is especially worthwhile to optimize Aerospike performance for an application. In addition, discussing requirements with Aerospike's solutions architect team will *still* result in performance improvements and increase ROI for Aerospike.

## Knowing the Right Questions to Ask is the First Step

By nature, the above is incomplete knowledge on Modeling. This notebook may be updated with additional questions over time. Please [submit feedback](mailto:devhub-feedback@aerospike.com) to help refine it.

## Next steps

Have questions? Don't hesitate to reach out if you have additional questions about data modeling at https://discuss.aerospike.com/c/how-developers-are-using-aerospike/data-modeling/143.

Want to check out other Java notebooks?
1. [Intro to Transactions](./java-intro_to_transactions.ipynb)
2. [Modeling Using Lists](./java-modeling_using_lists.ipynb)
3. [Working with Maps](./java-working_with_maps.ipynb)
4. [Aerospike Query and UDF](query_udf.ipynb)


Are you running this from Binder? [Download the Aerospike Notebook Repo](https://github.com/aerospike/aerospike-dev-notebooks.docker) and work with Aerospike Database and Jupyter locally using a Docker container.

### Additional Resources

* Want to get started with Java? [Download](https://www.aerospike.com/download/client/) or [install](https://github.com/aerospike/aerospike-client-java) the Aerospike Java Client.  
(https://www.aerospike.com/apidocs/java/com/aerospike/client/cdt/MapOperation.html).
* What are Namespaces, Sets, and Bins? Check out the [Aerospike Data Model](https://www.aerospike.com/docs/architecture/data-model.html). 
* How robust is the Aerospike Database? Browses the [Aerospike Database Architecture](https://www.aerospike.com/docs/architecture/index.html).