Skip to content

Commit

Permalink
Add experimental support for Amazon Managed KeySpace
Browse files Browse the repository at this point in the history
Closes #2370

Signed-off-by: Boxuan Li <liboxuan@connect.hku.hk>
  • Loading branch information
li-boxuan committed Jun 5, 2021
1 parent 6f5719a commit d6e6880
Show file tree
Hide file tree
Showing 5 changed files with 194 additions and 32 deletions.
6 changes: 6 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,12 @@ For more information on features and bug fixes in 0.6.0, see the GitHub mileston

#### Upgrade Instructions

##### Experimental support for Amazon Keyspaces

[Amazon Keyspaces](https://aws.amazon.com/keyspaces/) is a serverless managed Apache Cassandra-compatible
database service provided by Amazon. See [Deploying on Amazon Keyspaces](https://docs.janusgraph.org/storage-backend/cassandra/#deploying-on-amazon-keyspaces-experimental)
for more details.

##### Breaking change for Configuration objects

Prior to JanusGraph 0.6.0, `Configuration` objects were from the Apache `commons-configuration` library.
Expand Down
128 changes: 128 additions & 0 deletions docs/storage-backend/cassandra.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,131 @@ However, note that all these vertices and/or edges will be loaded into
memory which can cause `OutOfMemoryException`. Use [JanusGraph with TinkerPop’s Hadoop-Gremlin](../advanced-topics/hadoop.md) to
iterate over all vertices or edges in large graphs effectively.

## Deploying on Amazon Keyspaces (Experimental)

> Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and managed
> Apache Cassandra–compatible database service. Amazon Keyspaces is serverless, so you
> pay for only the resources you use and the service can automatically scale tables up
> and down in response to application traffic.
>
> [Amazon Keyspaces](https://aws.amazon.com/keyspaces/)
!!! note
The support for Amazon Keyspaces is experimental. We discourage usage in production
systems unless you have thoroughly tested it against your use case.

Follow these steps to set up a Amazon Keyspaces cluster and deploy JanusGraph over it.
Prior to these instructions, make sure you have already followed
[this guide](https://docs.aws.amazon.com/keyspaces/latest/devguide/accessing.html)
to sign up for AWS and set up your identity and access management.

### Creating Credentials

You would need to generate service-specific credentials. See
[this guide](https://docs.aws.amazon.com/keyspaces/latest/devguide/programmatic.credentials.html#programmatic.credentials.ssc)
for more details.

After your service-specific credential is generated, you would get an output similar to
the following:

```json
{
"ServiceSpecificCredential": {
"CreateDate": "2019-10-09T16:12:04Z",
"ServiceName": "cassandra.amazonaws.com",
"ServiceUserName": "alice-at-111122223333",
"ServicePassword": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
"ServiceSpecificCredentialId": "ACCAYFI33SINPGJEBYESF",
"UserName": "alice",
"Status": "Active"
}
}
```

Please save `ServiceUserName` and `ServicePassword` in a secure location and you would
need them in JanusGraph config later.

### Setting up SSL/TLS

1. Download the Starfield digital certificate using the following command

```bash
curl https://certs.secureserver.net/repository/sf-class2-root.crt -O
```

2. Convert the Starfield digital certificate to a trustStore file

```bash
openssl x509 -outform der -in sf-class2-root.crt -out temp_file.der
keytool -import -alias cassandra -keystore cassandra_truststore.jks -file temp_file.der
```

Now you would see a `cassandra_truststore.jks` file generated locally. You would need this file
and your truststore password later.

For more details, see [this doc](https://docs.aws.amazon.com/keyspaces/latest/devguide/using_java_driver.html).
Note that you don't need to follow every step of that doc, since it is written for users who connect to Amazon
Keyspaces using a Java client directly.
.

### Configurations

Below is a complete sample of configuration. Unlike standard Apache Cassandra or
ScyllaDB, Amazon Keyspaces only supports a subset of functionalities. Therefore,
some specific configurations are needed.

```properties
# Basic settings for CQL
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=cql
storage.hostname=cassandra.<your-datacenter, e.g. ap-east-1>.amazonaws.com
storage.port=9142
storage.username=<your-service-username>
storage.password=<your-service-password>
storage.cql.keyspace=janusgraph
storage.cql.local-datacenter=<your-datacenter, e.g. ap-east-1>

# SSL related settings
storage.cql.ssl.enabled=true
storage.cql.ssl.truststore.location=<your-trust-store-location>
storage.cql.ssl.truststore.password=<your-trust-store-password>

# Amazon Keyspaces does not support user-generated timestamps
# Thus, the below config must be turned off
graph.assign-timestamp=false

# Amazon Keyspaces only supports LOCAL QUORUM consistency
storage.cql.only-use-local-consistency-for-system-operations=true
storage.cql.read-consistency-level=LOCAL_QUORUM
storage.cql.write-consistency-level=LOCAL_QUORUM
log.janusgraph.key-consistent=true
log.tx.key-consistent=true

# Amazon Keyspaces does not have metadata available to clients
# Thus, we need to tell JanusGraph that metadata are disabled,
# and provide a hint of which partitioner AWS is using.
storage.cql.metadata-schema-enabled=false
storage.cql.metadata-token-map-enabled=false
storage.cql.partitioner-name=DefaultPartitioner
```

Now you should be able to open the graph via gremlin console
or java code, using the above configuration file.

### Known Problems

- Amazon Keyspaces creates tables on-demand. If you are connecting to it the first time, you
would likely see error message like
```
unconfigured table janusgraph.system_properties
```
At the same time, you should be able to see the same table getting created on Amazon Keyspaces console UI.
The creation process typically takes a few seconds. Once the creation is done, you could open
the graph again, and the error will be gone. Unfortunately, you would *have to* follow the same process
for all tables. Alternatively, you could create the tables on AWS manually, if you are familiar with
JanusGraph. Typically nine tables are needed: `edgestore`, `edgestore_lock_`, `graphindex`, `graphindex_lock_`,
`janusgraph_ids`, ` system_properties`, `system_properties_lock_`, `systemlog`, and `txlog`.

## Deploying on Amazon EC2

> Amazon Elastic Compute Cloud (Amazon EC2) is a web service that
Expand All @@ -194,6 +319,9 @@ iterate over all vertices or edges in large graphs effectively.
>
> [Amazon EC2](http://aws.amazon.com/ec2/)
!!! note
The below documentation might be partially out-of-date.

Follow these steps to setup a Cassandra cluster on EC2 and deploy
JanusGraph over Cassandra. To follow these instructions, you need an
Amazon AWS account with established authentication credentials and some
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,4 +73,7 @@ public EntryMetaData[] getMetaDataSchema(String storeName) {
return schemaBuilder.toArray(new EntryMetaData[schemaBuilder.size()]);
}

public boolean isAssignTimestamp() {
return this.assignTimestamp;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@
import com.datastax.oss.driver.api.core.metadata.TokenMap;
import com.datastax.oss.driver.api.core.servererrors.QueryValidationException;
import com.datastax.oss.driver.api.core.type.DataTypes;
import com.datastax.oss.driver.api.querybuilder.delete.DeleteSelection;
import com.datastax.oss.driver.api.querybuilder.insert.Insert;
import com.datastax.oss.driver.api.querybuilder.relation.Relation;
import com.datastax.oss.driver.api.querybuilder.schema.CreateTableWithOptions;
import com.datastax.oss.driver.api.querybuilder.schema.compaction.CompactionStrategy;
Expand Down Expand Up @@ -165,49 +167,70 @@ public CQLKeyColumnValueStore(final CQLStoreManager storeManager, final String t
.limit(bindMarker(LIMIT_BINDING));
this.getSlice = this.session.prepare(addTTLFunction(addTimestampFunction(getSliceSelect)).build());

final Select getKeysRangedSelect = selectFrom(this.storeManager.getKeyspaceName(), this.tableName)
.column(KEY_COLUMN_NAME)
.column(COLUMN_COLUMN_NAME)
.column(VALUE_COLUMN_NAME)
.allowFiltering()
.where(
Relation.token(KEY_COLUMN_NAME).isGreaterThanOrEqualTo(bindMarker(KEY_START_BINDING)),
Relation.token(KEY_COLUMN_NAME).isLessThan(bindMarker(KEY_END_BINDING))
)
.whereColumn(COLUMN_COLUMN_NAME).isGreaterThanOrEqualTo(bindMarker(SLICE_START_BINDING))
.whereColumn(COLUMN_COLUMN_NAME).isLessThanOrEqualTo(bindMarker(SLICE_END_BINDING));
this.getKeysRanged = this.session.prepare(addTTLFunction(addTimestampFunction(getKeysRangedSelect)).build());
if (this.storeManager.getFeatures().hasOrderedScan()) {
final Select getKeysRangedSelect = selectFrom(this.storeManager.getKeyspaceName(), this.tableName)
.column(KEY_COLUMN_NAME)
.column(COLUMN_COLUMN_NAME)
.column(VALUE_COLUMN_NAME)
.allowFiltering()
.where(
Relation.token(KEY_COLUMN_NAME).isGreaterThanOrEqualTo(bindMarker(KEY_START_BINDING)),
Relation.token(KEY_COLUMN_NAME).isLessThan(bindMarker(KEY_END_BINDING))
)
.whereColumn(COLUMN_COLUMN_NAME).isGreaterThanOrEqualTo(bindMarker(SLICE_START_BINDING))
.whereColumn(COLUMN_COLUMN_NAME).isLessThanOrEqualTo(bindMarker(SLICE_END_BINDING));
this.getKeysRanged = this.session.prepare(addTTLFunction(addTimestampFunction(getKeysRangedSelect)).build());
} else {
this.getKeysRanged = null;
}

final Select getKeysAllSelect = selectFrom(this.storeManager.getKeyspaceName(), this.tableName)
.column(KEY_COLUMN_NAME)
.column(COLUMN_COLUMN_NAME)
.column(VALUE_COLUMN_NAME)
.allowFiltering()
.whereColumn(COLUMN_COLUMN_NAME).isGreaterThanOrEqualTo(bindMarker(SLICE_START_BINDING))
.whereColumn(COLUMN_COLUMN_NAME).isLessThanOrEqualTo(bindMarker(SLICE_END_BINDING));
this.getKeysAll = this.session.prepare(addTTLFunction(addTimestampFunction(getKeysAllSelect)).build());
if (this.storeManager.getFeatures().hasUnorderedScan()) {
final Select getKeysAllSelect = selectFrom(this.storeManager.getKeyspaceName(), this.tableName)
.column(KEY_COLUMN_NAME)
.column(COLUMN_COLUMN_NAME)
.column(VALUE_COLUMN_NAME)
.allowFiltering()
.whereColumn(COLUMN_COLUMN_NAME).isGreaterThanOrEqualTo(bindMarker(SLICE_START_BINDING))
.whereColumn(COLUMN_COLUMN_NAME).isLessThanOrEqualTo(bindMarker(SLICE_END_BINDING));
this.getKeysAll = this.session.prepare(addTTLFunction(addTimestampFunction(getKeysAllSelect)).build());
} else {
this.getKeysAll = null;
}

this.deleteColumn = this.session.prepare(deleteFrom(this.storeManager.getKeyspaceName(), this.tableName)
.usingTimestamp(bindMarker(TIMESTAMP_BINDING))
final DeleteSelection deleteSelection = addUsingTimestamp(deleteFrom(this.storeManager.getKeyspaceName(), this.tableName));
this.deleteColumn = this.session.prepare(deleteSelection
.whereColumn(KEY_COLUMN_NAME).isEqualTo(bindMarker(KEY_BINDING))
.whereColumn(COLUMN_COLUMN_NAME).isEqualTo(bindMarker(COLUMN_BINDING))
.build());

this.insertColumn = this.session.prepare(insertInto(this.storeManager.getKeyspaceName(), this.tableName)
final Insert insertColumnInsert = insertInto(this.storeManager.getKeyspaceName(), this.tableName)
.value(KEY_COLUMN_NAME, bindMarker(KEY_BINDING))
.value(COLUMN_COLUMN_NAME, bindMarker(COLUMN_BINDING))
.value(VALUE_COLUMN_NAME, bindMarker(VALUE_BINDING))
.usingTimestamp(bindMarker(TIMESTAMP_BINDING)).build());
.value(VALUE_COLUMN_NAME, bindMarker(VALUE_BINDING));
this.insertColumn = this.session.prepare(addUsingTimestamp(insertColumnInsert).build());

this.insertColumnWithTTL = this.session.prepare(insertInto(this.storeManager.getKeyspaceName(), this.tableName)
.value(KEY_COLUMN_NAME, bindMarker(KEY_BINDING))
.value(COLUMN_COLUMN_NAME, bindMarker(COLUMN_BINDING))
.value(VALUE_COLUMN_NAME, bindMarker(VALUE_BINDING))
.usingTimestamp(bindMarker(TIMESTAMP_BINDING))
.usingTtl(bindMarker(TTL_BINDING)).build());
if (storeManager.getFeatures().hasCellTTL()) {
this.insertColumnWithTTL = this.session.prepare(addUsingTimestamp(insertColumnInsert).usingTtl(bindMarker(TTL_BINDING)).build());
} else {
this.insertColumnWithTTL = null;
}
// @formatter:on
}

private DeleteSelection addUsingTimestamp(DeleteSelection deleteSelection) {
if (storeManager.isAssignTimestamp()) {
return deleteSelection.usingTimestamp(bindMarker(TIMESTAMP_BINDING));
}
return deleteSelection;
}

private Insert addUsingTimestamp(Insert insert) {
if (storeManager.isAssignTimestamp()) {
return insert.usingTimestamp(bindMarker(TIMESTAMP_BINDING));
}
return insert;
}

/**
* Add WRITETIME function into the select query to retrieve the timestamp that the data was written to the database,
* if {@link STORE_META_TIMESTAMPS} is enabled.
Expand Down Expand Up @@ -453,7 +476,7 @@ public KeyIterator getKeys(final KeyRangeQuery query, final StoreTransaction txh

@Override
public KeyIterator getKeys(final SliceQuery query, final StoreTransaction txh) throws BackendException {
if (this.storeManager.getFeatures().hasOrderedScan()) {
if (!this.storeManager.getFeatures().hasUnorderedScan()) {
throw new PermanentBackendException("This operation is only allowed when a random partitioner (md5 or murmur3) is used.");
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,8 @@ public CQLStoreManager(final Configuration configuration) throws BackendExceptio
"server, please check %s and %s options", PARTITIONER_NAME.getName(), METADATA_TOKEN_MAP_ENABLED.getName()));
}
switch (partitioner) {
case "DefaultPartitioner": // Amazon managed KeySpace uses com.amazonaws.cassandra.DefaultPartitioner
fb.timestamps(false).cellTTL(false);
case "RandomPartitioner":
case "Murmur3Partitioner": {
fb.keyOrdered(false).orderedScan(false).unorderedScan(true);
Expand Down

0 comments on commit d6e6880

Please sign in to comment.