Proposed features

Martin Andersson edited this page Apr 12, 2014 · 24 revisions

Notes on Proposed Features

NOTE This set of notes is drawn from various discussions (e.g. blogs, internal discussions) and represents a starting point for discussion, it is not intended to set anything in stone! We intend to assign owners to each feature, who will bring forward a much fuller proposal for consideration as time progresses.

NOTE There is a bias towards Infinispan here simply because this is the system with which the spec leads are most familiar! This does not mean an alternative product may not have a superior approach and should not be explored or discussed.

Package

All interfaces and APIs in this specification would use the javax.datagrid namespace.

1. Async API (Future based)

This API aims to allow users to perform operations on the data grid (PUT, GET, REMOVE) in an asynchronous and non-blocking manner. Such an API would be an alternative to JSR-107's Cache, providing versions of Cache#put(), Cache#get() and Cache#remove() that return java.util.concurrent.Futures rather than the actual return values.

The proposed interface to contain these methods is AsyncCache. E.g.,

public interface AsyncCache<K, V> {
    Future<V> get(K key);
    Future<Void> put(K key);
    Future<V> getAndPut(K key);
    Future<Boolean> remove(K key);
    Future<V> getAndRemove(K key);

    ... etc ...
}

See Infinispan's Async API which inspired this feature.

2. Distributed code execution

In a data grid it may be desirable to ship tasks (code) to data rather than move data to code for processing. The proposed API for this is inspired by a similar feature available in Infinispan. See Infinispan's documentation on distributed executors for more details.

2.1. Map/Reduce

Just as with distributed executors, Map/Reduce (inspired by Google's paper on the same subject) is a popular mechanism for selecting and transforming entries across a distributed data store. Open source implementations of Map/Reduce exist in Hadoop, Cassandra and others. Infinispan too has its own Java-centric, fluent and DSL-like API for performing Map/Reduce tasks. We propose to follow an API similar to that of Infinispan for this JSR.

2.2. Alternate proposal

Given that JSR 347 will only target Java EE 8, which in turn would rely on Java SE 8, we could make use of the closures defined in JSR 335 as a mechanism to provide mappers and reducers.

3. Group API

In a partially replicated grid, colocation is important to ensure related entries are stored on the same node, to prevent unnecessary RPC. Users control how entries are grouped together (and hence colocated) via a Group API.

See Infinispan's Group API which inspired this feature.

4. Annotations

This section covers additional annotations as well as the containers used to process such annotations. JSR 107, for example, tests annotation behaviour when deployed in a CDI, Spring and Google Guice environment. We could follow a similar pattern.

Also, see Infinispan's integration with CDI.

  • Adds bridging of cache notifications to CDI events
  • cache configuration and injection

5. Non-API features

  • TODO= NOTE: By "Non-API" it is not being suggested that this section's features don't have APIs, just that these features' APIs are not part of JSR 347. E.g. most of these Non-APIfeatures will re-use the existing JSR-107 JSR-107 JCACHE API.

5.1. Transactions (JTA) integration

  • JSR-107 provides local transactions and optional participation in a distributed transaction
  • The ability to participate in transactions is necessary, both as an XA resource and as a simple cache to front a RDBMS, via JPA.
  • Define behaviour at certain stages of a tx's lifecycle, particularly with regards to recovery
  • Should play nice with JPA's second level cache SPI
  • TODO=Provide detailed prose for Transactions use cases accommodated by JSR-347
  • TODO=Identify any/all use case distinctions that are 347 grid Transactions specific (i.e. might not be accommodated by 107's base Transactions capability)
  • TODO=Even though this 5.1 section is a non-API section, identify potential extensions/additions to 107 base API that should be provided in 347 API
  • TODO=Demonstrate that 347 specifies a Transactions capability that is both 100% sound and 100% complete for all data grid transactions use cases.
  • TODO=Provide tests that confirm the demonstration of a 100% sound and 100% complete capability.
  • TODO=Investigate/Sample the tactics and capabilities of R&D et. al. cutting-edge grid transactions providers (e.g. Total Order, CloudTM) re: potential to propose standardizing certain features via 347

5.2. Operation Mode

Characteristics such as high availability, along with removal of single points of failure become increasingly important, since cloud infrastructure is inherently unreliable and can be re-provisioned with minimal notice; applications deployed on cloud need to be resilient to this. Further, one of the major benefits of cloud-style deployments is elasticity. The ability to scale out (and back in) quickly and easily.

The following would be driven by configuration options and implementations may support one or more of the following.

  • Define total replication and partial replication, as well as synchronous and asynchronous versions of
    network communications.
  • Define num_owners, which when used with partial replication, dictate how many copies of data are maintained in the grid.
  • Define whether state_transfer is enabled or not. This feature would allow new joiners to populate state from neighbouring nodes in the cluster.
  • TODO = Solicit other interested 347 EG members to help take custody of section 5.2

5.3. Drop-In Replacement for JSR-107 Compliant Caches

To facilitate the adoption of JSR-347 compliant data grids, they should be able to be drop-in replacements for JSR-107 compliant caches. While data grid users in this situation this might not be able to leverage all of the features of a JSR-347 compliant implementation, this will allow for much quicker adoption of datagrids and allow for easy trial runs of the use of data grids where JSR-107 caches are used.

This will require that JSR-347 implementers also provide the same published package, class, and method members as the JSR-107 spec. However, these could simply be wrappers that delegate to their corresponding javax.datagrid members. These would likely be limited to annotations and static factory classes, as interfaces and classes would extend their JSR-107 counterparts.

To ensure that JSR-347 compliant datagrids are backwards compatible with JSR-107 compliant caches, the test cases in the TCK for JSR-107 will need to be included in the TCK for JSR-347 once the TCK for JSR-107 is finalized.

  • TODO = Solicit other interested 347 EG members to help take custody of section 5.3

6. Eventually Consistent API

Many distributed systems make use of eventual consistency to provide partition tolerance. This is an optional operation mode in this specification where the implementation may be configured to be either strongly or eventually consistent. If an eventually consistent mode is configured, the same interfaces are used (javax.cache.Cache and javax.datagrid.AsyncCache), except that all methods may throw the following unchecked exception:

class VersionConflictException<K, V> {
    VersionResolver<K, V> getVersionResolver();
}

interface VersionResolver<K, V> {
	Set<Version> getConcurrentVersions();
    void resolveVersion(Version correctVersion);
    Cache.Entry<K, V> getEntry(Version version);
}

TODO Should we be using exceptions here? Split brains are, after all, exceptional ... however resolving such a version conflict is a bit more involved. The alternative is a new API, analogous to Cache and AsyncCache, providing methods like:

Map<Version, V> get(K key);
void resolve(K key, Version v);

which seems like a real pain to use in day-to-day life. :-)

7. Querying

In addition to map/reduce, many data grids offer alternate ways to search for data, including index-based querying or simple filtering (which may be translated into index-based querying). This section addresses an appropriate mechanism to express such queries. One approach (as used by Infinispan) is Hibernate Search's query builder DSL. See this document for examples.

8. Configuration

  • XML-based config file standardisation (including an XSD)
    • Extension of what is defined in JSR-107?
  • Standardise programmatic config interfaces
    • Extension of what is defined in JSR-107?

9. Client/Server access

  • Most of the available data grid products allow two kind of access:
    • embedded: the client is collocated (same VM) with the grid node
    • client/server: the grid is accessed in C/S mode, various protocols available: REST, custom(binary) etc Many large scale deployments prefer Client/Server access, worth considering standardizing such a C/S protocol as well.