Skip to content

First Major Release - Expressions, Storage, Async queries, no more JSON!

Compare
Choose a tag to compare
@0aix 0aix released this 02 Oct 18:40
· 64 commits to master since this release

Bullet 1.0 is here! This release marks the first major version of Bullet.

The two high level changes that are happening with this release are:

  1. No more JSON queries. Until now, queries were mainly parsed from JSON and were sent from web service to backend as JSON strings where they would be de-serialized, configured, and initialized (and validated). In this release, we move from using JSON serialization to constructing and sending query objects directly. (BQL will be the main method of query construction going from here.)

  2. Expressions. Expressions enable first-order logic in virtually all parts of the Bullet query (besides Aggregations which still use field names). This allows us to filter on and project - and consequently aggregate on - more than just individual fields.

PubSub

  1. PubSubMessage
    1.1. Content changed to byte[] from String; added getContentAsString() which replaces the old getContent()
    1.2. Sequence number removed since it was not used
  2. Metadata
    2.1. Creation timestamp added. This is used in Querier as the query start time.
    2.2. Metadata copy() added. This is expected to be overridden by other PubSubs. e.g. in the subclass RESTMetadata, copy() returns a RESTMetadata.
  3. Publisher
    3.1. PubSubMessage send(PubSubMessage), PubSubMessage send(String, String) in now return the sent PubSubMessage for any configured StorageManagers since the message may be modified. (Previously returned void)

New Interfaces

  1. PubSubResponder abstract class added and extended by any class that responds to a PubSubMessage.
    1.1. PubSubResponder is used in Bullet Service 1.0.0 both synchronously and asynchronously.
    1.2. BulletPubSubResponder implementation provided which publishes results to a configured PubSub
  2. Added StorageManager abstract class which is used to store and retrieve PubSubMessages. Primarily used in Bullet Service for persistent queries (reference implementations landing soon!) sending queries and receiving results.
    2.1. MemoryStorageManager implementation that stores objects in memory
    2.2. NullStorageManager implementation that stores nothing (in the event you do not want to use a StorageManager)

Metrics (New)

  1. Added MetricEvent which is a simple wrapper that represents a metric event.
  2. Added MetricCollector which is a utility class for storing frequency and average metrics for string keys.
  3. Added MetricPublisher abstract class
    3.1. MetricEventPublisher abstract class that extends MetricPublisher to publish MetricEvents.
    3.2. HTTPMetricEventPublisher implementation of MetricEventPublisher which publishes MetricEvents to a given URL and can retry multiple times.

Expressions (New)

  1. Added Expressions and Evaluators. Expressions enable first-order logic in queries, and evaluators are constructed from expressions and evaluated on Bullet records.
    1.1. Supported expressions: Field, Value, List, Unary, Binary, NAry, and Cast
    1.2. Supported operations:
    • Arithmetic: +, -, *, /
    • Comparators: =, !=, >, <, >=, <= (with ANY/ALL for value-to-list comparisons)
    • Boolean logic: AND, OR, XOR
    • Unary: NOT, SIZEOF, IS [NOT] NULL
    • Binary: CONTAINSKEY, CONTAINVALUE, SIZEIS
    • If-then-else: IF
    • Regex LIKE: RLIKE

Query and Querier

  1. Projection and Filter now use expressions.
  2. Projection now explicitly supports three projection types: COPY, NO_COPY, and PASS_THROUGH denoting how fields should be projected.
    2.1. COPY - the record is copied before new fields are projected, e.g. in the computation post-aggregation.
    2.2. NO_COPY - fields are projected onto an empty record
    2.3. PASS_THROUGH - the original record is passed on with no projection
  3. Computation and OrderBy post-aggregations now use expressions.
  4. Added Having post-aggregation which is a filter applied after aggregation.
  5. Added Culling post-aggregation which removes specified fields. This takes the place of the implicit transient fields previously generated in Querier.
  6. Querier now takes a Query object. Consequently, there is no initialization or error-checking done in the Querier anymore.
  7. The query start time is now taken from metadata rather than set at Querier initialization; therefore, the query start time is set when the query is initially sent.
  8. Querier result metadata now includes the original query string in addition to the query object’s JSON.
  9. Queries are now error-checked in the constructors whereas previously, queries were initialized and validated after construction.
  10. Added Aggregation subclasses for the different types of aggregations.
  11. Renamed previous “parsing” package to “query” package since we no longer parse queries from JSON.
  12. Aggregation operations and strategies moved to “querying” package.

Miscellaneous

  1. Added a constructor BulletError(String, String)
  2. BulletException is now a RuntimeException and also takes a single BulletError rather than a list of errors.
  3. Removed Initializable interface
  4. Some strategies renamed
    4.1. Raw to RawStrategy
    4.2. TopK to FrequentItemsSketchingStrategy
    4.3. CountDistinct to ThetaSketchingStrategy
    4.4. GroupBy to TupleSketchingStrategy
  5. ThetaSketch now rounds the resulting count
  6. Updated SimpleEqualityPartitioner to work with Expressions and TypedObjects
  7. Fixed a bug in SimpleEqualityPartitioner where the partitioner did not differentiate properly between missing fields and null fields.
  8. Typesystem moved to Bullet Record 1.0.0
  9. The default BulletRecordProvider class is now “com.yahoo.bullet.record.avro.TypedAvroBulletRecordProvider” from “com.yahoo.bullet.record.AvroBulletRecordProvider”
  10. Can now get a configured Bullet Record Schema from BulletConfig.