From b0a31dca624205779196a670198d39bbb49d9f46 Mon Sep 17 00:00:00 2001 From: Nathan Speidel Date: Wed, 20 Jun 2018 18:57:32 -0700 Subject: [PATCH 1/8] Fixed front page again --- docs/index.md | 36 ++++++++++++++++++++++++------------ docs/pubsub/storm-drpc.md | 3 +++ mkdocs.yml | 2 +- 3 files changed, 28 insertions(+), 13 deletions(-) diff --git a/docs/index.md b/docs/index.md index 32d8dc20..349150ad 100644 --- a/docs/index.md +++ b/docs/index.md @@ -20,7 +20,7 @@ * Big-data scale-tested - used in production at Yahoo and tested running 500+ queries simultaneously on up to 2,000,000 rps -# How is this useful +# How is Bullet useful How Bullet is used is largely determined by the data source it consumes. Depending on what kind of data you put Bullet on, the types of queries you run on it and your use-cases will change. As a look-forward query system with no persistence, you will not be able to repeat your queries on the same data. The next time you run your query, it will operate on the different data that arrives after that submission. If this usage pattern is what you need and you are looking for a light-weight system that can tap into your streaming data, then Bullet is for you! @@ -40,15 +40,15 @@ This instance of Bullet also powers other use-cases such as letting analysts val See [Quick Start](quick-start/bullet-on-spark.md) to set up Bullet locally using spark-streaming. You will generate some synthetic streaming data that you can then query with Bullet. -# Setting up Bullet on your streaming data +# Setup Bullet on your streaming data To set up Bullet on a real data stream, you need: -1. To setup the Bullet Backend on a stream processing framework. Currently, we support [Bullet on Storm](backend/storm-setup.md): +1. To setup the Bullet Backend on a stream processing framework. Currently, we support [Bullet on Storm](backend/storm-setup.md) and [Bullet on Spark](backend/spark-setup.md). 1. Plug in your source of data. See [Getting your data into Bullet](backend/ingestion.md) for details 2. Consume your data stream 2. The [Web Service](ws/setup.md) set up to convey queries and return results back from the backend -3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka.md) on any Backend and [Storm DRPC](pubsub/storm-drpc.md) for the Storm Backend. +3. To choose a [PubSub implementation](pubsub/architecture.md) that connects the Web Service and the Backend. We currently support [Kafka](pubsub/kafka.md) and a [REST PubSub](pubsub/rest.md) on any Backend and [Storm DRPC](pubsub/storm-drpc.md) for the Storm Backend. 4. The optional [UI](ui/setup.md) set up to talk to your Web Service. You can skip the UI if all your access is programmatic !!! note "Schema in the UI" @@ -59,9 +59,9 @@ To set up Bullet on a real data stream, you need: # Querying in Bullet -Bullet queries allow you to filter, project and aggregate data. It lets you fetch raw (the individual data records) as well as aggregated data. +Bullet queries allow you to filter, project and aggregate data. You can also specify a window to get incremental results. Bullet lets you fetch raw (the individual data records) as well as aggregated data. -* See the [UI Usage section](ui/usage.md) for using the UI to build Bullet queries. This is the same UI you will build in the [Quick Start](quick-start.md) +* See the [UI Usage section](ui/usage.md) for using the UI to build Bullet queries. This is the same UI you will build in the [Quick Start](quick-start/bullet-on-spark.md) * See the [API section](ws/api.md) for building Bullet API queries @@ -111,6 +111,16 @@ Currently we support ```GROUP``` aggregations with the following operations: | MAX | Returns the maximum of the non-null values in the provided field for all the elements in the group | | AVG | Computes the average of the non-null values in the provided field for all the elements in the group | +## Windows + +Windows in a Bullet query allow you to specify how often you'd like Bullet to return results. + +For example, you could launch a query for 2 minutes, and have Bullet return a COUNT DISTINCT on a particular field every 3 seconds: + +![Time-Based Tumbling Windows](../img/time-based-tumbling.png) + +See documentation on [the Web Service API](ws/api.md) for more info. + # Results The Bullet Web Service returns your query result as well as associated metadata information in a structured JSON format. The UI can display the results in different formats. @@ -145,17 +155,19 @@ The Bullet Backend can be split into three main conceptual sub-systems: 2. Data Processor - reads data from a input stream, converts it to an unified data format and matches it against queries 3. Combiner - combines results for different queries, performs final aggregations and returns results -The core of Bullet querying is not tied to the Backend and lives in a core library. This allows you implement the flow shown above in any stream processor you like. We are currently working on Bullet on [Spark Streaming](https://spark.apache.org/streaming). +The core of Bullet querying is not tied to the Backend and lives in a core library. This allows you implement the flow shown above in any stream processor you like. -## PubSub +Implementations of [Bullet on Storm](backend/storm-architecture.md) and [Bullet on Spark](backend/spark-architecture.md) are currently supported. -The PubSub is responsible for transmitting queries from the API to the Backend and returning results back from the Backend to the clients. It decouples whatever particular Backend you are using with the API. We currently provide a PubSub implementation using Kafka as the transport layer. You can very easily [implement your own](pubsub/architecture.md#implementing-your-own-pubsub) by defining a few interfaces that we provide. +## PubSub -In the case of Bullet on Storm, there is an [additional simplified option](pubsub/storm-drpc.md) using [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as the PubSub. This layer is planned to only support a request-response model for querying in the future. +The PubSub is responsible for transmitting queries from the API to the Backend and returning results back from the Backend to the clients. It decouples whatever particular Backend you are using with the API. +We currently support two different PubSub implementation: -!!! note "DRPC PubSub" +* [Kafka](pubsub/kafka.md) +* [REST](pubsub/rest.md) - This was how Bullet was first implemented in Storm. Storm DRPC provided a really simple way to communicate with Storm that we took advantage of. We provide this as a legacy adapter or for users who use Storm but don't want a PubSub layer. +You can also very easily [implement your own](pubsub/architecture.md#implementing-your-own-pubsub) by defining a few interfaces that we provide. ## Web Service and UI diff --git a/docs/pubsub/storm-drpc.md b/docs/pubsub/storm-drpc.md index 7300f591..939fde3f 100644 --- a/docs/pubsub/storm-drpc.md +++ b/docs/pubsub/storm-drpc.md @@ -1,5 +1,8 @@ # Storm DRPC PubSub +!!! note "NOTE: This PubSub only works with old versions of the Storm Backend!" + Since DRPC is part of Storm, and can only support a single query/response model, this PubSub implementation can only be used with the Storm backend, and cannot support Windowed queries (bullet-storm 0.8.0 and later). + Bullet on [Storm](https://storm.apache.org/) can use [Storm DRPC](http://storm.apache.org/releases/1.0.0/Distributed-RPC.html) as a PubSub layer. DRPC or Distributed Remote Procedure Call, is built into Storm and consists of a set of servers that are part of the Storm cluster. ## How does it work? diff --git a/mkdocs.yml b/mkdocs.yml index 395cf9e2..dbf3f964 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -44,7 +44,7 @@ markdown_extensions: extra: collapse_toc: true include_search: true - service_version: v0.4.0 + service_version: v0.5.0 extra_css: - css/extra.css From f0d73df733c632ab883b17895fe407ade30bf2fe Mon Sep 17 00:00:00 2001 From: Nathan Speidel Date: Wed, 20 Jun 2018 19:16:02 -0700 Subject: [PATCH 2/8] Updated pubsub architecture page --- docs/pubsub/architecture.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/pubsub/architecture.md b/docs/pubsub/architecture.md index bfe96a70..d9c848bb 100644 --- a/docs/pubsub/architecture.md +++ b/docs/pubsub/architecture.md @@ -4,11 +4,11 @@ This section describes how the Publish-Subscribe or [PubSub layer](../index.md#p ## Why a PubSub? -When we initially created Bullet, it was built on [Apache Storm](https://storm.apache.org) and leveraged a feature in it called [Storm DRPC](http://storm.apache.org/releases/1.0.3/Distributed-RPC.html) to deliver queries to and extract results from the Bullet Backend. Storm DRPC is supported by a set of clusters that are physically part of the Storm cluster and is a shared resource for the cluster. While many other stream processors support some form of RPC and we could support multiple versions of the Web Service for those, it quickly became clear that abstracting the transport layer from the Web Service to the Backend was needed. This was particularly highlighted when we wanted to switch Bullet queries from operating in a request-response model (one response at the end of the query) to a streaming model. Streaming responses back to the user for a query through DRPC would be cumbersome and require a lot of logic to handle. A PubSub system was a natural solution to this. Since DRPC was a shared resource per cluster, we also were [tying the Backend's scalability](../backend/storm-performance.md#test-4-improving-the-maximum-number-of-simultaneous-raw-queries) to a resource that we didn't control. +When we initially created Bullet, it was built on [Apache Storm](https://storm.apache.org) and leveraged a feature in it called Storm DRPC to deliver queries to and extract results from the Bullet Backend. Storm DRPC is supported by a set of clusters that are physically part of the Storm cluster and is a shared resource for the cluster. While many other stream processors support some form of RPC and we could support multiple versions of the Web Service for those, it quickly became clear that abstracting the transport layer from the Web Service to the Backend was needed. This was particularly highlighted when we wanted to switch Bullet queries from operating in a request-response model (one response at the end of the query) to a streaming model. Streaming responses back to the user for a query through DRPC would be cumbersome and require a lot of logic to handle. A PubSub system was a natural solution to this. Since DRPC was a shared resource per cluster, we also were [tying the Backend's scalability](../backend/storm-performance.md#test-4-improving-the-maximum-number-of-simultaneous-raw-queries) to a resource that we didn't control. However, we didn't want to pick a particular PubSub like Kafka and restrict a user's choice. So, we added a PubSub layer that was generic and entirely pluggable into both the Backend and the Web Service. We would support a select few like [Kafka](https://github.com/yahoo/bullet-kafka) or [Storm DRPC](https://github.com/yahoo/bullet-storm). See [below](#implementing-your-own-pubsub) for how to create your own. -With the transport mechanism abstracted out, it opens up a lot of possibilities like implementing Bullet on other stream processors ([Apache Spark](https://spark.apache.org) is in the works) and adding streaming, incremental results, sharding and much more. +With the transport mechanism abstracted out, it opens up a lot of possibilities like implementing Bullet on other stream processors, allowing for the development of [Bullet on Spark](../backend/spark-architecture.md) along with other possible implementations in the future. ## What does it do? @@ -28,7 +28,8 @@ The PubSub layer does not deal with queries and results and just works on instan If you want to use an implementation already built, we currently support: 1. [Kafka](kafka.md#setup) for any Backend -2. [Storm DRPC](storm-drpc.md#setup) if you're using Bullet on Storm as your Backend +2. [REST](rest.md#setup) for any Backend +3. [Storm DRPC](storm-drpc.md#setup) if you're using Bullet on Storm as your Backend ## Implementing your own PubSub From 3c70c15aac21ac65b61f439cfe7036a60eae8626 Mon Sep 17 00:00:00 2001 From: Nathan Speidel Date: Wed, 20 Jun 2018 19:20:32 -0700 Subject: [PATCH 3/8] Fixed TOC in spark quickstart --- docs/quick-start/bullet-on-spark.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/quick-start/bullet-on-spark.md b/docs/quick-start/bullet-on-spark.md index 385d9083..e71ac5ed 100644 --- a/docs/quick-start/bullet-on-spark.md +++ b/docs/quick-start/bullet-on-spark.md @@ -13,8 +13,6 @@ At the end of this section, you will have: * You will need to be on an Unix-based system (Mac OS X, Ubuntu ...) with ```curl``` installed * You will need [JDK 8](http://www.oracle.com/technetwork/java/javase/downloads/index.html) installed -## To Install and Launch Bullet Locally: - ### Setup Kafka For this instance of Bullet we will use the kafka PubSub implementation found in [bullet-spark](https://github.com/bullet-db/bullet-spark). So we will first download and run Kafka, and setup a couple Kafka topics. @@ -180,7 +178,7 @@ Visit [http://localhost:8800](http://localhost:8800) to query your topology with If you access the UI from another machine than where your UI is actually running, you will need to edit ```config/env-settings.json```. Since the UI is a client-side app, the machine that your browser is running on will fetch the UI and attempt to use these settings to talk to the Web Service. Since they point to localhost by default, your browser will attempt to connect there and fail. An easy fix is to change ```localhost``` in your env-settings.json to point to the host name where you will hosting the UI. This will be the same as the UI host you use in the browser. You can also do a local port forward on the machine accessing the UI by running: ```ssh -N -L 8800:localhost:8800 -L 9999:localhost:9999 hostname-of-the-quickstart-components 2>&1``` -## Congratulations!! Bullet is all setup! +### Congratulations!! Bullet is all setup! #### Playing around with the instance: From e0b49a90c66f85e627cb1946e3872a65df6a9699 Mon Sep 17 00:00:00 2001 From: Nathan Speidel Date: Wed, 20 Jun 2018 19:31:54 -0700 Subject: [PATCH 4/8] Updated ingestion --- docs/backend/ingestion.md | 18 +++++++++++++----- 1 file changed, 13 insertions(+), 5 deletions(-) diff --git a/docs/backend/ingestion.md b/docs/backend/ingestion.md index d2e2d86c..6368bd7e 100644 --- a/docs/backend/ingestion.md +++ b/docs/backend/ingestion.md @@ -8,7 +8,13 @@ Bullet operates on a generic data container that it understands. In order to get ## Bullet Record -The Bullet Record is a serializable data container based on [Avro](http://avro.apache.org). It is typed and has a generic schema. You can refer to the [Avro Schema](https://github.com/yahoo/bullet-record/blob/master/src/main/avro/BulletAvro.avsc) file for details if you wish to see the internals of the data model. The Bullet Record is also lazy and only deserializes itself when you try to read something from it. So, you can pass it around before sending to Bullet with minimal cost. Partial deserialization is being considered if performance is key. This will let you deserialize a much narrower chunk of the Record if you are just looking for a couple of fields. +The Bullet backend processes data that must be stored in a [Bullet Record](https://github.com/bullet-db/bullet-record/blob/master/src/main/java/com/yahoo/bullet/record/BulletRecord.java) which is an abstract Java class that can +be implemented as to be optimized for different backends or use-cases. + +There are currently two concrete implementations of BulletRecord: + +1. [SimpleBulletRecord](https://github.com/bullet-db/bullet-record/blob/master/src/main/java/com/yahoo/bullet/record/SimpleBulletRecord.java) which is based on a simple Java HashMap +2. [AvroBulletRecord](https://github.com/bullet-db/bullet-record/blob/master/src/main/java/com/yahoo/bullet/record/AvroBulletRecord.java) which uses [Avro](http://avro.apache.org) for serialization ## Types @@ -17,9 +23,11 @@ Data placed into a Bullet Record is strongly typed. We support these types curre ### Primitives 1. Boolean -2. Long -3. Double -4. String +2. Integer +3. Long +4. Float +5. Double +6. String ### Complex @@ -31,7 +39,7 @@ With these types, it is unlikely you would have data that cannot be represented ## Installing the Record directly -Generally, you depend on the Bullet Core artifact for your Stream Processor when you plug in the piece that gets your data into the Stream processor. The Bullet Core artifact already brings in the Bullet Record container as well. See the usage for the [Storm](storm-setup.md#installation) for an example. +Generally, you depend on the Bullet Core artifact for your Stream Processor when you plug in the piece that gets your data into the Stream processor. The Bullet Core artifact already brings in the Bullet Record containers as well. See the usage for the [Storm](storm-setup.md#installation) for an example. However, if you need it, the artifacts are available through JCenter to depend on them in code directly. You will need to add the repository. Below is a Maven example: From 46eed117b1b856ad6a3e700313dc07684d1c824f Mon Sep 17 00:00:00 2001 From: Nathan Speidel Date: Wed, 20 Jun 2018 19:49:08 -0700 Subject: [PATCH 5/8] Fixed api --- docs/ws/api.md | 26 +++++++++++--------------- 1 file changed, 11 insertions(+), 15 deletions(-) diff --git a/docs/ws/api.md b/docs/ws/api.md index 75352553..ece75b85 100644 --- a/docs/ws/api.md +++ b/docs/ws/api.md @@ -284,7 +284,7 @@ Note that the ```K``` in ```TOP K``` is specified using the ```size``` field in The "window" field is **optional** and allows you to instruct Bullet to return incremental results. For example you might want to return the COUNT of a field and return that count every 2 seconds. -If "window" is ommitted Bullet will emit only a single result at the very end of the query. +If "window" is omitted Bullet will emit only a single result at the very end of the query. An example window might look like this: @@ -293,22 +293,18 @@ An example window might look like this: "include": { "type": "TIME/RECORD/ALL", "first": 5000 } }, ``` -* The __emit__ field is used to specify when a window should be emmitted and the current results sent back to the user - * The __type__ subfield for "emit" can have two values: - * __"TIME"__ specifies that the window will emit after a specific number of milliseconds - * __"RECORD"__ specifies that the window will emit after consuming a specific number of records - * The __every__ subfield for "emit" specifies how many records/milliseconds (depending on "type") will be counted before the window is emmitted -* The __include__ field is used to specify what will be included in the emmitted window - * The __type__ subfield for "include" can have three values: - * __"TIME"__ specifies that the window will include all records seen in a certain time period in the window - * e.g. All records seen in the first 2 seconds of a 10 second window - * __"RECORD"__ specifies that the window will include the first n records, where n is specified in the "first" field below - * __"ALL"__ specifies that the window will include ALL results accumulated since the very beginning of the __query__ (not just this window) - * the __first__ subfield for "include" specifies the number of records/milliseconds at the beginning of this window to include in the emmitted result - it should be ommitted if "type" is "ALL". +| Field | SubField | Meaning | +| ------- | --------- | ------- | +| emit | | This object specifies when a window should be emitted and the current results sent back to the user | +| | type | Must be "TIME" or "RECORD" - specifying if the window should be emitted after X number of milliseconds, or X number of records | +| | every | The number of milliseconds or records (determined by "type" above) that will be contained in the emitted window | +| include | | This object specifies what will be included in the emitted window | +| | type | Must be "TIME", "RECORD" or "ALL" - specifying if the window should include X number of milliseconds, X number of records, or all results since the beginning of the whole query | +| | first | Specifies the number of records/milliseconds at the beginning of this window to include in the emitted result - it should be omitted if "type" is "ALL" | **NOTE: Not all windowing types are supported at this time.** -### **Currently Bullet supports the following window types**: +**Currently Bullet supports the following window types**: * Time-Based Tumbling Windows * Additive Tumbling Windows @@ -368,7 +364,7 @@ The above example would be specified with the window: #### **No Window** -If the "window" field is optional. If it is ommitted, the query will only emit when the entire query is finished. +If the "window" field is optional. If it is omitted, the query will only emit when the entire query is finished. ## Results From fc05cbbbbe280271939d7340c60b91c7213d5ecf Mon Sep 17 00:00:00 2001 From: Nathan Speidel Date: Wed, 20 Jun 2018 19:57:37 -0700 Subject: [PATCH 6/8] Fixed examples page --- docs/ws/examples.md | 628 ++++++++++++++++++++++---------------------- 1 file changed, 308 insertions(+), 320 deletions(-) diff --git a/docs/ws/examples.md b/docs/ws/examples.md index 45869d74..b49f4e34 100644 --- a/docs/ws/examples.md +++ b/docs/ws/examples.md @@ -1441,34 +1441,33 @@ This query specifies a tumbling window that will emit every 5 seconds and contai the user will receive a total of 4 results. Since the aggregation size is set to 5, each returned window will contain only 5 groups (which will be chosen randomly). The result might look like this: ```javascript -data:{ - "records":[ - { - "country":"Germany", - "count":1, - "averageAge":25.0 - }, - { - "country":"Canada", - "count":106, - "averageAge":22.58490566037736 - }, - { - "country":"USA", - "count":1, - "averageAge":28.0 - }, - { - "country":"England", - "count":8, - "averageAge":34.25 - }, - { - "country":"Peru", - "count":9, - "averageAge":30.0 - } - ], +"records":[ + { + "country":"Germany", + "count":1, + "averageAge":25.0 + }, + { + "country":"Canada", + "count":106, + "averageAge":22.58490566037736 + }, + { + "country":"USA", + "count":1, + "averageAge":28.0 + }, + { + "country":"England", + "count":8, + "averageAge":34.25 + }, + { + "country":"Peru", + "count":9, + "averageAge":30.0 + } +], "meta":{ "Window":{ "Number":1, @@ -1504,191 +1503,187 @@ data:{ } -data:{ - "records":[ - { - "country":"Canada", - "count":101, - "averageAge":32.742574257425744 - }, - { - "country":"ht", - "count":2, - "averageAge":32.0 - }, - { - "country":"England", - "count":16, - "averageAge":27.0625 - }, - { - "country":"Peru", - "count":8, - "averageAge":23.625 - }, - { - "country":"Bangladesh", - "count":3, - "averageAge":27.66666666666667 - } - ], - "meta":{ - "Window":{ - "Number":2, - "Emit Time":1529458408036, - "Expected Emit Time":1529458408023, - "Name":"Tumbling" - }, - "Query":{ - "ID":"448d228a-1eed-471f-8777-c800cc866535", - "Receive Time":1529458398023, - "Body":"...(query body)..." - }, - "Sketch":{ - "Was Estimated":false, - "Uniques Estimate":98.0, - "Family":"TUPLE", - "Theta":1.0, - "Standard Deviations":{ - "1":{ - "upperBound":98.0, - "lowerBound":98.0 - }, - "2":{ - "upperBound":98.0, - "lowerBound":98.0 - }, - "3":{ - "upperBound":98.0, - "lowerBound":98.0 - } +"records":[ + { + "country":"Canada", + "count":101, + "averageAge":32.742574257425744 + }, + { + "country":"ht", + "count":2, + "averageAge":32.0 + }, + { + "country":"England", + "count":16, + "averageAge":27.0625 + }, + { + "country":"Peru", + "count":8, + "averageAge":23.625 + }, + { + "country":"Bangladesh", + "count":3, + "averageAge":27.66666666666667 + } +], +"meta":{ + "Window":{ + "Number":2, + "Emit Time":1529458408036, + "Expected Emit Time":1529458408023, + "Name":"Tumbling" + }, + "Query":{ + "ID":"448d228a-1eed-471f-8777-c800cc866535", + "Receive Time":1529458398023, + "Body":"...(query body)..." + }, + "Sketch":{ + "Was Estimated":false, + "Uniques Estimate":98.0, + "Family":"TUPLE", + "Theta":1.0, + "Standard Deviations":{ + "1":{ + "upperBound":98.0, + "lowerBound":98.0 + }, + "2":{ + "upperBound":98.0, + "lowerBound":98.0 + }, + "3":{ + "upperBound":98.0, + "lowerBound":98.0 } } } } -data:{ - "records":[ - { - "country":"Canada", - "count":121, - "averageAge":27.97520661157025 - }, - { - "country":"Haiti", - "count":3, - "averageAge":39.0 - }, - { - "country":"Cabuyao laguna", - "count":2, - "averageAge":28.0 - }, - { - "country":"USA", - "count":1, - "averageAge":20.0 - }, - { - "country":"England", - "count":23, - "averageAge":40.869565217391305 - } - ], - "meta":{ - "Window":{ - "Number":3, - "Emit Time":1529458413031, - "Expected Emit Time":1529458413023, - "Name":"Tumbling" - }, - "Query":{ - "ID":"448d228a-1eed-471f-8777-c800cc866535", - "Receive Time":1529458398023, - "Body":"...(query body)..." - }, - "Sketch":{ - "Was Estimated":false, - "Uniques Estimate":104.0, - "Family":"TUPLE", - "Theta":1.0, - "Standard Deviations":{ - "1":{ - "upperBound":104.0, - "lowerBound":104.0 - }, - "2":{ - "upperBound":104.0, - "lowerBound":104.0 - }, - "3":{ - "upperBound":104.0, - "lowerBound":104.0 - } + +"records":[ + { + "country":"Canada", + "count":121, + "averageAge":27.97520661157025 + }, + { + "country":"Haiti", + "count":3, + "averageAge":39.0 + }, + { + "country":"Cabuyao laguna", + "count":2, + "averageAge":28.0 + }, + { + "country":"USA", + "count":1, + "averageAge":20.0 + }, + { + "country":"England", + "count":23, + "averageAge":40.869565217391305 + } +], +"meta":{ + "Window":{ + "Number":3, + "Emit Time":1529458413031, + "Expected Emit Time":1529458413023, + "Name":"Tumbling" + }, + "Query":{ + "ID":"448d228a-1eed-471f-8777-c800cc866535", + "Receive Time":1529458398023, + "Body":"...(query body)..." + }, + "Sketch":{ + "Was Estimated":false, + "Uniques Estimate":104.0, + "Family":"TUPLE", + "Theta":1.0, + "Standard Deviations":{ + "1":{ + "upperBound":104.0, + "lowerBound":104.0 + }, + "2":{ + "upperBound":104.0, + "lowerBound":104.0 + }, + "3":{ + "upperBound":104.0, + "lowerBound":104.0 } } } } -data:{ - "records":[ - { - "country":"Canada", - "count":117, - "averageAge":21.82051282051282 - }, - { - "country":"Azerbaijan", - "count":1, - "averageAge":30.0 - }, - { - "country":"England", - "count":13, - "averageAge":30.923076923076923 - }, - { - "country":"Congo", - "count":1, - "averageAge":32.0 - }, - { - "country":"Bangladesh", - "count":3, - "averageAge":24.333333333333336 - } - ], - "meta":{ - "Window":{ - "Number":4, - "Emit Time":1529458418030, - "Expected Emit Time":1529458418023, - "Name":"Tumbling" - }, - "Query":{ - "Finish Time":1529458418030, - "ID":"448d228a-1eed-471f-8777-c800cc866535", - "Receive Time":1529458398023, - "Body":"...(query body)..." - }, - "Sketch":{ - "Was Estimated":false, - "Uniques Estimate":108.0, - "Family":"TUPLE", - "Theta":1.0, - "Standard Deviations":{ - "1":{ - "upperBound":108.0, - "lowerBound":108.0 - }, - "2":{ - "upperBound":108.0, - "lowerBound":108.0 - }, - "3":{ - "upperBound":108.0, - "lowerBound":108.0 - } + +"records":[ + { + "country":"Canada", + "count":117, + "averageAge":21.82051282051282 + }, + { + "country":"Azerbaijan", + "count":1, + "averageAge":30.0 + }, + { + "country":"England", + "count":13, + "averageAge":30.923076923076923 + }, + { + "country":"Congo", + "count":1, + "averageAge":32.0 + }, + { + "country":"Bangladesh", + "count":3, + "averageAge":24.333333333333336 + } +], +"meta":{ + "Window":{ + "Number":4, + "Emit Time":1529458418030, + "Expected Emit Time":1529458418023, + "Name":"Tumbling" + }, + "Query":{ + "Finish Time":1529458418030, + "ID":"448d228a-1eed-471f-8777-c800cc866535", + "Receive Time":1529458398023, + "Body":"...(query body)..." + }, + "Sketch":{ + "Was Estimated":false, + "Uniques Estimate":108.0, + "Family":"TUPLE", + "Theta":1.0, + "Standard Deviations":{ + "1":{ + "upperBound":108.0, + "lowerBound":108.0 + }, + "2":{ + "upperBound":108.0, + "lowerBound":108.0 + }, + "3":{ + "upperBound":108.0, + "lowerBound":108.0 } } } @@ -1742,92 +1737,88 @@ data:{ The above query will run for 20 seconds and emit a result every 5 seconds. The result will contain the average age and the count of the records seen since the very beginning of the query. Results might look like this: ```javascript -data:{ - "records":[ - { - "count":8493, - "averageAge":28.8828796983622 - } - ], - "meta":{ - "Window":{ - "Number":1, - "Emit Time":1529522392188, - "Expected Emit Time":1529522392089, - "Name":"Tumbling" - }, - "Query":{ - "ID":"12e48fbd-a20f-4f5e-8135-0f012d9ba3ef", - "Receive Time":1529522387089, - "Body":"...(query body)..." - } +"records":[ + { + "count":8493, + "averageAge":28.8828796983622 + } +], +"meta":{ + "Window":{ + "Number":1, + "Emit Time":1529522392188, + "Expected Emit Time":1529522392089, + "Name":"Tumbling" + }, + "Query":{ + "ID":"12e48fbd-a20f-4f5e-8135-0f012d9ba3ef", + "Receive Time":1529522387089, + "Body":"...(query body)..." } } -data:{ - "records":[ - { - "count":17580, - "averageAge":29.842629482071715 - } - ], - "meta":{ - "Window":{ - "Number":2, - "Emit Time":1529522397191, - "Expected Emit Time":1529522397089, - "Name":"Tumbling" - }, - "Query":{ - "ID":"12e48fbd-a20f-4f5e-8135-0f012d9ba3ef", - "Receive Time":1529522387089, - "Body":"...(query body)..." - } + +"records":[ + { + "count":17580, + "averageAge":29.842629482071715 + } +], +"meta":{ + "Window":{ + "Number":2, + "Emit Time":1529522397191, + "Expected Emit Time":1529522397089, + "Name":"Tumbling" + }, + "Query":{ + "ID":"12e48fbd-a20f-4f5e-8135-0f012d9ba3ef", + "Receive Time":1529522387089, + "Body":"...(query body)..." } } -data:{ - "records":[ - { - "count":26317, - "averageAge":29.86675792835957 - } - ], - "meta":{ - "Window":{ - "Number":3, - "Emit Time":1529522402185, - "Expected Emit Time":1529522402089, - "Name":"Tumbling" - }, - "Query":{ - "ID":"12e48fbd-a20f-4f5e-8135-0f012d9ba3ef", - "Receive Time":1529522387089, - "Body":"...(query body)..." - } + +"records":[ + { + "count":26317, + "averageAge":29.86675792835957 + } +], +"meta":{ + "Window":{ + "Number":3, + "Emit Time":1529522402185, + "Expected Emit Time":1529522402089, + "Name":"Tumbling" + }, + "Query":{ + "ID":"12e48fbd-a20f-4f5e-8135-0f012d9ba3ef", + "Receive Time":1529522387089, + "Body":"...(query body)..." } } -data:{ - "records":[ - { - "count":35259, - "averageAge":29.8303102557552 - } - ], - "meta":{ - "Window":{ - "Number":4, - "Emit Time":1529522407182, - "Expected Emit Time":1529522407089, - "Name":"Tumbling" - }, - "Query":{ - "Finish Time":1529522407182, - "ID":"12e48fbd-a20f-4f5e-8135-0f012d9ba3ef", - "Receive Time":1529522387089, - "Body":"...(query body)..." - } + + +"records":[ + { + "count":35259, + "averageAge":29.8303102557552 + } +], +"meta":{ + "Window":{ + "Number":4, + "Emit Time":1529522407182, + "Expected Emit Time":1529522407089, + "Name":"Tumbling" + }, + "Query":{ + "Finish Time":1529522407182, + "ID":"12e48fbd-a20f-4f5e-8135-0f012d9ba3ef", + "Receive Time":1529522387089, + "Body":"...(query body)..." } } ``` @@ -1878,49 +1869,46 @@ This is a query that will capture raw data, and has a sliding window of size 1. will only match records from a particular browser. The query will run for 20 seconds, and the results might look like this: ```javascript -data:{ - "records":[ - { - "country":"USA", - "event":"page", - "browser-id":"2siknmdd6kaqm" - } - ], - "meta":{ - "Window":{ - "Number":1, - "Size":1, - "Emit Time":1529521479235, - "Name":"Sliding" - }, - "Query":{ - "ID":"31d65a12-ed56-4cc8-81ec-6a8bfe9301ba", - "Receive Time":1529521475015, - "Body":"...(query body)... " - } +"records":[ + { + "country":"USA", + "event":"page", + "browser-id":"2siknmdd6kaqm" + } +], +"meta":{ + "Window":{ + "Number":1, + "Size":1, + "Emit Time":1529521479235, + "Name":"Sliding" + }, + "Query":{ + "ID":"31d65a12-ed56-4cc8-81ec-6a8bfe9301ba", + "Receive Time":1529521475015, + "Body":"...(query body)... " } } -data:{ - "records":[ - { - "country":"USA", - "event":"click", - "browser-id":"2siknmdd6kaqm" - } - ], - "meta":{ - "Window":{ - "Number":6, - "Size":1, - "Emit Time":1529521764875, - "Name":"Sliding" - }, - "Query":{ - "ID":"e9595eb4-ea95-418b-8cff-d00736bf216f", - "Receive Time":1529521757459, - "Body":"...(query body)..." - } + +"records":[ + { + "country":"USA", + "event":"click", + "browser-id":"2siknmdd6kaqm" + } +], +"meta":{ + "Window":{ + "Number":6, + "Size":1, + "Emit Time":1529521764875, + "Name":"Sliding" + }, + "Query":{ + "ID":"e9595eb4-ea95-418b-8cff-d00736bf216f", + "Receive Time":1529521757459, + "Body":"...(query body)..." } } From ec100eda79a86f25486f2c9e448101f960fcbb24 Mon Sep 17 00:00:00 2001 From: Nathan Speidel Date: Wed, 20 Jun 2018 20:29:11 -0700 Subject: [PATCH 7/8] Updated releases --- docs/releases.md | 26 ++++++++++++++++++++++---- 1 file changed, 22 insertions(+), 4 deletions(-) diff --git a/docs/releases.md b/docs/releases.md index 82318b87..640476cd 100644 --- a/docs/releases.md +++ b/docs/releases.md @@ -24,8 +24,14 @@ The core Bullet logic (a library) that can be used to implement Bullet on differ ### Releases -| Date | Release | Highlights | -| ------------ | --------------------------------------------------------------------------------- | ---------- | +| Date | Release | Highlights | +| ------------ | ------------------------------------------------------------------------------------- | ---------- | +| 2018-06-18 | [**0.4.0**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.4.0) | Added support for Integer and Float data types, and configurable BulletRecordProvider class used to instantiate BulletRecords in bullet-core | +| 2018-04-11 | [**0.3.4**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.3.4) | Pre-Start delaying and Buffering changes - queries are now buffered at the start of a query instead of start of each window | +| 2018-03-30 | [**0.3.3**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.3.3) | Bug fix for com.yahoo.bullet.core.querying.Querier#isClosedForPartition | +| 2018-03-20 | [**0.3.2**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.3.2) | Added headers to RESTPubSub http requests | +| 2018-03-16 | [**0.3.1**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.3.1) | Added RESTPubSub implementation | +| 2018-02-22 | [**0.3.0**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.3.0) | Supports windowing / incremental updates | | 2017-10-04 | [**0.2.5**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.2.5) | Supports an in-memory BufferingSubscriber implementation for reliable subscribing | | 2017-10-03 | [**0.2.4**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.2.4) | Helpers added to Config, PubSubMessage, Metadata and JSONFormatter. FAIL signal in Metadata. PubSubMessage is JSON serializable | | 2017-09-20 | [**0.2.3**](https://github.com/bullet-db/bullet-core/releases/tag/bullet-core-0.2.3) | PubSub is no longer required to be Serializable. Makes PubSubMessage fully serializable. Utility classes and checked exceptions for PubSub | @@ -60,6 +66,11 @@ The implementation of Bullet on Storm. Due to major API changes between Storm <= | Date | Storm 1.0 | Storm 0.10 | Highlights | | ------------ | ---------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------- | ---------- | +| 2018-06-18 | [**0.8.3**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.8.3) | [**0.8.3**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.10-0.8.3) | Using new bullet-record and bullet-core supporting Integer and Float data types | +| 2018-04-12 | [**0.8.2**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.8.2) | [**0.8.2**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.10-0.8.2) | Delaying query start in Join Bolt | +| 2018-04-04 | [**0.8.1**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.8.1) | [**0.8.1**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.10-0.8.1) | Fixed bug in JoinBolt | +| 2018-03-30 | [**0.8.0**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.8.0) | [**0.8.0**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.10-0.8.0) | Supports windowing / incremental updates | +| 2017-11-07 | [**0.7.0**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.7.0) | [**0.7.0**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.10-0.7.0) | Merge Query and Metadata Streams | | 2017-10-24 | [**0.6.2**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.6.2) | [**0.6.2**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.10-0.6.2) | Adds a fat jar for using the DRPC PubSub in the Web Service | | 2017-10-18 | [**0.6.1**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.6.1) | [**0.6.1**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.10-0.6.1) | DRPC PubSub | | 2017-08-30 | [**0.6.0**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.6.0) | [**0.6.0**](https://github.com/bullet-db/bullet-storm/releases/tag/bullet-storm-0.10-0.6.0) | New PubSub architecture, removes DRPC components and settings | @@ -90,6 +101,7 @@ The implementation of Bullet on Spark Streaming. | Date | Release | Highlights | | ------------ | --------------------------------------------------------------------------------- | ---------- | +| 2018-06-18 | [**0.1.2**](https://github.com/bullet-db/bullet-spark/releases/tag/bullet-spark-0.1.2) | Uses SimpleBulletRecord to avoid some Spark serialization issues with Avro | | 2018-06-08 | [**0.1.1**](https://github.com/bullet-db/bullet-spark/releases/tag/bullet-spark-0.1.1) | Adds a command flag to pass custom setting file | | 2018-05-25 | [**0.1.0**](https://github.com/bullet-db/bullet-spark/releases/tag/bullet-spark-0.1.0) | The first release | @@ -113,6 +125,9 @@ The Web Service implementation that can serve a static schema from a file and ta | Date | Release | Highlights | | ------------ | -------------------------------------------------------------------------------------- | ---------- | +| 2018-06-14 | [**0.2.2**](https://github.com/bullet-db/bullet-service/releases/tag/bullet-service-0.2.2) | Addding settings to configure websocket | +| 2018-04-02 | [**0.2.1**](https://github.com/bullet-db/bullet-service/releases/tag/bullet-service-0.2.1) | Moved and renamed settings | +| 2018-03-30 | [**0.2.0**](https://github.com/bullet-db/bullet-service/releases/tag/bullet-service-0.2.0) | Supporting windowing / incremental updates | | 2017-10-19 | [**0.1.1**](https://github.com/bullet-db/bullet-service/releases/tag/bullet-service-0.1.1) | New PubSub architecture. Switching to Spring Boot and executable JAR instead of WAR | | 2016-12-16 | [**0.0.1**](https://github.com/bullet-db/bullet-service/releases/tag/bullet-service-0.0.1) | The first release with support for DRPC and the file-based schema | @@ -135,6 +150,7 @@ The Bullet UI that lets you build, run, save and visualize results from Bullet. | Date | Release | Highlights | | ------------ | -------------------------------------------------------------------------------------- | ---------- | +| 2018-06-18 | [**0.5.0**](https://github.com/bullet-db/bullet-ui/releases/tag/v0.5.0) | Supports windowing, uses IndexedDB and Ember 3! | | 2017-08-22 | [**0.4.0**](https://github.com/bullet-db/bullet-ui/releases/tag/v0.4.0) | Query sharing, collapsible Raw view, and unsaved/error indicators. Settings rename and other bug fixes| | 2017-05-22 | [**0.3.2**](https://github.com/bullet-db/bullet-ui/releases/tag/v0.3.2) | Exporting to TSV in Pivot table. Fixes unselectability bug in Raw view | | 2017-05-15 | [**0.3.1**](https://github.com/bullet-db/bullet-ui/releases/tag/v0.3.1) | Adds styles to the Pivot table. Fixes some minor UI interactions | @@ -160,6 +176,7 @@ The AVRO container that you need to convert your data into to be consumed by Bul | Date | Release | Highlights | | ------------ | ------------------------------------------------------------------------------------ | ---------- | +| 2018-06-14 | [**0.2.0**](https://github.com/bullet-db/bullet-record/releases/tag/bullet-record-0.2.0) | Makes BulletRecord pluggable, adds simple record and avro record implementations | | 2017-05-19 | [**0.1.2**](https://github.com/bullet-db/bullet-record/releases/tag/bullet-record-0.1.2) | Reduces the memory footprint needed to serialize itself by a factor of 128 for small records | | 2017-04-17 | [**0.1.1**](https://github.com/bullet-db/bullet-record/releases/tag/bullet-record-0.1.1) | Helper methods to remove, rename, check presence and count fields in the Record | | 2017-02-09 | [**0.1.0**](https://github.com/bullet-db/bullet-record/releases/tag/bullet-record-0.1.0) | Map constructor | @@ -170,8 +187,8 @@ A PubSub implementation using Kafka as the backing PubSub. Can be used with any | | | | -------------------------- | --------------- | -| **Repository** | [https://github.com/bullet-db/bullet-record](https://github.com/bullet-db/bullet-kafka) | -| **Issues** | [https://github.com/bullet-db/bullet-record/issues](https://github.com/bullet-db/bullet-kafka/issues) | +| **Repository** | [https://github.com/bullet-db/bullet-kafka](https://github.com/bullet-db/bullet-kafka) | +| **Issues** | [https://github.com/bullet-db/bullet-kafka/issues](https://github.com/bullet-db/bullet-kafka/issues) | | **Last Tag** | [![Latest tag](https://img.shields.io/github/release/bullet-db/bullet-kafka/all.svg)](https://github.com/bullet-db/bullet-kafka/tags) | | **Latest Artifact** | [![Download](https://api.bintray.com/packages/yahoo/maven/bullet-kafka/images/download.svg)](https://bintray.com/yahoo/maven/bullet-kafka/_latestVersion) | | **Package Manager Setup** | [Setup for Maven, Gradle etc](https://bintray.com/bintray/jcenter?filterByPkgName=bullet-kafka) | @@ -180,6 +197,7 @@ A PubSub implementation using Kafka as the backing PubSub. Can be used with any | Date | Release | Highlights | | ------------ | ------------------------------------------------------------------------------------ | ---------- | +| 2018-02-27 | [**0.3.0**](https://github.com/bullet-db/bullet-kafka/releases/tag/bullet-kafka-0.3.0) | Uses bullet-core-0.3.0 - windows / incremental updates | | 2017-10-19 | [**0.2.0**](https://github.com/bullet-db/bullet-kafka/releases/tag/bullet-kafka-0.2.0) | Refactors and re-releases. Pass-through settings to Kafka. Manual offset committing bug fix | | 2017-09-27 | [**0.1.2**](https://github.com/bullet-db/bullet-kafka/releases/tag/bullet-kafka-0.1.2) | Fixes a bug with config loading | | 2017-09-22 | [**0.1.1**](https://github.com/bullet-db/bullet-kafka/releases/tag/bullet-kafka-0.1.1) | First release using the PubSub interfaces | From 15edc7b31c821a4f37776995f330d2a4b89b688c Mon Sep 17 00:00:00 2001 From: Nathan Speidel Date: Wed, 20 Jun 2018 20:31:57 -0700 Subject: [PATCH 8/8] Fixed contributing --- docs/about/contributing.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/docs/about/contributing.md b/docs/about/contributing.md index 012b25fb..e9656a5a 100644 --- a/docs/about/contributing.md +++ b/docs/about/contributing.md @@ -16,13 +16,8 @@ This list is neither comprehensive nor in any particular order. | Feature | Components | Description | Status | |-------------------- | ----------- | ------------------------- | ------------- | -| Incremental updates | BE, WS, UI | Push results back to users during the query lifetime. Micro-batching, windowing and other features need to be implemented | In Progress | -| Bullet on Spark | BE | Implement Bullet on Spark Streaming. Compared with SQL on Spark Streaming which stores data in memory, Bullet will be light-weight | In Progress | | Security | WS, UI | The obvious enterprise security for locking down access to the data and the instance of Bullet. Considering SSL, Kerberos, LDAP etc. Ideally, without a database | Planning | -| In-Memory PubSub | PubSub | For users who don't want a PubSub like Kafka, we could add REST based in-memory PubSub layer that runs in the WS. The backend will then communicate directly with the WS | Planning | -| LocalForage | UI | Migration the UI to LocalForage to distance ourselves from the relatively small LocalStorage space | [#9](https://github.com/yahoo/bullet-ui/issues/9) | | Bullet on X | BE | With the pub/sub feature, Bullet can be implemented on other Stream Processors like Flink, Kafka Streaming, Samza etc | Open | | Bullet on Beam | BE | Bullet can be implemented on [Apache Beam](https://beam.apache.org) as an alternative to implementing it on various Stream Processors | Open | -| SQL API | BE, WS | WS supports an endpoint that converts a SQL-like query into Bullet queries | Open | +| SQL API | BE, WS | WS supports an endpoint that converts a SQL-like query into Bullet queries | In Progress | | Packaging | UI, BE, WS | Github releases and building from source are the only two options for the UI. Docker images or the like for quick setup and to mix and match various pluggable components would be really useful | Open | -| Spring Boot Reactor | WS | Migrate the Web Service to use Spring Boot reactor instead of servlet containers | Open |