Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 26 additions & 32 deletions docs/pubsub/rest.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,12 @@
# REST PubSub

The REST PubSub implementation is included in bullet-core, and can be launched along with the Web Service. If it is enabled the Web Service will expose two additional REST endpoints, one for reading/writing Bullet queries, and one
for reading/writing results.
The REST PubSub implementation is included in bullet-core, and can be launched along with the Web Service. If it is enabled the Web Service will expose two additional REST endpoints, one for reading/writing Bullet queries, and one for reading/writing results.

## How does it work?

When the Web Service receives a query from a user, it will create a PubSubMessage and write the message to the "query" RESTPubSub endpoint. This PubSubMessage will contain not only the query, but also some metadata, including the
appropriate host/port to which the response should be sent (this is done to allow for multiple Web Services running simultaneously). The query is then stored in memory until the backend does a GET from this endpoint, at which
time the query will be served to the backend, and dropped from the queue in memory.
When the Web Service receives a query from a user, it will create a PubSubMessage and write the message to the "query" RESTPubSub endpoint. This PubSubMessage will contain not only the query, but also some metadata, including the appropriate host/port to which the response should be sent (this is done to allow for multiple Web Services running simultaneously). The query is then stored in memory until the backend does a GET from this endpoint, at which time the query will be served to the backend, and dropped from the queue in memory.

Once the backed has generated the results of the query, it will wrap those results in PubSubMessage. The backend extracts the URL to send the results to from the metadata and writes the results PubSubMessage to the
"results" REST endpoint with a POST. This result will then be stored in memory until the Web Service does a GET to that endpoint, at which time the Web Service will have the results of the query to send back to the user.
Once the backed has generated the results of the query, it will wrap those results in PubSubMessage. The backend extracts the URL to send the results to from the metadata and writes the results PubSubMessage to the "results" REST endpoint with a POST. This result will then be stored in memory until the Web Service does a GET to that endpoint, at which time the Web Service will have the results of the query to send back to the user.

## Setup

Expand All @@ -20,13 +16,13 @@ To enable the RESTPubSub and expose the two additional necessary REST endpoints,
bullet.pubsub.builtin.rest.enabled: true
```

...in the Web Service Application.yaml file. This can also be done from the command line when launching the Web Service jar file by adding the command-line option:
...in the Web Service ```application.yaml``` configuration file. This can also be done from the command line when launching the Web Service jar file by adding the command-line option:

```bash
--bullet.pubsub.builtin.rest.enabled=true
```

This will enable the two necessary REST endpoints, the paths for which can be configured in the Application.yaml file with the settings:
This will enable the two necessary REST endpoints, the paths for which can be configured in the ```application.yaml``` file with the settings:

```yaml
bullet.pubsub.builtin.rest.query.path: /pubsub/query
Expand All @@ -39,46 +35,44 @@ Configure the backend to use the REST PubSub:

```yaml
bullet.pubsub.context.name: "QUERY_PROCESSING"
bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"

bullet.pubsub.class.name: "com.yahoo.bullet.pubsub.rest.RESTPubSub"
bullet.pubsub.rest.connect.timeout.ms: 5000
bullet.pubsub.rest.subscriber.max.uncommitted.messages: 100
bullet.pubsub.rest.result.subscriber.min.wait.ms: 10
bullet.pubsub.rest.query.subscriber.min.wait.ms: 10
bullet.pubsub.rest.query.urls:
- "http://webServiceHostNameA:9901/api/bullet/pubsub/query"
- "http://webServiceHostNameB:9902/api/bullet/pubsub/query"
- "http://<API_HOST_A>:9901/api/bullet/pubsub/query"
- "http://<API_HOST_B>:9901/api/bullet/pubsub/query"
```

* __bullet.pubsub.context.name: "QUERY_PROCESSING"__ - tells the PubSub that it is running in the backend
* __bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"__ - tells Bullet to use this class for it's PubSub
* __bullet.pubsub.rest.connect.timeout.ms: 5000__ - sets the HTTP connect timeout to a half second
* __bullet.pubsub.rest.subscriber.max.uncommitted.messages: 100__ - this is the maxiumum number of uncommitted messages allowed before blocking
* __bullet.pubsub.rest.query.subscriber.min.wait.ms: 10__ - this setting is used to avoid making an http request too rapidly and overloading the http endpoint. It will force the backend to poll the query endpoint at most once every 10ms.
* __bullet.pubsub.rest.query.urls__ - this should be a list of all the query rest enpoint URLs. If you are only running one Web Service this will only contain one url (the url of your Web Service followed by the full path of the query endpoint).
| Setting Name | Default Value | Meaning |
| ----------------------------------------------------------------- | --------------------------------------- | ---------------- |
| bullet.pubsub.context.name | QUERY_PROCESSING | Tells the PubSub that it is running in the backend |
| bullet.pubsub.class.name | com.yahoo.bullet.pubsub.rest.RESTPubSub | Tells Bullet to use this class for its PubSub |
| bullet.pubsub.rest.connect.timeout.ms | 5000 | Sets the HTTP connect timeout to 5 s |
| bullet.pubsub.rest.subscriber.max.uncommitted.messages | 100 | This is the maximum number of uncommitted messages allowed to be read by the subscriber before blocking |
| bullet.pubsub.rest.query.subscriber.min.wait.ms | 10 | This is used to avoid making an HTTP request too rapidly and overloading the HTTP endpoint. It will force the backend to poll the query endpoint at most once every 10ms |
| bullet.pubsub.rest.query.urls | <EXAMPLE DEFAULTS> | This should be a list of all the query REST endpoint URLs. If you are only running one Web Service this will only contain one URL (the URL of your Web Service followed by the full path of the query endpoint) |

### Plug into the Web Service

Configure the Web Service to use the REST PubSub:

```yaml
bullet.pubsub.context.name: "QUERY_SUBMISSION"
bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"

bullet.pubsub.class.name: "com.yahoo.bullet.pubsub.rest.RESTPubSub"
bullet.pubsub.rest.connect.timeout.ms: 5000
bullet.pubsub.rest.subscriber.max.uncommitted.messages: 100
bullet.pubsub.rest.result.subscriber.min.wait.ms: 10
bullet.pubsub.rest.query.subscriber.min.wait.ms: 10
bullet.pubsub.rest.result.url: "http://localhost:9901/api/bullet/pubsub/result"
bullet.pubsub.rest.query.urls:
- "http://localhost:9901/api/bullet/pubsub/query"
```

* __bullet.pubsub.context.name: "QUERY_SUBMISSION"__ - tells the PubSub that it is running in the Web Service
* __bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"__ - tells Bullet to use this class for it's PubSub
* __bullet.pubsub.rest.connect.timeout.ms: 5000__ - sets the HTTP connect timeout to a half second
* __bullet.pubsub.rest.subscriber.max.uncommitted.messages: 100__ - this is the maxiumum number of uncommitted messages allowed before blocking
* __bullet.pubsub.rest.query.subscriber.min.wait.ms: 10__ - this setting is used to avoid making an http request too rapidly and overloading the http endpoint. It will force the backend to poll the query endpoint at most once every 10ms.
* __bullet.pubsub.rest.result.url: "http://localhost:9901/api/bullet/pubsub/result"__ - this is the endpoint from which the WebService should read results - it should generally be the hostname of that machine the Web Service is running on (or "localhost").
* __bullet.pubsub.rest.query.urls__ - in the Web Service this setting should contain __exactly one__ url - the url to which queries should be written - it should generally be the hostname of that machine the Web Service is running on (or "localhost").

| Setting Name | Default Value | Meaning |
| ----------------------------------------------------------------- | ---------------------------------------------- | ---------------- |
| bullet.pubsub.context.name | QUERY_SUBMISSION | Tells the PubSub that it is running in the Web Service |
| bullet.pubsub.class.name | com.yahoo.bullet.pubsub.rest.RESTPubSub | Tells Bullet to use this class for its PubSub |
| bullet.pubsub.rest.connect.timeout.ms | 5000 | Sets the HTTP connect timeout to 5 s |
| bullet.pubsub.rest.subscriber.max.uncommitted.messages | 100 | This is the maximum number of uncommitted messages allowed to be read by the subscriber before blocking |
| bullet.pubsub.rest.result.subscriber.min.wait.ms | 10 | This is used to avoid making an HTTP request too rapidly and overloading the HTTP endpoint. It will force the Web Service to poll the query endpoint at most once every 10ms |
| bullet.pubsub.rest.result.url | http://localhost:9901/api/bullet/pubsub/result | This is the endpoint from which the Web Service should read results. This is the hostname of that machine the Web Service is running on (or ```localhost```) |
| bullet.pubsub.rest.query.urls | http://localhost:9901/api/bullet/pubsub/query | In the Web Service, this should contain *exactly one* URL (the URL to which queries should be written). This is the hostname of that machine the Web Service is running on (or ```localhost```) |
5 changes: 2 additions & 3 deletions docs/ui/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -147,7 +147,7 @@ This example gets the Top 3 most popular ```type``` values (there are only 6 but

### Approximate

By adding ```duration``` into the fields, the number of unique values for ```(type, duration)``` is increased. However, because ```duration``` has a tendency to have low values, we will have some *frequent items*. The counts are now estimated.
By adding ```duration``` into the fields, the number of unique values for ```(type, duration)``` is increased. However, because ```duration``` has a tendency to have low values, we will have some *frequent items*. The counts are now estimated.

<iframe width="900" height="508" src="https://www.youtube.com/embed/hCHWy229Yhw?autoplay=0&loop=0&playlist=hCHWy229Yhw" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>

Expand Down Expand Up @@ -183,7 +183,7 @@ In this example we compute bucket'ed frequency for the "gaussian" field. As the

If the regular chart option is insufficient for your result (for instance, you have too many groups and metrics or you want to post-aggregate your results or remove outliers etc), then there is a advanced Pivot mode available when you are in the Chart option.

The Pivot option provides a drag-and-drop interface to drag fields to breakdown and aggregate by their values. Operations such as finding standard deviations, variance, etc are available as well as easily viewing them as tables and charts.
The Pivot option provides a drag-and-drop interface to drag fields to breakdown and aggregate by their values. Operations such as finding standard deviations, variance, etc are available as well as easily viewing them as tables and charts.

The following example shows a ```Group``` query with multiple groups and metrics and some interactions with the Pivot table.

Expand All @@ -192,4 +192,3 @@ The following example shows a ```Group``` query with multiple groups and metrics
!!! note "Raw data does have a regular chart mode option"

This is deliberate since the Chart option tries to infer your independent and dependent columns. When you fetch raw data, this is prone to errors so only the Pivot option is allowed.