Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BigchainDB Events API specification #862

Closed
2 tasks done
TimDaub opened this issue Nov 24, 2016 · 6 comments
Closed
2 tasks done

BigchainDB Events API specification #862

TimDaub opened this issue Nov 24, 2016 · 6 comments
Assignees

Comments

@TimDaub
Copy link
Contributor

TimDaub commented Nov 24, 2016

Definition

This project is deliberately called "Event APIs" as we didn't want to introduce a tool-driven bias (I'd consider "Websocket API" a biased way of naming it e.g.). What we mean by Event APIs are interfaces between

  • a single node in BigchainDB; or
  • multiple nodes in BigchainDB (e.g. in a cluster)

communicating with an external entity (two-way communication not a hard requirement). An external entity is defined as not being in any ways associated/connected with the entities running the multiple or single node(s).

Examples of such interfaces that were already mentioned:

  • Communication primitives like WebSockets
  • Pub-Sub patterns like Pubsubhubnub
  • Federated event distribution systems (e.g. Jabber?)
  • Intra-cluster-communication protocols of provisioned database (MongoDB, RethinkDB)
  • (Please extend this list if you have more examples)

Problem statement

For given actions that happen on a single or multiple nodes in a BigchainDB cluster, we'd like to PUSH information about that action in the form of an event to a client (assumed they've registered upfront stating that they'd like to receive that information).

Collected user stories so far

This is a list of stories we've collected from users already. They're ordered by priority. If you know from users that are interested in a specific aspect, feel free to update this list without changing the prioritization. If you're a user and you're currently reading this ticket, feel free to comment below to let us know what's important to you.

Confirmed stories by actual users of BigchainDB (high priority)

  1. As a BigchainDB/IPDB client I want to be notified for all transactions currently being persisted that follow a "specific schema"1 so that I can process them centrally in my backend, but have them submitted in a decentralized fashion by my users without my involvement.
  2. As a BigchainDB/IPDB client I want to be notified when and why (provided with a reason) the transactions I've sent get discarded by (a) node(s) so that I can construct them validly next time.
  3. As a BigchainDB/IPDB client I want to be notified when transactions for a specific asset occur so that I can monitor the asset throughout its life.

Suggested stories by us (no confirmation for actual users YET; not mandatory for the completion of this project; lower priority)

  1. As a BigchainDB/IPDB client I want to be notified when transactions occur that include a specific challenge to be resolved3.
  2. As a BigchainDB/IPDB client I want to be notified when transactions I've sent have arrived in a nodes backlog.
  3. As a BigchainDB/IPDB client I want to be notified when transactions I've sent get included into a block.
    1. As a BigchainDB/IPDB client I want to be notified when a transaction I've sent get included into a valid2block.
  4. As a BigchainDB/IPDB client I want to be notified when a new block gets into the bigchain table
  5. As a BigchainDB/IPDB client I want to be notified when a block is considered valid by a majority of the votes
  6. As a BigchainDB/IPDB client I want to be notified when a node casts a vote on a block.
  7. (there is and will be more here. Feel free to add...)

Deliverables

  • A list of tasks for the short term (for the implementation part of this project)
  • A proposal in the form of a specification and/or presentation outlining where we should be going with this in the next 1-3 and 3-6 months.

See #1086.

Ideas from the kick-off meeting

  • Explore available communications primitives (e.g. WebSockets)
  • Explore different pub-sub architectures (e.g. Pubsubhubnub)
  • Explore federated event APIs (e.g. Jabber)
  • Explore how this project relates to the HTTP API (implementation currently in process) and also what stories can be extracted from the driver
  • Explore "MongoDB-follower"-idea and how it could (or if it should) be implemented in the future
  • (there is more, please add)

Glossary

  1. "Specific schema": The solution space is open here, a few examples that come immediately to mind:

The 'hashtag' approach:
- Tag a transaction with a string
- Allow a client to retrieve all transactions being written into "bigchain", that match that specific string

The asset type approach:

  • For a certain asset type (matched by the asset id?), all future transactions can be pushed to a client.

and so on.

  1. "valid block", meaning that as a user I only want to be notified when > 50% of the votes are in?

  2. "Specific challenge": Mainly thinking about specific cryptoconditions to occur on the chain (e.g. monitor all hashlock conditions).

@diminator
Copy link
Contributor

As a BigchainDB client user I want to get notified upon the latest transaction of a specific asset, typically this asset would be a stream of data (from devices/sensors)

@TimDaub TimDaub changed the title [Collecting use cases] BigchainDB Websocket API [Collecting use cases] BigchainDB Events API Jan 9, 2017
@sohkai
Copy link
Contributor

sohkai commented Jan 10, 2017

Some notes:

As a BigchainDB/IPDB client I want to be notified for all transactions currently being persisted that follow a "specific schema"

As I think you noted, this is really concerned with querying transaction schemas and may be difficult to address without more thought on that matter. Not sure what you mean by "asset type" (policies?), but the "hashtag" approach is a good way of by-passing any custom document indexing: users could add an array of strings to tag a transaction and query for them via that later (probably indexable as well?).

Explore how this project relates to the HTTP API

Some interaction with @libscott provided some ideas (to be fleshed out) of how we could reuse some ideas for making the transaction list API performant for this events API.

@krish7919
Copy link
Contributor

I can divide the problem into 2 subsets (as discussed with Tim yesterday):

  1. Notification of events at the cluster level
  • These are events that are synced with the cluster (ex. a create asset). One way to look at is that we can have a distributed, centralized pub-sub mechanism to solve this problem by having a module listen to the change-feed/oplog from the database and publish this feed to a subscribed user.
  • This process maintains a list of filters describing interesting events as specified for the user.
  • The pre-requisite is that the user subscribes to events s/he is interested in.
  • The centralization question is tackled as even though the subscriber/publisher modules are centralized, the source of the events is decentralized (ipdb/bdb), and hence, we can consider the events to be decentralized too.
  1. Notification of events at the node level:
  • This is a more complicated situations where we want every event occurring on a node - cluster level and node level.
  • This can probably solved by getting the bigchaindb processes to talk with each other and publish its individual feeds with others. The cluster-level feeds are already synced, via the change-feed.
  • This will involve each node getting events of every other node, which can then be filtered and presented to the subscribed user. The sub/pub is still a centralized system as proposed in the above option.
  • Issues to consider while going down this path will be building a bigchaindb cluster protocol (say, using flatbuffers and/or gRPC) which sits above the database protocol; security, efficient filtering, etc..

@TimDaub TimDaub changed the title [Collecting use cases] BigchainDB Events API BigchainDB Events API Jan 11, 2017
@sohkai
Copy link
Contributor

sohkai commented Jan 23, 2017

(For posterity)

In discussion with @TimDaub and @vrde, we've narrowed down our main use case and a few desired properties for the Event API.

Main use case

Server to (client) server data sync

The Events API is designed to sync (filtered) data from a BigchainDB federation to an end-client's servers for processing, look up, or association (with their own data).

Desired properties

  • Bidirectional communication: Optional; a client of the Events API can use the REST API for data transfer to a BigchainDB federation. Bidirectional communication could, for example, be used for giving subscription interests or RPC in-channel.
  • Traffic: Lots; potentially hundreds of thousands of valid transactions per second may be processed by a BigchainDB federation in the future, so we need something performant and lightweight
  • Reliability: As reliable as possible; as clients may be dependent on the Events API for processing, and in general for data sync applications, reliability is a key feature. However, event ordering is optional depending on the type of event (e.g. dependent valid transactions' order can be reconstructed). Missed event retrieval is optional, although highly desirable.

Another aspect to consider is the scope of an implementation. In the near term, the amount of custom, client-side software should be limited. If we can piggyback on existing primitives, architectures, or protocols, we should do so until we begin hurting (e.g. in scale).

A further requirement is the ability to rate-limit and monitor the Events API, given that IPDB (and likely other federations) will need to bill usage. An implementation would ideally play well with existing solutions, like 3scale.


Design choices

Transport

What type of communication primitive?

  • Webhooks
  • HTTP long polling
  • Server sent events
  • WebSockets
  • Custom protocols

Subscription interest filtering

To what detail can a client subscribe for events? What types of filtering rules are available on the total set of data (i.e. all transactions, blocks, and votes processing)?

  • No filtering
  • Based on event type (e.g. valid transactions, new vote)
  • Advanced filtering (e.g. asset schema, metadata schema, specific condition details)

Subscription management

How does the server manage client subscriptions? How would it work within a federated model?

  • No management (requires no filtering, above)
  • Per-connection, per-federation-node: each node manages its own connections, and each connection would be stateful
  • Per-connection, per-federation: the entire federation would manage all of its connections, likely requiring a reverse proxy (that could deal with decentralized result sets)
  • Via external intermediary / proxy not involved in federation voting

Event payload

What does the payload for events look like?

  • Full-sized (e.g. a full transaction)
  • Limited (e.g. only a transaction id)
  • Hint (requires ability to retrieve events; e.g. tell client there's new valid transactions)

Event processing

How does the federation process events and broadcast them to interested parties?

  • No processing (requires no filtering, above)
  • Based on event type (see filtering)
  • Per-connection (see subscription management): each event that occurs would be checked against all subscribers to determine if that subscriber is interested

Event retrieval

How could a client retrieve missed events?

  • None: forces clients to query all of BigchainDB for missed events
  • Built into BigchainDB: an internal mechanism could be added to cache events (e.g. a table or separate database)
  • Via external intermediary that caches events (e.g. via a read-only replica set connected to the BigchainDB instance)

Future topics, not in scope for now:

  • Decentralized storage of subscriptions (e.g. as assets / tokens)
  • Advanced filtering for clients

@krish7919
Copy link
Contributor

@sohkai Have we decided on what are we going ahead with as of now?

@TimDaub TimDaub changed the title BigchainDB Events API BigchainDB Events API specification Jan 24, 2017
This was referenced Feb 16, 2017
@TimDaub
Copy link
Contributor Author

TimDaub commented Feb 17, 2017

Done in #1086.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants