Document how an user can create their own feed to create triggers from external service. #559

Closed
csantanapr opened this Issue Jun 3, 2016 · 21 comments

Comments

Projects
None yet
7 participants
@csantanapr
Contributor

csantanapr commented Jun 3, 2016

cc @sjfink @rabbah

Outline what are the steps to create a feed and then from the feed create triggers

The feed will be a running service that the user will be responsible, we will use a nodejs server app as example on how to achieve this, but any programing language can be use.

@sjfink

This comment has been minimized.

Show comment
Hide comment
@sjfink

sjfink Jun 3, 2016

Contributor

some initial notes coming -- very rough descriptions. I will break this over several comments

1. Feed Architecture Choices

There are at least 3 strategies for creating a feed: "Hooks", "Polling" and "Connections".

Hooks

"Hooks" means we set up a feed using a webhook facility from another service. In this strategy, we configure a webhook on a service to POST directly to a whisk URL to fire a trigger. This is by far the easiest and most attractive option for implementing low-frequency feeds.

  • The github feed is implemented using webhooks.

Polling

"Polling" means that we arrange a whisk action to poll an endpoint periodically to fetch new data.
This is easy to build, but is low performance, and is limited by the polling interval.

  • I have a prototype of a MessageHub feed using polling.

Connections

"Connections" means that we stand up a separate service somewhere that maintains a persistent connection to a feed source. The connection based implementation might interact with a service endpoint via long polling, or to set up a push notification.

  • Our cloudant changes feed is connection based. We are working on a high-performance MessageHub connection-based feed using Kafka consumers.
Contributor

sjfink commented Jun 3, 2016

some initial notes coming -- very rough descriptions. I will break this over several comments

1. Feed Architecture Choices

There are at least 3 strategies for creating a feed: "Hooks", "Polling" and "Connections".

Hooks

"Hooks" means we set up a feed using a webhook facility from another service. In this strategy, we configure a webhook on a service to POST directly to a whisk URL to fire a trigger. This is by far the easiest and most attractive option for implementing low-frequency feeds.

  • The github feed is implemented using webhooks.

Polling

"Polling" means that we arrange a whisk action to poll an endpoint periodically to fetch new data.
This is easy to build, but is low performance, and is limited by the polling interval.

  • I have a prototype of a MessageHub feed using polling.

Connections

"Connections" means that we stand up a separate service somewhere that maintains a persistent connection to a feed source. The connection based implementation might interact with a service endpoint via long polling, or to set up a push notification.

  • Our cloudant changes feed is connection based. We are working on a high-performance MessageHub connection-based feed using Kafka consumers.
@sjfink

This comment has been minimized.

Show comment
Hide comment
@sjfink

sjfink Jun 3, 2016

Contributor

2. Difference between Feed and Trigger

Some definitions:

  • Whisk processes events which flow into the system.
  • A trigger is simply a name for a class of events. Each event belongs to exactly one trigger. A good analogy is "topic" in pub-sub world. A Rule T -> A means "whenever an event from trigger T arrives, invoke action A with the trigger payload.
  • A feed is a stream of events which all belong to some trigger T. A feed is controlled by a feed action which handles creating, deleting, pausing, and resuming the stream of events which comprise a feed. The feed action typically interacts with external services which produce the events, via a REST API that manages notifications.
Contributor

sjfink commented Jun 3, 2016

2. Difference between Feed and Trigger

Some definitions:

  • Whisk processes events which flow into the system.
  • A trigger is simply a name for a class of events. Each event belongs to exactly one trigger. A good analogy is "topic" in pub-sub world. A Rule T -> A means "whenever an event from trigger T arrives, invoke action A with the trigger payload.
  • A feed is a stream of events which all belong to some trigger T. A feed is controlled by a feed action which handles creating, deleting, pausing, and resuming the stream of events which comprise a feed. The feed action typically interacts with external services which produce the events, via a REST API that manages notifications.
@sjfink

This comment has been minimized.

Show comment
Hide comment
@sjfink

sjfink Jun 3, 2016

Contributor

3. Implementing Feed Actions

The feed action is a normal OpenWhisk action, but it should accept the following parameters:

  • lifecycleEvent: one of 'CREATE', 'DELETE', 'PAUSE', or 'UNPAUSE'
  • triggerName: the fully-qualified name of the trigger which contains events produced from this feed.
  • authKey: the Basic auth credentials of the OpenWhisk user who owns the trigger just mentioned

The feed action can also accept any other parameters it needs to manage the feed. For example the cloudant changes feed action expects to receive parameters including 'dbname', 'username', etc.

When the user creates a trigger with the --feed parameter, the system automatically invokes the feed action with the appropriate parameters.

For example,assume the user has created a mycloudant binding for cloudant with their username and password as bound parameters. Then when the user issues:

wsk trigger create T --feed mycloudant/changes -p dbName myTable,

then under the covers the system will do something equivalent to:
wsk action invoke mycloudant/changes -p lifecycleEvent CREATE -p triggerName T -p authKey <userAuthKey> -p password <password value from mycloudant binding> -p username <username value from mycloudant binding> -p dbName mytype

The feed action takes these parameters, and is expected to take whatever action is necesssary to set up a stream of events from cloudant, with the appropriate configuration, directed to the trigger T. For cloudant, the action happens to talk directly to a cloudanttrigger service we've implemented with a connection-based architecture. We'll discuss the other architectures below.

A similar feed action protocol occurs for wsk trigger delete. We have not yet implemented pause and unpause, but they will be similar.

Contributor

sjfink commented Jun 3, 2016

3. Implementing Feed Actions

The feed action is a normal OpenWhisk action, but it should accept the following parameters:

  • lifecycleEvent: one of 'CREATE', 'DELETE', 'PAUSE', or 'UNPAUSE'
  • triggerName: the fully-qualified name of the trigger which contains events produced from this feed.
  • authKey: the Basic auth credentials of the OpenWhisk user who owns the trigger just mentioned

The feed action can also accept any other parameters it needs to manage the feed. For example the cloudant changes feed action expects to receive parameters including 'dbname', 'username', etc.

When the user creates a trigger with the --feed parameter, the system automatically invokes the feed action with the appropriate parameters.

For example,assume the user has created a mycloudant binding for cloudant with their username and password as bound parameters. Then when the user issues:

wsk trigger create T --feed mycloudant/changes -p dbName myTable,

then under the covers the system will do something equivalent to:
wsk action invoke mycloudant/changes -p lifecycleEvent CREATE -p triggerName T -p authKey <userAuthKey> -p password <password value from mycloudant binding> -p username <username value from mycloudant binding> -p dbName mytype

The feed action takes these parameters, and is expected to take whatever action is necesssary to set up a stream of events from cloudant, with the appropriate configuration, directed to the trigger T. For cloudant, the action happens to talk directly to a cloudanttrigger service we've implemented with a connection-based architecture. We'll discuss the other architectures below.

A similar feed action protocol occurs for wsk trigger delete. We have not yet implemented pause and unpause, but they will be similar.

@mbehrendt

This comment has been minimized.

Show comment
Hide comment
@mbehrendt

mbehrendt Jun 3, 2016

i think it'll be very important to include a description of how event data is going to flow into whisk via triggers. as far as i see, we often need a long-running process between whisk and the event source, whereas this lrp receives the event data (as a result of the feed-based trigger creation), and then translates the data into wsk trigger fire calls.

i think it'll be very important to include a description of how event data is going to flow into whisk via triggers. as far as i see, we often need a long-running process between whisk and the event source, whereas this lrp receives the event data (as a result of the feed-based trigger creation), and then translates the data into wsk trigger fire calls.

@sjfink

This comment has been minimized.

Show comment
Hide comment
@sjfink

sjfink Jun 3, 2016

Contributor

(Moving to briefer text now to get the initial outline in before discussion starts)

4. Implementing Feeds with hooks.

Setting up a feed via a hook is by far the easiest way to start, and should be the recommended way to encourage users to create feeds (according to SJF. System MessageHub devotees often argue differently --but SJF does not believe Kafka consumers can ever be as easy as webhooks, and SJF believes we should stress simple UX above all else).

With this method there is no need to stand up any persistent service outside of whisk. All feed management happens naturally though stateless whisk actions.

When invoked with CREATE the feed action simply installs a webhook for some other service, asking the remote service to POST notifications to the appropriate fireTrigger URL in whisk.

The webhook should be directed to send notifications to a URL such as:
POST /namespaces/{namespace}/triggers/{triggerName}

Contributor

sjfink commented Jun 3, 2016

(Moving to briefer text now to get the initial outline in before discussion starts)

4. Implementing Feeds with hooks.

Setting up a feed via a hook is by far the easiest way to start, and should be the recommended way to encourage users to create feeds (according to SJF. System MessageHub devotees often argue differently --but SJF does not believe Kafka consumers can ever be as easy as webhooks, and SJF believes we should stress simple UX above all else).

With this method there is no need to stand up any persistent service outside of whisk. All feed management happens naturally though stateless whisk actions.

When invoked with CREATE the feed action simply installs a webhook for some other service, asking the remote service to POST notifications to the appropriate fireTrigger URL in whisk.

The webhook should be directed to send notifications to a URL such as:
POST /namespaces/{namespace}/triggers/{triggerName}

@sjfink

This comment has been minimized.

Show comment
Hide comment
@sjfink

sjfink Jun 3, 2016

Contributor

5. Implementing Feeds with polling.

It is possible to set up an action to poll a feed source entirely within whisk, without the need to stand up any persistent connections or external service.

For feeds where a webhook is not available, but do not need high-volume or really quick response times, polling is an attractive option and should be recommended.

To set up a polling-based feed, the feed action takes the following steps when called for CREATE:

  1. The feed action sets up a periodic trigger (T) with the desired frequency, using the whisk.system/alarms feed. (meta!)
  2. The feed developer creates a `pollMyService' action which simply polls the remote service and returns any new events.
  3. The feed action sets up a rule T -> pollMyService.

That's it. We've implemented a polling-based trigger entirely using whisk actions, without any need for a separate service.

Contributor

sjfink commented Jun 3, 2016

5. Implementing Feeds with polling.

It is possible to set up an action to poll a feed source entirely within whisk, without the need to stand up any persistent connections or external service.

For feeds where a webhook is not available, but do not need high-volume or really quick response times, polling is an attractive option and should be recommended.

To set up a polling-based feed, the feed action takes the following steps when called for CREATE:

  1. The feed action sets up a periodic trigger (T) with the desired frequency, using the whisk.system/alarms feed. (meta!)
  2. The feed developer creates a `pollMyService' action which simply polls the remote service and returns any new events.
  3. The feed action sets up a rule T -> pollMyService.

That's it. We've implemented a polling-based trigger entirely using whisk actions, without any need for a separate service.

@mbehrendt

This comment has been minimized.

Show comment
Hide comment
@mbehrendt

mbehrendt Jun 3, 2016

With this method there is no need to stand up any persistent service outside of whisk.

i agree we should offer to have the ability to integrate as easy as possible with webhooks.

however, today that requires webhooks send their payload with an application/json msg type.

several webhooks also use other msg types, so need to document that constrain and open an issue to allow more msg types. also, we need to document the webhook based approach, as you did above (need to add auth information)

With this method there is no need to stand up any persistent service outside of whisk.

i agree we should offer to have the ability to integrate as easy as possible with webhooks.

however, today that requires webhooks send their payload with an application/json msg type.

several webhooks also use other msg types, so need to document that constrain and open an issue to allow more msg types. also, we need to document the webhook based approach, as you did above (need to add auth information)

@mbehrendt

This comment has been minimized.

Show comment
Hide comment
@mbehrendt

mbehrendt Jun 3, 2016

re the polling approach -- i think we'll have to work through in which context the action doing the polling is being executed. if it's done in the context of the user, he'll get charged for the resource consumption, which will come across as being weird, since the user shouldn't have to know that the impl of a feed is using actions and shouldn't have to pay for it.

re the polling approach -- i think we'll have to work through in which context the action doing the polling is being executed. if it's done in the context of the user, he'll get charged for the resource consumption, which will come across as being weird, since the user shouldn't have to know that the impl of a feed is using actions and shouldn't have to pay for it.

@sjfink

This comment has been minimized.

Show comment
Hide comment
@sjfink

sjfink Jun 3, 2016

Contributor

6. Implementing Feeds via Connections

The previous 2 methods are easy .. but if you want a high-performance feed, there is no substitute for persistent connections and long-polling or similar techniques.

Since OpenWhisk actions are stateless, right now there is no way keep a persistent connection open to a third party. So instead, we are forced to stand up a separate service (outside of whisk) that runs all the time. We call these provider services. A provider service can maintain connections to third party event sources that support long polling or other connection-based notifications.

The provider service should provide a REST API that allows the whisk feed action to control the feed. The provider service acts as a proxy between the event provider and whisk -- when it receives events from the third party, it sends them on to whisk by firing a trigger.

The cloudant built-in feed is the canonical example -- it stands up a cloudanttrigger service which mediates between cloudant notifications over a persistent connection, and whisk triggers.

The alarm feed is implemented with a similar pattern.

We will soon provide a MessageHub feed with a similar pattern.

For the moment, these provider services must POST events to whisk in order to fire triggers. Eventually, we will hook up MessageHub events directly into whisk to avoid the POST overhead.

The connection-based architecture is the highest performance option -- but it's far more difficult to operate and maintain than the polling and hook architectures. The provider service must be production-quality: it must be highly-available and fault-tolerant. Our current providers do not meet this requirement, and will need work to reach production quality standards.

Contributor

sjfink commented Jun 3, 2016

6. Implementing Feeds via Connections

The previous 2 methods are easy .. but if you want a high-performance feed, there is no substitute for persistent connections and long-polling or similar techniques.

Since OpenWhisk actions are stateless, right now there is no way keep a persistent connection open to a third party. So instead, we are forced to stand up a separate service (outside of whisk) that runs all the time. We call these provider services. A provider service can maintain connections to third party event sources that support long polling or other connection-based notifications.

The provider service should provide a REST API that allows the whisk feed action to control the feed. The provider service acts as a proxy between the event provider and whisk -- when it receives events from the third party, it sends them on to whisk by firing a trigger.

The cloudant built-in feed is the canonical example -- it stands up a cloudanttrigger service which mediates between cloudant notifications over a persistent connection, and whisk triggers.

The alarm feed is implemented with a similar pattern.

We will soon provide a MessageHub feed with a similar pattern.

For the moment, these provider services must POST events to whisk in order to fire triggers. Eventually, we will hook up MessageHub events directly into whisk to avoid the POST overhead.

The connection-based architecture is the highest performance option -- but it's far more difficult to operate and maintain than the polling and hook architectures. The provider service must be production-quality: it must be highly-available and fault-tolerant. Our current providers do not meet this requirement, and will need work to reach production quality standards.

@sjfink

This comment has been minimized.

Show comment
Hide comment
@sjfink

sjfink Jun 3, 2016

Contributor

7. Finished

Ok I'm done with the initial draft. @rabbah feel free to edit my comments directly if you like to clarify or improve.

@mbehrendt is correct that we will need to support other payload formats (not just Application/JSON) to expand the set of webhooks we can play with.

Contributor

sjfink commented Jun 3, 2016

7. Finished

Ok I'm done with the initial draft. @rabbah feel free to edit my comments directly if you like to clarify or improve.

@mbehrendt is correct that we will need to support other payload formats (not just Application/JSON) to expand the set of webhooks we can play with.

@mbehrendt

This comment has been minimized.

Show comment
Hide comment
@mbehrendt

mbehrendt Jun 3, 2016

thanks @sjfink -- excellent summary of all key information, thanks for putting this together so quickly.

thanks @sjfink -- excellent summary of all key information, thanks for putting this together so quickly.

@csantanapr

This comment has been minimized.

Show comment
Hide comment
@csantanapr

csantanapr Jun 3, 2016

Contributor

Thanks @sjfink you are a fast typer 😄
@mbehrendt @sjfink I was not sure if there are new issues to be created from the above comments about payload formats?
Should I create a new issue, and you guys can expand on what's the work need it?

Contributor

csantanapr commented Jun 3, 2016

Thanks @sjfink you are a fast typer 😄
@mbehrendt @sjfink I was not sure if there are new issues to be created from the above comments about payload formats?
Should I create a new issue, and you guys can expand on what's the work need it?

@mbehrendt

This comment has been minimized.

Show comment
Hide comment
@mbehrendt

mbehrendt Jun 3, 2016

@csantanapr i thin kit would be good to have a thread about payload format / scheme, would be great if you could open that up.

@csantanapr i thin kit would be good to have a thread about payload format / scheme, would be great if you could open that up.

@csantanapr

This comment has been minimized.

Show comment
Hide comment
@csantanapr

csantanapr Jun 3, 2016

Contributor

@mbehrendt @sjfink issue #567 created for payload formats

Contributor

csantanapr commented Jun 3, 2016

@mbehrendt @sjfink issue #567 created for payload formats

@jthomas

This comment has been minimized.

Show comment
Hide comment
@jthomas

jthomas Jun 15, 2016

Member

Since I've just been through the process of creating my own feed provider, here's my feedback...

  • Good section about the architectures, I hadn't thought about using the alarm package to handle polling. I spent most of my time digging through the existing catalogue feeds to understand how they work. Would be useful to include code samples with each of the architecture types to help users see how this works in reality. Either that or link directly to one of the catalogue samples that implements that pattern.
  • This section should include the commands needed to registered a feed, I had to find this from the setup scripts.
wsk action create -a feed true feed_name feed_action.js
Member

jthomas commented Jun 15, 2016

Since I've just been through the process of creating my own feed provider, here's my feedback...

  • Good section about the architectures, I hadn't thought about using the alarm package to handle polling. I spent most of my time digging through the existing catalogue feeds to understand how they work. Would be useful to include code samples with each of the architecture types to help users see how this works in reality. Either that or link directly to one of the catalogue samples that implements that pattern.
  • This section should include the commands needed to registered a feed, I had to find this from the setup scripts.
wsk action create -a feed true feed_name feed_action.js
@csantanapr

This comment has been minimized.

Show comment
Hide comment
@csantanapr

csantanapr Jun 15, 2016

Contributor

@jthomas good suggestion

Would be useful to include code samples with each of the architecture types to help users see how this works in reality. Either that or link directly to one of the catalogue samples that implements that pattern.

We have issues open to add to the catalog the cloudant/couchdb and alarm packages and we could use those as examples in the docs.

Contributor

csantanapr commented Jun 15, 2016

@jthomas good suggestion

Would be useful to include code samples with each of the architecture types to help users see how this works in reality. Either that or link directly to one of the catalogue samples that implements that pattern.

We have issues open to add to the catalog the cloudant/couchdb and alarm packages and we could use those as examples in the docs.

@skaegi

This comment has been minimized.

Show comment
Hide comment
@skaegi

skaegi Oct 4, 2016

Contributor

That last bit from @jthomas about annotating an action really needs to be better documented and perhaps given first class syntax.

-a feed true -- adds a feed annotation to the action and is required to register it as a feed provider

Contributor

skaegi commented Oct 4, 2016

That last bit from @jthomas about annotating an action really needs to be better documented and perhaps given first class syntax.

-a feed true -- adds a feed annotation to the action and is required to register it as a feed provider

@aarora91

This comment has been minimized.

Show comment
Hide comment
@aarora91

aarora91 Jan 5, 2017

I want to explore the third approach for implementing feeds-via connections. I have a Slack outgoing webhook which should hit an OW action's REST endpoint. Slack doesn't allow me to customize it's request header so I cannot pass in my OpenWhisk creds via Basic Auth so requests to OW get rejected. I might be misunderstanding this but I think a "connection" intermediate will help.

Your explanation is nice but would be great if you can provide some code samples or snippets at the very least as I don't know where to start.

Thanks in advance!

aarora91 commented Jan 5, 2017

I want to explore the third approach for implementing feeds-via connections. I have a Slack outgoing webhook which should hit an OW action's REST endpoint. Slack doesn't allow me to customize it's request header so I cannot pass in my OpenWhisk creds via Basic Auth so requests to OW get rejected. I might be misunderstanding this but I think a "connection" intermediate will help.

Your explanation is nice but would be great if you can provide some code samples or snippets at the very least as I don't know where to start.

Thanks in advance!

@jthomas

This comment has been minimized.

Show comment
Hide comment
@jthomas

jthomas Jan 6, 2017

Member

@aarora91 In theory you could use the API Gateway feature in OpenWhisk to configure a public endpoint which resolves this issue. Unfortunately at the moment the content format isn't supported by this service, see #1655

Member

jthomas commented Jan 6, 2017

@aarora91 In theory you could use the API Gateway feature in OpenWhisk to configure a public endpoint which resolves this issue. Unfortunately at the moment the content format isn't supported by this service, see #1655

@aarora91

This comment has been minimized.

Show comment
Hide comment
@aarora91

aarora91 Jan 6, 2017

Thanks @jthomas for the quick response. I see you too ran into the same problem when using Slack's outgoing webhooks. Hope we get this feature in Openwhisk.
Meanwhile, I did some digging and looks like API connect can be used for this-> https://www.youtube.com/watch?v=WP6D47KxSrs
The Bluemix GUI has changed since the video was made so I had some trouble navigating around the page.

aarora91 commented Jan 6, 2017

Thanks @jthomas for the quick response. I see you too ran into the same problem when using Slack's outgoing webhooks. Hope we get this feature in Openwhisk.
Meanwhile, I did some digging and looks like API connect can be used for this-> https://www.youtube.com/watch?v=WP6D47KxSrs
The Bluemix GUI has changed since the video was made so I had some trouble navigating around the page.

@markusthoemmes

This comment has been minimized.

Show comment
Hide comment
Contributor

markusthoemmes commented Mar 10, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment