AWS S3 Kamelet (source) - fetch only meta-data #372

rhuss · 2021-06-14T14:38:11Z

If I understand correctly, the S3 Kamelet as a source always fetches the whole data, puts it as binary into the cloud event, and deletes it (by default). So it's mostly suited for kind of an 'inbox' pattern.

But what's about leveraging the source for postprocessing of certain (maybe not all files) that is triggered after certain S3 lifecycle events (create/update/list/delete). Also, files can get huge, transmitting them all the time over the network might not be efficient.

What I really would love to see is a source that generates CloudEvents only with metadata around the S3 action, including an URL maybe to retrieve the data, but not automatically adding this data to the event. Also, it really should only 'monitor' the bucket, not modifying it (e.g. not deleting it). I guess this would be really possible it one would hook into the AWS event system and do not polling of the bucket itself (but I'm not an AWS expert here).

wdyt ?

oscerd · 2021-06-14T14:41:26Z

One possibility is avoid the body of the S3 Object through the includeBody option:

"If it is true, the S3Object exchange will be consumed and put into the body and closed. If false the S3Object stream will be put raw into the body and the headers will be set with the S3 object metadata. This option is strongly related to autocloseBody option. In case of setting includeBody to true because the S3Object stream will be consumed then it will also be closed in case of includeBody false then it will be up to the caller to close the S3Object stream. However setting autocloseBody to true when includeBody is false it will schedule to close the S3Object stream automatically on exchange completion."

It is true by default, so we can provide it as option and you can set it to false.

rhuss · 2021-06-14T14:48:05Z

That sounds legit. And also maybe copy over the meta-data into a JSON payload, maybe including an access URL (or at least the AWS specific coordinate), so that a consumer could pick the file on their own if needed.

And tbh, I think that would be a good and safe default mode, too (imagine people using the InMemoryChannel with 50GB image tar ;-)

As I don't think the CE can contain a stream (but @slinkydeveloper, I might be wrong), closing the stream immediately might be useful, too, then.

oscerd · 2021-06-14T14:52:13Z

Well, not all the users will have to use Knative. You can bind the source to an FTP, or to Dropbox. So it shouldn't be the default mode, it should be configurable.

Also we have a producer operation in s3 component to generate a remote URL to download the S3 Object.

To transform the payload into a JSON there is an action for this purpose, which you may use as a step between source and sink. By the way I will provide an example if I have the time.

oscerd · 2021-06-14T15:46:48Z

You may also think about leveraging Eventbridge. You can target a particular bucket and you'll push push this event somewhere (SQS, SNS or pass them to Lambda), then you can consume the events. So in theory you could do something like:

Create an Eventbridge entity with a particular event pattern listed in it
Send the eventbridge events to an SQS Queue
Subscribe and consume from the SQS queue
Looking at the payload and extract Key and bucket Name
Invoke a Camel AWS S3 createDownloadLink as a middle step
An URL will be provided as body and you could consume it on Knative side

oscerd · 2021-06-14T15:48:00Z

I'll add the includeBody option by the way

oscerd · 2021-06-15T11:44:09Z

I added the includeBody. I'm going to add an example too and I'll link it here.

oscerd · 2021-06-15T13:41:39Z

There is still some work to do for this. Because the includeBody to false will add the body anyway but raw.

oscerd mentioned this issue Jun 15, 2021

AWS S3 Source Kamelet add includeBody Option #375

Merged

oscerd mentioned this issue Jul 20, 2021

AWS S3 Kamelet (source) - fetch only meta-data #434

Merged

oscerd closed this as completed in #434 Jul 20, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS S3 Kamelet (source) - fetch only meta-data #372

AWS S3 Kamelet (source) - fetch only meta-data #372

rhuss commented Jun 14, 2021

oscerd commented Jun 14, 2021

rhuss commented Jun 14, 2021

oscerd commented Jun 14, 2021

oscerd commented Jun 14, 2021

oscerd commented Jun 14, 2021

oscerd commented Jun 15, 2021

oscerd commented Jun 15, 2021

AWS S3 Kamelet (source) - fetch only meta-data #372

AWS S3 Kamelet (source) - fetch only meta-data #372

Comments

rhuss commented Jun 14, 2021

oscerd commented Jun 14, 2021

rhuss commented Jun 14, 2021

oscerd commented Jun 14, 2021

oscerd commented Jun 14, 2021

oscerd commented Jun 14, 2021

oscerd commented Jun 15, 2021

oscerd commented Jun 15, 2021