Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS S3 Kamelet (source) - fetch only meta-data #372

Closed
rhuss opened this issue Jun 14, 2021 · 7 comments · Fixed by #434
Closed

AWS S3 Kamelet (source) - fetch only meta-data #372

rhuss opened this issue Jun 14, 2021 · 7 comments · Fixed by #434

Comments

@rhuss
Copy link
Contributor

rhuss commented Jun 14, 2021

If I understand correctly, the S3 Kamelet as a source always fetches the whole data, puts it as binary into the cloud event, and deletes it (by default). So it's mostly suited for kind of an 'inbox' pattern.

But what's about leveraging the source for postprocessing of certain (maybe not all files) that is triggered after certain S3 lifecycle events (create/update/list/delete). Also, files can get huge, transmitting them all the time over the network might not be efficient.

What I really would love to see is a source that generates CloudEvents only with metadata around the S3 action, including an URL maybe to retrieve the data, but not automatically adding this data to the event. Also, it really should only 'monitor' the bucket, not modifying it (e.g. not deleting it). I guess this would be really possible it one would hook into the AWS event system and do not polling of the bucket itself (but I'm not an AWS expert here).

wdyt ?

@oscerd
Copy link
Contributor

oscerd commented Jun 14, 2021

One possibility is avoid the body of the S3 Object through the includeBody option:

"If it is true, the S3Object exchange will be consumed and put into the body and closed. If false the S3Object stream will be put raw into the body and the headers will be set with the S3 object metadata. This option is strongly related to autocloseBody option. In case of setting includeBody to true because the S3Object stream will be consumed then it will also be closed in case of includeBody false then it will be up to the caller to close the S3Object stream. However setting autocloseBody to true when includeBody is false it will schedule to close the S3Object stream automatically on exchange completion."

It is true by default, so we can provide it as option and you can set it to false.

@rhuss
Copy link
Contributor Author

rhuss commented Jun 14, 2021

That sounds legit. And also maybe copy over the meta-data into a JSON payload, maybe including an access URL (or at least the AWS specific coordinate), so that a consumer could pick the file on their own if needed.

And tbh, I think that would be a good and safe default mode, too (imagine people using the InMemoryChannel with 50GB image tar ;-)

As I don't think the CE can contain a stream (but @slinkydeveloper, I might be wrong), closing the stream immediately might be useful, too, then.

@oscerd
Copy link
Contributor

oscerd commented Jun 14, 2021

Well, not all the users will have to use Knative. You can bind the source to an FTP, or to Dropbox. So it shouldn't be the default mode, it should be configurable.

Also we have a producer operation in s3 component to generate a remote URL to download the S3 Object.

To transform the payload into a JSON there is an action for this purpose, which you may use as a step between source and sink. By the way I will provide an example if I have the time.

@oscerd
Copy link
Contributor

oscerd commented Jun 14, 2021

You may also think about leveraging Eventbridge. You can target a particular bucket and you'll push push this event somewhere (SQS, SNS or pass them to Lambda), then you can consume the events. So in theory you could do something like:

  • Create an Eventbridge entity with a particular event pattern listed in it
  • Send the eventbridge events to an SQS Queue
  • Subscribe and consume from the SQS queue
  • Looking at the payload and extract Key and bucket Name
  • Invoke a Camel AWS S3 createDownloadLink as a middle step
  • An URL will be provided as body and you could consume it on Knative side

@oscerd
Copy link
Contributor

oscerd commented Jun 14, 2021

I'll add the includeBody option by the way

@oscerd
Copy link
Contributor

oscerd commented Jun 15, 2021

I added the includeBody. I'm going to add an example too and I'll link it here.

@oscerd
Copy link
Contributor

oscerd commented Jun 15, 2021

There is still some work to do for this. Because the includeBody to false will add the body anyway but raw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants