-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AWS S3 Kamelet (source) - fetch only meta-data #372
Comments
One possibility is avoid the body of the S3 Object through the includeBody option: "If it is true, the S3Object exchange will be consumed and put into the body and closed. If false the S3Object stream will be put raw into the body and the headers will be set with the S3 object metadata. This option is strongly related to autocloseBody option. In case of setting includeBody to true because the S3Object stream will be consumed then it will also be closed in case of includeBody false then it will be up to the caller to close the S3Object stream. However setting autocloseBody to true when includeBody is false it will schedule to close the S3Object stream automatically on exchange completion." It is true by default, so we can provide it as option and you can set it to false. |
That sounds legit. And also maybe copy over the meta-data into a JSON payload, maybe including an access URL (or at least the AWS specific coordinate), so that a consumer could pick the file on their own if needed. And tbh, I think that would be a good and safe default mode, too (imagine people using the InMemoryChannel with 50GB image tar ;-) As I don't think the CE can contain a stream (but @slinkydeveloper, I might be wrong), closing the stream immediately might be useful, too, then. |
Well, not all the users will have to use Knative. You can bind the source to an FTP, or to Dropbox. So it shouldn't be the default mode, it should be configurable. Also we have a producer operation in s3 component to generate a remote URL to download the S3 Object. To transform the payload into a JSON there is an action for this purpose, which you may use as a step between source and sink. By the way I will provide an example if I have the time. |
You may also think about leveraging Eventbridge. You can target a particular bucket and you'll push push this event somewhere (SQS, SNS or pass them to Lambda), then you can consume the events. So in theory you could do something like:
|
I'll add the includeBody option by the way |
I added the includeBody. I'm going to add an example too and I'll link it here. |
There is still some work to do for this. Because the includeBody to false will add the body anyway but raw. |
If I understand correctly, the S3 Kamelet as a source always fetches the whole data, puts it as binary into the cloud event, and deletes it (by default). So it's mostly suited for kind of an 'inbox' pattern.
But what's about leveraging the source for postprocessing of certain (maybe not all files) that is triggered after certain S3 lifecycle events (create/update/list/delete). Also, files can get huge, transmitting them all the time over the network might not be efficient.
What I really would love to see is a source that generates CloudEvents only with metadata around the S3 action, including an URL maybe to retrieve the data, but not automatically adding this data to the event. Also, it really should only 'monitor' the bucket, not modifying it (e.g. not deleting it). I guess this would be really possible it one would hook into the AWS event system and do not polling of the bucket itself (but I'm not an AWS expert here).
wdyt ?
The text was updated successfully, but these errors were encountered: