Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a "newer than" timestamp to be specified for blob trigger #1327

Open
mathewc opened this issue Sep 6, 2017 · 18 comments
Open

Allow a "newer than" timestamp to be specified for blob trigger #1327

mathewc opened this issue Sep 6, 2017 · 18 comments
Milestone

Comments

@mathewc
Copy link
Member

mathewc commented Sep 6, 2017

Currently our blob scan algorithm will process ALL blobs in the target container that don't have blob receipts. We should investigate whether we can allow the start date for the scan to be specified.

Scenario: assume all blobs in a container have been processed by a blob trigger function in a particular app (WebJob host). Now, if that function is moved to a different app (different host/host ID) all the blobs will be reprocessed, because there are no receipts for those blobs for that host ID.

@mathewc mathewc changed the title Allow an "older than" timestamp to be specified for blob trigger Allow a "newer than" timestamp to be specified for blob trigger Sep 6, 2017
@brettsam
Copy link
Member

brettsam commented Sep 6, 2017

I agree we should expose something to control this. Since we already maintain the blob scan pointer for tracking our last processed blob, we'd need to make sure that the behavior makes sense when these two interact.

For sake of argument -- let's call this new property newerThan.

For example:

  • newerThan = Jan 1, 2017. Last blob scan was Dec 1, 2016 -- We'd skip over all of December when we start processing.
  • newerThan = Jan 1, 2017. Last blob scan was Feb 1, 2017 -- We wouldn't want to re-process all of January, would we?

In other words -- we'd start our scan from whichever was newest between newerThan and the stored blob scan pointer.

As a side note -- I think writing out informational logs (like we do for Timer) would be very helpful here. Something like Found blob scan pointer of {date} and NewerThan value of {date}. Starting scan at {date} because it is the most recent. To change this, .... It'd only write out once at Listener start and could go a long way towards explaining the logic without needing to look up docs.

@ransagy
Copy link

ransagy commented Oct 9, 2017

This would be very helpful in a few scenarios i came across. My current case - scanning over SQL audit blobs generated by Azure's SQL Blob Auditing feature.
We have a pretty high retention rate for those but only need to process the logs going forward, Which sounds perfect for an Azure Function with a Blob Trigger - Until you realize you have to let it run in a NOOP style over all of them, for each host, before its usable.

This would really help similar scenarios.

@paulbatum
Copy link
Member

One possibility that could help here is using Event Grid's support for routing storage events to azure functions. This approach does not involve any blob scanning which is the cause of the main issue here.

https://docs.microsoft.com/en-us/azure/event-grid/resize-images-on-storage-blob-upload-event

@paulbatum paulbatum added this to the Backlog milestone Oct 18, 2017
@jaltin
Copy link

jaltin commented Jun 12, 2018

Resurfacing this as this is something I would love to be able to do. Any idea on if/when this might be looked at?

Thx!

@paulbatum
Copy link
Member

paulbatum commented Jun 12, 2018

No idea at this time (that's what the "unknown" milestone means).

@jtlz2
Copy link

jtlz2 commented Jun 25, 2019

Another year has passed - any update?

@rollsch
Copy link

rollsch commented Aug 5, 2019

Any update? This is kind of annoying as I have to sit there waiting for 10 minutes for the trigger to reprocess each blob. I'm not sure why but the receipts get reset sometimes which means it will reprocess everything

@pablosguajardo
Copy link

pablosguajardo commented Sep 18, 2020

Hello, I'm the same.
it fires 3 times.
surely they are the events I have created for testing.
But how do I eliminate them all so I create a new one to run only that one?

In the cosole there are 3 events that are triggered at the same time:
2020-09-18T15:28:13.541 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:13.2184099+00:00', Id=595fc416-0280-43e1-8dc5-f285640e986c)
2020-09-18T15:28:13.569 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:13.0399176+00:00', Id=f1796f9a-f06c-47e8-8e67-be34950629a3)
2020-09-18T15:28:13.570 [Information] Executing 'D2KEventGrid' (Reason='EventGrid trigger fired at 2020-09-18T15:28:12.7600042+00:00', Id=c34bb611-2c99-4c1d-ba4b-5675cc87236c)

@paulbatum
Copy link
Member

@pablosguajardo I think you're talking about something different to what is being discussed here, because it looks you are using eventgrid, while this issue is discussing the behavior of the built-in blob trigger..

@Floriszz
Copy link

What about this additional parameter 'Start time'? Does this have anything to do with this? I can't find documentation about this parameter.
image

@santi-paz
Copy link

Any update on this? This feature would be very useful.

@tbasallo
Copy link

tbasallo commented Apr 2, 2022

Is this related to the same blob being triggered for multiple hosts? For example, a blob already processed by a production Function, is also triggered when a dev machine runs the function/project locally. We've seen files from YEARS start to trigger for processing.

@bdlb77
Copy link

bdlb77 commented Jun 17, 2022

Any Updates on this functionality?

@nicm-CC
Copy link

nicm-CC commented Sep 5, 2023

Any update on the above discussion?

@v-bafa
Copy link

v-bafa commented Sep 22, 2023

Really need this feature!! Please help add it : )

@abouroubi
Copy link

More than 6 years after, can we at least have news about it ?

@ToniPR
Copy link

ToniPR commented Mar 17, 2024

I am also hoping for this feature as well, cause when you first publish the trigger in the function app and you have an existing blob with files. You upload one for testing, instead of taking just the testing one. It takes the testing plus any other files in it. But after that, its fine. Next file you upload, it will only trigger for that folder.

https://stackoverflow.com/questions/51675455/stop-azure-blob-trigger-function-from-being-triggered-on-existing-blobs-when-fun

@mario-dnet
Copy link

This would be really helpful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests