Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binding/trigger for Azure Data Lake Store #353

Open
ahelland opened this issue Jun 21, 2017 · 13 comments
Open

Binding/trigger for Azure Data Lake Store #353

ahelland opened this issue Jun 21, 2017 · 13 comments
Labels
Milestone

Comments

@ahelland
Copy link

Much like it's useful to process incoming blobs on their way into blob store it could be useful to process files landing in Azure Data Lake Store. Possibly something like a file for which there is no ADLA extractor where Azure Functions could process it, and create a metadata file more suitable for the purpose.

Anything in the pipeline for this?

@pavankvd
Copy link

Yes. Can we get a trigger for a files placed in Data Lake Store, similar to blob triggers for Azure Blob Storage

@roundbatman
Copy link

Yes, longing for this one also.

@dreadeddev
Copy link

Yes, yes, yes!!!! Do custom web job trigger extensions work for Functions?? if so, how??
https://github.com/Azure/azure-webjobs-sdk-extensions/wiki/Binding-Extensions-Overview

@syedhassaanahmed
Copy link

+1

@dufain
Copy link

dufain commented Mar 6, 2018

This would be incredibly helpful. Any updates on this?

@dkmiller
Copy link

Also agree that this would be super useful. Any progress / updates?

@paulbatum
Copy link
Member

This feature is not a high priority for us right now, but I will note that the announcement for Azure Event Grid listed Data Lake as one of the integrations they are building. Once you can subscribe to Data Lake updates through Event Grid, running an Azure Function would be trivial (see here for some info). So I think this the most likely way that the scenario would be enabled. I noticed that Event Grid integration for Data Lake is tracked in user voice here - I would encourage everyone following this issue to comment there to help the Data Lake team prioritize this work.

@paulbatum paulbatum added this to the Unknown milestone Mar 23, 2018
@zhangruiskyline
Copy link

Hi, Want to follow up on this thread, do we have Azure function integrated with ADL? or we still need to handle via event grid?

@paulbatum
Copy link
Member

@zhangruiskyline There has been some progress, courtesy of @joescars, see here. Note that there are no official releases of this yet (you would need to clone and build yourself, and there is no ETA for this to change).

@alexgman
Copy link

is this possible to do yet?

@paulbatum
Copy link
Member

@alexgman The state is unchanged, see above for the relevant links.

@shamshuddeeen
Copy link

Does any one have a solution for this yet?

@SeaDude
Copy link

SeaDude commented Sep 2, 2020

You can now use an ADLS Gen2 directory (namespace) as a trigger for an Azure Function. The problem comes in with permissions for the Function. I have not found a way to grant the Function permissions to ONLY the directory from which its triggered. The Function requires permissions to the ENTIRE DataLake.

Example:

  • Azure Function w/ Python runtime
  • System-assigned Managed Identity (SAMI) turned on for Function
  • local.settings.json must have a triggerStorage value set to <the entire DataLake connection string/key> (<--this is the problem)
    • Better practice: Set this to a Key Vault reference to keep the connection string/key out of your code; still the problem above persists
  • In function.json, set the binding path to path/to/trigger/directory/{name}
  • The Function will trigger ONLY when files are uploaded to the path above BUT the SAMI requires the connection string/key (aka: Root access) to the entire data lake. This is not good.

Problems:

  • The Function requires the keys to the kingdom in order to monitor a single directory (namespace) within the DataLake
  • Access Control Lists don't help here as the Function must have permissions to parent directories (path/to/trigger in example above) in order to write to child directories (/directory/{name}) AND ACL's don't generate any type of "token" or connection string to substitute in your local.settings.json file
  • SAS tokens are only scopable to the container-level (top level, path/ in above example) so this still gives the Function permissions to the entire DataLake

Desired state:

  • The ability to scope permissions down to (generate a SAS token for) a single directory within the DataLake so that the Function does not require root access to the DataLake.
    • Azure Function 1 SAMI has only permissions on directory external/vendors/vendor1/usecase1 and is provided with a connection string/key that allows it to be triggered when blobs are uploaded here.
  • Azure Function 2 SAMI has only permissions on directory external/vendors/vendor2/usecase2 and is provided with a connection string/key that allows it to be triggered when blobs are uploaded here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests