Skip to content
Branch: master
Find file History
Permalink
Type Name Latest commit message Commit time
..
Failed to load latest commit information.
env Move Azure Function sample to samplesv2 folder Feb 27, 2019
src Move Azure Function sample to samplesv2 folder Feb 27, 2019
.gitignore Move Azure Function sample to samplesv2 folder Feb 27, 2019
Readme.md Move Azure Function sample to samplesv2 folder Feb 27, 2019

Readme.md

Untar Azure File With Azure Function Sample

Introduction

This samples illustrates an Azure Data Factory pipeline that will iterate through tar files in an Azure File Share, and extract their content. The basic flow is:

  1. Get the metadata from a dataset associated with the Azure File Share
  2. Loop through the children of the dataset metadata
  3. Pass each file name to an Azure Function
  4. The Function downloads the file
  5. The contents of the file is extracted to local storage using Adam Hathcock's SharpCompress library
  6. The file is uploaded to the file share
  7. The Function returns a list of the urls of the files that have been created, to the Data Factory

Running and debugging

Prerequisites

You will need Git, an Azure Subscription, Powershell and the Az Powershell Module to run the sample. You can run it locally or from the Azure Cloud Shell. For local debugging, you will also need Visual Studio 2017, or Visual Studio Code, with the Azure Functions extensions installed.

Deployment

  1. Clone this git repository to your machine with git.
  2. Navigate to the folder {repository location}\Azure-DataFactory\Samples\UntarAzureFilesWithAzureFunction\env
  3. Login to Azure by running Connect-AzAccount
  4. Select your desired subscription using Select-AzContext '{subscription name}'
  5. Run the following command in powershell: deploy.ps1. You can specify the resource group to use using the -resourcegroupname parameter. A random prefix is automatically generated for all resources to ensure unique names. To prevent new resources from being created on each run, you can override this prefix with a fixed value by specifying the uniqueresourcenameprefix parameter. Example: deploy.ps1 -resourcegroupname 'dffunctionssample' -uniqueresourcenameprefix 'e1064086576241d39'

The deployment script will do the following:

  1. Create all the resources you need in the specified resource group. These include a function app, storage account, and data factory.
  2. Deployed a pre-existing function app package to the created Function app. You can override this by deploying the app in the \src directory using the Azure Functions Core Tools or Visual Studio, to the Function app that the script generates.
  3. Upload the tar files in the env directory to the newly created storage account.
  4. Create a ample pipeline and connected services in the created Data Factory.

Running the sample

  1. Open the Azure portal and navigate to the newly created Resource Group.
  2. The resource group will contain the Azure Function App, a Storage Account and a Data Factory.
  3. Open the Storage account in a new window.
  4. Click on Files.
  5. Click on the filedrop share.
  6. You should see two tar files in here. You can click the connect button for instructions to mount the file share on your local machine.
  7. In your previous window, open the Data Factory, and click Author and Monitor.
  8. Click the Author button in the left menu, and select the UntarPipeline pipeline.
  9. Click the Debug button.
  10. Once the run is completed, the run output should contain an entry for each step.
  11. Click on the output button of each Azure Function step. It will display the URL of the extracted files for each tar.
  12. Switch back to the window displaying the Azure Files content, and hit the refresh button. The list will now contain two directories containing the content of the tar files.
  13. Open the directory to view the contents of the tar files.

Debugging the Azure Function locally

You can debug the Azure Function on your local machine, by setting up a tunnel using ngrok.

First you need to be able to run the Function App in the src folder locally. You will need Either Visual Studio 2017 with the Function Apps extension, or Visual Studio Code with the Azure Functions extension installed. Follow the instructions here.

Once the app is running, make a note of the uri on which it is exposed. This is usually http://localhost:7071/api/DecompressFile You can test it by sending a POST request using Postman to this URI. You need to set the filename in the body of the post, using the following request body:

{
    "fileName": "TestData1.tar"
}

Executing the request will give you a response containing the urls of the files it created. If you are debugging the Function App, it will stop at any breakpoints you've set.

To test your local function using the data factory, you need to set up a tunnel using ngrok:

  1. Download ngrok to your local machine.
  2. Open a command prompt and run the following command from the location of the ngrok executable. ngrok http 7071 where 7071 matches the port in the URI you are currently debugging.
  3. Copy the https ngrok URI created by ngrok. It will look like https://e7c0d779.ngrok.io
  4. Replace the address in Postman with the ngrok address, and repeat your test.
  5. Now open the Azure Datafactory, and lick the Connections button.
  6. Click the AzureFunctionService Edit button.
  7. Replace the Function App URL with the address that ngrok created.
  8. Type anything in the Function Key field. This function is set to allow anonymous access, so the key doesn't matter.
  9. Click Finish.
  10. Open the pipeline again, and click Debug. After a few moments the breakpoint in your debugger will be hit.

Notes

Rerunning the factory will result in some failures when the directories created by the function is sent to the Function app. You can prevent this by either deleting the directories between runs, or updating the filepath filter in the Data Factory's FileShareDataset to only look for tar files.

The Azure Function will work with any compression and archive format supported by the SharpCompress library

More info on the Azure Functions activity is available here

You can’t perform that action at this time.