Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how do you specify blob storage source where year/month/day.csv storage is involved #6117

Closed
myusrn opened this issue Mar 21, 2018 — with docs.microsoft.com · 29 comments

Comments

Copy link

myusrn commented Mar 21, 2018

with adf v1 if you were specifying a blob storage source where log files were involved using a year/month/day.csv storage hierarchy you could specify the container source folder as "mylogdata/{Year}/{Month}" and the file as "{Day}.csv", provided you included partitionedBy section outlining how these token placeholders are calculated. I'm finding adf v2 dataset based on blob storage linked service doesn't like this. Has the syntax for folder and file hierarchy based on dates changed?

The error I get when I try this is in adf v2 is
Activity Copy Points Data failed: Failure happened on 'Source' side. ErrorCode=UserErrorSourceBlobNotExist,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The required Blob is missing.
ContainerName: https://mystoracct.blob.core.windows.net/mylogdata, ContainerExist: True,
BlobPrefix: {Day}.csv, BlobCount: 0.,Source=Microsoft.DataTransfer.ClientLibrary,'


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

@MohitGargMSFT
Copy link
Member

Thanks for your feedback! We will investigate and update as appropriate.

@myusrn
Copy link
Author

myusrn commented Mar 22, 2018

. . . and here is an example of the dataset json that worked with adf v1 but doesn't appear to work when I try it with adf v2

"type": "AzureBlob", "linkedServiceName": "blob-store", "typeProperties": { "folderPath": "adf-data/gamedata/{Year}/{Month}", "fileName": "{Day}.csv", "format": { "type": "TextFormat", "columnDelimiter": ",", "firstRowAsHeader": true }, "partitionedBy": [ { "name": "Year", "value": { "type": "DateTime", "date": "SliceStart", "format": "yyyy" } }, { "name": "Month", "value": { "type": "DateTime", "date": "SliceStart", "format": "MM" } }, { "name": "Day", "value": { "type": "DateTime", "date": "SliceStart", "format": "dd" } } ] },

@jason-j-MSFT
Copy link
Contributor

@myusrn
Copy link
Author

myusrn commented Mar 27, 2018

Thanks for followup. I had received a pointer to the first of the above to documents in response to datafactoryv2.azure.com feedback submission I had also created. Using it I arrived at the following settings to enable parameterized file processing. The outstanding issue it left me with is that the files I needed to process lived in a /{Year}/{Month}/{Day}.csv blob storage structure where the date range was in the past, i.e. 2016/01/01.csv -> 2016/06/30.csv content. So a scheduled trigger couldn't be setup to fire daily for these dates in the past and a Trigger Now where I provided a scheduleRunTime input parameter value wouild only allow me to process one of those past date files at a time. Is a TumblingWindow trigger setup what is needed to allow a person to kick off a pipeline to process a date range of parameterized blob storage folder/file hierarchy content or some pipeline FOR loop option?

/*** pipeline setting ***/
"properties": {
    "parameters": {
        "scheduledRunTime": {
            "type": "String",
            "defaultValue": ""
        }
    }
}

/*** scheduled [ vs tumbling window ] daily trigger run parameters ***/
scheduledRunTime String @trigger().scheduledTime

/*** dataset with blob storage linked service connection settings ***/
"properties": {
    . . . 
    "typeProperties": {
        . . . 
        "fileName": {                
            "value": "@concat(formatDateTime(pipeline().parameters.scheduledRunTime, '%d'), '.csv')",                
            "type": "Expression"            
        },   
        "folderPath": {                
            "value": "@concat('adf-data/gamedata/', formatDateTime(pipeline().parameters.scheduledRunTime, 'yyyy'), '/', formatDateTime(pipeline().parameters.scheduledRunTime, '%M'))",
            "type": "Expression"           
        }
    }
}

@linda33wj
Copy link
Contributor

@myusrn please use Tumbling Window trigger instead for this and set the interval to 24h (daily), which can do backfill as you expected. For difference between schedule trigger vs tumbling window trigger, check this: https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers#trigger-type-comparison

@terpie
Copy link

terpie commented Mar 28, 2018

I try to accomplish the same, however it seems the parameter 'ScheduledRunTime' is not passed through from trigger to pipeline. When I create the trigger it recognizes the pipeline parameter 'ScheduledRunTime' and the trigger parameter is created with value @trigger().scheduledTime.
When I 'Trigger Now' the trigger the pipeline pops-up asking the value of the 'ScheduledRunTime' parameter where I would assume it should be passed through from the trigger? Anything I am doing wrong.
Martin.

@myusrn
Copy link
Author

myusrn commented Mar 28, 2018

@terpie when I used 'Trigger Now' for a Scheduled Trigger setup I found that the parameter prompt was asking me to provide a datetime value, e.g. 01/01/2016 12:00 AM, that would get used in pipeline execution in lieu of that value being passed by an actual firing of the Scheduled Trigger setup. I'm about to test the Tumbling Window option to determine how I can use 'Trigger Now' to kick off date based parametrized blob folder/file path reading for an entire date range in the past, e.g. 01/01/2016 12:00 AM -> 06/30/2016 11:59 PM.

@terpie
Copy link

terpie commented Mar 28, 2018

Yes I try the same but the (tumbling window) trigger should pass through de value @trigger().scheduledTime via the parameter 'ScheduledRunTime' to the pipeline so the CSV dataset can use it to select the correct .csv file. But this passing through of the parameter I don't get it to work...

@myusrn
Copy link
Author

myusrn commented Mar 28, 2018

@linda3wj i'm reviewing the document you referenced and am not seeing how using a Tumbling Window Trigger option addresses this need, any differently than a Scheduled Trigger, to execute a pipeline Copy activity for a range of past dates. In fact that document suggests that with v2 the only way for me to accomplish this is to manually trigger the pipeline from a powershell script or c# program that issues rest api call in a loop passing in the scheduledRunTime parameter value for each of the past date ranges I need covered, 01/01/2016 12:00 AM -> 06/30/2016 11:59 PM.

Alternatively it appears one might be able to setup a Interations and Conditionals | ForEach activity that calls my Copy activity and have the ForEach parameters configured to pass into the child Copy activity each of the past datetime values I need that Copy activity to execute. Not sure if that is possible but this aspect of v2 appears to be significantly different than what v1 support wrt executing activity using a past data range.

@jason-j-MSFT
Copy link
Contributor

@myusrn & @terpie
Thanks again, for the feedback. I think we have enough information here to understand you are looking for a documentation enhancement which will include examples of how to use some of the functionality found in ADFv1, in ADFv2. The content team will evaluate this feedback and make enhancements where appropriate in the ADFv2 documentation, going forward. ADFv2 is still in preview status, and will continue to evolve.

In the meantime, I invite you to leave feedback for the product team here:
https://feedback.azure.com/forums/270578-data-factory

If you post a link to your feedback here, I will leave a comment on your feedback with a link back to this issue. We really appreciate your interest in ADF and hope to improve your experience with the service.

@myusrn
Copy link
Author

myusrn commented Mar 29, 2018

thanks @jason-j-MSFT for the closure and next steps suggestion on this matter. I created the product team feedback entry https://feedback.azure.com/forums/270578-data-factory/suggestions/33787015-story-for-running-a-pipeline-for-a-range-of-dates to try and ensure this scenario is covered and if not that it gets looked at.

@linda33wj
Copy link
Contributor

linda33wj commented Mar 29, 2018

@myusrn & @terpie I should have suggested this simpler path to help you understand this earlier: as you are using the copy data tool, please try the built-in scheduled copy which will auto generate all the parameters and the corresponding tumbling window trigger. After deployment, you can then check how things are chained together through generic authoring UI.

In the copy data tool first page, "Run regularly on schedule" -> select start date and end date (suggest you go with a shorten period for test first), and check https://docs.microsoft.com/en-us/azure/data-factory/copy-data-tool#filter-data-in-an-azure-blob-folder on specifically how to config datetime partitioned path in copy data tool.

@terpie
Copy link

terpie commented Mar 29, 2018

@linda33wj : thanks, now it works! I've used the built-in 'copy data' as you proposed to create the dataset/pipeline/trigger with scheduling and this one works. I've compared the json code with the one I created before and the difference is in the parameters. So I changed my earlier created dataset/trigger using these parameters and this pipeline works also!

in trigger:
"parameters": {
"windowStart": "@trigger().outputs.windowStartTime",
"windowEnd": "@trigger().outputs.windowEndTime"
}
in dataset:
"fileName": {
"value": "@concat(formatDateTime(pipeline().parameters.windowStart, 'dd'),'.csv')",
"type": "Expression"
},
"folderPath": {
"value": "@concat('adf-data/gamedata/', formatDateTime(pipeline().parameters.windowStart, 'yyyy'), '/', formatDateTime(pipeline().parameters.windowStart, 'MM'))",
"type": "Expression"
}

in pipeline:
"parameters": {
"windowStart": {
"type": "String"
},
"windowEnd": {
"type": "String"
}
}

So apparently the trigger parameter 'ScheduledRunTime' set to @trigger().scheduledTime didn't work and did not pass through a value to the pipeline. That was all.

@linda33wj
Copy link
Contributor

@terpie great to know! The tumbling window trigger has different system variable name comparing to scheduled trigger, as mentioned at https://docs.microsoft.com/en-us/azure/data-factory/concepts-pipeline-execution-triggers#trigger-type-comparison.

@myusrn hope you can make your case work as well. :)

@jason-j-MSFT jason-j-MSFT removed the cxp label Mar 29, 2018
@jason-j-MSFT jason-j-MSFT removed their assignment Mar 29, 2018
@myusrn
Copy link
Author

myusrn commented Mar 29, 2018

@linda33wj & @terpie

I used the adfV2 overview | let's get started | copy data wizard and followed the instructions in https://docs.microsoft.com/en-us/azure/data-factory/copy-data-tool#filter-data-in-an-azure-blob-folder link shared earlier to swap out browser + choose selected specific file with a datetime parameterized blob storage source folder/file setting. I also confirmed that the "run regularly on a schedule" selection up front created an associated tumbling window trigger with the trigger run parameters windowStart = @trigger().outputs.windowStartTime and windowEnd = @trigger().outputs.windowEndTime.

What I found using that process is that the blob storage linked service dataset was created with

folderPath = adf-data/gamedata/@{formatDateTime(pipeline().parameters.windowStart,'yyyy')}/@{formatDateTime(pipeline().parameters.windowStart,'MM')}
fileName = @{formatDateTime(pipeline().parameters.windowStart,'dd')}.csv 

and not the following settings that I used when manually creating this dataset based on earlier issue suggestions.

folderPath = @concat('adf-data/gamedata/', formatDateTime(pipeline().parameters.windowStart, 'yyyy'), '/', formatDateTime(pipeline().parameters.windowStart, 'MM'))
fileName = @concat(formatDateTime(pipeline().parameters.windowStart, 'dd'), '.csv')

Regardless of folderPath/fileName expression syntax used it's currently cranking away on what appears to be each blob storage files existing in the my manually defined tumbling window trigger for date range 01/01/2016 12:00 am -> 06/30/2016 11:59 pm. Will know if I have success when those entries in monitoring window stop showing up and I can do a count of all rows created from copy process to confirm I have the same 262080 rows that resulted when using adfV1 based setup.

q1. is the tumbling window once activated via adfV2 "publish all" is essentially firing a bunch of pipeline instances where @trigger().outputs.windowStartTime = <date> 12:00 am and @trigger().outputs.windowEndTime = <date> 11:59 pm such that each firing of pipeline is able to use pipeline().parameters.windowStart <date> to construct blob storage date parameterized path setting that involves /yyyy/MM/dd.csv format?

q2. what if I wanted rerun this pipeline/trigger setup to debug resulting differences after making some edits and optionally clearing out all the target data sink results from an earlier run. To do that in adfV2 does one unactivate trigger and select "publish all" and then reactivate trigger and select "publish all"?

q3. it would seem that use of pipeline "trigger now" is not viable in this scenario except for the case where you want to debug execution for one day at a time correct?

q4. when looking at monitor | <pipeline execution instance> | actions | view activity runs | actions input/output/details I'm unable to find any details telling me what tumbling window @trigger().outputs.windowStartTime and @trigger().outputs.windowEndTime iteration that particular monitoring entry is associated with. Am I overlooking how to see that?

@terpie are you also taking the msft academy big data track [ https://aka.ms/bdMsa ], specifically dat223.3x orchestrating big data with azure data factory course's lab 3, and are trying to get an adfV2 based pipeline processing setup working for the game points blob2sql copy lab working in lieu of the adfV1 based one covered in the lab? I started down this path because I found the adfV1 approach was really challenging when it came to terminating failed runs and restarting new runs and making sense of what was currently executing and what was done executing in the adfV1 monitoring views . . . so hope was that adfV2 even in preview state would be more approachable for this date parameterized blob storage input scenario that I expect would be pretty common especially when processing application log files.

@terpie
Copy link

terpie commented Mar 29, 2018

q1. yes
q2. unactivate and activate trigger does not work to rerun the whole period again. It seems the trigger keeps track of which day.csv file are processed successfully. I am trying to find a way how to clear/reset the trigger so I can process the whole period again. Maybe @linda33wj knows?
q3. yes, it just runs the pipeline for one day where you have to enter the pipeline parameters
q4. you can see the value of the parameters for each run.

I am following a training with edx (currently at Lab 3):
https://www.edx.org/course/orchestrating-big-data-with-azure-data-factory
But it seems this training was created for the V1 data factory so it is a challenge to use V2 for it.
However it is good learning, it is better than just following the pre-cooked steps ;-)

Martin.

@myusrn
Copy link
Author

myusrn commented Mar 30, 2018

@terpie thanks for those details that helps.

wrt q2. I found a similar experience with adfV1 in that if I managed to pause/stop and restart an existing pipeline definition it was remembering what was executed on prior pass(es) and simply trying to execute what hadn't gotten finished in prior runs or had been in waiting to execute states. With adfV1 I could only seem to get a pipeline/trigger setup to reexecute from scratch by duplicating it an entirely different adfV1 environment. With adfV2 in this case to make mods and rerun pipeline/trigger combination from scratch i'm going to test removing the trigger association and then create a new replica of that trigger associated with the pipeline and then "publish all" to get another test pass from beginning going. If that works then I'd say its an improvement over the adfV1 case of having to create a whole new adfV1 service replica of the setup.

wrt q3. so this would seem to be the appropriate way to debug/test your blob storage date parameterized folder/file path input one day at a time vs having a bad/incorrect setup try and execute across the entire date range you want the final pass to operate on.

wrt q4. thanks for the pointer to where to find trigger passed parameter values associated with each monitor entry . . . I was completely overlooking that "Parameters" column data on far right side and focusing instead just on what the "Actions" column "View Activity Runs" exposed.

wrt msft academy / edX.org training lab yes that's the same course and lab 3 that I found challenging to debug and fix errors with using adfV1 and if I had the above understanding of how to make key aspects of that exercise work in adfV2 I think it would have been easier and quicker to debug and initiate final complete pass that generated 262080 rows when I did it using adfV1 and 260640 rows when I did it using the adfV2 approach discussed above. Since each days set of records consisted of 1440 rows and (262080 - 260640) / 1440 = 1 then i'm guessing my adfV1 run perhaps had an extra days worth of rows present perhaps from a single day execution test pass. I'd be curious what your "SELECT count(*) FROM dbo.points" produced in adfV1 and adfV2 cases.

@linda33wj I might suggest that the adfV2 documentation perhaps have an example of how to setup and debug/run the msft academy big data track [ https://aka.ms/bdMsa ] dat223.3x orchestrating big data with azure data factory course's lab 3 where the lab 3 instructions pdf download can be found at https://aka.ms/edx-dat223.3x-lab3 .

@myusrn
Copy link
Author

myusrn commented Mar 30, 2018

I tested how to get a tumbling window trigger enabled pipeline like this to re-run the entire date range and what I found I had to do was de-activate and delete the existing trigger and recreate it for things to re-run. From a product documentation perspective it might be helpful to have this documented in the appropriate place and from a product design perspective it would be nice to have a "refire trigger" button that automated these steps for you.

to delete adfV2 triggers see #6163, e.g.

Login-AzureRmAccount 
// Get-AzureRmSubscription -SubscriptionId <subId> -TenantId <tenId> | Set-AzureRmContext
Get-AzureRmDataFactoryV2Trigger -ResourceGroupName <my rg name> -DataFactoryName <my adfv2 name>
Stop-AzureRmDataFactoryV2Trigger -ResourceGroupName <my rg name> -DataFactoryName <my adfv2 name> -Name <unwanted trigger to remove>  // that changes Activated property to unchecked
Remove-AzureRmDataFactoryV2Trigger -ResourceGroupName <my rg name> -DataFactoryName <my adfv2 name> -Name <unwanted trigger to remove>

Also reviewing monitor pipeline runs executed by tumbling window trigger I found that I needed to specify date range 01/01/2016 -> 07/01/2016 in order to get it to process date parameterized blob storage files from 01/01/2016 -> 06/30/2016. This was because the windowStart value for last firing when I use 06/30/2016 11:59 pm as tumbling window trigger end date was 06/29/2016 12:00 am not 06/30/2016 12:00 am. An aspect of this scenario configuration that didn't seem intuitive.

Also I found that a successful no errors configuration pass of the 01/01/2016 -> 06/30/2016 files, consisting 180 files each with 1440 lines, took 1 hour to start and finish using the copy data wizard tumbling window trigger settings of max concurrency = 10 | retry policy count = 3 | retry policy interval in seconds = 120. I did another pass using max concurrency = 50 (default setting for new tw trigger) | retry policy count = 1 | retry policy interval in seconds = 10 and that ripped through the same dataset in 2 mins more aligned with what I would have expected a big data etl processing like adfV2 to be able to deliver with this quantity of data and simple 1:1 mapping into the target azure sql db. So documentation wise i'll have to look if there is guidance on this front given with my naïve big data experience i'd be inclined to run this in the future with max concurrency = total number of blob storage files to process, if they are able to all be processed concurrently w/o any dependency on ordering, in order to start/finish this pipeline/trigger processing very quickly and in turn optimizing the developer fail/fix/retry cycle.

@terpie
Copy link

terpie commented Mar 30, 2018

@myusrn In being able to re-run the trigger for the whole data range I was hoping to find something in PowerShell. I am new in area so I just installed the Azure Powershell package and was able to Login into Azure and to run some adf trigger commands like above. So I think this PowerShell is the way to go with json scripts to create/start/stop etc.
BTW I accidently truncated the table so SELECT count(*) FROM dbo.points is zero ;-) (that's the reason I want to re-run).
Concerning the speed of processing, that is depending on the chosen speed of the storage en Azure SQL database. For this training I am using the slowest/cheapest solution so I will check how fast it will perform. That will be next week. Thanks!

Martin.

@myusrn
Copy link
Author

myusrn commented Mar 31, 2018

@terpie note that I simply installed current vs17 15.6.4 bits with the workloads | web & cloud | azure development option enabled and this appears to applied azure powershell modules for me. Presumably at some point we get adfV2 project template support as part of azure development option workload that will enable not only defining but also executing and monitoring adfV2 pipeline/trigger setups from within vs17 ide given there is an adfV1 project support story in the azure sdk for vs15 ide environment.

@concat
Copy link

concat commented Apr 2, 2018

@myusrn my github account name is “concat”; it makes for some interesting emails to me at times - like this one which has code snippet that triggered email to me!

@terpie
Copy link

terpie commented Apr 3, 2018

@myusrn Indeed the trigger with
"startTime": "2016-01-01T00:00:00Z",
"endTime": "2016-06-30T23:59:00Z",
does not process the last day 2016-06-30 12:00 AM. After manually processing this last day my count(*) is 262080. To re-run the trigger for the whole period I had to stop-remove-set-start the trigger so in fact remove and recreate the trigger. I did this using PowerShell but I would be interested in a command to clear the history of the trigger instead of remove re-create of the trigger.
It looks like using Set-AzureRmDataFactorySliceStatus you can reset the status for each dataset slice so it will run again but there seems there is not a adf-V2 version of this command?

@myusrn
Copy link
Author

myusrn commented Apr 5, 2018

fyi, received a pointer from other channel that deleting adfV2 triggers can be accomplished in UI using azure portal < adfV2 instance > | author & monitor | author | triggers [ bottom left just below connections ] | < your trigger > | actions | delete

@myusrn
Copy link
Author

myusrn commented Apr 12, 2018

. . . a related documentation issue, how would one define and adfV2 tumbling window trigger to do backfill scenario where the dates are every week on Saturday [ or every 7 days ] starting 2009/01/03 thru 2009/12/26 ?

Using adfV1 I tried setting up a schedule of 2009/01/03/ thru 2009/12/26 and "availability": "availability": { "frequency": "Week", "interval": 1 } or { "frequency": "Day", "interval": 7 } but neither of those generated start/end tumbling window dates that aligned with my 2009/01/03 starting date and every Saturday after that.

So I switched to "availability": { "frequency": "Day", "interval": 1 } with the plan of just accepting all the dates that would have “Waiting” status due to no data which doesn't seem optimal.

I've now moved over to adfV2 to try and solve this scenario of a tumbling window trigger to do backfill scenario where the dates are every week on Saturday [ or every 7 days ] starting 2009/01/03 thru 2009/12/26 .

  • image of blob storage source needing tumbling window backfill solution can be found here

@linda33wj
Copy link
Contributor

@myusrn for V2, the scheduler becomes quite flexible. Authoring trigger through UI would be much easier: go to your pipeline -> new/edit trigger -> you can then choose the start date (2009/01/03), end date (optional field, 2009/12/26), recurrence (weekly), and also advanced settings there (e.g. every Sat & Sun).

@myusrn
Copy link
Author

myusrn commented Apr 15, 2018

@linda33wj thanks the pointer.

I am only seeing "Every Minute" and "Hourly" options in new/edit trigger UI for TumblingWindow but I do see the full set of options you mention in the new/edit trigger UI for Schedule, specifically Weekly with advanced “Run on these days” option and Monthly with an even more extensive set of advanced recurrence setting options. I setup the TumblingWindow with "Hourly" and 168 hour intervals [ 7 days * 24 hours ] and that did the trick.

Also it appears that adfV2 Copy Data wizard UI only exposes Run Now and a combined experience for Schedule and TumblingWindow trigger configuration option. In the trigger case it appeared if you picked weekly it pinned the start day as the Sunday date for whatever date you enter, e.g. I entered 01/03/2009 as start date which is a Saturday and copy data wiz created trigger was setup with 12/28/2008 as the start date which is the Sunday of the week containing the date I entered. So I had to go re-create the trigger copy data wizard setup using authoring environment Triggers section to get one that started on the actual Saturday and ran every 7 days / 168 hours following that for the backfill dates I needed to cover.

Copy link

I cannot use pipeline().TriggerTime when trying to grab a folder formatted as dd mm yyyy.

@JosHaemers
Copy link

Hi everybody,
We’ve a pipeline that should copy data from blob to SQL Database. The folder structure of the blob is:
../resourceId= /SUBSCRIPTIONS
/ [SubscriptionID]
/ RESOURCEGROUPS
/ [ResourceGroup]
/ PROVIDERS
/ MICROSOFT.DATAFACTORY
/ FACTORIES
/ [Datafactory]
/Y=9999/M=99/D=99/H=99/M=9
So, in order to get data from the past we use a tumbling window trigger. But we always receive the following error message when ‘Trigger Now’ the pipeline:
ErrorCode=UserErrorSourceBlobNotExist,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The required Blob is missing.

We’ve no idea what is going wrong, because we use the same credentials as always.

We’ve setup the trigger, pipeline and copy activity as follows:
Trigger:
{
"name": "TR-Tumbling-LOG-Pipeline",
"properties": {
"description": "",
"runtimeState": "Stopped",
"pipeline": {
"pipelineReference": {
"referenceName": "PL-Master-Logging-PipelineRun",
"type": "PipelineReference"
},
"parameters": {
"windowStart": "@trigger().outputs.windowStartTime",
"windowEnd": "@trigger().outputs.windowEndTime"
}
},
"type": "TumblingWindowTrigger",
"typeProperties": {
"frequency": "Hour",
"interval": 1,
"startTime": "2018-12-01T13:10:00.000Z",
"endTime": "2019-01-23T13:10:00.000Z",
"delay": "00:00:00",
"maxConcurrency": 50,
"retryPolicy": {
"intervalInSeconds": 30
}
}
}
}

in dataset:
"fileName": "PT1H.json",
"folderPath": {
"value": "@concat('insights-logs-pipelineruns/resourceId=/SUBSCRIPTIONS/999999999/RESOURCEGROUPS/OBVRG00012/PROVIDERS/MICROSOFT.DATAFACTORY/FACTORIES/XXXXXXXXXXX/y=',formatDateTime(dataset().windowStart, 'yyyy'), '/m=', formatDateTime(dataset().windowStart, 'MM'), '/d=', formatDateTime(dataset().windowStart, 'dd'), '/h=', formatDateTime(dataset().windowStart, 'HH'), '/m=00')",
"type": "Expression"
}

in pipeline:
"parameters": {
"windowStart": {
"type": "String",
"defaultValue": "2018-12-01T13:10:00Z"
},
"windowEnd": {
"type": "String",
"defaultValue": "2019-01-23T13:10:00Z"
}
},

in copy activity:
"inputs": [
{
"referenceName": "src_blb_log_pipelinerun",
"type": "DatasetReference",
"parameters": {
"windowStart": {
"value": "@pipeline().parameters.windowStart",
"type": "Expression"
},
"windowEnd": {
"value": "@pipeline().parameters.windowEnd",
"type": "Expression"
}
}
}
],

@linda33wj
Copy link
Contributor

@JosHaemers the error message should return the exact path Copy activity was looking for and was missing, then you can double check the blob existence. Per your config, I'd suggest you double check the folder path setting on dataset, esp. on capital vs small letter, note blob path is case-sensitive.

#please-close

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants