-
Notifications
You must be signed in to change notification settings - Fork 121
Task Factory File/Blob Enumeration is Slow #303
Comments
This is not unexpected if you have lots of blobs to enumerate. You can optimize your file-based task factory by prefix filtering directly to the blob virtual directory (on the server side to eliminate filtering on the client side): task_factory:
file:
azure_storage:
storage_account_settings: mystorageaccount
remote_path: projects/shipyard_eval/pgus/sku_eval_file
include:
- '*.csv'
is_file_share: false |
The above is the expected behavior - after reviewing the code, there's a defect preventing prefix matching from filtering server side. |
@alfpark Thanks for your suggestion. In my case, I only has around 10 files in the blob. Would it still take such a long time to enumerate all the files? |
The entire container only has 10 blobs total? |
@alfpark I mean I only have around 10 files to enumerate, which would create 10 tasks. The whole container contain around ten thousands of files. |
That is expected behavior in this case (with the defect). I'll attempt to fix the prefix filter not being applied, then if you use my suggested yaml above, the task generation should be quick. |
@alfpark I attempted your suggested yaml format. However, it looks like the it is enumerating the whole container (which is named as |
It's not fixed yet, the fix is coming right now. |
Please check the devops build for the commit above, you can use the |
Thanks @alfpark . I tried to submit the pool under the
My pool.yaml is the following:
|
You need to install/upgrade: https://github.com/Azure/batch-shipyard/blob/master/docs/01-batch-shipyard-installation.md#upgrading-to-new-releases |
@alfpark Thanks, I have fixed this issue. However, my task submission still gave me the following output:
My config for the task factory looks like:
|
Sorry, I forgot to switch back to |
@JadenLy, thanks for testing! It'll be rolled up into the next hotfix release. |
Problem Description
I have been using shipyard for a while. There is one issue that I noticed, which is the uploading speed. It takes a long time for me to complete the job submission process.
Batch Shipyard Version
3.7.0
Steps to Reproduce
For my job, I used a jobs.yaml similar to the following:
Submitting
pool
is pretty quick. Submitting a job sometimes is pretty quick for a job without iterating files over blob, but it takes a long time in this case.Expected Results
I hope that the submission can be done in one minute.
Actual Results
It takes about 20 minutes or more to complete the process.
Redacted Configuration
Additional Logs
Additonal Comments
Let me know if there is any additional information I can provide to help.
The text was updated successfully, but these errors were encountered: