Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] PagedIterable<BlobItem> stream().paralllel() is behaving as sequential #40768

Open
3 tasks done
varenyavv opened this issue Jun 21, 2024 · 2 comments
Open
3 tasks done
Assignees
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)

Comments

@varenyavv
Copy link

Describe the bug
PagedIterable<BlobItem> stream().paralllel() is behaving as sequential.

Exception or Stack Trace
Log showing a single forkjoinpool worker is performing the task even though it has 20 workers.

2024-06-21 17:59:21,275 INFO  [ForkJoinPool.commonPool-worker-1] c.o.g.f.v.s.o.IndividualRequestXmlAggregator gpfh-vendor-outbound-job batchId: '2000' sourceSystem: 'GPS': Downloading 
blob 789/generated/2023-12-03/1232939632.xml
2024-06-21 17:59:22,208 INFO  [ForkJoinPool.commonPool-worker-1] c.o.g.f.v.s.o.IndividualRequestXmlAggregator gpfh-vendor-outbound-job batchId: '2000' sourceSystem: 'GPS': Downloading 
blob 789/generated/2023-12-03/1232939814.xml
2024-06-21 17:59:23,143 INFO  [ForkJoinPool.commonPool-worker-1] c.o.g.f.v.s.o.IndividualRequestXmlAggregator gpfh-vendor-outbound-job batchId: '2000' sourceSystem: 'GPS': Downloading 
blob 789/generated/2023-12-03/1232939895.xml
2024-06-21 17:59:24,065 INFO  [ForkJoinPool.commonPool-worker-1] c.o.g.f.v.s.o.IndividualRequestXmlAggregator gpfh-vendor-outbound-job batchId: '2000' sourceSystem: 'GPS': Downloading 
blob 789/generated/2023-12-03/1232939934.xml

To Reproduce
Steps to reproduce the behavior:
Use the below code snippet to list the blobs. Logs will display that the task is occurring serially which is causing slowness if the blob counts are in millions.

Code Snippet

BlobServiceClient blobServiceClient = new BlobServiceClientBuilder().connectionString(connectionString).buildClient();
BlobContainerClient blobContainerClient = blobServiceClient.getBlobContainerClient(blobContainerName);
String blobPrefix = "789/generated/";
ListBlobsOptions options = new ListBlobsOptions().setPrefix(blobPrefix).setMaxResultsPerPage(5000);
PagedIterable<BlobItem> blobItems = blobContainerClient.listBlobs(options, null, null);
blobItems.stream()
        .parallel()
        .filter(this::isAnXml)
        .forEach(
            blobItem -> {
              LOGGER.info("Downloading blob {}", blobItem.getName());
              //more business logic
            });

Expected behavior
Execution should happen in parallel by multiple worker threads rather than by the single thread in sequential manner.

Setup (please complete the following information):
OS: Ubuntu 22.04.4 LTS
IDE: Intellij
Library/Libraries: com.azure:azure-storage-blob:12.25.0
Java version: Openjdk version 17.0.10
App Server/Environment: Tomcat embedded in Springboot
Frameworks: Springboot v3.2.3

Information Checklist
Kindly make sure that you have added all the following information above and checkoff the required fields otherwise we will treat the issuer as an incomplete report

  • Bug Description Added
  • Repro Steps Added
  • Setup information Added
@github-actions github-actions bot added Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files) labels Jun 21, 2024
Copy link

Copy link

Thank you for your feedback. Tagging and routing to the team member best able to assist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Client This issue points to a problem in the data-plane of the library. customer-reported Issues that are reported by GitHub users external to the Azure organization. needs-team-attention This issue needs attention from Azure service team or SDK team question The issue doesn't require a change to the product in order to be resolved. Most issues start as that Storage Storage Service (Queues, Blobs, Files)
Projects
None yet
Development

No branches or pull requests

2 participants