Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(v2): parallel batch processing #1620

Merged
merged 15 commits into from
Jul 3, 2024
Merged

feat(v2): parallel batch processing #1620

merged 15 commits into from
Jul 3, 2024

Conversation

jeromevdl
Copy link
Contributor

Issue #, if available: #1540

Description of changes:

batch module can now process items in parallel rather than in sequence. Introduction of a new method processBatchInParallel in BatchMessageHandler. Works with SQS, Kinesis, DDB Streams but not SQS FIFO (does not make sense).

Checklist

Breaking change checklist

RFC issue #:

  • Migration process documented
  • Implement warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Copy link

github-actions bot commented Apr 5, 2024

💾 Artifacts Size Report

Module Version Size (KB)
powertools-common 2.0.0-SNAPSHOT 9.59
powertools-serialization 2.0.0-SNAPSHOT 17.23
powertools-logging 2.0.0-SNAPSHOT 33.07
powertools-logging-log4j 2.0.0-SNAPSHOT 20.69
powertools-logging-logback 2.0.0-SNAPSHOT 16.91
powertools-tracing 2.0.0-SNAPSHOT 14.01
powertools-metrics 2.0.0-SNAPSHOT 14.05
powertools-parameters 2.0.0-SNAPSHOT 17.49
powertools-validation 2.0.0-SNAPSHOT 19.94
powertools-cloudformation 2.0.0-SNAPSHOT 16.58
powertools-idempotency-core 2.0.0-SNAPSHOT 34.63
powertools-idempotency-dynamodb 2.0.0-SNAPSHOT 12.37
powertools-large-messages 2.0.0-SNAPSHOT 17.45
powertools-batch 2.0.0-SNAPSHOT 21.49
powertools-parameters-ssm 2.0.0-SNAPSHOT 10.70
powertools-parameters-secrets 2.0.0-SNAPSHOT 9.90
powertools-parameters-dynamodb 2.0.0-SNAPSHOT 11.95
powertools-parameters-appconfig 2.0.0-SNAPSHOT 11.45

@jeromevdl jeromevdl changed the title feat: parallel batch processing feat(v2): parallel batch processing Apr 5, 2024
@jeromevdl jeromevdl added v2 Version 2 enhancement New feature or request labels Apr 5, 2024
Copy link
Contributor

@scottgerring scottgerring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Some feedback inline

docs/utilities/batch.md Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented Apr 21, 2024

Quality Gate Passed Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@humanzz
Copy link
Contributor

humanzz commented Apr 29, 2024

Hi folks,

I'm quite happy to see you're working on this. I know my team would love such ability.

I have not read the PR yet, but thought I'd leave a couple of comments to consider - if you haven't already.

In some of our Lambda functions, where we use powertools (more specifically logging, tracing), we do leverage concurrency to speedup our lambda requests. Depending on the use case, this concurrency is achieved using (1) AWS SDK async clients (which return Completable futures) (2) Other types of clients that might have their thread pools (3) Explicit executor services (and sometimes virtual threads from Java 21).

The core challenges we tend to see are in logging/tracing

  1. In the beginning of handling a request, we like to add some properties to both powertools logging context and metrics context. This allows us to easily debug issues, and trace requests, via business-relevant identifiers
  2. We see challenges with xray tracing e.g. either no trace emitted, or trace having wrong parent. All of this is likely due to xray library depending on thread locals to have this lineage tracking, which becomes very challenging with existing thread pools we don't control. Most of the time we end up having to write some custom pieces of code to try to pass relevant info across threads when possible
  3. We also see challenges with logging context, again with thread pools we don't control

I kinda feel, this change, is likely the first time that powertools is handling concurrency. I wonder, if as part of that, or as follow ups, challenges with concurrency when it comes to tracing/logging can maybe be looked at, to see if there's any level of help powertools can provide.

Copy link

codecov bot commented Jun 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.08%. Comparing base (82d4b30) to head (789b91b).
Report is 75 commits behind head on v2.

Current head 789b91b differs from pull request most recent head 395f37a

Please upload reports for the commit 395f37a to get more accurate results.

Additional details and impacted files
@@              Coverage Diff              @@
##                 v2    #1620       +/-   ##
=============================================
- Coverage     89.79%   76.08%   -13.71%     
- Complexity      406      426       +20     
=============================================
  Files            44       40        -4     
  Lines          1274     1560      +286     
  Branches        165      240       +75     
=============================================
+ Hits           1144     1187       +43     
- Misses           88      288      +200     
- Partials         42       85       +43     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@scottgerring
Copy link
Contributor

We see challenges with xray tracing e.g. either no trace emitted, or trace having wrong parent. All of this is likely due to xray library depending on thread locals to have this lineage tracking, which becomes very challenging with existing thread pools we don't control. Most of the time we end up having to write some custom pieces of code to try to pass relevant info across threads when possible

Hey @humanzz ! Sorry for the slow reply; it's been busy over here.
@jeromevdl and I were speaking about this this morning. The error mode you describe from your experience with pooling - missing traces and strange parents - is the sort of thing I am a keen to see we avoid here.

I think the broader issue of "things that need to propagate information across thread boundaries don't work well" is something we may well be positioned to help with - certainly with logging and tracing at least. I have opened #1670 to track this.

@scottgerring
Copy link
Contributor

@jeromevdl how about we merge this without the fix for x-ray and handle that on another PR?

Given the x-ray and log correlation is likely going to be a bigger thing, and we are talking about a SNAPSHOT here, I think the downside of merging this into v2 is low - some x-ray traces won't correlate well on a preview release - and it will let us deal with the bigger issue asynchronously rather than scope-creeping this PR.

@jeromevdl
Copy link
Contributor Author

@jeromevdl how about we merge this without the fix for x-ray and handle that on another PR?

Given the x-ray and log correlation is likely going to be a bigger thing, and we are talking about a SNAPSHOT here, I think the downside of merging this into v2 is low - some x-ray traces won't correlate well on a preview release - and it will let us deal with the bigger issue asynchronously rather than scope-creeping this PR.

I agree, let's review like it is today and create a new issue for the multithread support

@jeromevdl
Copy link
Contributor Author

See #1671

@scottgerring
Copy link
Contributor

@jeromevdl i'll review this tomorrow morning

@scottgerring scottgerring self-requested a review June 25, 2024 15:17
docs/utilities/batch.md Outdated Show resolved Hide resolved
Copy link

sonarcloud bot commented Jul 2, 2024

Copy link
Contributor

@scottgerring scottgerring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤝

@scottgerring scottgerring merged commit 1fa11c1 into v2 Jul 3, 2024
12 checks passed
@scottgerring scottgerring deleted the feat/parallel-batch branch July 3, 2024 13:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request size/XXL v2 Version 2
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants