Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUM-3796: Optimise BatchFileOrchestator performance #1968

Merged

Conversation

0xnm
Copy link
Contributor

@0xnm 0xnm commented Apr 5, 2024

What does this PR do?

This PR optimise BatchFileOrchestator performance by doing the following changes:

  • Remove File.isFile check - this is expensive syscall and it should be only files in the particular folder anyway (it is only BatchFileOrchestrator who works with that folder) and we check file names anyway. Also File.isFile doc says: Any non-directory file created by a Java application is guaranteed to be a normal file.
  • Remove regex for the batch file name check in favor of conversion to Long.
  • Don't sort files where it is not needed
  • Reuse files collection where it is possible to minimise the number of syscalls.
  • Remove file cache (it is not really needed after File.isFile check is removed as benchmarks show below)
  • Increase cleanup check frequency from 1s to 5s.

Benchmarks

The testing approach was the following: generate 500 files of size ranging from 1Kb to 32Kb, evenly distributed in the last 24h by their names as timestamps. Then do 100 calls to getWritableFile passing forceNewFile = true at every even call, and forceNewFile = false at every odd call.

Tests:

  1. no cache, no cleanup frequency threshold (meaning cleanup can be executed at every call to getWritableFile)
  2. no cache, cleanup frequency threshold = 1s
  3. cache for 400 files, cleanup frequency threshold = 1s
  4. cache for 800 files, cleanup frequency threshold = 1s

Table below shows the average duration of each getWritableFile call in milliseconds.

1 2 3 4
Baseline 136 40 23 23
File.isFile removed 95 19 22 22
Remove regex 88 20 21 21
Remove sorting if not needed 86 21 22 22
Reuse files collection 65 20 21 21

NB: it shouldn't be considered as a proper benchmarking methodology (JMH is not compatible with AGP, there is some variability, etc.), but can give an idea of the impact for the improvements.

Last 2 changes (cache removal and change of the cleanup frequency) are not listed, because they directly impact tests.

All changes are in separate commits.

Review checklist (to be filled by reviewers)

  • Feature or bugfix MUST have appropriate tests (unit, integration, e2e)
  • Make sure you discussed the feature or bugfix with the maintaining team in an Issue
  • Make sure each commit and the PR mention the Issue number (cf the CONTRIBUTING doc)

@0xnm 0xnm requested review from a team as code owners April 5, 2024 09:51
Copy link
Collaborator

@xgouchet xgouchet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really nice job there !

get() = File("${this.path}_metadata")
private fun listSortedBatchFiles(): List<File> {
// note: since it is using File#compareTo, lexicographical sorting will be used, meaning "10" comes before "9".
// but for our needs it is fine, because the moment when Unix timestamp adds one more digit will be in 2286.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few, I guess it'll be for others to deal with this then…

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless SDK is installed on time machine device

@0xnm 0xnm force-pushed the nogorodnikov/rum-3796/optimise-batch-file-orchestrator branch from 916d402 to 949f2c9 Compare April 5, 2024 12:49
@0xnm 0xnm force-pushed the nogorodnikov/rum-3796/optimise-batch-file-orchestrator branch from 949f2c9 to 63471a2 Compare April 5, 2024 12:50
@codecov-commenter
Copy link

codecov-commenter commented Apr 5, 2024

Codecov Report

Merging #1968 (63471a2) into develop (f5aebeb) will decrease coverage by 0.01%.
Report is 2 commits behind head on develop.
The diff coverage is 92.86%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1968      +/-   ##
===========================================
- Coverage    83.40%   83.38%   -0.01%     
===========================================
  Files          489      489              
  Lines        17923    17902      -21     
  Branches      2669     2669              
===========================================
- Hits         14947    14927      -20     
- Misses        2237     2241       +4     
+ Partials       739      734       -5     
Files Coverage Δ
...internal/persistence/file/FilePersistenceConfig.kt 100.00% <ø> (ø)
...al/persistence/file/batch/BatchFileOrchestrator.kt 94.04% <92.86%> (+0.37%) ⬆️

... and 27 files with indirect coverage changes

@0xnm 0xnm merged commit 1b75c02 into develop Apr 8, 2024
23 checks passed
@0xnm 0xnm deleted the nogorodnikov/rum-3796/optimise-batch-file-orchestrator branch April 8, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants