Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace ByteArrayInputStream and ByteArrayOutputStream usage with something more efficient #2049

Closed
punktilious opened this issue Mar 10, 2021 · 1 comment
Assignees

Comments

@punktilious
Copy link
Collaborator

punktilious commented Mar 10, 2021

Issues with ByteArrayOutputStream:

  1. Access to the underlying byte[] copies the array which can increase memory pressure in some high throughput scenarios where large buffers are involved;
  2. Many methods are synchronized.

Issues with ByteArrayInputStream:

  1. Many methods are synchronized.

Implement a new class which can provide InputStream and OutputStream accessors, using a single byte[] which can be resized if needed, or avoid array copies if an ideal size is given to begin with.

Replace ByteArrayOutputStream and ByteArrayInputStream where they are used to compress/decompress resource payloads.

@punktilious punktilious self-assigned this Mar 16, 2021
@punktilious punktilious added this to the Sprint 2021-04 milestone Mar 16, 2021
@lmsurpre
Copy link
Member

lmsurpre commented Mar 23, 2021

I ran an export after getting this change deployed and the performance looks good to me:

INFO bulkexportfastjob[164] processed 138753 resources in 21.34 seconds (rate=6501.5 resources/second)
INFO bulkexportfastjob[164]                          Patient      32618
INFO bulkexportfastjob[164]                        Condition     360376
INFO bulkexportfastjob[164]                      Observation    6359819
INFO bulkexportfastjob[164] -------------------------------- ----------
INFO bulkexportfastjob[164]                            TOTAL    6752813

Prior to this change, I'd seen export times over 10,000 resources/second, but back then we were also leaking memory (due to #2040 ).

Lets call this one 'good' and we can revisit in the future as part of a broader performance analysis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants