-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast export implementation fails to close GZIPInputStream #2040
Labels
bug
Something isn't working
Comments
lmsurpre
added a commit
that referenced
this issue
Mar 9, 2021
The main one was in ResourcePayload.transferTo because, without this, we never free the non-heap memory being held by the GZIPInputStream. Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 9, 2021
The main one was in ResourcePayload.transferTo because, without this, we never free the non-heap memory being held by the GZIPInputStream. Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 9, 2021
The main one was in ResourcePayload.transferTo because, without this, we never free the non-heap memory being held by the GZIPInputStream. Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
Remove the dataStream.close() from BulkDataUtils.multiPartUpload and add javadoc to make it clear this isn't closed. Also some formatting/cleanup in classes which call this (while I was making sure none of them need a to be closed). Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
Remove the dataStream.close() from BulkDataUtils.multiPartUpload and add javadoc to make it clear this isn't closed. Also some formatting/cleanup in classes which call this (while I was making sure none of them need a to be closed). Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
Remove the dataStream.close() from BulkDataUtils.multiPartUpload and add javadoc to make it clear this isn't closed. Also some formatting/cleanup in classes which call this (while I was making sure none of them need a to be closed). Additionally, I updated the ImportTransientUserData.Builder builder signatures to return ImportTransientUserData builders and objects instead of ImportCheckPointData; accomplished this by following the same pattern we use in fhir-model. Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
I think that we inadvertently changed the S3Provider.writeResources logic as part of a late change in #1999 (the one where we set `chunkData.isFinishCurrentUpload` when getPageNum > lastPageNum). Specifically, in the NDJSON case, we still want to start writing parts as soon as we have crossed the partUploadTriggerSize. This fixes that and removes some now-unnecessary `or` expressions from S3Provider. I also made a number of config-related changes (some of which are probably more opinion-based than strictly needed): * Reconciled the Fast impl's `partUploadTriggerSize` with the legacy impl's `minPartSize` (by adopting the fast impl's name) * Updated the maxFileSizeThreshold and minPartSize params to `objectSizeThreshold` and `objectResourceCountThreshold` respectively * Added "MB" after the size names to make it clear what the units are without needing to look at the docs * Changed ExportCheckpointAlgorithm to handle threshold values of 0 like "no threshold" (formerly these were -1 but that is now impossible to set due to the fact we multiply the config values in our getters) * Fixed the javadoc in ConfigurationAdapter accordingly Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
I think that we inadvertently changed the S3Provider.writeResources logic as part of a late change in #1999 (the one where we set `chunkData.isFinishCurrentUpload` when getPageNum > lastPageNum). Specifically, in the NDJSON case, we still want to start writing parts as soon as we have crossed the partUploadTriggerSize. This fixes that and removes some now-unnecessary `or` expressions from S3Provider. I also made a number of config-related changes (some of which are probably more opinion-based than strictly needed): * Reconciled the Fast impl's `partUploadTriggerSize` with the legacy impl's `minPartSize` (by adopting the fast impl's name) * Updated the maxFileSizeThreshold and minPartSize params to `objectSizeThreshold` and `objectResourceCountThreshold` respectively * Added "MB" after the size names to make it clear what the units are without needing to look at the docs * Changed ExportCheckpointAlgorithm to handle threshold values of 0 like "no threshold" (formerly these were -1 but that is now impossible to set due to the fact we multiply the config values in our getters) * Fixed the javadoc in ConfigurationAdapter accordingly Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
I think that we inadvertently changed the S3Provider.writeResources logic as part of a late change in #1999 (the one where we set `chunkData.isFinishCurrentUpload` when getPageNum > lastPageNum). Specifically, in the NDJSON case, we still want to start writing parts as soon as we have crossed the partUploadTriggerSize. This attempts to fix that and removes some now-unnecessary `or` expressions from S3Provider. I also made a number of config-related changes (some of which are probably more opinion-based than strictly needed): * Reconciled the Fast impl's `partUploadTriggerSize` with the legacy impl's `minPartSize` (by adopting the fast impl's name) * Updated the maxFileSizeThreshold and minPartSize params to `objectSizeThreshold` and `objectResourceCountThreshold` respectively * Added "MB" after the size names to make it clear what the units are without needing to look at the docs * Changed ExportCheckpointAlgorithm to handle threshold values of 0 like "no threshold" (formerly these were -1 but that is now impossible to set due to the fact we multiply the config values in our getters) * Fixed the javadoc in ConfigurationAdapter accordingly Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
I think that we inadvertently changed the S3Provider.writeResources logic as part of a late change in #1999 (the one where we set `chunkData.isFinishCurrentUpload` when getPageNum > lastPageNum). Specifically, in the NDJSON case, we still want to start writing parts as soon as we have crossed the partUploadTriggerSize. This attempts to fix that and removes some now-unnecessary `or` expressions from S3Provider. I also made a number of config-related changes (some of which are probably more opinion-based than strictly needed): * Reconciled the Fast impl's `partUploadTriggerSize` with the legacy impl's `minPartSize` (by adopting the fast impl's name) * Updated the maxFileSizeThreshold and minPartSize params to `objectSizeThreshold` and `objectResourceCountThreshold` respectively * Added "MB" after the size names to make it clear what the units are without needing to look at the docs * Changed ExportCheckpointAlgorithm to handle threshold values of 0 like "no threshold" (formerly these were -1 but that is now impossible to set due to the fact we multiply the config values in our getters) * Fixed the javadoc in ConfigurationAdapter accordingly Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
1. coreCosObjectResourceCountThreshold is now consistently an int 2. systemExportImpl is now "fast" in our default config Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
1. coreCosObjectResourceCountThreshold is now consistently an int 2. systemExportImpl is now "fast" in our default config Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
1. Previously we were looping through all of the ReadResultDTOs (presumably 1 per page of search results), and entering our logic for each page. However, the chunkData buffer stream constains the content from all the results, so this doesn't make much sense to me. 2. Renamed lastWritePageNum to lastWrittenPageNum to make its purpose more clear (more distinct from lastPageNum) 3. Update ExportOperationTest to properly look for an 'output' key in the response json, not just anywhere in the response body Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 10, 2021
1. Previously we were looping through all of the ReadResultDTOs (presumably 1 per page of search results), and entering our logic for each page. However, the chunkData buffer stream constains the content from all the results, so this doesn't make much sense to me. 2. Renamed lastWritePageNum to lastWrittenPageNum to make its purpose more clear (more distinct from lastPageNum) 3. Update ExportOperationTest to properly look for an 'output' key in the response json, not just anywhere in the response body Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
prb112
added a commit
that referenced
this issue
Mar 10, 2021
issue #2040 - close input streams in finally blocks
After finding and fixing the leak, I've launched numerous exports in our test environment and the pods are properly staying within their limits. |
lmsurpre
added a commit
that referenced
this issue
Mar 13, 2021
1. Previously we were looping through all of the ReadResultDTOs (presumably 1 per page of search results), and entering our logic for each page. However, the chunkData buffer stream constains the content from all the results, so this doesn't make much sense to me. 2. Renamed lastWritePageNum to lastWrittenPageNum to make its purpose more clear (more distinct from lastPageNum) 3. Update ExportOperationTest to properly look for an 'output' key in the response json, not just anywhere in the response body 4. Add logging to ExportCheckpointAlgorithm 5. ExportTransientUserData.Builder methods now return ExportTransientUserData instead of the super type (same change was made to ImportTransientUserData previously) 6. Page number now matches the actual page number read; previously it was always 1 higher Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 13, 2021
1. Previously we were looping through all of the ReadResultDTOs (presumably 1 per page of search results), and entering our logic for each page. However, the chunkData buffer stream constains the content from all the results, so this doesn't make much sense to me. 2. Renamed lastWritePageNum to lastWrittenPageNum to make its purpose more clear (more distinct from lastPageNum) 3. Update ExportOperationTest to properly look for an 'output' key in the response json, not just anywhere in the response body 4. Add logging to ExportCheckpointAlgorithm 5. ExportTransientUserData.Builder methods now return ExportTransientUserData instead of the super type (same change was made to ImportTransientUserData previously) 6. Page number now matches the actual page number read; previously it was always 1 higher 7. Set the checkpoint-policy of Group and Patient export to 'custom' to indicate that the ExportCheckpointAlgorithm will be used to control checkpointing Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 14, 2021
1. Previously we were looping through all of the ReadResultDTOs (presumably 1 per page of search results), and entering our logic for each page. However, the chunkData buffer stream constains the content from all the results, so this doesn't make much sense to me. 2. Renamed lastWritePageNum to lastWrittenPageNum to make its purpose more clear (more distinct from lastPageNum) 3. Update ExportOperationTest to properly look for an 'output' key in the response json, not just anywhere in the response body 4. Add logging to ExportCheckpointAlgorithm 5. ExportTransientUserData.Builder methods now return ExportTransientUserData instead of the super type (same change was made to ImportTransientUserData previously) 6. Page number now matches the actual page number read; previously it was always 1 higher 7. Set the checkpoint-policy of Group and Patient export to 'custom' to indicate that the ExportCheckpointAlgorithm will be used to control checkpointing Signed-off-by: Lee Surprenant <lmsurpre@us.ibm.com>
lmsurpre
added a commit
that referenced
this issue
Mar 15, 2021
issue #2040 - cleanup in S3Provider.writeResources
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug
In our kubernetes deployment, our pod kept getting killed for going over its memory limit. However, this was very strange because we have the max heap size set at 4GB.
To Reproduce
Do multiple large exports at the same time
Expected behavior
Export may slow down with load, but pods shouldn't be leaking memory like this.
Additional context
Looks like we're failing to close the GZipInputStream that is used in this fast export impl; GZipInputStream uses Inflater internally and native zlib is used and only freed on an explicit call to close.
The text was updated successfully, but these errors were encountered: