-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes to get Azure Blob Support work with large GenomicsDB datasets #107
Conversation
…ic loading of codecs and allow zstd to use compress/decompress with context
Codecov Report
@@ Coverage Diff @@
## develop #107 +/- ##
===========================================
- Coverage 62.65% 62.61% -0.04%
===========================================
Files 59 60 +1
Lines 17697 17703 +6
===========================================
- Hits 11088 11085 -3
- Misses 6609 6618 +9
Continue to review full report at Codecov.
|
…ently than az blob storage on empty folders
…ently than az blob storage on empty folders
These changes look good to me, but I wonder if we should give the azure benchmarks a whirl with these changes. Not that I expect any perf changes necessarily, but given that we didn't get a smoking gun for the issues on there that might serve as a good stress test to make sure the refactoring doesn't uncover any issues with the azure client. What do you think @nalinigans? |
Agreed. Have pulled in these changes into GenomicsDB - check this branch https://github.com/GenomicsDB/GenomicsDB/tree/ng_debug_azure_0822. @aoblebea and I had chatted yesterday about testing these changes. I am testing importing the tcga dataset and the workspace from Azure Blob from my laptop from home to use the faster network meanwhile. |
The GenomicsDB branch (ng_debug_azure_0822) which uses this pull request ran fine on Azure (except for the no compression case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
The significant codec changes are to register names specifically for the supported algorithms. This allows fragment reads/writes to output the name of the codec in case of compression errors. Also, changed all tabs to spaces in the codec code for readability for consistency.
For Azure Blob Support to handle large GenomicsDB datasets:
debug
mode. Basically, we stash away the file sizes while writing and then confirm the file sizes from cloud storage after committing the files.