Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wrapper scripts to pipe training tensors directly to Tensor2Bin #55

Merged
merged 12 commits into from
Sep 29, 2021

Conversation

ftostevin-ont
Copy link
Collaborator

This MR adds a wrapper script CreateTrainingTensor that calls CreateTensor{Pileup|FullAlignment} and handles piping the output to Tensor2Bin, in the same way as CallVarBam does with CallVariants. Additionally, UnifyRepresentation similarly calls CreateTensorFullAlignment and handles the piped output directly.
This has the advantage of saving writing and rereading the uncompressed tensors from disk, and allows tensor extraction and compression to run in parallel.

I also add a second script MergeBin that simply merges the individual chunk binaries into one, without changing their contents. This is mainly to limit the number of binary files that need to be passed to training.

One functional change introduced by this is that non-variant site subsampling is done at a variable rate (though targeting a constant variant:non-variant ratio) determined at the level of chunks of sites of size shuffle_bin_size, rather than at a global rate determined over all tensor details files. I have not seen that this significantly affects the resulting output tensors, though in theory the number of non-variant sites included will be sligthly more variable.

This should be backwards-compatible with the previous functionality of Tensor2Bin, though I have not extensively tested this.

@aquaskyline
Copy link
Member

#53 and #55 testing in progress.

@zhengzhenxian zhengzhenxian merged commit cc313f7 into HKU-BAL:main Sep 29, 2021
@ftostevin-ont ftostevin-ont deleted the training_features_pipes branch April 29, 2022 12:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants