New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat/support passing index files #978
Feat/support passing index files #978
Conversation
@jmarshall I found that I was trying to do the same thing when I came across your post at #485. My current implementation can already take multiple index files as arguments but your proposal seems to be more elegant. I'm wondering may I use your approach to refactor my code? |
Using something like foo.bam:foo.bai has positives and negatives. It (theoretically, if not in practice) can cause ambiguities, especially when used as an output filename. Perhaps foo.bam/foo.bai makes it less ambiguous. It also doesn't work with wild-card expansion unlike say /bam_dir/.bam /bai_dir/.bai would. It does however give minimal change to interface, which is appealing. Although I'm unsure how beneficial that is as you don't automatically gain any benefit without changing your code to use the new syntax anyway. For what it's worth regarding this specific PR, I personally dislike the change in -X behaviour from single file to multi file. I feel it would be cleaner to use e.g.
So -X is always a flag and takes no argument, with the flag stating that after the 1 or more files specified (depending on the nature of the program) there will be an identical number of index files. |
@jkbonfield I've refactored my pr. Now for all cases (single file or multiple files), the -X option has a consistent behavior. I agree that the foo.bam:foo.bai approach has both positives and negatives. We can keep refining our solutions on the road. |
A quick look over the code, so far it looks good with attention to detail in error handling (divisible by 2, etc). Thank you. I have a few comments though on things I would like to change:
We can look at doing these changes ourselves if needs be, but it may mean it takes a little while to schedule time to work on this pull request. Otherwise anything you can do to help in this regard would be most welcome, even if it's just a partial fix. Thanks |
Please take a look again @jkbonfield. I'm not sure we will be able to handle the test portion but we updated the docs. |
Thanks. It looks great on cursory inspection. I'll take a more complete look soon and hope we can get this merged without too much delay. |
I'm curious if there have been any additional reviews or updates on potential merging @jkbonfield ? |
@tshao87 have you explicitly tested this with bam/bai files in GCS buckets? |
Apologies for getting side-tracked. Yes I forgot! I've now rebased, squashed and merged as 651bf14 with an extra commit of my own. This fixes a small memory leak and removes explicit ".bai" filename checks in order to permit .crai and .csi to work. I also added a couple rudimentary tests. Many thanks for your work. |
The samtools has be configured with the ability to take index files at customized locations using -X flag.
Affected samtools functionalities:
sample usage:
samtools view -X in1.bam index1.bai chrM:1-1000
or
samtools mpileup -X in1.bam [in2.bam [...]] index1.bai [index2.bai [...]]