Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/support passing index files #978

Conversation

mfshao
Copy link
Contributor

@mfshao mfshao commented Nov 12, 2018

The samtools has be configured with the ability to take index files at customized locations using -X flag.

Affected samtools functionalities:

  • view
  • stats
  • bedcov
  • depth
  • merge/sort
  • tview
  • mpileup

sample usage:
samtools view -X in1.bam index1.bai chrM:1-1000
or
samtools mpileup -X in1.bam [in2.bam [...]] index1.bai [index2.bai [...]]

@mfshao
Copy link
Contributor Author

mfshao commented Nov 14, 2018

@jmarshall I found that I was trying to do the same thing when I came across your post at #485. My current implementation can already take multiple index files as arguments but your proposal seems to be more elegant. I'm wondering may I use your approach to refactor my code?

@jkbonfield
Copy link
Contributor

jkbonfield commented Nov 15, 2018

Using something like foo.bam:foo.bai has positives and negatives. It (theoretically, if not in practice) can cause ambiguities, especially when used as an output filename. Perhaps foo.bam/foo.bai makes it less ambiguous. It also doesn't work with wild-card expansion unlike say /bam_dir/.bam /bai_dir/.bai would.

It does however give minimal change to interface, which is appealing. Although I'm unsure how beneficial that is as you don't automatically gain any benefit without changing your code to use the new syntax anyway.

For what it's worth regarding this specific PR, I personally dislike the change in -X behaviour from single file to multi file. I feel it would be cleaner to use e.g.

samtools view -X -s 0.5 -h [etc] in.bam in.bai chr1

So -X is always a flag and takes no argument, with the flag stating that after the 1 or more files specified (depending on the nature of the program) there will be an identical number of index files.

@mfshao
Copy link
Contributor Author

mfshao commented Nov 16, 2018

@jkbonfield I've refactored my pr. Now for all cases (single file or multiple files), the -X option has a consistent behavior.

I agree that the foo.bam:foo.bai approach has both positives and negatives. We can keep refining our solutions on the road.

@jkbonfield
Copy link
Contributor

A quick look over the code, so far it looks good with attention to detail in error handling (divisible by 2, etc). Thank you.

I have a few comments though on things I would like to change:

  • Update the man pages (mostly boilerplate text which can be identical to every command).

  • I feel it is misleading to put the [index ...] in the usage statement as this is only legal when a specific option is used. Normally commands that want to do this show two lines. Eg:

    samtools mpileup [options] in1.bam [in2.bam ...]
    samtools mpileup [options] -X in1.bam [in2.bam ...] [index1 [index2 ...]]

    I don't think this is really appropriate to single out the one option here and it's best put this in the man page or, if possible, in the -X description part of the usage.

  • There are no tests. This is perhaps a rather cruel criticism given the obscurity of the test harness!

We can look at doing these changes ourselves if needs be, but it may mean it takes a little while to schedule time to work on this pull request. Otherwise anything you can do to help in this regard would be most welcome, even if it's just a partial fix.

Thanks

@zflamig
Copy link

zflamig commented Nov 30, 2018

Please take a look again @jkbonfield. I'm not sure we will be able to handle the test portion but we updated the docs.

@jkbonfield
Copy link
Contributor

Thanks. It looks great on cursory inspection. I'll take a more complete look soon and hope we can get this merged without too much delay.

@trevars
Copy link

trevars commented Jan 9, 2019

I'm curious if there have been any additional reviews or updates on potential merging @jkbonfield ?

@smrgit
Copy link

smrgit commented Jan 14, 2019

@tshao87 have you explicitly tested this with bam/bai files in GCS buckets?

@jkbonfield
Copy link
Contributor

Apologies for getting side-tracked. Yes I forgot!

I've now rebased, squashed and merged as 651bf14 with an extra commit of my own. This fixes a small memory leak and removes explicit ".bai" filename checks in order to permit .crai and .csi to work. I also added a couple rudimentary tests.

Many thanks for your work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants