Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Another Error: too may BAM features... question #1524

Closed
bioinfornatics opened this issue Jun 29, 2020 · 9 comments
Closed

Another Error: too may BAM features... question #1524

bioinfornatics opened this issue Jun 29, 2020 · 9 comments

Comments

@bioinfornatics
Copy link

bioinfornatics commented Jun 29, 2020

On release 16.7 not try others yet I got this error:

Error: too may BAM features. Bam chunk size 68865847 bytest exceeds chunkSizeLimit of 20000000

I read the FAQ

I tried to view a little region 394 bases but I still got the error.
In fact The message is always display with any region size.

I try to add into a configuration track:

{
  "maxExportSpan": 500000,
  "autoscale": "local",
  "logScaleOption": false,
  "style": {
    "pos_color": "blue",
    "neg_color": "red",
    "origin_color": "#888",
    "variance_band_color": "rgba(0,0,0,0.3)"
  },
  "min_score": 0,
  "mismatchScale": 0.1,
  "indicatorProp": 0.5,
  "indicatorDepth": 1,
  "hideDuplicateReads": true,
  "hideQCFailingReads": true,
  "hideSecondary": true,
  "hideSupplementary": true,
  "hideMissingMatepairs": false,
  "hideImproperPairs": false,
  "hideUnmapped": true,
  "label": "tlrtjntjnqnqrnq",
  "key": "foo",
  "storeClass": "JBrowse/Store/SeqFeature/BAM",
  "urlTemplate": "sorted.realign.bam",
  "type": "JBrowse/View/Track/SNPCoverage",
  "metadata": {
    "organism": "foo",
    "sample": "foo",
    "mapping": "bwa"
  },
  "baseUrl": "http://somewhere/data/",
  "chunkSizeLimit": 1000000000
}

but I still have the error

So, how to fix it ?

The bam file is small 238Mo

@cmdcolin
Copy link
Contributor

Editing the chunkSizeLimit in the "Edit config" panel inside jbrowse does not work (xref #895) but you can edit the trackList.json for the track or, on initial track import with the Open track dialog there is a config editor there also if you are not in control of the trackList.json

@cmdcolin
Copy link
Contributor

Also it is interesting that a small BAM file results in very large chunk sizes like this. I had another user report stuff like this but from what I could tell even tools like samtools were requesting the same large amount of data.

It is kind of an annoying situation and it's possible we should just get rid of chunkSizeLimit

@bioinfornatics
Copy link
Author

bioinfornatics commented Jun 29, 2020

Thanks @cmdcolin I will try your recommendation

with samtools tview I can display the alignment

samtools tview

@bioinfornatics
Copy link
Author

bioinfornatics commented Jun 29, 2020

I have around 100 tracks so

  1. If I put the "chunkSizeLimit": 1000000000 to only one track jbrowse is able to display it (others track still have the error)

  2. I added on each track
    this "chunkSizeLimit": 1000000000

but now to display 96 bases jbrowse need more than 100 000 000 bytes !

jbrowse

@cmdcolin
Copy link
Contributor

I agree this is not ideal at all. If you are interested in helping we could try to fix. We'd need to find out why it's doing this. If you have a sample test file please let me know. I may have a similar test file in my email from another user I can check if not.

Furthermore, CRAM has better behavior for this I believe not just because it is more compressed but because the index format is different. If you can please consider using CRAM, jbrowse has supported it for awhile.

@bioinfornatics
Copy link
Author

We agree with you cram file should the main file format to use unfortunately they are still some tools which allow only bam file like some sub-commands into gatk

@cmdcolin
Copy link
Contributor

cmdcolin commented Jul 4, 2020

I think we resolved this. It is an interesting issue and is fundamentally difficult to handle due to how BAI index operates

One takeaway was, a custom CSI index can improve performance

samtools index -c -m 6 yourfile.bam

Since BAI has minimum binSize ~16kb many reads can end up in a single bin, so for small genomes this can result in very large bin sizes (the CSI above has min bin size 64 bases, so much smaller)

Using CRAM is also a good alternative

Continued improvements in BAM performance would improve this situation even with the large BAI bins, and we might want to get rid of chunkSizeLimit also because it is generally just unwanted behavior from the user perspective

@bioinfornatics
Copy link
Author

Thanks @cmdcolin

@cmdcolin
Copy link
Contributor

maybe can close for now with the workarounds reported

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants