Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGV out of memory when traversing chromosomes larger than 1.073.741.824 bp #977

Closed
diego-rt opened this issue Jun 3, 2021 · 9 comments
Closed

Comments

@diego-rt
Copy link

diego-rt commented Jun 3, 2021

While inspecting a long-read bam file aligned to the giant Axolotl genome, I realised that its impossible to traverse past the 1.073.741.824 bp (1024^3) coordinate in any of its chromosomes because of a memory issue. It would be super useful for us working on giant genomes if IGV could visualise these regions, which are substatial in organisms like this one where chromosome lengths can stretch up to 3 Gb.

If that's unreasonable, it would already be quite useful if IGV could reach regions of up to 1.6 GB or so, since already then it would enable us to explore the entirety of this genome when chromosome sequences are split into p/q arms.

Screenshot 2021-06-03 at 18 43 58

@jrobinso
Copy link
Contributor

jrobinso commented Jun 3, 2021

For sure IGV is not going to be able to support chromosomes > 2,147,483,647 bytes in length, you will need to look for another solution if that is required. I don't see any reason 1.6GB chromosomes could not be supported. There is a limit on sequence length in the BAM format if the "bai" index is used, but I assume you know that.

How are you starting IGV? You can increase the memory used by IGV, it looks like you are using the default.

If you have a case that fails for a chromosome < 2.1 GB in size please package up a test case I can download and I will look at it.

@diego-rt
Copy link
Author

diego-rt commented Jun 4, 2021

Hi Jim,

Yes I'm using a csi index so that should not be a problem. I start igv through its thumbnail icon, but I have also tried to increase the memory limit to 8 Gb and it does not help. I don't think the issue is there actually, since the memory consumption is typically quite low (in the 100-200 mbs range) when browsing pretty much any region and it only shoots up to the maximum as soon as you enter a coordinate above 1024^3 bps. If it's ok with you, I can send you a test dataset via the igv contact form. Thanks a lot for looking into it!

@jrobinso
Copy link
Contributor

jrobinso commented Jun 5, 2021

OK I found the problem. The center of the window in view was being computed from the formula center = (start + end) / 2. When the interval was large enough this sum would overflow, which led for an attempt to query sequence for a huge range. This fix didn't make the next release, 2.10.0, which is already packaged, but will be in the next bug fix release. These occur approximately monthly. If you need it sooner you can sue the snapshot build.

@maximilianh
Copy link

maximilianh commented Jun 5, 2021 via email

@jrobinso
Copy link
Contributor

jrobinso commented Jun 5, 2021

@maximilianh The test data they sent me did have the chromosomes split in 1/2 (P and Q arms), this was a genuine IGV bug as I noted above. IGV and I assume other software still has a 32bit max size limit, which for IGV is in effect 31 bit since Java does not have unsigned ints.

@maximilianh
Copy link

maximilianh commented Jun 7, 2021 via email

@diego-rt
Copy link
Author

diego-rt commented Jun 7, 2021

Hi Jim, thanks for looking into the issue!

I gave the snapshot build a try but there seems to still be some issue with an index out of bounds. Please find the log below for details. I did not observe the same error with bam files in the mouse genome with this snapshot build so it might still be related to the chromosome size.

@maximilianh , as Jim noted, we did indeed split chromosomes into p/q arms to get around that problem. In any case, pretty much all of the up and coming amphibian, lungfish, pine tree, etc. genomes are going to face these problems so they will likely have to be addressed at some point.

INFO [2021-06-07T14:03:37,703] [Main.java:206]  Startup  IGV Version snapshot 06/05/2021 02:53 AM
INFO [2021-06-07T14:03:37,704] [Main.java:207]  Java 15.0.2 (build 15.0.2+7-27) 2021-01-19
INFO [2021-06-07T14:03:37,705] [Main.java:210]  Java Vendor: Oracle Corporation https://java.oracle.com/
INFO [2021-06-07T14:03:37,705] [Main.java:212]  JVM: Java HotSpot(TM) 64-Bit Server VM    
INFO [2021-06-07T14:03:37,868] [Main.java:215]  Default User Directory: /Users/diego.terrones
INFO [2021-06-07T14:03:37,869] [Main.java:216]  OS: Mac OS X 10.16 x86_64
INFO [2021-06-07T14:03:39,303] [GenomeManager.java:183]  Loading genome: /Users/diego.terrones/Documents/Projects/0_Global/data/genomes/Ambystoma_mexicanum/Amex6.0/reference_sequence/AmexG_v6.DD.corrected.round2.chr.fa.gz
INFO [2021-06-07T14:03:39,742] [CommandListener.java:121]  Listening on port 60151
INFO [2021-06-07T14:04:10,127] [IGV.java:1318]  Loading 1 resources.
INFO [2021-06-07T14:04:10,133] [TrackLoader.java:118]  Loading resource, path /Users/diego.terrones/Flongle_alignment.sorted.bam
ERROR [2021-06-07T14:04:18,095] [AlignmentTileLoader.java:312]  Error loading alignment data
java.lang.ArrayIndexOutOfBoundsException: Index 15347 out of bounds for length 0
	at org.broad.igv.sam.ByteSubarray.getByte(ByteSubarray.java:26) ~[igv.jar:?]
	at org.broad.igv.sam.AlignmentUtils.reverseComplementCopy(AlignmentUtils.java:178) ~[igv.jar:?]
	at org.broad.igv.sam.BisulfiteCounts.incrementCounts(BisulfiteCounts.java:82) ~[igv.jar:?]
	at org.broad.igv.sam.BaseAlignmentCounts.incCounts(BaseAlignmentCounts.java:128) ~[igv.jar:?]
	at org.broad.igv.sam.AlignmentTileLoader$AlignmentTile.addRecord(AlignmentTileLoader.java:495) ~[igv.jar:?]
	at org.broad.igv.sam.AlignmentTileLoader.loadTile(AlignmentTileLoader.java:244) [igv.jar:?]
	at org.broad.igv.sam.AlignmentDataManager.loadInterval(AlignmentDataManager.java:418) [igv.jar:?]
	at org.broad.igv.sam.AlignmentDataManager.load(AlignmentDataManager.java:365) [igv.jar:?]
	at org.broad.igv.sam.CoverageTrack.load(CoverageTrack.java:176) [igv.jar:?]
	at org.broad.igv.ui.IGV.lambda$repaint$12(IGV.java:2442) [igv.jar:?]
	at java.util.concurrent.CompletableFuture$AsyncRun.run(CompletableFuture.java:1800) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]
INFO [2021-06-07T14:04:18,099] [MessageUtils.java:76]  <html>Error encountered querying alignments: java.lang.ArrayIndexOutOfBoundsException: Index 15347 out of bounds for length 0
INFO [2021-06-07T14:04:47,137] [TrackLoader.java:118]  Loading resource, path /Users/diego.terrones/Flongle_alignment.sorted.bam
INFO [2021-06-07T14:04:47,360] [MessageUtils.java:76]  <html>Unexpected error: null.<br>See igv.log for more details
ERROR [2021-06-07T14:04:50,429] [LongRunningTask.java:75]  Exception running task
java.lang.NullPointerException: null
	at java.io.File.<init>(File.java:279) ~[?:?]
	at org.broad.igv.util.FileUtils.getAbsolutePath(FileUtils.java:412) ~[igv.jar:?]
	at org.broad.igv.session.IGVSessionReader.processTrack(IGVSessionReader.java:877) ~[igv.jar:?]
	at org.broad.igv.session.IGVSessionReader.processPanel(IGVSessionReader.java:805) ~[igv.jar:?]
	at org.broad.igv.session.IGVSessionReader.process(IGVSessionReader.java:364) ~[igv.jar:?]
	at org.broad.igv.session.IGVSessionReader.process(IGVSessionReader.java:1044) ~[igv.jar:?]
	at org.broad.igv.session.IGVSessionReader.processRootNode(IGVSessionReader.java:286) ~[igv.jar:?]
	at org.broad.igv.session.IGVSessionReader.loadSession(IGVSessionReader.java:169) ~[igv.jar:?]
	at org.broad.igv.ui.IGV.restoreSessionFromStream(IGV.java:1143) ~[igv.jar:?]
	at org.broad.igv.ui.action.ReloadTracksMenuAction.lambda$actionPerformed$0(ReloadTracksMenuAction.java:81) ~[igv.jar:?]
	at org.broad.igv.util.LongRunningTask.call(LongRunningTask.java:72) [igv.jar:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:832) [?:?]

brainstorm pushed a commit to umccr/igv that referenced this issue Jun 11, 2021
@jrobinso
Copy link
Contributor

@diego-rt The bug for this issue is fixed, I don't know what the cause of the second issue you found is but if it persists could you open a new issue with preferably a test case I can run? Thanks.

@jrobinso
Copy link
Contributor

@diego-rt I think I found the cause of this, its unrelated to this issue, its triggered by BAMs that do not store read sequence due to an unrelated change. So no new issue required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants