New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IGV out of memory when traversing chromosomes larger than 1.073.741.824 bp #977
Comments
For sure IGV is not going to be able to support chromosomes > 2,147,483,647 bytes in length, you will need to look for another solution if that is required. I don't see any reason 1.6GB chromosomes could not be supported. There is a limit on sequence length in the BAM format if the "bai" index is used, but I assume you know that. How are you starting IGV? You can increase the memory used by IGV, it looks like you are using the default. If you have a case that fails for a chromosome < 2.1 GB in size please package up a test case I can download and I will look at it. |
Hi Jim, Yes I'm using a csi index so that should not be a problem. I start igv through its thumbnail icon, but I have also tried to increase the memory limit to 8 Gb and it does not help. I don't think the issue is there actually, since the memory consumption is typically quite low (in the 100-200 mbs range) when browsing pretty much any region and it only shoots up to the maximum as soon as you enter a coordinate above 1024^3 bps. If it's ok with you, I can send you a test dataset via the igv contact form. Thanks a lot for looking into it! |
OK I found the problem. The center of the window in view was being computed from the formula |
The Axolotl genome is breaking every genome software and data archive. I
don’t understand why the Axolotl genome project didn’t simply split the
sequences in two. That could have saved hundreds of work hours spent on
finding various 32bit related bugs. Yes it would not be perfect but so much
more practical...
…On Sat 5 Jun 2021 at 08:50, Jim Robinson ***@***.***> wrote:
OK I found the problem. The center of the window in view was being
computed from the formula center = (start + end) / 2. When the interval
was large enough this sum would overflow, which led for an attempt to query
sequence for a huge range. This fix didn't make the next release, 2.10.0,
which is already packaged, but will be in the next bug fix release. These
occur approximately monthly. If you need it sooner you can sue the snapshot
build.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#977 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TMECZ6A2B26D6TSQPDTRHCL3ANCNFSM46BFLBNA>
.
|
@maximilianh The test data they sent me did have the chromosomes split in 1/2 (P and Q arms), this was a genuine IGV bug as I noted above. IGV and I assume other software still has a 32bit max size limit, which for IGV is in effect 31 bit since Java does not have unsigned ints. |
Ah - Java has 31 bits. I forgot about that. Thanks!
…On Sat, Jun 5, 2021 at 6:19 PM Jim Robinson ***@***.***> wrote:
@maximilianh <https://github.com/maximilianh> The test data they sent me
did have the chromosomes split in 1/2 (P and Q arms), this was a genuine
IGV bug as I noted above. IGV and I assume other software still has a 32bit
max size limit, which for IGV is in effect 31 bit since Java does not have
unsigned ints.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#977 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACL4TPJBQBK72N7VKVACBTTRJE7NANCNFSM46BFLBNA>
.
|
Hi Jim, thanks for looking into the issue! I gave the snapshot build a try but there seems to still be some issue with an index out of bounds. Please find the log below for details. I did not observe the same error with bam files in the mouse genome with this snapshot build so it might still be related to the chromosome size. @maximilianh , as Jim noted, we did indeed split chromosomes into p/q arms to get around that problem. In any case, pretty much all of the up and coming amphibian, lungfish, pine tree, etc. genomes are going to face these problems so they will likely have to be addressed at some point.
|
…rge chromosomes. Fixes igvteam#977
@diego-rt The bug for this issue is fixed, I don't know what the cause of the second issue you found is but if it persists could you open a new issue with preferably a test case I can run? Thanks. |
@diego-rt I think I found the cause of this, its unrelated to this issue, its triggered by BAMs that do not store read sequence due to an unrelated change. So no new issue required. |
While inspecting a long-read bam file aligned to the giant Axolotl genome, I realised that its impossible to traverse past the 1.073.741.824 bp (1024^3) coordinate in any of its chromosomes because of a memory issue. It would be super useful for us working on giant genomes if IGV could visualise these regions, which are substatial in organisms like this one where chromosome lengths can stretch up to 3 Gb.
If that's unreasonable, it would already be quite useful if IGV could reach regions of up to 1.6 GB or so, since already then it would enable us to explore the entirety of this genome when chromosome sequences are split into p/q arms.
The text was updated successfully, but these errors were encountered: