Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node stack overflow error in dendro_data #27

Closed
GuptaPriyanshu opened this issue Aug 1, 2018 · 20 comments
Closed

Node stack overflow error in dendro_data #27

GuptaPriyanshu opened this issue Aug 1, 2018 · 20 comments
Assignees
Labels

Comments

@GuptaPriyanshu
Copy link

Hi,

We are trying to plot dendrogram but unfortunately for some cases we are getting error in dendro_data function.

hcdata <- dendro_data(hc, type="rectangle")

Error in : node stack overflow
No stack trace available

Is there any way i can get rid of this error ?

Thanks,
Priyanshu

@andrie
Copy link
Owner

andrie commented Aug 1, 2018

Please provide a reproducible example

@GuptaPriyanshu
Copy link
Author

Hi ,

I have attached two sample files.

  1. mydata1.csv works fine
  2. mydata.csv gives error as mentioned above

Below is the code in which i read data from csv file

scaledData<-read.csv("mydata1.csv",header=TRUE)
dhc <<- hclust(dist(scaledData), method = "average")
ddata <- dendro_data(dhc, type = "rectangle")

files.zip

@andrie
Copy link
Owner

andrie commented Aug 2, 2018

Yes, I can replicate the problem. I'm not going to be able to investigate any time soon.

Meanwhile you can try:

  • Use plot(dhc) from the package rpart()
  • Reduce your data set (maybe using sampling) and then plot using ggdendro

@thommohr
Copy link

The issue is still present when dealing with large dendrograms .....

@thommohr
Copy link

Hi, the issue still persists. It is extremely problematic when analyzing genetic data. Any chance it will be resolved soon ?

best,
Thomas

@andrie
Copy link
Owner

andrie commented Dec 16, 2019

Do you get the same problem when using rpart() ?

@thommohr
Copy link

thommohr commented Feb 2, 2020

Sorry for the late reply, was very busy the last months.

Quote: "Do you get the same problem when using rpart() ?"

I am not sure how to use rpart, but will look it up.

The problem is with co-expression network analyses, which typically result in dendrograms with 20.000+ leafs. Plotting the dendrogram with the normal plotting fuction is not a problem. However, I would love to integrate this into ggplot2 since I think this is a really great way to write papers.

@thommohr
Copy link

The issue is sill present in the latest release.I tried to plot a dendrogram generated by hclust with roughly 14695. I think the problem lies with a recursive function call within the as.dendrogram function.

@andrie
Copy link
Owner

andrie commented Sep 15, 2020

Correct. Since ggdendro doesn't do this compuation, but rpart does, this can't be fixed in ggdendro, really. If you can demonstrate that rpart can do the computation but ggdendro can not, then I will investigate.

@andrie
Copy link
Owner

andrie commented Sep 15, 2020

Sorry, I said rpart but of course you're using hclust. In either case, the dendogram itself is generated by as.dendrogram in the stats package (part of base R). If the base stats package is causing the problem, then ggdendro can not help you solve this, unfortunately.

Still, if you can find a solution using base R, this will demonstrate that the problem lies with ggdendro, and then I can investigate.

@thommohr
Copy link

Hi Andrie,

thanks for the prompt reply. Plotting the dendrogram with plot(hclustobject) is no problem at all. The function as.dendrogram works fine.

The culprit is the gg.plotNode function defined in the dendrogram_data function, I believe the recursive function call in line 282-285 of dendrogram.R

I hope that helps. Dendrograms of this size are nothing unusual in -omics data analysis, and it would be really nice to be able to plot dendrograms in ggplot. Plotting them using plot and reimporting the png file is - let's put it this way - suboptimal.

best, and thanks for your work !
Thomas

@ljwharbers
Copy link

Hi,

I'm running into the same issue. Is there any update on this issue?

Thanks a lot for your work!
Luuk

@andrewGhazi
Copy link

Hi there, just wanted to bump this issue as I've had problems with it as well. As thommor pointed out, the issue is caused by this recursive call to gg.plotNode().

Attached is a plot from plot.hclust() on an hclust result that causes the issue. Not a very interesting tree, but I'm using dendro_data() in a package that sometimes runs into examples like this (which is also why I want the lighter dependency footprint of ggdendro vs dendextend). Many leaves branch off from the same node at height = 0, so I think what's going on in this case is the recursive function adds a a new layer to the call stack for each leaf until it hits the limit. The final error message is:

> ggdendro::dendro_data(path_clust)
Error: node stack overflow
Error during wrapup: node stack overflow
Error: no more error handlers available (recursive errors?); invoking 'abort' restart

Github doesn't allow uploading RData files to issues, so I ran the hclust result through dput() and put it into a txt file.

I appreciate any help, ggdendro is awesome!

path_clust_dput.txt

dendro_data_clust

@crotoc
Copy link

crotoc commented May 15, 2023

This problem happens because there is too much identical branches and the recursive function invoked overflow the limit. A work around is adding a small random value to the original matrix to make sure there are not so many identical rows.

https://support.bioconductor.org/p/125023/

@leonfodoulian
Copy link

Hi @andrie,

The problem lies with ggdendro. Creating a dendrogram using base R works just fine. I have tried to look into the problem, and as it was already stated, the problem lies with gg.plotNode calling itself. Adding noise to the data (as suggested by @crotoc) fixes the problem, but is not really ideal for me. Do you think you can look into this issue anytime soon?

Thank you in advance!

Best,
Leon

andrie added a commit that referenced this issue Sep 30, 2023
@andrie
Copy link
Owner

andrie commented Sep 30, 2023

I have pushed a fix to the main branch on github. Please can you install the latest version from github and let me know if this works on your real world data.

Using the example provided by @GuptaPriyanshu , I now get this plot output:

image

@andrie andrie self-assigned this Sep 30, 2023
@andrie andrie added the Bug label Sep 30, 2023
@andrie andrie closed this as completed Sep 30, 2023
@andrie
Copy link
Owner

andrie commented Oct 1, 2023

Re-opening, since there are some performance optimisation gains possible.

@andrie andrie reopened this Oct 1, 2023
andrie added a commit that referenced this issue Oct 1, 2023
@andrie
Copy link
Owner

andrie commented Oct 1, 2023

The example now runs in ~1.5 seconds on my machine.

@leonfodoulian
Copy link

Hi @andrie,

Thank you for fixing the issue. The latest GitHub version of ggdendro now works without returning an error. It runs in 864.351 seconds (about 14.4 minutes) on a dataset composed of 58723 observations.

Best,
Leon

@andrie
Copy link
Owner

andrie commented Feb 23, 2024

I am delighted to finally close this bug, after 6 years. Version 0.2.0 was accepted by CRAN on 2023-02-24.

@andrie andrie closed this as completed Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants