Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with algorithm fges-mb #74

Open
Zarmas opened this issue Aug 1, 2022 · 28 comments
Open

Issue with algorithm fges-mb #74

Zarmas opened this issue Aug 1, 2022 · 28 comments

Comments

@Zarmas
Copy link

Zarmas commented Aug 1, 2022

Hello,

I tried using the fges-mb algorithm, in an attempt to work around the issue I had when running the fges algorithm. The system I use has 14 threads and 128gb RAM. The data I enter are approximately 20000 continuous variables and 3000 samples. One of the commands I used was the following:

java -Xmx128G -jar ~/bin/causal-cmd/causal-cmd-1.4.1-SNAPSHOT-jar-with-dependencies.jar --data-type continuous --default --delimiter tab --json-graph --algorithm fges-mb --dataset dataset.txt --target targetname --score sem-bic-score --penaltyDiscount 10.0 --maxDegree 100

As with fges, the largest data size it could handle was 2000 variables with 100 samples, printing this additional message: "heuristicSpeedup = false".
The --parallelized option was not available for this algorithm, but it could run parallel when using the --default switch. The issue with that was that I could not change the other options for the algorithm, even if I specified them in the command (e.g. in the log it is "verbose: yes" even if I specify "--verbose no" in the command). Is there a way to get parallel execution while being able to choose the other algorithm options?

After trying to run the fges-mb algorithm with a larger data size I got errors. When used a 2000 variable-2000sample dataset, I got "Exception in thread "main" java.util.ConcurrentModificationException".
When I used an even larger dataset(i.e. 10000variables-3000samples) I got the following error line: Exception in thread "main" java.lang.NullPointerException. After displaying the errors, the executable exits.

I attach the full error messages to this ticket. I tried running the fges-mb algorithm with multiple different variables as target. Is there a way to get around these errors?

One other issue I had was when I was using the fges-mb algorithm on the whole dataset, it took too long to even display the message, approximately one hour, meaning it takes too long and repeating it for every variable will take years.

All the tests were done on the same 14 thread and 128gb RAM system, but the errors on the fges-mb algorithm were also presented on a different system with less ram and fewer threads. On the paper accompanying the caucal-cmd executable(Ramsey Et al. 2017), it is stated that it can be used for a million variables and more, but it seems impossible to use it on more than 2000 variables, on an above average system.

Thank you in advance,
George
ConcurrentModificationException.txt
NullPointerException.txt

@jdramsey
Copy link
Contributor

jdramsey commented Aug 1, 2022 via email

@jdramsey
Copy link
Contributor

jdramsey commented Aug 1, 2022 via email

@Zarmas
Copy link
Author

Zarmas commented Aug 2, 2022

Yes, I am using the latest version and I have increased the heap size. You can see the entire command I use on the first message of the post.

@jdramsey
Copy link
Contributor

jdramsey commented Aug 2, 2022

Hmmmm.... 20,000 variables should not be a problem, unless the data loader can't load it (@kvb2univpitt ?). The execution time may be due to the density of the graph. Let me try a 20,000 variable problem with N = 3000 in the Tetrad GUI...

@jdramsey
Copy link
Contributor

jdramsey commented Aug 2, 2022

Wait, is this continuous or discrete or mixed?

@Zarmas
Copy link
Author

Zarmas commented Aug 2, 2022

Continuous.

@jdramsey
Copy link
Contributor

jdramsey commented Aug 2, 2022

Oh wow, this is breaking the GUI. Hmm.. let me think how I did that before for the million variable problem. I was using different data structures at the time, a different simulator ('large scale simulator", which we still have), and a different covariance matrix class (which calculated covariances on the fly instead of storing them in a covariance matrix). The data loading is @kvb2univpitt's, so hopefully he can comment on that. I suppose I could give you a version that uses the covariance matrix on the fly. For 3000 samples it might be a bit slow but it should finish for you, depending on the density of your true model.

@jdramsey
Copy link
Contributor

jdramsey commented Aug 2, 2022

I'm attending UAI virtually in the Netherlands since 3 AM this morning so am a bit sleep deprived, but if I can catch some sleep maybe I could work on this for you--that is, the parts that I can.

@jdramsey
Copy link
Contributor

jdramsey commented Aug 2, 2022

Wait, you should be able to tell if the data loading is successful. Is it?

@Zarmas
Copy link
Author

Zarmas commented Aug 2, 2022

This is the last message on the .txt:
Start search: Tue, August 02, 2022 02:28:44 PM

@kvb2univpitt
Copy link
Contributor

Sorry for joining in the conversation late. If the search has started, that means the data has already been read in. So, it seems like the data loading part is fine. Based on the attached error files, it seems to break when running FGES. The reason why you're getting ConcurrentModificationException is because the data structure, HashMap in this case, is being modified by one thread and being read by another thread. @jdramsey It seems like parallelization might not be working correctly.

@jdramsey
Copy link
Contributor

jdramsey commented Feb 8, 2023

@Zarmas Sorry just coming back to looking at these causal-cmd issues now. I will try this again tomorrow.

@jdramsey
Copy link
Contributor

jdramsey commented Feb 8, 2023

@Zarmas Actually we were having some concurrency problems with FGES in another project and I think I just fixed them there. I'm thinking I just fixed them for your application as well. I looked at your error messages and think I've addressed those problem. (Due to concurrency, some nodes were being removed and then "removed" again, causing issues.)

We're going to do a new release soon of Tetrad and causal-cmd; hopefully these problems will go away.

@Zarmas
Copy link
Author

Zarmas commented Mar 1, 2023

@jdramsey Hello again, I am still getting the "ConcurrentModificationException" error message when using the fges-mb algorithm. The full error message is shown in the original message of this post. I have also encountered a problem when using the fges algorithm and posted it on a closed relative to fges issue, but can't re-open it since I wasn't the one that closed it.

@jdramsey
Copy link
Contributor

jdramsey commented Mar 1, 2023 via email

@jdramsey
Copy link
Contributor

jdramsey commented Mar 1, 2023 via email

@jdramsey
Copy link
Contributor

jdramsey commented Mar 1, 2023 via email

@Zarmas
Copy link
Author

Zarmas commented Mar 1, 2023

I am using the latest version, the original error was in a previous one, but I am still getting it in the latest one. This is the new error message:

Exception in thread "main" java.util.ConcurrentModificationException
at java.base/java.util.HashMap$HashIterator.nextNode(HashMap.java:1584)
at java.base/java.util.HashMap$KeyIterator.next(HashMap.java:1607)
at edu.cmu.tetrad.graph.EdgeListGraph.isAdjacentTo(EdgeListGraph.java:414)
at edu.cmu.tetrad.search.FgesMb.calculateArrowsForward(FgesMb.java:1021)
at edu.cmu.tetrad.search.FgesMb.access$600(FgesMb.java:57)
at edu.cmu.tetrad.search.FgesMb$1AdjTask.compute(FgesMb.java:995)
at edu.cmu.tetrad.search.FgesMb$1AdjTask.compute(FgesMb.java:933)
at java.base/java.util.concurrent.RecursiveTask.exec(RecursiveTask.java:95)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinTask.tryExternalHelp(ForkJoinTask.java:381)
at java.base/java.util.concurrent.ForkJoinTask.externalAwaitDone(ForkJoinTask.java:323)
at java.base/java.util.concurrent.ForkJoinTask.doJoin(ForkJoinTask.java:398)
at java.base/java.util.concurrent.ForkJoinTask.join(ForkJoinTask.java:721)
at java.base/java.util.concurrent.ForkJoinPool.invoke(ForkJoinPool.java:2432)
at edu.cmu.tetrad.search.FgesMb.reevaluateForward(FgesMb.java:1016)
at edu.cmu.tetrad.search.FgesMb.fes(FgesMb.java:810)
at edu.cmu.tetrad.search.FgesMb.search(FgesMb.java:233)
at edu.cmu.tetrad.algcomparison.algorithm.oracle.cpdag.FgesMb.search(FgesMb.java:70)
at edu.pitt.dbmi.causal.cmd.tetrad.TetradRunner.lambda$runSearch$3(TetradRunner.java:214)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at edu.pitt.dbmi.causal.cmd.tetrad.TetradRunner.runSearch(TetradRunner.java:213)
at edu.pitt.dbmi.causal.cmd.tetrad.TetradRunner.runAlgorithm(TetradRunner.java:130)
at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.runTetrad(CausalCmdApplication.java:149)
at edu.pitt.dbmi.causal.cmd.CausalCmdApplication.main(CausalCmdApplication.java:113)

@jdramsey
Copy link
Contributor

jdramsey commented Mar 1, 2023

I'm curious why I can't reproduce the error. A couple of questions. Can you tell me what platform you're on and what version of Java you're using? Maybe there's a known issue

@jdramsey
Copy link
Contributor

jdramsey commented Mar 1, 2023

Another thing I could try to make that a ConcurrentHashMap instead of a HashMap...

@jdramsey
Copy link
Contributor

jdramsey commented Mar 1, 2023

@jdramsey
Copy link
Contributor

jdramsey commented Mar 1, 2023

@Zarmas also could you give me some hints as to how to reproduce the error? I mean aside from sending me your data?

@Zarmas
Copy link
Author

Zarmas commented Mar 2, 2023

I am using it on ubuntu. The java version of the error message is 15.0.2 but I also got it on java 18.0.2 and java 11.0.18. This error appears when I am trying to run a sample size of 150 or more. It runs with no problems with 100 samples.

@Zarmas
Copy link
Author

Zarmas commented Mar 3, 2023

While I ran it on 20000 variables and a low sample size with no problems, when I tried to run it on a smaller variable size but with a larger sample size and got the concurrentModificationException error. Seems that the large sample size is what is causing it.

@Zarmas
Copy link
Author

Zarmas commented Mar 6, 2023

Another issue I came across while running the fges-mb algorithm, is that changing the maxDegree parameter changed the results, even if the degree of the graph didn't reach the number I set. For example with max degree of 10 I got a graph with 4 nodes and 3 edges, and with max degree 20 I got a graph with 2 nodes and 1 edge. All the other parameters and target were the same.

@jdramsey
Copy link
Contributor

jdramsey commented Mar 8, 2023

@Zarmas Sorry, I'm still thinking about this last one. I don't think it would solve anything, but I should update the FGES-MB code to use all of the updates to FGES since the last time I did that. I'm in the middle of other things, but I'll get to that.

@jdramsey
Copy link
Contributor

jdramsey commented Mar 16, 2023

@Zarmas I'm considering redoing the FGES-MB algorithm to use the latest FGES code. I may do that, not for the next release next week but for a subsequent release. But in light of your comments, I'm considering removing the max degree parameter from the code. Is that parameter helpful? If you set the parameter too low, you may impact the results, but it may be misleading if you don't know what to set it to.

@Zarmas
Copy link
Author

Zarmas commented Mar 16, 2023

I use the max degree parameter in order to make it easier to run with a lower penalty. For example I could run with penalty 4 only if I set maxDegree to 10. I understand that if there are more than 10 degrees associated with a node it may not show them, but it is curious to me why changing the max degree changes the results, with the same target and both results having less degrees than what was set in the parameter as said here: #74 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants