Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TETRAD cmd search doesn't work for dataset with more than 100 nodes #122

Closed
biotech25 opened this issue Dec 27, 2015 · 17 comments
Closed

TETRAD cmd search doesn't work for dataset with more than 100 nodes #122

biotech25 opened this issue Dec 27, 2015 · 17 comments

Comments

@biotech25
Copy link

Hi Dr. Ramsey,

I have a question about running command line TETRAD PC Search algorithm. I am running it on a dataset that has several hundreds nodes, but it doesn't work. I found that command line TETRAD doesn't work if the dataset has more than 100 nodes (predictors and target), which means that it works for a dataset with only up to 100 nodes. The number of cases doesn't matter; it works well for a dataset with > 1000 cases, once the number of nodes is less than 100. However, it seems that it doesn't work if the number of nodes exceeds 100. Innumerable error messages were made and it flew over the windows command. I got a screenshot of error messages and attached here. Command line TETRAD has a limitation to be run on a dataset with less than 100 nodes or something?

Sanghoon

errormessage_morethan100nodes

@jdramsey
Copy link
Collaborator

There's no such limitation I know of. The error messages didn't come
through.

On Sunday, December 27, 2015, biotech25 notifications@github.com wrote:

Hi Dr Ramsey,

I have a question about running command line TETRAD PC Search algorithm I
am running it on a dataset that has several hundreds nodes, but it doesn't
work I found that command line TETRAD doesn't work if the dataset has more
than 100 nodes (predictors and target), which means that it works for a
dataset with only up to 100 nodes The number of cases doesn't matter; it
works well for a dataset with > 1000 cases, once the number of nodes is
less than 100 However, it seems that it doesn't work if the number of nodes
exceeds 100 Innumerable error messages were made and it flew over the
windows command I got a screenshot of error messages and attached here
Command line TETRAD has a limitation to be run on a dataset with less than
100 nodes or something?

Sanghoon

[image: errormessage_morethan100nodes]
https://camo.githubusercontent.com/9ca7c9d8ef540aba3cda2c651f462d7a7769d5cb/68747470733a2f2f636c6f756467697468756275736572636f6e74656e74636f6d2f6173736574732f31353735383039332f31323031303934302f32343334613437322d616338392d313165352d386236622d313038646561663836346633706e67


Reply to this email directly or view it on GitHub
#122.

Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
143 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213

jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey

@biotech25
Copy link
Author

Hmm.. I had tested the dataset with different number of nodes, such as 90, 100, 101, 105, 110, or 4136. I got the error message when I used a dataset with only the number of nodes > 100.

I attached the dataset which has 502 nodes (columns). Could you test it when you have time? It won't work. Then, could cut the dataset with less than 100 nodes (columns) and run it. Then, it will work.
I tested it with different number of nodes in the TETRAD GUI, and I got the same results; it works on only less than 100 nodes.

The error message can't come through because innumerable messages flew over very quickly.

tgen_imputed_apoe_ChiSqaure_501SNPs_p0.001.txt

@jdramsey
Copy link
Collaborator

That files loads fine for me in the interface; the method used is the same
as for the cmd line gadget. I'm wondering if you're running out of memory.
Try adding -Xmx4g to your java command..i.e. something like

java -cp ... -Xmx4g ...

On Sun, Dec 27, 2015 at 12:25 PM, biotech25 notifications@github.com
wrote:

Hmm.. I had tested the dataset with different number of nodes, such as 90,
100, 101, 105, 110, or 4136. I got the error message when I used a dataset
with only the number of nodes > 100.

I attached the dataset which has 502 nodes (columns). Could you test it
when you have time? It won't work. Then, could cut the dataset with less
than 100 nodes (columns) and run it. Then, it will work.
I tested it with different number of nodes in the TETRAD GUI, and I got
the same results; it works on only less than 100 nodes.

The error message can't come through because innumerable messages flew
over very quickly.

tgen_imputed_apoe_ChiSqaure_501SNPs_p0.001.txt
https://github.com/cmu-phil/tetrad/files/72755/tgen_imputed_apoe_ChiSqaure_501SNPs_p0.001.txt


Reply to this email directly or view it on GitHub
#122 (comment).

Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
135 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213

jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey

@biotech25
Copy link
Author

Thank you for your advice. I followed your advice and tested many things. First, I tested the heap size extension (-cp -Xmx4g) on the dataset with 79 nodes, and confirmed it was working. Then, I applied the heap size extension on the dataset with 502 nodes.

java -cp -Xmx4096m -jar lib-tetrad-5.3.0-20151113.150857-1-tetradcmd.jar -data tgen_imputed_apoe_ChiSqaure_501SNPs_p0.001.txt -datatype discrete -algorithm pc -depth -1 -significance 0.01

When I set the heap size extension and ran command line TETRAD on the dataset with 502 nodes, I still got the same error massages flowing over fast, but I just waited until it ends. Even though I got error messages, surprisingly I got TETRAD output. The output correctly detected direct causes as I expected.

Therefore, this time I didn't set the heap size extention, ran TETRAD search, and just waited until it ends. Yes, I got the output files, but the output content (direct causes and the number of edge pairs) was different from the first output. It is strange.

I set the heap size extension and ran TETRAD search several times. Strangely, I got different output contents whenever I ran TETRAD with the same condition. I tested this in the Linux workstation, but I got the same problem. I don't think the heap map extension is working at list in my computer as I still get the same error messages and different output contents in each TETRAD run.

I am going to test more tomorrow and let you know if I find something.

@jdramsey
Copy link
Collaborator

Sorry, you can leave out the -cp the way you have it. You already have a
-jar tag. I don't think it will make a difference.

java -Xmx4096m -jar lib-tetrad-5.3.0-20151113.150857-1-tetradcmd.jar -data
tgen_imputed_apoe_ChiSqaure_501SNPs_p0.001.txt -datatype discrete
-algorithm pc -depth -1 -significance 0.01

On Sun, Dec 27, 2015 at 11:25 PM, biotech25 notifications@github.com
wrote:

Thank you for your advice. I followed your advice and tested many things.
First, I tested the heap size extension (-cp -Xmx4g) on the dataset with 79
nodes, and confirmed it was working. Then, I applied the heap size
extension on the dataset with 502 nodes.

java -cp -Xmx4096m -jar lib-tetrad-5.3.0-20151113.150857-1-tetradcmd.jar
-data tgen_imputed_apoe_ChiSqaure_501SNPs_p0.001.txt -datatype discrete
-algorithm pc -depth -1 -significance 0.01

When I set the heap size extension and ran command line TETRAD on the
dataset with 502 nodes, I still got the same error massages flowing over
fast, but I just waited until it ends. Even though I got error messages,
surprisingly I got TETRAD output. The output correctly detected direct
causes as I expected.

Therefore, this time I didn't set the heap size extention, ran TETRAD
search, and just waited until it ends. Yes, I got the output files, but the
output content (direct causes and the number of edge pairs) was different
from the first output. It is strange.

I set the heap size extension and ran TETRAD search several times.
Strangely, I got different output contents whenever I ran TETRAD with the
same condition. I tested this in the Linux workstation, but I got the same
problem. I don't think the heap map extension is working at list in my
computer as I still get the same error messages and different output
contents in each TETRAD run.

I am going to test more tomorrow and let you know if I find something.


Reply to this email directly or view it on GitHub
#122 (comment).

Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
135 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213

jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey

@biotech25
Copy link
Author

Thank you for your reply. I think you are right. When I left out -cp, I got this error message.


Invalid maximum heap size: -Xmx4096m
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.


So, I set just "-Xmx1024m" and ran it again. Then, I got the same innumerable error messages like yesterday. When I set "-Xmx3000m" or "-Xmx2000m" , I got a single error message like below.


Could not reserve enough space for 3072000KB object heap

Error occurred during initialization of VM
Could not reserve enough space for 2048000KB object heap
Start


I googled to find some solution; I may have to change 'system environment variable' to start and reserve enough JVM heap size. I am going to try and will you updated.

Thank you,
Sanghoon

@jdramsey
Copy link
Collaborator

You don't have enough memory on your machine, I don't think.

On Mon, Dec 28, 2015 at 10:03 AM, biotech25 notifications@github.com
wrote:

Thank you for your reply. I am sorry but I had tested it already. If I

leave out -cp, I get this error message.

Invalid maximum heap size: -Xmx4096m
The specified size exceeds the maximum representable size.
Error: Could not create the Java Virtual Machine.

Error: A fatal exception has occurred. Program will exit.

So, I set just "-Xmx1024m" and ran it again. Then, I got the same
innumerable error messages like yesterday. When I set "-Xmx3000m" or

"-Xmx2000m" , I got a single error message like below.

Could not reserve enough space for 3072000KB object heap

Error occurred during initialization of VM
Could not reserve enough space for 2048000KB object heap

Start

I googled to find some solution; I may have to change 'system environment
variable' to start and reserve enough JVM heap size. I am going to try and
will you updated.

Thank you,
Sanghoon


Reply to this email directly or view it on GitHub
#122 (comment).

Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
135 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213

jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey

@biotech25
Copy link
Author

I agree with you. I need to reboot the Windows server or test it on a machine that I can reboot. I will update you later.

@biotech25
Copy link
Author

I reboot computer and re-ran, or tried in on Mac or another Windows server. It still doesn't work. I got the same error messages plus "java.lang.NullPointerException". I don't understand why I don't have enough memory on every machine. I am going to test more..

@jdramsey
Copy link
Collaborator

Tell me about the null pointer exception.

J

On Mon, Dec 28, 2015 at 12:29 PM, biotech25 notifications@github.com
wrote:

I reboot computer and re-ran, or tried in on Mac or another Windows
server. It still doesn't work. I got the same error messages plus
"java.lang.NullPointerException". I don't understand why I don't have
enough memory on every machine. I am going to test more..


Reply to this email directly or view it on GitHub
#122 (comment).

Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
135 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213

jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey

@biotech25
Copy link
Author

I attached the screen shots of the error message. The null pointer exception doesn't tell me a lot.

The first screen shot is the first part of the error message as soon as I execute the command line TETRAD. Those messages flow over very fast for 1~2 seconds. And then, as you see the second screen shot, innumerable lines of 'java.lang.NullPointerException' flow over for about 2 minutes. It is still being written. Please let me know if there is more I can explain.

errormessage_part1
errormessage_part2

@jdramsey
Copy link
Collaborator

Do you have any missing values in your data?

On Mon, Dec 28, 2015 at 3:16 PM, biotech25 notifications@github.com wrote:

I attached the screen shots of the error message. The null pointer
exception doesn't tell me a lot.

The first screen shot is the first part of the error message as soon as I
execute the command line TETRAD. Those messages flow over very fast for 1~2
seconds. And then, as you see the second screen shot, innumerable lines of
'java.lang.NullPointerException' flow over for about 2 minutes. It is still
being written. Please let me know if there is more I can explain.

[image: errormessage_part1]
https://cloud.githubusercontent.com/assets/15758093/12024784/ced7932a-ad75-11e5-82e2-4b7953b6964a.png
[image: errormessage_part2]
https://cloud.githubusercontent.com/assets/15758093/12024785/ced9d18a-ad75-11e5-9626-2d7637d0ee5a.png


Reply to this email directly or view it on GitHub
#122 (comment).

Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
135 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213

jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey

@biotech25
Copy link
Author

Well, I had suspected it and tested in many ways. When I open a txt file in Excel, if there is a missing value, I can find it by 'Find and Replace' function. (i tested it after deleting one value purposely) But, I didn't find any missing value. So, I duplicated a dataset with 78 SNPs, which was working well, to make a new dataset with more than 100 nodes; then I got the same problem, which is innumerable error messages flow over. I made the dataset have 104 nodes - still the same problem. I deleted 4 nodes to make it 100 nodes - then, it is working well without error message.

@jdramsey
Copy link
Collaborator

Let me try asking a different way. Can you add any other nodes to the
working set and still have it work? I mean, other than the ones you just
deleted? Maybe the problem is with those 4 nodes?

On Mon, Dec 28, 2015 at 3:41 PM, biotech25 notifications@github.com wrote:

Well, I had suspected it and tested in many ways. When I open a txt file
in Excel, if there is a missing value, I can find it by 'Find and Replace'
function. (i tested it after deleting one value purposely) But, I didn't
find any missing value. So, I duplicated a dataset with 78 SNPs, which was
working well, to make a new dataset with more than 100 nodes; then I got
the same problem, which is innumerable error messages flow over. I made the
dataset have 104 nodes - still the same problem. I deleted 4 nodes to make
it 100 nodes - then, it is working well without error message.


Reply to this email directly or view it on GitHub
#122 (comment).

Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
135 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213

jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey

@biotech25
Copy link
Author

Well, the 4 nodes I deleted are what I duplicated from a dataset which was successful. But, I followed your advice to add any other nodes and tested it; still the same problem. Once I delete any nodes to make the total number of nodes 100, it works well.

@jdramsey
Copy link
Collaborator

I wish I could dig further into this right now, but I'm busy with other
things. Will you have a chance to look at the code?

J

On Mon, Dec 28, 2015 at 3:54 PM, biotech25 notifications@github.com wrote:

Well, the 4 nodes I deleted are what I duplicated from a dataset which was
successful. But, I followed your advice to add any other nodes and tested
it; still the same problem. Once I delete any nodes to make the total
number of nodes 100, it works well.


Reply to this email directly or view it on GitHub
#122 (comment).

Joseph D. Ramsey
Special Faculty and Director of Research Computing
Department of Philosophy
135 Baker Hall
Carnegie Mellon University
Pittsburgh, PA 15213

jsph.ramsey@gmail.com
Office: (412) 268-8063
http://www.andrew.cmu.edu/user/jdramsey

@biotech25
Copy link
Author

I am going to look at the code. I understand that you are busy and I appreciate your help!

@jdramsey jdramsey closed this as completed Jan 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants