Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make test/score that will work for algebraically defined nonlinear models. #1669

Open
uvnikgupta opened this issue Jul 26, 2023 · 42 comments
Open

Comments

@uvnikgupta
Copy link

Loading the attached csv throws the following exception:

Infer demiliter for file: 20_nodes_normal.csv Exception in thread "AWT-EventQueue-0" java.lang.NoSuchMethodError: java.nio.ByteBuffer.clear()Ljava/nio/ByteBuffer; at edu.pitt.dbmi.data.reader.util.TextFileUtils.inferDelimiter(TextFileUtils.java:135) at edu.cmu.tetradapp.editor.LoadDataSettings.getInferredDelimiter(LoadDataSettings.java:882) at edu.cmu.tetradapp.editor.LoadDataSettings.basicSettings(LoadDataSettings.java:503) at edu.cmu.tetradapp.editor.LoadDataDialog.showDataLoaderDialog(LoadDataDialog.java:165) at edu.cmu.tetradapp.editor.LoadDataAction.actionPerformed(LoadDataAction.java:91) at javax.swing.AbstractButton.fireActionPerformed(AbstractButton.java:2022) at javax.swing.AbstractButton$Handler.actionPerformed(AbstractButton.java:2348) at javax.swing.DefaultButtonModel.fireActionPerformed(DefaultButtonModel.java:402) at javax.swing.DefaultButtonModel.setPressed(DefaultButtonModel.java:259) at javax.swing.AbstractButton.doClick(AbstractButton.java:376) at javax.swing.plaf.basic.BasicMenuItemUI.doClick(BasicMenuItemUI.java:842) at javax.swing.plaf.basic.BasicMenuItemUI$Handler.mouseReleased(BasicMenuItemUI.java:886) at java.awt.Component.processMouseEvent(Component.java:6539) at javax.swing.JComponent.processMouseEvent(JComponent.java:3324) at java.awt.Component.processEvent(Component.java:6304) at java.awt.Container.processEvent(Container.java:2239) at java.awt.Component.dispatchEventImpl(Component.java:4889) at java.awt.Container.dispatchEventImpl(Container.java:2297) at java.awt.Component.dispatchEvent(Component.java:4711) at java.awt.LightweightDispatcher.retargetMouseEvent(Container.java:4904) at java.awt.LightweightDispatcher.processMouseEvent(Container.java:4535) at java.awt.LightweightDispatcher.dispatchEvent(Container.java:4476) at java.awt.Container.dispatchEventImpl(Container.java:2283) at java.awt.Window.dispatchEventImpl(Window.java:2746) at java.awt.Component.dispatchEvent(Component.java:4711) at java.awt.EventQueue.dispatchEventImpl(EventQueue.java:760) at java.awt.EventQueue.access$500(EventQueue.java:97) at java.awt.EventQueue$3.run(EventQueue.java:709) at java.awt.EventQueue$3.run(EventQueue.java:703) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:84) at java.awt.EventQueue$4.run(EventQueue.java:733) at java.awt.EventQueue$4.run(EventQueue.java:731) at java.security.AccessController.doPrivileged(Native Method) at java.security.ProtectionDomain$JavaSecurityAccessImpl.doIntersectionPrivilege(ProtectionDomain.java:74) at java.awt.EventQueue.dispatchEvent(EventQueue.java:730) at java.awt.EventDispatchThread.pumpOneEventForFilters(EventDispatchThread.java:205) at java.awt.EventDispatchThread.pumpEventsForFilter(EventDispatchThread.java:116) at java.awt.EventDispatchThread.pumpEventsForHierarchy(EventDispatchThread.java:105) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:101) at java.awt.EventDispatchThread.pumpEvents(EventDispatchThread.java:93) at java.awt.EventDispatchThread.run(EventDispatchThread.java:82) [20_nodes_normal.csv](https://github.com/cmu-phil/tetrad/files/12176485/20_nodes_normal.csv)
@jdramsey
Copy link
Collaborator

Actually your file didn't come through; you may need to zip it before attaching it (I've found)...

@jdramsey
Copy link
Collaborator

One second I found your link...

@jdramsey
Copy link
Collaborator

jdramsey commented Jul 26, 2023

Ah. It's not a covariance matrix. You can load it as tabular data--see the picture I took.

Screenshot 2023-07-26 at 3 26 00 PM

@jdramsey
Copy link
Collaborator

Hold on, sorry, you didn't actually say it was a covariance matrix. But huh, it loads for me..... can you tell me more about how you're trying to load it?

@kvb2univpitt
Copy link
Collaborator

@uvnikgupta What version of Java are you using?

@uvnikgupta
Copy link
Author

Java version:
openjdk version "1.8.0_332"
OpenJDK Runtime Environment (Temurin)(build 1.8.0_332-b09)
OpenJDK 64-Bit Server VM (Temurin)(build 25.332-b09, mixed mode)

I am launching the jar using :
java -Xmx2G -jar tetrad-gui-7.4.0-launch.jar
image
image
image

@jdramsey
Copy link
Collaborator

Thanks for the update. Sorry, I was multitasking yesterday. This is a bug we know about (thanks @kvb2univpitt). The issue (if you want to know) is that Oracle changed the implementation of the ByteBuffer class so that it's incompatible between version 1.8 and versions > 1.8. It's this bug:

https://www.morling.dev/blog/bytebuffer-and-the-dreaded-nosuchmethoderror/

except in your case it's the clear() method that's the problem and not the position() method. You're using OpenJDK 1.8, I'm guessing on a Linux box? (Actually can you confirm that?) What I'll do (sorry just trying different things here) is the casting they suggest in the article to see if it will work in OpenJDK1.8 for me. (It needs to work both for 1.8 and for > 1.8 unfortunately, which is the issue.) Unfortunately I'm on a Mac at the moment and the only JDK 1.8 I can get anymore is Amazon's, and it's not a problem there. When I get back home today I'll try installing OpenJDK 1.8 on my Windows laptop (I think I can still do that, though I can no longer get it from M$) and test it there. But really what I need to do is test it on Linux, using OpenJDK 1.8, and I don't have a Linux box currently.

If I made you a version (or maybe two versions) to test, would you be willing to try them out on your machine? That would help a lot.

@uvnikgupta
Copy link
Author

@jdramsey, Thanks a lot for explaining the issue.
I am using Widows 10.
Yes, I am ok to try the test versions

@jdramsey
Copy link
Collaborator

Awesome--Let me grab the Mac version now and test it, and then I can download the Windows one later and test it there. Fingers crossed! We (well @kvb2univpitt) were thinking of rewriting that section of code without using ByteBuffer, but hopefully this fixes it without that effort.

@jdramsey
Copy link
Collaborator

Actually they're not providing any Mac options--it's in their selector but you only get Windows options in the list. I'm at the office right now but can do this later when I get home; my Windows laptop is there.

I just tested it using Amazon's Corretto 1.8 on Mac and it works there, though I suspect Amazon may have gone in and fixed the issue internally.

@jdramsey
Copy link
Collaborator

Oh hold on, they did have it! It's just that their dropdown was broken; I had to select "all" and then the Mac options showed up. I test it--it works! That gives me some confidence that it will work on Windows as well using the a Windows 1.8 download from this site, but I can test it later.

@kvb2univpitt
Copy link
Collaborator

The problem goes away if you use Java 11 and above.

@jdramsey
Copy link
Collaborator

@kvb2univpitt I am motivated to figure it out because we have users who are not in a position to grab a newer version of Java. I may have figured it out though--I'll let you know! I'm going to test it now on Windiows.

@uvnikgupta
Copy link
Author

@kvb2univpitt I am motivated to figure it out because we have users who are not in a position to grab a newer version of Java. I may have figured it out though--I'll let you know! I'm going to test it now on Windiows.

I am one of those in that group :)

@kvb2univpitt
Copy link
Collaborator

kvb2univpitt commented Jul 27, 2023

@jdramsey We definitely need to get rid of the ByteBuffer. By "we" I mean "me".

@jdramsey
Copy link
Collaborator

jdramsey commented Jul 28, 2023

@uvnikgupta @kvb2univpitt Could you both try to break this version? I.e., launch it, try to load a dataset...

https://s01.oss.sonatype.org/content/repositories/snapshots/io/github/cmu-phil/tetrad-gui/7.4.0-SNAPSHOT/tetrad-gui-7.4.0-20230728.001143-5-launch.jar

If it works I will tell you what I did.

@uvnikgupta
Copy link
Author

uvnikgupta commented Jul 28, 2023

@uvnikgupta @kvb2univpitt Could you both try to break this version? I.e., launch it, try to load a dataset...

https://s01.oss.sonatype.org/content/repositories/snapshots/io/github/cmu-phil/tetrad-gui/7.4.0-SNAPSHOT/tetrad-gui-7.4.0-20230728.001143-5-launch.jar

If it works I will tell you what I did.

Sure. On it :)

Tried different datasets and it seems to work pretty fine now 👍
Thanks for the quick fix

@uvnikgupta
Copy link
Author

Tried a few more and data loading + Search works flawlessly. The only issue now is the the resulting graph is nowhere close to the actual graph :( I guess that is state of the existing discovery algorithms due to the nature of the problem.

@jdramsey
Copy link
Collaborator

I'm very curious what experience Kevin has. I compiled this under Corretto 1.8 and have no trouble running under 1.8 or 11 on my Mac, so if you have no trouble on Windows, I'll try under 11 under Windows.

Not sure what to say about the content. Maybe if you tell me the general nature of the problem and what you've tried I could comment?

@uvnikgupta
Copy link
Author

I am loading the data and connecting to the search box. Then executing search using different algorithms. Finally comparing the result with the actual DAG. The data and the actual DAG is attached for your reference
20_nodes_normal.csv
image

BTW, I encountered a Null pointer issue when I tried to use the "Regression"
image

@cg09
Copy link

cg09 commented Jul 28, 2023 via email

@uvnikgupta
Copy link
Author

not able to attach my data generator .py file. So below is the formulae:

"A1": "0.0",
"A2": "0.0",
"A3": "0.0",
"A4": "0.0",
"A5": "0.0",
"A6": "0.0",
"A7": "0.0",
"A8": "0.0",
"B1": 'data_2["A1"]**2',
"B2": 'data_2["A1"]',
"C2": 'np.sqrt(np.abs(data_2["B1"]))',
"C3": 'data_2["B1"] * data_2["B2"]',
"D2": 'data_2["C2"]**2 + data_2["C3"] - data_2["A2"]**2',
"C4": 'data_2["B2"]**3',
"D3": 'np.sqrt(np.abs(data_2["C4"]))',
"B3": 'data_2["A4"]**2 + data_2["A5"]',
"C1": 'data_2["B3"]**2',
"D1": 'np.round(np.mod(1000data_2["C1"], 10), 3)',
"E1": 'np.abs(data_2["A3"])**2/(data_2["D1"] + .001)',
"F1": '2
data_2["D2"] + data_2["D3"] - data_2["E1"]data_2["A6"] + 8data_2["A7"]/data_2["A8"]'

I add np.random.normal(loc=5, scale=1, size=self.size) to each of the variables above

@jdramsey
Copy link
Collaborator

They are not terribly Gaussian. By the way @uvnikgupta if you'd like to switch to email I'm happy. @cg09 if you load up the data that was sent in the version of Tetrad given above and use the Plot Matrix tool you can see the distributions of the variables.

@uvnikgupta
Copy link
Author

uvnikgupta commented Jul 28, 2023

They are not terribly Gaussian. By the way @uvnikgupta if you'd like to switch to email I'm happy. @cg09 if you load up the data that was sent in the version of Tetrad given above and use the Plot Matrix tool you can see the distributions of the variables.

yes, I can share my data generation python code then. Please DM me at

@jdramsey
Copy link
Collaborator

That's what I thought--nonlinear algebraic functions generated them...You know we were just thinking of how to incorporate this sort of nonlinear additivity into a fast score...

@cg09
Copy link

cg09 commented Jul 28, 2023 via email

@uvnikgupta
Copy link
Author

uvnikgupta commented Jul 28, 2023 via email

@jdramsey
Copy link
Collaborator

jdramsey commented Aug 3, 2023

Sorry I haven't gotten back to you--we're all at the UAI conference here in Pittsburgh. I thought about the 1.8 issue and think the thing to do is to publish a separate version compiled under 1.8. I'm going to try to get this done today.

@uvnikgupta
Copy link
Author

uvnikgupta commented Aug 5, 2023 via email

@jdramsey
Copy link
Collaborator

Sorry for the delay--we had a couple of dissertation defenses in the last week. Getting back to this.

I need to look at your Python code more carefully to see what assumptions are being honored. It wasn't clear to me on my first gander.

We had made a nonlinear simulator using Gaussian processes (and additive simulation) and GRaSP/BOSS did pretty well on that, but when we looked at the distributions, all of the functions had linear trends. it's been noticed in the past (I can get you a reference) that linear Gaussian scores like LG BIC tend to do OK whenever there are linear trends, and besides this, GRaSP/BOSS tend to do OK under a rather significant weakening of the faithfulness assumption, so some "sins" can be forgiven by the procedure. What I know will give the procedure difficult are the square and absolute value functions you use, which give dependencies but not becuase of linear trends. I'm wondering if you took those out how well the algorithms would do?

@jdramsey
Copy link
Collaborator

@uvnikgupta Wondering, have you had a chance to look at this?

@cg09
Copy link

cg09 commented Aug 22, 2023 via email

@jdramsey
Copy link
Collaborator

Oh, I'm just trying to review outstanding issues and see what needs to be done. This particular issue involves trying to generalize to more algebraic functional forms for larger models, something I'm interested in and thinking of how to do.

@jdramsey
Copy link
Collaborator

I mean we do have the KCI general independence test, but it won't scale far enough for the problems suggested here. Also, it would be good to have a general score, and we've never implemented Biwei's general score in Tetrad, but Biwei's score won't handle these problems; there are too many variables, and the sample sizes are too large. I've been thinking about scores that are more general than LG but perhaps not completely general, which could handle a variety of distributions (but perhaps not all) and might be fast. I ask everyone I talk to whether they can think of such scores but no takers so far. I agree though it would be nice to have and a contribution to the literature.

@cg09
Copy link

cg09 commented Aug 22, 2023 via email

@jdramsey
Copy link
Collaborator

Interesting....

@jdramsey
Copy link
Collaborator

jdramsey commented Oct 2, 2023

@uvnikgupta Sorry, I ended up with so many thing to do at beginning of term that I was losing track of them in my head. Let me write this one down so I can work on it some.

(I made a long to-do list recently and ordered it in terms of priorities. I think this is going to help.)

@jdramsey
Copy link
Collaborator

jdramsey commented Oct 2, 2023

@uvnikgupta Let me characterize the problem this way. Is there a test/score that could be used that would recover at least approximately the correct DAG when the data are generated with simple combinations of functions? What combinations can work and which can't?

Is that fair?

@jdramsey
Copy link
Collaborator

jdramsey commented Oct 2, 2023

@uvnikgupta Perhaps one of us should look to see if there's any literature on this already.

@uvnikgupta
Copy link
Author

@jdramsey sorry, I am not sure if I understand your question completely. Are we trying to find a score that would compare a set of equations to the generated DAG? If yes, then I am do not understand why. The reason being that if I know the equations, I can already create the original DAG and then use scores like SHD to compare the generated vs the original graph.

@jdramsey
Copy link
Collaborator

jdramsey commented Oct 5, 2023

@uvnikgupta That is, does anyone have a strategy for search a dataset with > 20 variables where the variables are generated by an SEM with the kinds of functions you're using? Also, with the sample sizes you have in mind?

You could use a general test like KCI, but it won't scale that far.

@jdramsey jdramsey changed the title Loading csv throws exception Make test/score that will work for algebraically defined nonlinear models. Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants