Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MDSVALUE in MATLAB breaks when reading Signals #2660

Open
ModestMC opened this issue Dec 7, 2023 · 20 comments · Fixed by #2661
Open

MDSVALUE in MATLAB breaks when reading Signals #2660

ModestMC opened this issue Dec 7, 2023 · 20 comments · Fixed by #2661
Assignees
Labels
api/matlab Relates to the Matlab API branch/alpha This is present on or relates to the alpha branch bug An unexpected problem or unintended behavior US Priority

Comments

@ModestMC
Copy link

ModestMC commented Dec 7, 2023

Affiliation
DIII-D / General Atomics

Version(s) Affected
alpha-7.139.59, best we can confirm is that it worked in 6.1.84 (modifying MATLAB installs is a good amount of work)

Platform
RHEL 8

Describe the bug
While working on compiling our main tokamak simulation for our new cluster, we encountered an error with the MATLAB API. In short, it appears the current logic fails to correctly do the handling necessary to read signals.

To Reproduce
Steps to reproduce the behavior:

  1. Open MATLAB (error first identified on MATLAB 2023b)
  2. mdsconnect to the server and mdsopen the tree/shot of interest
  3. mdsvalue(the problematic node)

Expected behavior
The signal data stored in the node should be returned, instead the Java Connection class fails to properly handle the signal before passing it into the JavatoMATLAB case statement.

Screenshots
The error in action
Screenshot 2023-12-07 at 2 14 49 AM

Old version working
image

Problem Signal's inner contents
image

Lazy incomplete workaround that appears to correctly resolve
image

The apparent section of the code that needs to be amended/fixed/modified
image

Additional context
It may also be worth looking at the actual MTIME signal (can provide more information since you guys should have the necessary access) to confirm whether the syntax used in that node is not problematic, but I'm fairly certain it is not.

I also hoped to look at your existing MATLAB test suite, but your Jenkins instance appears to be down.

@ModestMC ModestMC added the bug An unexpected problem or unintended behavior label Dec 7, 2023
@GabrieleManduchi
Copy link
Contributor

The problem is due to a recent improvement of the Connection object that now is also able to transfer data types other than scalars and array. However the signal that now shows up as result of Connection.get() was not properly handled by the matlab routine JavaToMatlab.m I will fix this in a couple of days.

@mwinkel-dev mwinkel-dev added US Priority api/matlab Relates to the Matlab API branch/alpha This is present on or relates to the alpha branch labels Dec 7, 2023
@ModestMC
Copy link
Author

I attempted a revert of the improvement described in PR #2620 and triggered a rebuild and the error still occurs. What from the changes you introduced at this time would have broken the feature?

I tried loading an older version that predates any of these feature sets (alpha_release-7-108-1) and my autobuilder broke, but in reading over the code I'm not seeing too much.
image
image

@GabrieleManduchi
Copy link
Contributor

Looking into the code, I noted that in the Java Connection there is a (wrong) possibility that in some condition the expression gets directly evaluated (without data()). This would explain what you see and is not related to the APD stuff. I will issue later a PR so that you cat try it

@GabrieleManduchi
Copy link
Contributor

You may try branch gm-fix-apd that should contain, among others, a fix to the problem (basically due to the fact that the java Connection.get() method did not force a data() evaluation).
The checks for the PR are in progress now, but meanwhile you can check that branch in order to verify the problem.

@joshStillerman
Copy link
Contributor

The Jenkins server is undergoing maintenance. In the meanwhile, we will run the tests manually. Can you GA verify that Gabriele's branch fixes this problem?

@ModestMC
Copy link
Author

ModestMC commented Dec 11, 2023

I pulled the new branch and attempted to use it, but got the same error. What did you guys use to test the fix on your end?

@GabrieleManduchi
Copy link
Contributor

Hmmm.. really strange, we tested it on our side on a whole set of MATLAB tests (not yet on jenkins). Could you send me the failing MATLAB code and possibly a description (e.g. via TCL show data) of the accessed node?

@GabrieleManduchi
Copy link
Contributor

BTW the fix is not in the MATLAB MDSplus library, but in the Java mdsobjects library. Did you update it?

@ModestMC
Copy link
Author

ModestMC commented Dec 12, 2023

I believe these are the changes to the Java mdsobjects library you're describing.
Screenshot 2023-12-12 at 10 08 10 AM

Here is the data inside of the node itself that breaks when mdsvalue is called.
Screenshot 2023-12-12 at 10 15 16 AM
It also fails when the node is not self-referential.
Screenshot 2023-12-12 at 10 25 07 AM
Screenshot 2023-12-12 at 10 26 05 AM

@GabrieleManduchi
Copy link
Contributor

Are you sure that your MATLAB is using the right jar? Changing jars version on MATLAB is sometimes tricky....
We tested our version (branch gm-fix-apd) exactly in the same condition of your example and everything works as expected (while the current alpha HAS the problem)

@GabrieleManduchi
Copy link
Contributor

Did you get a chance for checking this?

@ModestMC
Copy link
Author

ModestMC commented Dec 18, 2023

@GabrieleManduchi Not exactly. I've been having to debug our clusterwide MATLAB installation's way of loading MDSplus. For the past several years, we have made MDSplus available to MATLAB by having one symlink that's used for the java modifications MDSplus needs (two lines in classpath.txt and one line in the librarypath.txt). It turns out there's no good way to point the symlink at a user development version of MDSplus (also it means our MATLAB version is functionally pinned).

Anyways I've now spent probably half a day's worth of time collectively trying to figure out if I can somehow get lmod to be able to load in a particular version of MDSplus (we have a way of doing this for the development versions as well) and have MATLAB pull the necessary paths from the environment variables. I'm not sure there's a way to do this (see here), so for now I'm willing to accept defeat.

If you're sure about your fix, I think our only option is going to be to take it on faith that it works and release a new build for our cluster. I got official blessing to monkey with the clusterwide installation to test out the fix and it correctly retrieved signals. I couldn't do any serious/robust testing, but I think this will eliminate the main blocker.

@ModestMC ModestMC mentioned this issue Dec 18, 2023
@mwinkel-dev
Copy link
Contributor

The gm-fix-apd branch passed manual testing on Ubuntu 20 for this fix. See posts in PR #2661.

@mwinkel-dev
Copy link
Contributor

mwinkel-dev commented Dec 22, 2023

Gabriele's posts of 11-Dec-2023 (above) indicate that only some of the PR #2661 files are needed to fix this MATLAB bug. Thus, I created an experimental (private) branch with just the following files from the PR.

deploy/packaging/debian/kernel.noarch
deploy/packaging/redhat/kernel.noarch
include/mdsobjects.h
java/mdsobjects/src/main/java/MDSplus/Connection.java
javamds/mdsobjects.c
mdsobjects/cpp/mdsipobjects.cpp
tdi/treeshr/TreePutDeserialized.fun

When using the experimental branch, MATLAB was able to retrieve and plot a signal. (And for the sake of completeness, I also confirmed that the current alpha branch does fail with the errors shown in the initial bug report.)

Note though that the test-tab.ans file still needs to be updated with the new TreePutDeserialized.fun function. However, the experimental branch passed all other automated tests, and also passed the IDL tests.

If the decision is made to split PR #2661 into new PRs -- one for MATLAB, and the other for APD (aka "Array of Pointers to Descriptors") -- it would be best to have Gabriele review the proposed split.

@joshStillerman
Copy link
Contributor

I thought they were complaining that MATLAB did. not deal with APDs.

@GabrieleManduchi
Copy link
Contributor

Please keep the whole branch since it contains a set of(related) improvements for handling apds over connection, a feature requested by tcv and already extensively tested.

@mwinkel-dev
Copy link
Contributor

Hi @joshStillerman and @GabrieleManduchi -- Thanks for both of your posts. I was merely doing research so that PSFC can discuss the options when the software team meets in early January.

Josh -- General Atomics was not reporting an issue with "array of pointers to descriptors" (APD). This bug #2660 is a more fundamental issue, namely that the MATLAB API cannot retrieve data from a signal. I reproduced that bug by attempting to retrieve a signal from a C-Mod shot. The retrieval failed with the same errors shown in the initial bug report. Note that Gabriele posted above that the fix for the MATLAB signal issue has been included in PR #2661 for the APD issue.

Gabriele -- Thanks for the advice. I do not intend to split PR #2661 into two PRs. Note though that GA has requested its own branch. I was merely researching options in case GA requests that their branch have a MATLAB-only fix (i.e., if GA wants to exclude the APD fix).

Note -- The long-term goal is to run MATLAB tests as part of the build system (however that requires adding some infrastructure). A good starting point will be to write MATLAB test cases that are comparable to those in the IDL Test Harness.

@mwinkel-dev
Copy link
Contributor

mwinkel-dev commented Jan 2, 2024

Results of additional experiments . . .

alpha_release-7-139-49 (PR #2622, 22-Sep-2023, 21ead65) = MATLAB is able to retrieve and plot a signal

alpha_release-7-139-50 (PR #2620, 23-Sep-2023, d996b0c) = MATLAB throws errors when attempt to retrieve a signal

@ModestMC
Copy link
Author

ModestMC commented Jan 2, 2024

@mwinkel-dev can you tell me what all these APD fixes will impact in the codebase (eg. what all we'd want to test internally before using them)? I assume you've already looked over the same files I was about to go read for myself, figure this is simpler.

@mwinkel-dev
Copy link
Contributor

Hi @ModestMC -- I have not yet reviewed the "Array of Pointers to Descriptors" (APD) changes in detail. They originated with PR #2620 (and also require pending PR #2661). My assumption (perhaps incorrect) is that the APD feature applies to all APIs.

If GA uses APDs in its queries, then it would be best to take the entire PR #2661.

However, if GA never uses APDs, then it might be possible to use "alpha 7-139-49" as the base for the GA branch. Note though that "7-139-49" predates the IDL fixes made in October, thus there would be several cherry picks that GA would probably want.

I am presently working on porting the IDL tests to MATLAB. Initially, the MATLAB test suite will be run manually. But after the maintenance work on the build server is completed, the MATLAB tests will run automatically as part of the PR review process.

@mwinkel-dev mwinkel-dev linked a pull request Jan 10, 2024 that will close this issue
joshStillerman added a commit that referenced this issue Feb 20, 2024
#2661 changed the expr
to be evaluated enclosing it in a call to `serializeout`

When connecting to older mdsip servers - like:
```
TCL> show version

MDSplus version: 7.78.6
----------------------
  Release:  alpha_release-7-78-6
  Browse:   https://github.com/MDSplus/mdsplus/tree/alpha_release-7-78-6
  Download: https://github.com/MDSplus/mdsplus/archive/alpha_release-7-78-6.tar.gz
  Build date: Wed Jun 26 18:51:44 UTC 2019

TCL>
```
This causes calls like TreeOpen('tree', shot) to generate an infinte
recursion error,

This PR removes line 208 from connection.py

GABRIELE:  Does this break #2660/#2661 ??
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api/matlab Relates to the Matlab API branch/alpha This is present on or relates to the alpha branch bug An unexpected problem or unintended behavior US Priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants