-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jScope freezes during loading with a "broken pipe" error #2704
Comments
This is likely a network issue regarding the We will attempt to reproduce the issue using GA's computers. And then using MIT PSFC's computers. It is likely that eventually the troubleshooting will require the assistance of GA's networking specialists. |
Brian reported (via email) that this problem started ~4 months ago. Prior to that he used jScope for years without any issues. |
Client computer details: |
Hi @victorbs -- Thanks for the information. Much appreciated. |
Hi @ModestMC -- Am hoping you can provide some additional context regarding this issue.
Thanks, |
Just for information, jScope has not been changed at least in the last year. |
Hi there, just a hunch but since its a broken pipe, what protocol are you using for connecting to the server (plain mdsip, via tunnel, ssh; i am not familiar with noMachine). |
Hi @ModestMC, Would also appreciate it if you can check the And as per @zack-vii's post above, would also be good to check the various system logs on the Atlas server for networking issues. Thanks, -MarkW |
Iris uses 6.1.84 as its default, Atlas was updated in November from 7.96.?? to 7.139.59 (also the OS went from RHEL6 to RHEL8). Without an exact timeframe of the change from Brian, it's hard to say whether this was or was not what gave rise to the issue. We will not be updating the version on Iris, so this might not be worth trying to reproduce. Our recommendation is that @victorbs try using JScope on Omega (which also runs 7.139.59) to see if the bug persists. Many users here also use Reviewplus or OMFIT for visualization. As for the log files, we tried looking but there are too many entries to have any idea who is associated with what (see #2683). |
That change in November seems like it would be consistent with when the problems started to occur. My understanding from an earlier email that Sterling sent out to all users is that there are also problems with data visualization using reviewplus. I am able to reproduce this error. @ModestMC <https://github.com/ModestMC> Is it possible to look at the log files at the time the error occurs?
I will also test if similar issues occur on Omega.
Sincerely,
Brian Victor
Lawrence Livermore National Laboratory
13-352
***@***.***
Phone: 858-455-3098
… On Feb 13, 2024, at 2:53 PM, Mitchell Clark ***@***.***> wrote:
Iris uses 6.1.84 as its default, Atlas was updated in November from 7.96.?? to 7.139.59 (also the OS went from RHEL6 to RHEL8). Without an exact timeframe of the change from Brian, it's hard to say whether this was or was not what gave rise to the issue. We will not be updating the version on Iris, so this might not be worth trying to reproduce.
Our recommendation is that @victorbs <https://github.com/victorbs> try using JScope on Omega (which also runs 7.139.59) to see if the bug persists. Many users here also use Reviewplus or OMFIT for visualization. As for the log files, we tried looking but there are too many entries to have any idea who is associated with what (see #2683 <#2683>).
—
Reply to this email directly, view it on GitHub <#2704 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABVIHVMXOCPZCZGAX3RJ7A3YTPVHFAVCNFSM6AAAAABDAICORWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBSG43TANJYGY>.
You are receiving this because you were mentioned.
|
@victorbs the simplest way for me to attempt (though I'm not optimistic) to reproduce your errors would be for you to give me a basic example that breaks and then I try to run it at a time when Atlas usage is minimal (eg. wee hours on a weekend or something) until I can see something interesting. Realistically, I think this is a good sign that we should find you a more stable long term workflow. The reviewplus issues I'm recalling were the result of network changes which have since been patched, but Sterling would know better than I would. Definitely let me know what happens when you try using JScope on Omega, as it's a datapoint worth having. Feel welcome to email me from the original email thread if you'd like. As for @mwinkel-dev, my hunch is that this is some kind of incompatibility between 6.x.x and 7.x.x in a manner like what @zack-vii described, specifically when the server is updated. If Brian has no issue with the same versions communicating (Omega <--> Atlas), I think this bug can be closed a known version incompatibility. |
I haven't had time to test jScope on Omega extensively, but in a couple days of use, I haven't had any problems with the data loading. I will continue to use jScope on Omega and will keep you posted if I begin to have any issues. |
Hi @victorbs -- Thanks for the update. If jScope on Omega works well for you during the next two weeks or so, then let me know if this issue should be closed. |
Hi @mwinkel-dev I'm starting to have similar problems using jScope on omega that I was having on iris. I get an error that 'the connection to atom.gat.com' was lost. After I get that error, signals will no longer load. |
Hi @victorbs -- That is unfortunate news. But thanks for the update. Hi @ModestMC -- What is the |
@mwinkel-dev : ATOM is a Linux server similar to Omega and is restricted to team that operate D3D. It does not have an MDSplus server at all, only clients. Perhaps the use case is misunderstood? The available version are
|
Hi @margomw -- Thanks for explaining the purpose of the Hi @sflanagan and @ModestMC -- Any idea why jScope on Omega would be connecting to Atom? For details, see the post from @victorbs . Note though that jScope (from Omega to Atlas) has apparently worked well for about a month. |
Hi @victorbs -- Thanks for the clarification. According to a previous post, both Omega and Atlas are running MDSplus |
Hi. Is there any update on this? There was a period about a month ago when I wasn't having data loading issues. In the last week or two, I have had more connection issues than usual. |
Hi @victorbs, Thanks for reminding us to look at this. (We've been swamped with tasks associated with the startup of DIII-D.) Hi @sflanagan and @ModestMC, Have there been any changes regarding Omega and/or GA's networking that would explain why jScope is freezing for @victorbs? When he switched to Omega (instead of Iris) the problem vanished for a month or so. Strikes me as odd that the problem has arisen again. (My guess is that we'll probably have to fix Issue #2683 to troubleshoot this jScope issue at GA.) |
Affiliation
LLNL / DIII-D
(submitted by @mwinkel-dev of MIT PSFC on behalf of Brian V. of LLNL)
Version(s) Affected
Client MDSplus: TBD
Server MDSplus: TBD
Platform
Client: GA's Iris cluster, CentOS 6.10 (Final)
Server: GA's Atlas cluster, TBD
Describe the bug
Intermittent socket failures when using jScope to display DIII-D data. Causes jScope to freeze when loading / displaying data.
To Reproduce
This description is from the email of 8-Feb-2024 that reported the issue.
Here are four screenshots that explain what happens.
Typical jScope display showing many signals. This is working correctly.
Many of the signals fail to load. jScope loads signals top to bottom, left to right, so in this case it failed to load ‘i_boot’ from the automatic onetwo tree. After that failure, none of the other signals load.
notes: The failure doesn’t always occur on the same signal. Often it fails on Ip. I don’t know why it failed on this shot. Anecdotally, some shots seem to fail more often than others. This morning took longer than usual to reproduce this error.
If I try any other shot in the same jScope session, I get this error. The only thing I can do is restart jScope.
The error indicates a ’socket Exception: broken pipe'.
Expected behavior
All signals should load and display in jScope.
Screenshots
These screenshots are from the 8-Feb-2024 email (and presented in the same order).
Additional context
n/a
The text was updated successfully, but these errors were encountered: