-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bad file descriptor when using VPN #391
Comments
Another related open issue CDAT/vcdat#295 |
A work around for the LLNL VPN Longer term fix may be to identify where MPI calls gethostname() and replacing it with MPI_Get_processor_name() which is MPI standard and portable see https://stackoverflow.com/questions/23112515/mpich2-gethostbyname-failed |
@downiec @jasonb5 @muryanto1 @gabdulla @painter1 @doutriaux1 Hey, Guys, I talked to some of you. I'm pinging you in case someone has been looking into it. I don't have much expertise on how MPI works in CDAT and if a faulty MPI library version is pinned and needs to be updated. This issue has also been seen randomly on a compute node of a cluster. |
Possibly related: |
Running |
Here's a little update on the progress of this issue. The issue is definitely caused by DNS not being able to resolve the systems hostname. Best guess is connecting to VPN is reconfiguring DNS and preventing this from occurring. Interesting enough I was never able to reproduce this on VPN until I purposely configured my DNS settings incorrectly. I've traced the source of the crash to the following line: cdms/regrid2/Lib/mvESMFRegrid.py Line 18 in 753fd7a
This can be verified with I'll be opening up an issue with ESMF. For the time being the solution here will work #391 (comment) or you can run |
@jasonb5 Thanks for looking into this! I confirmed that |
Thank you Jason!
Ghaleb
From: Jason Boutte <notifications@github.com>
Reply-To: CDAT/cdms <reply@reply.github.com>
Date: Wednesday, April 1, 2020 at 7:20 PM
To: CDAT/cdms <cdms@noreply.github.com>
Cc: Ghaleb Abdulla <abdulla1@llnl.gov>, Mention <mention@noreply.github.com>
Subject: Re: [CDAT/cdms] Bad file descriptor when using VPN (#391)
Here's a little update on the progress of this issue.
The issue is definitely caused by DNS not being able to resolve the systems hostname. Best guess is connecting to VPN is reconfiguring DNS and preventing this from occurring. Interesting enough I was never able to reproduce this on VPN until I purposely configured my DNS settings incorrectly.
I've traced the source of the crash to the following line: https://github.com/CDAT/cdms/blob/753fd7a3441e5f073fbb8beb6ab0723d379eec54/regrid2/Lib/mvESMFRegrid.py#L18
This can be verified with python -c "import ESMF; ESMF.Manager()"
I'll be opening up an issue with ESMF.
For the time being the solution here will work #391 (comment)<#391 (comment)> or you can run export MPICH_INTERFACE_HOSTNAME=localhost
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#391 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABPRQHLOTRCDYD2EDQSTFTDRKPY7JANCNFSM4LMV7ABA>.
|
Describe the bug
Running
e3sm_diags
on a Mac while on VPN causes aBad File Descriptor
error, printed below. @zshaheen explained that this error was not frome3sm_daigs
code, but rather due to a problem in CDMS (see E3SM-Project/e3sm_diags#287 for the discussion). This bug is easily gotten around by turning off VPN, but it would be nice to be able to stay on VPN.To Reproduce
Steps to reproduce the behavior:
e3sm_diags
code, for example./tests/test.sh
Expected behavior
The code should run.
Desktop (please complete the following information):
Environment Information
`conda info`
`conda config --show-sources`
`conda list --show-channel-urls`
The text was updated successfully, but these errors were encountered: