Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad file descriptor when using VPN #391

Closed
forsyth2 opened this issue Mar 16, 2020 · 9 comments · Fixed by #393
Closed

Bad file descriptor when using VPN #391

forsyth2 opened this issue Mar 16, 2020 · 9 comments · Fixed by #393

Comments

@forsyth2
Copy link
Contributor

Describe the bug
Running e3sm_diags on a Mac while on VPN causes a Bad File Descriptor error, printed below. @zshaheen explained that this error was not from e3sm_daigs code, but rather due to a problem in CDMS (see E3SM-Project/e3sm_diags#287 for the discussion). This bug is easily gotten around by turning off VPN, but it would be nice to be able to stay on VPN.

Fatal error in MPI_Init_thread: Other MPI error, error stack:
MPIR_Init_thread(572)..............: 
MPID_Init(224).....................: channel initialization failed
MPIDI_CH3_Init(105)................: 
MPID_nem_init(324).................: 
MPID_nem_tcp_init(178).............: 
MPID_nem_tcp_get_business_card(425): 
MPID_nem_tcp_init(384).............: gethostbyname failed, ml-9624328 (errno 1)
[unset]: write_line error; fd=-1 buf=:cmd=abort exitcode=3191567
:
system msg for write_line failure : Bad file descriptor

To Reproduce
Steps to reproduce the behavior:

  1. Be on VPN (unsure if you have to be on non-LLNL Wi-Fi)
  2. Run e3sm_diags code, for example ./tests/test.sh

Expected behavior
The code should run.

Desktop (please complete the following information):

  • OS: macOS

Environment Information

`conda info`

     active environment : e3sm_diags_env_dev
    active env location : /usr/local/anaconda3/envs/e3sm_diags_env_dev
            shell level : 2
       user config file : /Users/forsyth2/.condarc
 populated config files : /Users/forsyth2/.condarc
          conda version : 4.7.12
    conda-build version : 3.18.9
         python version : 3.7.4.final.0
       virtual packages : 
       base environment : /usr/local/anaconda3  (writable)
           channel URLs : https://conda.anaconda.org/cdat/label/latest_vtk/osx-64
                          https://conda.anaconda.org/cdat/label/latest_vtk/noarch
                          https://conda.anaconda.org/cdat/label/new_vtk_project_vectors/osx-64
                          https://conda.anaconda.org/cdat/label/new_vtk_project_vectors/noarch
                          https://conda.anaconda.org/cdat/label/nightly/osx-64
                          https://conda.anaconda.org/cdat/label/nightly/noarch
                          https://conda.anaconda.org/conda-forge/osx-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /usr/local/anaconda3/pkgs
                          /Users/forsyth2/.conda/pkgs
       envs directories : /usr/local/anaconda3/envs
                          /Users/forsyth2/.conda/envs
               platform : osx-64
             user-agent : conda/4.7.12 requests/2.22.0 CPython/3.7.4 Darwin/18.7.0 OSX/10.14.6
                UID:GID : 26501:26501
             netrc file : None
           offline mode : False

`conda config --show-sources`

==> /Users/forsyth2/.condarc <==
ssl_verify: False
channel_priority: strict
channels:
  - cdat/label/latest_vtk
  - cdat/label/new_vtk_project_vectors
  - cdat/label/nightly
  - conda-forge
  - defaults

`conda list --show-channel-urls`

# packages in environment at /usr/local/anaconda3/envs/e3sm_diags_env_dev:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                        main    defaults
asn1crypto                1.2.0                    py37_0    conda-forge
attrs                     19.3.0                     py_0    conda-forge
beautifulsoup4            4.8.1                    py37_0    conda-forge
bokeh                     1.3.4                    py37_0    conda-forge
bzip2                     1.0.8                h0b31af3_2    conda-forge
ca-certificates           2019.9.11            hecc5488_0    conda-forge
cartopy                   0.17.0          py37h95120c7_1007    conda-forge
cdat_info                 8.2                        py_7    conda-forge
cdms2                     3.1.4                    pypi_0    pypi
cdp                       1.6.0                      py_0    conda-forge
cdtime                    3.1.2            py37ha91d4f2_6    conda-forge
cdutil                    8.2                        py_2    cdat/label/v82
certifi                   2019.9.11                py37_0    conda-forge
cffi                      1.13.1           py37h33e799b_0    conda-forge
chardet                   3.0.4                 py37_1003    conda-forge
click                     7.0                        py_0    conda-forge
cloudpickle               1.2.2                      py_1    conda-forge
conda                     4.8.2                    py37_0    conda-forge
conda-package-handling    1.6.0            py37h0b31af3_1    conda-forge
cryptography              2.7              py37hafa8578_0    conda-forge
curl                      7.65.3               h22ea746_0    conda-forge
cycler                    0.10.0                     py_2    conda-forge
cytoolz                   0.10.0           py37h0b31af3_0    conda-forge
dask                      2.6.0                      py_0    conda-forge
dask-core                 2.6.0                      py_0    conda-forge
dbus                      1.13.6               h2f22bb5_0    conda-forge
decorator                 4.4.1                      py_0    conda-forge
distarray                 2.12.2                     py_1    conda-forge
distributed               2.6.0                      py_0    conda-forge
dv3d                      8.2                        py_0    cdat/label/v82
e3sm-diags                2.0.0                    pypi_0    pypi
esmf                      7.1.0             h963e782_1008    conda-forge
esmpy                     7.1.0            py37h5ca1d4c_3    conda-forge
expat                     2.2.5             h4a8c4bd_1004    conda-forge
ffmpeg                    4.2                  h5c2b479_0    conda-forge
fontconfig                2.13.1            h6b1039f_1001    conda-forge
freetype                  2.10.0               h24853df_1    conda-forge
fsspec                    0.5.2                      py_0    conda-forge
future                    0.18.1                   py37_0    conda-forge
g2clib                    1.6.0                h4e57d6e_9    conda-forge
genutil                   8.2              py37h3b54f70_3    conda-forge
geos                      3.7.2                h6de7cb9_2    conda-forge
gettext                   0.19.8.1          h46ab8bc_1002    conda-forge
ghostscript               9.22              h0a44026_1001    conda-forge
glib                      2.58.3          py37h577aef8_1002    conda-forge
gmp                       6.1.2             h0a44026_1000    conda-forge
gnutls                    3.6.5             h53004b3_1002    conda-forge
gst-plugins-base          1.14.5               hb4a159a_2    conda-forge
gstreamer                 1.14.5               h06b91d7_2    conda-forge
hdf4                      4.2.13            h84186c3_1003    conda-forge
hdf5                      1.10.5          nompi_h3e39495_1104    conda-forge
heapdict                  1.0.1                      py_0    conda-forge
icu                       64.2                 h6de7cb9_1    conda-forge
idna                      2.8                   py37_1000    conda-forge
importlib_metadata        0.23                     py37_0    conda-forge
ipython_genutils          0.2.0                      py_1    conda-forge
jasper                    1.900.1           h636a363_1006    conda-forge
jinja2                    2.10.3                     py_0    conda-forge
jpeg                      9c                h1de35cc_1001    conda-forge
jsonschema                3.1.1                    py37_0    conda-forge
jupyter_core              4.5.0                      py_0    conda-forge
kiwisolver                1.1.0            py37ha1b3eb9_0    conda-forge
krb5                      1.16.3            hcfa6398_1001    conda-forge
lame                      3.100             h1de35cc_1001    conda-forge
lazy-object-proxy         1.4.3            py37h0b31af3_1    conda-forge
libblas                   3.8.0               11_openblas    conda-forge
libcblas                  3.8.0               11_openblas    conda-forge
libcdms                   3.1.2                hbe35099_5    conda-forge
libcf                     1.0.3            py37h00f410c_1    conda-forge
libclang                  8.0.1                h770b8ee_1    conda-forge
libcurl                   7.65.3               h16faf7d_0    conda-forge
libcxx                    9.0.1                         1    conda-forge
libdrs                    3.1.2                h1ddc27c_7    conda-forge
libdrs_f                  3.1.2                hb052ab9_6    conda-forge
libedit                   3.1.20170329      hcfe32e1_1001    conda-forge
libffi                    3.2.1             h6de7cb9_1006    conda-forge
libgcc                    4.8.5                         1    conda-forge
libgfortran               4.0.0                         2    conda-forge
libiconv                  1.15              h01d97ff_1005    conda-forge
liblapack                 3.8.0               11_openblas    conda-forge
libllvm8                  8.0.1                h770b8ee_0    conda-forge
libllvm9                  9.0.0                h770b8ee_3    conda-forge
libnetcdf                 4.6.2             h1a02027_1003    conda-forge
libopenblas               0.3.6                h4bb4525_6    conda-forge
libpng                    1.6.37               h2573ce8_0    conda-forge
libssh2                   1.8.2                hcdc9a53_2    conda-forge
libtiff                   4.0.10            h3527a1b_1004    conda-forge
libuuid                   2.32.1            h1de35cc_1000    conda-forge
libxcb                    1.13              h1de35cc_1002    conda-forge
libxml2                   2.9.9                h12c6b28_5    conda-forge
libxslt                   1.1.33               h320ff13_0    conda-forge
llvm-openmp               9.0.1                h28b9765_2    conda-forge
locket                    0.2.0                      py_2    conda-forge
lxml                      4.4.1            py37h08abf6f_0    conda-forge
lz4-c                     1.8.3             h6de7cb9_1001    conda-forge
markupsafe                1.1.1            py37h0b31af3_0    conda-forge
matplotlib                3.1.1                    py37_2    conda-forge
matplotlib-base           3.1.1            py37h11da6c2_2    conda-forge
more-itertools            7.2.0                      py_0    conda-forge
mpi                       1.0                       mpich    conda-forge
mpich                     3.3.1                hc856adb_1    conda-forge
msgpack-python            0.6.2            py37ha1b3eb9_0    conda-forge
nbformat                  4.4.0                      py_1    conda-forge
ncurses                   6.1               h0a44026_1002    conda-forge
netcdf-fortran            4.4.5             h1993a31_1004    conda-forge
nettle                    3.4.1             h3efe00b_1002    conda-forge
nspr                      4.20              h0a44026_1000    conda-forge
nss                       3.47                 hc0980d9_0    conda-forge
numpy                     1.17.3           py37hde6bac1_0    conda-forge
olefile                   0.46                       py_0    conda-forge
openblas                  0.3.6                h4bb4525_6    conda-forge
openh264                  1.8.0             hd9629dc_1000    conda-forge
openssl                   1.1.1c               h01d97ff_0    conda-forge
output_viewer             1.3.1                      py_1    conda-forge
owslib                    0.18.0                     py_0    conda-forge
packaging                 19.2                       py_0    conda-forge
pandas                    0.25.2           py37h4f17bb1_0    conda-forge
partd                     1.0.0                      py_0    conda-forge
pcre                      8.43                 h4a8c4bd_0    conda-forge
pillow                    6.2.1            py37hb6f49c9_0    conda-forge
pip                       19.3.1                   py37_0    conda-forge
proj4                     6.1.1                hca663eb_1    conda-forge
psutil                    5.6.3            py37h0b31af3_0    conda-forge
pthread-stubs             0.4               h1de35cc_1001    conda-forge
pycosat                   0.6.3           py37h0b31af3_1002    conda-forge
pycparser                 2.19                     py37_1    conda-forge
pyepsg                    0.4.0                      py_0    conda-forge
pykdtree                  1.3.1           py37h3b54f70_1002    conda-forge
pyopenssl                 19.0.0                   py37_0    conda-forge
pyparsing                 2.4.2                      py_0    conda-forge
pyproj                    2.3.1            py37h9bb365a_0    conda-forge
pyqt                      5.12.3           py37he22c54c_1    conda-forge
pyqt5-sip                 4.19.18                  pypi_0    pypi
pyqtwebengine             5.12.1                   pypi_0    pypi
pyrsistent                0.15.5           py37h0b31af3_0    conda-forge
pyshp                     2.1.0                      py_0    conda-forge
pysocks                   1.7.1                    py37_0    conda-forge
python                    3.7.3                h93065d6_1    conda-forge
python-dateutil           2.8.0                      py_0    conda-forge
pytz                      2019.3                     py_0    conda-forge
pyyaml                    5.1.2            py37h0b31af3_0    conda-forge
qt                        5.12.5               h1b46049_0    conda-forge
readline                  8.0                  hcfe32e1_0    conda-forge
regrid2                   3.1.4                    pypi_0    pypi
requests                  2.22.0                   py37_1    conda-forge
ruamel_yaml               0.15.71         py37h1de35cc_1000    conda-forge
scipy                     1.3.1            py37h7e0e109_2    conda-forge
setuptools                41.4.0                   py37_0    conda-forge
shapely                   1.6.4           py37h5c88e11_1006    conda-forge
six                       1.12.0                py37_1001    conda-forge
sortedcontainers          2.1.0                      py_0    conda-forge
soupsieve                 1.9.4                    py37_0    conda-forge
sqlite                    3.30.1               h93121df_0    conda-forge
tblib                     1.4.0                      py_0    conda-forge
tk                        8.6.9             h2573ce8_1003    conda-forge
toolz                     0.10.0                     py_0    conda-forge
tornado                   6.0.3            py37h0b31af3_4    conda-forge
tqdm                      4.42.1                     py_0    conda-forge
traitlets                 4.3.3                    py37_0    conda-forge
udunits2                  2.2.27.6          h776b7f1_1001    conda-forge
urllib3                   1.25.6                   py37_0    conda-forge
vcs                       8.2                        py_2    cdat/label/v82
vcsaddons                 8.2              py37h1de35cc_1    cdat/label/v82
vtk-cdat                  8.2.0.8.2        py37h3a4d124_0    cdat/label/v82
wheel                     0.33.6                   py37_0    conda-forge
x264                      1!152.20180806       h1de35cc_0    conda-forge
xorg-libxau               1.0.9                h1de35cc_0    conda-forge
xorg-libxdmcp             1.1.3                h01d97ff_0    conda-forge
xz                        5.2.4             h1de35cc_1001    conda-forge
yaml                      0.1.7             h1de35cc_1001    conda-forge
zict                      1.0.0                      py_0    conda-forge
zipp                      0.6.0                      py_0    conda-forge
zlib                      1.2.11            h0b31af3_1006    conda-forge
zstd                      1.4.3                he7fca8b_0    conda-forge

@chengzhuzhang
Copy link

Another related open issue CDAT/vcdat#295
And some discussion in here might be helpful for troubleshooting...

@davidcbaderatllnl
Copy link

davidcbaderatllnl commented Mar 26, 2020

A work around for the LLNL VPN
Add this line to /etc/hosts – you need root privileges
127.0.0.1 computer_name.llnl.gov

Longer term fix may be to identify where MPI calls gethostname() and replacing it with MPI_Get_processor_name() which is MPI standard and portable

see https://stackoverflow.com/questions/23112515/mpich2-gethostbyname-failed

@chengzhuzhang
Copy link

chengzhuzhang commented Mar 30, 2020

@downiec @jasonb5 @muryanto1 @gabdulla @painter1 @doutriaux1 Hey, Guys, I talked to some of you. I'm pinging you in case someone has been looking into it. I don't have much expertise on how MPI works in CDAT and if a faulty MPI library version is pinned and needs to be updated. This issue has also been seen randomly on a compute node of a cluster.

@forsyth2
Copy link
Contributor Author

Possibly related:
In early February I was trying to run e3sm_diags on my Mac for the first time (instead of on Cori or Compy). I was getting ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain when onsite at LLNL. Running e3sm_diags offsite allowed it to pull down whatever files or resources it needed, and I've since been able to run e3sm_diags onsite. (However, running offsite on VPN causes the error described in this issue).

@forsyth2
Copy link
Contributor Author

forsyth2 commented Mar 30, 2020

Running python -c "import cdms2" is sufficient to produce the error (on my Mac and in this environment) -- no need to actually run e3sm_diags itself.

@jasonb5
Copy link
Contributor

jasonb5 commented Apr 2, 2020

Here's a little update on the progress of this issue.

The issue is definitely caused by DNS not being able to resolve the systems hostname. Best guess is connecting to VPN is reconfiguring DNS and preventing this from occurring. Interesting enough I was never able to reproduce this on VPN until I purposely configured my DNS settings incorrectly.

I've traced the source of the crash to the following line:

ESMF.Manager(debug=False)

This can be verified with python -c "import ESMF; ESMF.Manager()"

I'll be opening up an issue with ESMF.

For the time being the solution here will work #391 (comment) or you can run export MPICH_INTERFACE_HOSTNAME=localhost

@forsyth2
Copy link
Contributor Author

forsyth2 commented Apr 2, 2020

@jasonb5 Thanks for looking into this! I confirmed that export MPICH_INTERFACE_HOSTNAME=localhost allows me to run e3sm_diags while on VPN.

@gabdulla
Copy link

gabdulla commented Apr 2, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants