Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python3 $PIPEDIR/DAN-msa/ErrorPredictorMSA.py --roll -p $CPU $WDIR/t000_.3track.npz $WDIR/pdb-3track $WDIR/pdb-3track failure #94

Closed
truatpasteurdotfr opened this issue Oct 6, 2021 · 3 comments

Comments

@truatpasteurdotfr
Copy link

Hello,

I am just clueless about the failure of the last step of run_pyrosetta_ver.sh:

Singularity> /app/RoseTTAFold/run_pyrosetta_ver.sh Sulfolobus_S_Layer.fasta Sulfolobus_S_Layer.d2
Running HHblits
Running PSIPRED
Running hhsearch
Predicting distance and orientations
Running parallel RosettaTR.py
Running DeepAccNet-msa

logs reads:

PyRosetta-4 2021 [Rosetta PyRosetta4.Release.python37.ubuntu 2021.34+release.5eb89ef1fc1a9146e2c7aa29194bc6267733596c 2021-08-23T13:12:24] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
/app/RoseTTAFold/DAN-msa/models/smTr
Traceback (most recent call last):
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node 3d_conv/conv3d_1/Conv3D}}]]
         [[2d_conv/lddt/truediv/_1161]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node 3d_conv/conv3d_1/Conv3D}}]]
0 successful operations.
0 derived errors ignored.
@truatpasteurdotfr
Copy link
Author

trying to run the last stage manually:

(folding) Singularity>  python3 $PIPEDIR/DAN-msa/ErrorPredictorMSA.py --roll -p $CPU $WDIR/t000_.3track.npz $WDIR/pdb-3track $WDIR/pdb-3track
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
...
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
PyRosetta-4 2021 [Rosetta PyRosetta4.Release.python37.ubuntu 2021.34+release.5eb89ef1fc1a9146e2c7aa29194bc6267733596c 2021-08-23T13:12:24] retrieved from: http://www.pyrosetta.org
(C) Copyright Rosetta Commons Member Institutions. Created in JHU by Sergey Lyskov and PyRosetta Team.
/app/RoseTTAFold/DAN-msa/models/smTr
Traceback (most recent call last):
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1356, in _do_call
    return fn(*args)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1341, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node 3d_conv/conv3d_1/Conv3D}}]]
         [[2d_conv/lddt/truediv/_1161]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[{{node 3d_conv/conv3d_1/Conv3D}}]]
0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/app/RoseTTAFold/DAN-msa/ErrorPredictorMSA.py", line 222, in <module>
    main()
  File "/app/RoseTTAFold/DAN-msa/ErrorPredictorMSA.py", line 181, in main
    verbose=args.verbose)
  File "/app/RoseTTAFold/DAN-msa/pyErrorPred/predict.py", line 84, in predict
    lddt, estogram, mask = model.predict2(batch)
  File "/app/RoseTTAFold/DAN-msa/pyErrorPred/model.py", line 467, in predict2
    return self.sesh.run(operations, feed_dict=feed_dict)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 950, in run
    run_metadata_ptr)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1173, in _run
    feed_dict_tensor, options, run_metadata)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1350, in _do_run
    run_metadata)
  File "/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/client/session.py", line 1370, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[node 3d_conv/conv3d_1/Conv3D (defined at app/RoseTTAFold/DAN-msa/pyErrorPred/model.py:116) ]]
         [[2d_conv/lddt/truediv/_1161]]
  (1) Unknown: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
         [[node 3d_conv/conv3d_1/Conv3D (defined at app/RoseTTAFold/DAN-msa/pyErrorPred/model.py:116) ]]
0 successful operations.
0 derived errors ignored.

Errors may have originated from an input operation.
Input Source operations connected to node 3d_conv/conv3d_1/Conv3D:
 3d_conv/conv3d/Conv3D (defined at app/RoseTTAFold/DAN-msa/pyErrorPred/model.py:113)

Input Source operations connected to node 3d_conv/conv3d_1/Conv3D:
 3d_conv/conv3d/Conv3D (defined at app/RoseTTAFold/DAN-msa/pyErrorPred/model.py:113)

Original stack trace for '3d_conv/conv3d_1/Conv3D':
  File "app/RoseTTAFold/DAN-msa/ErrorPredictorMSA.py", line 222, in <module>
    main()
  File "app/RoseTTAFold/DAN-msa/ErrorPredictorMSA.py", line 181, in main
    verbose=args.verbose)
  File "app/RoseTTAFold/DAN-msa/pyErrorPred/predict.py", line 77, in predict
    verbose=False)
  File "app/RoseTTAFold/DAN-msa/pyErrorPred/model.py", line 69, in __init__
    self.ops = self.build()
  File "app/RoseTTAFold/DAN-msa/pyErrorPred/model.py", line 116, in build
    layers.append(tf.layers.conv3d(layers[-1], 20, 3, padding='valid', use_bias=True))
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 324, in new_func
    return func(*args, **kwargs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/layers/convolutional.py", line 632, in conv3d
    return layer.apply(inputs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1479, in apply
    return self.__call__(inputs, *args, **kwargs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/layers/base.py", line 537, in __call__
    outputs = super(Layer, self).__call__(inputs, *args, **kwargs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 634, in __call__
    outputs = call_fn(inputs, *args, **kwargs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 146, in wrapper
    ), args, kwargs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 446, in converted_call
    return _call_unconverted(f, args, kwargs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/autograph/impl/api.py", line 253, in _call_unconverted
    return f(*args, **kwargs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/keras/layers/convolutional.py", line 196, in call
    outputs = self._convolution_op(inputs, self.kernel)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 1079, in __call__
    return self.conv_op(inp, filter)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 635, in __call__
    return self.call(inp, filter)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/ops/nn_ops.py", line 234, in __call__
    name=self.name)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/ops/gen_nn_ops.py", line 1553, in conv3d
    dilations=dilations, name=name)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py", line 788, in _apply_op_helper
    op_def=op_def)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/util/deprecation.py", line 507, in new_func
    return func(*args, **kwargs)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 3616, in create_op
    op_def=op_def)
  File "opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/ops.py", line 2005, in __init__
    self._traceback = tf_stack.extract_stack()

(folding) Singularity>
(folding) Singularity> python3
Python 3.7.11 (default, Jul 27 2021, 14:32:16)
[GCC 7.5.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from tensorflow.python.platform import build_info as tf_build_info
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
>>> print(tf_build_info.cuda_version_number)
10.1
>>> print(tf_build_info.cudnn_version_number)
7.6
>>> print(tf_build_info.is_cuda_build)
True
>>> print(tf_build_info.print_function)
_Feature((2, 6, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 65536)
>>>

@truatpasteurdotfr
Copy link
Author

maybe my gpu is not supported by the conda installed tf:

(folding) Singularity> conda env export|grep tens
  - tensorboard=1.14.0=py37hf484d3e_0
  - tensorflow=1.14.0=gpu_py37h74c33d7_0
  - tensorflow-base=1.14.0=gpu_py37he45bfe2_0
  - tensorflow-estimator=1.14.0=py_0
  - tensorflow-gpu=1.14.0=h0d30ee6_0
  - typing_extensions=3.10.0.2=pyh06a4308_0
(folding) Singularity> nvidia-smi 
Wed Oct  6 10:21:32 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.73.01    Driver Version: 460.73.01    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  Off  | 00000000:0A:00.0  On |                  N/A |
| 26%   40C    P8    19W / 250W |    494MiB /  7981MiB |      1%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

but everything seems to be ok:

(folding) Singularity> python3.7 -c 'import tensorflow as tf; print(tf.test.is_gpu_available())'
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
/opt/miniconda3/envs/folding/lib/python3.7/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
  np_resource = np.dtype([("resource", np.ubyte, 1)])
2021-10-06 10:23:04.732244: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
2021-10-06 10:23:04.744797: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3792885000 Hz
2021-10-06 10:23:04.745493: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x565558800620 executing computations on platform Host. Devices:
2021-10-06 10:23:04.745524: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): <undefined>, <undefined>
2021-10-06 10:23:04.746179: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcuda.so.1
2021-10-06 10:23:04.751264: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-06 10:23:04.751626: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1640] Found device 0 with properties: 
name: GeForce RTX 2080 SUPER major: 7 minor: 5 memoryClockRate(GHz): 1.815
pciBusID: 0000:0a:00.0
2021-10-06 10:23:04.751887: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2021-10-06 10:23:04.753041: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcublas.so.10
2021-10-06 10:23:04.754371: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcufft.so.10
2021-10-06 10:23:04.754544: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcurand.so.10
2021-10-06 10:23:04.755758: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusolver.so.10
2021-10-06 10:23:04.756391: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcusparse.so.10
2021-10-06 10:23:04.759109: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudnn.so.7
2021-10-06 10:23:04.759214: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-06 10:23:04.759613: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-06 10:23:04.759943: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1763] Adding visible gpu devices: 0
2021-10-06 10:23:04.759966: I tensorflow/stream_executor/platform/default/dso_loader.cc:42] Successfully opened dynamic library libcudart.so.10.1
2021-10-06 10:23:04.828521: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1181] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-10-06 10:23:04.828553: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1187]      0 
2021-10-06 10:23:04.828563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1200] 0:   N 
2021-10-06 10:23:04.828734: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-06 10:23:04.829106: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-06 10:23:04.829450: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1005] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2021-10-06 10:23:04.829779: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1326] Created TensorFlow device (/device:GPU:0 with 7004 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 SUPER, pci bus id: 0000:0a:00.0, compute capability: 7.5)
2021-10-06 10:23:04.830984: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x565558800d00 executing computations on platform CUDA. Devices:
2021-10-06 10:23:04.830998: I tensorflow/compiler/xla/service/service.cc:175]   StreamExecutor device (0): GeForce RTX 2080 SUPER, Compute Capability 7.5
True
(folding) Singularity> 

@truatpasteurdotfr
Copy link
Author

this issue seems to be related to my RTX-2080super card, same container on an older RTX-1080Ti does not cause the error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant