Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"NCHW" data_format in Conv not working with latest CUDA #83

Closed
mil-ad opened this issue Nov 17, 2020 · 6 comments
Closed

"NCHW" data_format in Conv not working with latest CUDA #83

mil-ad opened this issue Nov 17, 2020 · 6 comments

Comments

@mil-ad
Copy link

mil-ad commented Nov 17, 2020

I'm not able to use the NCHW data format in conv layers:

import os
import numpy as np
import jax
import haiku as hk

# os.environ["XLA_PYTHON_CLIENT_PREALLOCATE"] = "false"
os.environ["XLA_FLAGS"] = "--xla_gpu_cuda_data_dir=/opt/cuda"

def net(x):
    model = hk.Sequential([hk.Conv2D(2, 5, padding="VALID", data_format="NCHW")])
    return model(x)

key = jax.random.PRNGKey(42)
net_transformed = hk.without_apply_rng(hk.transform(net))
params = net_transformed.init(key, np.zeros((1, 1, 28, 28)))

The snippet above works fine on the CPU but on the GPU gives tensorflow-style spew of errors below. The problem goes away if I change data_format to NHWC. I'm running pretty recent versions of nvidia driver and cuda and the same snippet seems to run on older versions (according to a few people I sent it to) so pretty sure it's related to those. My versions are:

cuda 11.1.0-2
nvidia driver: 455.38
jax 0.2.5
jaxlib 0.1.57+cuda111 
dm-haiku 0.0.2

Error:

2020-11-17 12:18:03.717098: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:349] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-11-17 12:18:03.718623: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:349] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-11-17 12:18:03.719796: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:349] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-11-17 12:18:03.719969: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc:772] Failed to determine best cudnn convolution algorithm: Internal: All algorithms tried for convolution %custom-call = (f32[1,20,24,24]{3,2,1,0}, u8[0]{0}) custom-call(f32[1,1,28,28]{3,2,1,0} %parameter.1, f32[5,5,1,20]{1,0,2,3} %copy.1), window={size=5x5}, dim_labels=bf01_01io->bf01, custom_call_target="__cudnn$convForward", metadata={op_type="conv_general_dilated" op_name="conv_general_dilated[ batch_group_count=1\n                      dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 1, 2, 3), rhs_spec=(3, 2, 0, 1), out_spec=(0, 1, 2, 3))\n                      feature_group_count=1\n                      lhs_dilation=(1, 1)\n                      lhs_shape=(1, 1, 28, 28)\n                      padding=((0, 0), (0, 0))\n                      precision=None\n                      rhs_dilation=(1, 1)\n                      rhs_shape=(5, 5, 1, 20)\n                      window_strides=(1, 1) ]"}, backend_config="{\"algorithm\":\"0\",\"tensor_ops_enabled\":false,\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}" failed. Falling back to default algorithm. 

Convolution performance may be suboptimal.
2020-11-17 12:18:03.800681: E external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:349] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2020-11-17 12:18:03.800721: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_client.cc:1809] Execution of replica 0 failed: Unimplemented: DNN library is not found.
Traceback (most recent call last):
  File "scratch.py", line 17, in <module>
    params = net_transformed.init(key, np.zeros((1, 1, 28, 28)))
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/transform.py", line 111, in init_fn
    params, state = f.init(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/transform.py", line 277, in init_fn
    f(*args, **kwargs)
  File "scratch.py", line 12, in net
    return model(x)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/module.py", line 406, in wrapped
    out = f(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/module.py", line 263, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/basic.py", line 124, in __call__
    out = layer(out, *args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/module.py", line 406, in wrapped
    out = f(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/module.py", line 263, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/conv.py", line 195, in __call__
    out = lax.conv_general_dilated(inputs,
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 571, in conv_general_dilated
    return conv_general_dilated_p.bind(
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/core.py", line 266, in bind
    out = top_trace.process_primitive(self, tracers, params)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/core.py", line 576, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/interpreters/xla.py", line 234, in apply_primitive
    return compiled_fun(*args)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/interpreters/xla.py", line 349, in _execute_compiled_primitive
    out_bufs = compiled.execute(input_bufs)
RuntimeError: Unimplemented: DNN library is not found.
@tomhennigan
Copy link
Collaborator

Hey @mil-ad do you have cuDNN installed? I think you need to install this separately to GPU drivers..

I've tested your snippet in Google Colab on an Nvidia P100 GPU and it seems to work fine: https://colab.research.google.com/gist/tomhennigan/6fd38842b05d46b8418cf44a4083be9d/nchw-test.ipynb

@mil-ad
Copy link
Author

mil-ad commented Nov 17, 2020

I do have cuDN installed but perhaps similar to cuda I should explicitly point Jax to it?

@mil-ad
Copy link
Author

mil-ad commented Nov 17, 2020

Also the NHWC version works fine. That one doesn't need cuDNN?

@hawkinsp
Copy link
Contributor

See google/jax#4920 . Can you try setting LD_LIBRARY_PATH?

My guess about NHWC vs NCHW is that it is possible that XLA can lower that convolution to a matrix multiplication without using CuDNN.

@mil-ad
Copy link
Author

mil-ad commented Nov 17, 2020

Thanks @hawkinsp that does make it go further although still fails:

2020-11-17 14:46:54.615564: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-17 14:46:54.645871: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0
2020-11-17 14:46:54.838008: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560d34013d90 initialized for platform Interpreter (this does not guarantee that XLA will be used). Devices:
2020-11-17 14:46:54.838028: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Interpreter, <undefined>
2020-11-17 14:46:54.852763: I external/org_tensorflow/tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 3594550000 Hz
2020-11-17 14:46:54.853134: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560d33f81c10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-11-17 14:46:54.853147: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2020-11-17 14:46:54.853861: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcuda.so.1
2020-11-17 14:46:54.925446: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:941] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2020-11-17 14:46:54.925833: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560d33f71230 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-11-17 14:46:54.925847: I external/org_tensorflow/tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce GTX 1660 SUPER, Compute Capability 7.5
2020-11-17 14:46:54.926142: I external/org_tensorflow/tensorflow/compiler/xla/pjrt/nvidia_gpu_device.cc:119] XLA backend allocating 3926340403 bytes on device 0 for BFCAllocator.
2020-11-17 14:46:54.926262: I external/org_tensorflow/tensorflow/stream_executor/tpu/tpu_platform_interface.cc:74] No TPU platform found.
2020-11-17 14:46:55.522465: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublas.so.11
2020-11-17 14:46:55.824222: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcublasLt.so.11
2020-11-17 14:46:55.824629: I external/org_tensorflow/tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudnn.so.8
2020-11-17 14:46:56.303814: I external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc:344] Loaded cuDNN version 8004
2020-11-17 14:46:56.325354: W external/org_tensorflow/tensorflow/compiler/xla/service/gpu/gpu_conv_algorithm_picker.cc:772] Failed to determine best cudnn convolution algorithm: Internal: All algorithms tried for convolution %custom-call = (f32[1,20,24,24]{3,2,1,0}, u8[0]{0}) custom-call(f32[1,1,28,28]{3,2,1,0} %parameter.1, f32[5,5,1,20]{1,0,2,3} %copy.1), window={size=5x5}, dim_labels=bf01_01io->bf01, custom_call_target="__cudnn$convForward", metadata={op_type="conv_general_dilated" op_name="conv_general_dilated[ batch_group_count=1\n                      dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 1, 2, 3), rhs_spec=(3, 2, 0, 1), out_spec=(0, 1, 2, 3))\n                      feature_group_count=1\n                      lhs_dilation=(1, 1)\n                      lhs_shape=(1, 1, 28, 28)\n                      padding=((0, 0), (0, 0))\n                      precision=None\n                      rhs_dilation=(1, 1)\n                      rhs_shape=(5, 5, 1, 20)\n                      window_strides=(1, 1) ]"}, backend_config="{\"algorithm\":\"0\",\"tensor_ops_enabled\":false,\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}" failed. Falling back to default algorithm. 

Convolution performance may be suboptimal.
2020-11-17 14:46:56.386934: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_client.cc:1809] Execution of replica 0 failed: Unknown: CUDNN_STATUS_EXECUTION_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(3118): 'cudnnConvolutionForward( cudnn.handle(), alpha, input_nd.handle(), input_data.opaque(), filter_nd.handle(), filter_data.opaque(), conv.handle(), ToConvForwardAlgo(algorithm_desc), scratch_memory.opaque(), scratch_memory.size(), beta, output_nd.handle(), output_data.opaque())'
Traceback (most recent call last):
  File "scratch.py", line 17, in <module>
    params = net_transformed.init(key, np.zeros((1, 1, 28, 28)))
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/transform.py", line 111, in init_fn
    params, state = f.init(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/transform.py", line 277, in init_fn
    f(*args, **kwargs)
  File "scratch.py", line 12, in net
    return model(x)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/module.py", line 406, in wrapped
    out = f(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/module.py", line 263, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/basic.py", line 124, in __call__
    out = layer(out, *args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/module.py", line 406, in wrapped
    out = f(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/module.py", line 263, in run_interceptors
    return bound_method(*args, **kwargs)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/haiku/_src/conv.py", line 195, in __call__
    out = lax.conv_general_dilated(inputs,
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/_src/lax/lax.py", line 571, in conv_general_dilated
    return conv_general_dilated_p.bind(
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/core.py", line 266, in bind
    out = top_trace.process_primitive(self, tracers, params)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/core.py", line 576, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/interpreters/xla.py", line 234, in apply_primitive
    return compiled_fun(*args)
  File "/home/milad/miniconda3/envs/my_project/lib/python3.8/site-packages/jax/interpreters/xla.py", line 349, in _execute_compiled_primitive
    out_bufs = compiled.execute(input_bufs)
RuntimeError: Unknown: CUDNN_STATUS_EXECUTION_FAILED
in external/org_tensorflow/tensorflow/stream_executor/cuda/cuda_dnn.cc(3118): 'cudnnConvolutionForward( cudnn.handle(), alpha, input_nd.handle(), input_data.opaque(), filter_nd.handle(), filter_data.opaque(), conv.handle(), ToConvForwardAlgo(algorithm_desc), scratch_memory.opaque(), scratch_memory.size(), beta, output_nd.handle(), output_data.opaque())'

@tomhennigan
Copy link
Collaborator

Lets keep the discussion in the JAX bug, I think this is not Haiku specific.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants