Could not load library libcudnn_ops_infer.so.8. #212

kadirnar · 2022-09-16T19:47:50Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Code:

import yolov5

def pip_load():
    return yolov5.load('yolov5s.pt')

pip_load()

Error Message:

Writing profile results into speed/memray-pip_speed.py.15691.bin
Memray WARNING: Correcting symbol for aligned_alloc from 0x7f6d26393cc0 to 0x7f6d279a4250
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
Please make sure libcudnn_ops_infer.so.8 is in your library path!

Expected Behavior

No response

Steps To Reproduce

pip install memray

Code :

memray run file.py

Memray Version

1.3.1

Python Version

3.8

Operative System

Linux

Anything else?

No response

The text was updated successfully, but these errors were encountered:

pablogsal · 2022-09-16T19:53:19Z

Thanks, @kadirnar for opening an issue.

This doesn't seem like a memray error. This error seems to be coming from the library because your are missing a shared object.

Can you paste here what is the output of the program if you run python file.py and python -m memray file.py?

kadirnar · 2022-09-16T22:26:37Z

I solved the problem thank you.

ash2703 · 2023-08-22T12:30:59Z

@kadirnar Could you please post the solution, faving same issue when profiling pytorch model

bnawras · 2023-12-09T11:48:42Z

@pablogsal

Ubuntu 20.04.5
python3.8
memray 1.11.0

$ python3.8 -m memray train.py

Memray WARNING: Correcting symbol for malloc from 0x425490 to 0x7fd52516e0e0
Memray WARNING: Correcting symbol for free from 0x425910 to 0x7fd52516e6d0
Memray WARNING: Correcting symbol for aligned_alloc from 0x7fd524941ca0 to 0x7fd52516f250
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
Aborted (core dumped)

pablogsal · 2023-12-09T15:10:08Z

@pablogsal

Ubuntu 20.04.5
python3.8
memray 1.11.0


$ python3.8 -m memray train.py



Memray WARNING: Correcting symbol for malloc from 0x425490 to 0x7fd52516e0e0

Memray WARNING: Correcting symbol for free from 0x425910 to 0x7fd52516e6d0

Memray WARNING: Correcting symbol for aligned_alloc from 0x7fd524941ca0 to 0x7fd52516f250

Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory

Aborted (core dumped)

As I mentioned in my previous comment this doesn't look like an issue with memray but an issue with your environment or the packages you are using.

In order for us to check what's going on can you please provide the contents of "train.py" and all the dependencies you are using.

bnawras · 2023-12-09T15:38:15Z

@pablogsal

Dockerfile

FROM nvidia/cuda:11.1.1-runtime-ubuntu20.04

ENV DEBIAN_FRONTEND noninteracrive
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get install -y \
        build-essential git python3 python3-pip \
        ffmpeg libsm6 libxext6 libxrender1 libglib2.0-0

WORKDIR /app
COPY requirements.txt .
RUN pip install --ignore-installed -r requirements.txt

RUN mkdir /datasets

CMD jupyter lab --ip 0.0.0.0 --port 1110 --allow-root

requirements

absl-py==2.0.0
aiohttp==3.8.6
aiosignal==1.3.1
albumentations==1.3.0
anyio==3.6.2
argon2-cffi==21.3.0
argon2-cffi-bindings==21.2.0
asttokens==2.2.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==22.2.0
augraphy==8.2.4
azure-core==1.29.4
azure-storage-blob==12.18.3
Babel==2.11.0
backcall==0.2.0
beautifulsoup4==4.11.1
bleach==5.0.1
cachetools==5.3.1
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==3.0.1
clearml==1.13.1
cloudpickle==3.0.0
coloredlogs==15.0.1
comm==0.1.2
contourpy==1.0.7
cryptography==41.0.4
cssutils==2.9.0
cycler==0.11.0
dataframe-image==0.2.2
dbus-python==1.2.16
debugpy==1.6.5
decorator==5.1.1
defusedxml==0.7.1
efficientnet-pytorch==0.7.1
entrypoints==0.4
executing==1.2.0
fastjsonschema==2.16.2
filelock==3.12.4
filprofiler==2023.3.1
flatbuffers==23.5.26
fonttools==4.38.0
frozenlist==1.4.0
fsspec==2023.9.2
furl==2.1.3
google-auth==2.23.3
google-auth-oauthlib==1.0.0
grpcio==1.59.0
html2image==2.0.4.3
huggingface-hub==0.18.0
humanfriendly==10.0
idna==3.4
imageio==2.25.0
importlib-metadata==6.0.0
importlib-resources==5.10.2
ipykernel==6.20.2
ipython==8.8.0
ipython-genutils==0.2.0
ipywidgets==8.0.4
isodate==0.6.1
jedi==0.18.2
Jinja2==3.1.2
joblib==1.2.0
json5==0.9.11
jsonschema==4.17.3
jupyter-client==7.4.9
jupyter-core==5.1.3
jupyter-events==0.6.3
jupyter-server==2.1.0
jupyter-server-terminals==0.4.4
jupyterlab==3.5.2
jupyterlab-pygments==0.2.2
jupyterlab-server==2.19.0
jupyterlab-widgets==3.0.5
kiwisolver==1.4.4
llvmlite==0.41.0
lmdb==0.94
lxml==4.9.3
Markdown==3.5
markdown-it-py==3.0.0
MarkupSafe==2.1.2
matplotlib==3.6.3
matplotlib-inline==0.1.6
mdurl==0.1.2
memory-profiler==0.61.0
memray==1.11.0
mistune==2.0.4
mpmath==1.3.0
multidict==6.0.4
munch==4.0.0
nbclassic==0.4.8
nbclient==0.7.2
nbconvert==7.2.8
nbformat==5.7.3
nest-asyncio==1.5.6
networkx==3.0
notebook==6.5.2
notebook-shim==0.2.2
numba==0.58.0
numpy==1.24.1
nvidia-cublas-cu11==11.10.3.66
nvidia-cuda-nvrtc-cu11==11.7.99
nvidia-cuda-runtime-cu11==11.7.99
nvidia-cudnn-cu11==8.5.0.96
oauthlib==3.2.2
onnxruntime-gpu==1.9.0
opencv-python==4.8.1.78
opencv-python-headless==4.7.0.68
orderedmultidict==1.0.1
packaging==23.0
pandas==1.5.3
pandocfilters==1.5.0
parso==0.8.3
pathlib2==2.3.7.post1
pexpect==4.8.0
pickleshare==0.7.5
Pillow==9.4.0
pkgutil-resolve-name==1.3.10
platformdirs==2.6.2
pretrainedmodels==0.7.4
prometheus-client==0.15.0
prompt-toolkit==3.0.36
protobuf==4.24.4
psutil==5.9.4
ptyprocess==0.7.0
pure-eval==0.2.2
pyasn1==0.5.0
pyasn1-modules==0.3.0
pycparser==2.21
Pygments==2.14.0
PyGObject==3.36.0
PyJWT==2.4.0
pynvml==11.4.1
pyparsing==3.0.9
pyrsistent==0.19.3
python-dateutil==2.8.2
python-json-logger==2.0.4
pytz==2022.7.1
PyWavelets==1.4.1
PyYAML==6.0
pyzmq==25.0.0
qudida==0.0.4
requests==2.28.2
requests-oauthlib==1.3.1
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rich==13.7.0
rsa==4.9
safetensors==0.4.0
scalene==1.5.31.1
scikit-image==0.19.3
scikit-learn==1.2.0
scipy==1.10.0
seaborn==0.13.0
segmentation-models-pytorch==0.3.3
Send2Trash==1.8.0
shapely==2.0.0
six==1.16.0
sniffio==1.3.0
soupsieve==2.3.2.post1
stack-data==0.6.2
svgpathtools==1.3.3
svgwrite==1.4.3
sympy==1.12
tensorboard==2.14.0
tensorboard-data-server==0.7.1
terminado==0.17.1
textual==0.44.1
threadpoolctl==3.1.0
tifffile==2023.1.23.1
timm==0.9.2
tinycss2==1.2.1
tomli==2.0.1
torch==1.13.1
torchvision==0.14.1
tornado==6.2
tqdm==4.64.1
traitlets==5.8.1
typing-extensions==4.4.0
urllib3==1.26.14
wcwidth==0.2.6
webencodings==0.5.1
websocket-client==1.4.2
werkzeug==3.0.0
widgetsnbextension==4.0.5
yarl==1.9.2
zipp==3.11.0

code

import torch
import torchvision

model = torchvision.models.regnet_x_1_6gf()
model.cuda()

output = model(torch.rand(3, 3, 200, 200).cuda())

bnawras · 2023-12-09T15:43:02Z

@pablogsal

this code works without memray, it also works with scalene

pablogsal · 2023-12-09T16:43:17Z

Seems that this is because pytorch is doing something weird with their dlopen handles:

pytorch/pytorch@198a3e4

See also voicepaw/so-vits-svc-fork#364

I think you need to bring this to the pytorch developers as their "workaround" conflicts with dlopen interposition.

pablogsal · 2023-12-09T16:47:19Z

@godlygeek I can confirm that not patching torch/lib/../../nvidia/cudnn/lib/libcudnn.so.8 and torch/lib/libtorch_cuda.so fixes the problem, but it's unclear if this is something we should do.

pablogsal · 2023-12-10T15:20:30Z

Hummm, when libcudnn_ops_infer.so.8 is first tried to be dlopen-ed by cudnnCreate in lib/python3.11/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.8 this is the linker load:

   968020:
    968020:     file=libcudnn_ops_infer.so.8 [0];  dynamically loaded by /home/pablogsal/.pyenv/versions/3.11.1/envs/memray/lib/python3.11/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.8 [0]
    968020:     find library=libcudnn_ops_infer.so.8 [0]; searching
    968020:      search path=/opt/cuda/lib64:glibc-hwcaps/x86-64-v4:glibc-hwcaps/x86-64-v3:glibc-hwcaps/x86-64-v2:              (LD_LIBRARY_PATH)
    968020:       trying file=/opt/cuda/lib64/libcudnn_ops_infer.so.8
    968020:       trying file=glibc-hwcaps/x86-64-v4/libcudnn_ops_infer.so.8
    968020:       trying file=glibc-hwcaps/x86-64-v3/libcudnn_ops_infer.so.8
    968020:       trying file=glibc-hwcaps/x86-64-v2/libcudnn_ops_infer.so.8
    968020:       trying file=libcudnn_ops_infer.so.8
    968020:      search path=/home/pablogsal/.pyenv/versions/3.11.1/envs/memray/lib/python3.11/site-packages/torch/lib/../../nvidia/cudnn/lib           (RPATH from file /home/pablogsal/.pyenv/versions/3.11.1/envs/memray/lib/python3.11/site-packages/torch/lib/libtorch_global_deps.so)
    968020:       trying file=/home/pablogsal/.pyenv/versions/3.11.1/envs/memray/lib/python3.11/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn_ops_infer.so.8
    968020:

but when loaded via memray in memray::intercept::dlopen this is the linker load:

    990508:
    990508:     file=libcudnn_ops_infer.so.8 [0];  dynamically loaded by /home/pablogsal/github/memray/src/memray/_memray.cpython-311-x86_64-linux-gnu.so [0]
    990508:     find library=libcudnn_ops_infer.so.8 [0]; searching
    990508:      search path=/opt/cuda/lib64:glibc-hwcaps/x86-64-v4:glibc-hwcaps/x86-64-v3:glibc-hwcaps/x86-64-v2:              (LD_LIBRARY_PATH)
    990508:       trying file=/opt/cuda/lib64/libcudnn_ops_infer.so.8
    990508:       trying file=glibc-hwcaps/x86-64-v4/libcudnn_ops_infer.so.8
    990508:       trying file=glibc-hwcaps/x86-64-v3/libcudnn_ops_infer.so.8
    990508:       trying file=glibc-hwcaps/x86-64-v2/libcudnn_ops_infer.so.8
    990508:       trying file=libcudnn_ops_infer.so.8
    990508:      search path=/home/pablogsal/.pyenv/versions/3.11.1/lib         (RUNPATH from file /home/pablogsal/.pyenv/versions/3.11.1/envs/memray/bin/python)
    990508:       trying file=/home/pablogsal/.pyenv/versions/3.11.1/lib/libcudnn_ops_infer.so.8
    990508:      search cache=/etc/ld.so.cache
    990508:      search path=/usr/lib/glibc-hwcaps/x86-64-v4:/usr/lib/glibc-hwcaps/x86-64-v3:/usr/lib/glibc-hwcaps/x86-64-v2:/usr/lib           (system search path)
    990508:       trying file=/usr/lib/glibc-hwcaps/x86-64-v4/libcudnn_ops_infer.so.8
    990508:       trying file=/usr/lib/glibc-hwcaps/x86-64-v3/libcudnn_ops_infer.so.8
    990508:       trying file=/usr/lib/glibc-hwcaps/x86-64-v2/libcudnn_ops_infer.so.8
    990508:       trying file=/usr/lib/libcudnn_ops_infer.so.8

somehow the RPATH of /home/pablogsal/.pyenv/versions/3.11.1/envs/memray/lib/python3.11/site-packages/torch/lib/libtorch_global_deps.so was not conisderd.

pablogsal · 2023-12-10T15:22:14Z

Ah, is not being considered because the dlopen happens in /home/pablogsal/github/memray/src/memray/_memray.cpython-311-x86_64-linux-gnu.so and not in /home/pablogsal/.pyenv/versions/3.11.1/envs/memray/lib/python3.11/site-packages/torch/lib/../../nvidia/cudnn/lib/libcudnn.so.8.

Here is a reproducer:

$ cat mypreload.c

#define _GNU_SOURCE
#include <dlfcn.h>
#include <stdio.h>

typedef void* (*dlopen_t)(const char* filename, int flags);

static void* (*real_dlopen)(const char* filename, int flags) = NULL;

void*
dlopen(const char* filename, int flags)
{
    if (!real_dlopen) {
        real_dlopen = (dlopen_t)dlsym(RTLD_NEXT, "dlopen");
        if (!real_dlopen) {
            fprintf(stderr, "Error: Unable to find real dlopen function\n");
            return NULL;
        }
    }

    printf("Intercepted: Loading library: %s\n", filename);

    return real_dlopen(filename, flags);
}

$ gcc -shared -fPIC -ldl mypreload.c -o mypreload.so

$ LD_PRELOAD=./mypreload.so python example.py

...
...
Intercepted: Loading library: libcudnn_ops_infer.so.8
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory

pablogsal · 2023-12-10T15:23:50Z

This means that RPATHs won't be considered when dlopen intercepts happen. This affects other profilers, for example:

$heaptrack /home/pablogsal/.pyenv/versions/memray/bin/python example.py
heaptrack output will be written to "/home/pablogsal/github/memray/heaptrack.python.1180606.zst"
starting application, this might take some time...
...
Could not load library libcudnn_ops_infer.so.8. Error: libcudnn_ops_infer.so.8: cannot open shared object file: No such file or directory
/usr/bin/heaptrack: line 361: 1180621 Aborted                 (core dumped) LD_PRELOAD="$LIBHEAPTRACK_PRELOAD${LD_PRELOAD:+:$LD_PRELOAD}" DUMP_HEAPTRACK_OUTPUT="$pipe" "$client" "$@"

albertz · 2024-02-08T15:49:47Z

Is there a workaround? Or should I just wait for PR #525?

pablogsal · 2024-02-08T16:04:31Z

Is there a workaround? Or should I just wait for PR #525?

There is a workaround meanwhile we merge PR #525. You need to set the LD_LIBRARY_PATH environment variable to include also the rpath of the library. For example, in my system I can do:

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/home/pablogsal/.pyenv/versions/3.11.1/envs/memray/lib/python3.11/site-packages/torch/lib/../../nvidia/cudnn/lib

pablogsal · 2024-02-08T16:05:50Z

@albertz can you confirm this works for you?

albertz · 2024-02-08T16:46:48Z

export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:.../virtualenv/.../lib/python3.11/site-packages/torch/lib/../../nvidia/cudnn/lib

Thanks, yes, that works.

pablogsal · 2024-02-10T21:01:34Z

@albertz you mentioned here https://news.ycombinator.com/reply?id=39327452&goto=item%3Fid%3D39325983%2339327452 that seems that there are some things that don’t make sense in your experience when you used memray.

Could you give us a small reproducer or explain a bit the problem so we can look into it?

albertz · 2024-02-10T21:59:34Z

I will add some details in a separate issue: #547

kadirnar added the bug Something isn't working label Sep 16, 2022

kadirnar closed this as completed Sep 16, 2022

pablogsal reopened this Jan 16, 2024

pablogsal mentioned this issue Jan 16, 2024

Override dlsym instead of dlopen to correctly honour RPATH/RUNPATHS #525

Merged

godlygeek closed this as completed in #525 Feb 8, 2024

albertz mentioned this issue Feb 10, 2024

Accurate report? #547

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Could not load library libcudnn_ops_infer.so.8. #212

Could not load library libcudnn_ops_infer.so.8. #212

kadirnar commented Sep 16, 2022

pablogsal commented Sep 16, 2022 •

edited

kadirnar commented Sep 16, 2022

ash2703 commented Aug 22, 2023

bnawras commented Dec 9, 2023

pablogsal commented Dec 9, 2023

bnawras commented Dec 9, 2023 •

edited

bnawras commented Dec 9, 2023 •

edited

pablogsal commented Dec 9, 2023

pablogsal commented Dec 9, 2023

pablogsal commented Dec 10, 2023

pablogsal commented Dec 10, 2023

pablogsal commented Dec 10, 2023

albertz commented Feb 8, 2024

pablogsal commented Feb 8, 2024 •

edited

pablogsal commented Feb 8, 2024

albertz commented Feb 8, 2024

pablogsal commented Feb 10, 2024 •

edited

albertz commented Feb 10, 2024

Could not load library libcudnn_ops_infer.so.8. #212

Could not load library libcudnn_ops_infer.so.8. #212

Comments

kadirnar commented Sep 16, 2022

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Memray Version

Python Version

Operative System

Anything else?

pablogsal commented Sep 16, 2022 • edited

kadirnar commented Sep 16, 2022

ash2703 commented Aug 22, 2023

bnawras commented Dec 9, 2023

pablogsal commented Dec 9, 2023

bnawras commented Dec 9, 2023 • edited

Dockerfile

requirements

code

bnawras commented Dec 9, 2023 • edited

pablogsal commented Dec 9, 2023

pablogsal commented Dec 9, 2023

pablogsal commented Dec 10, 2023

pablogsal commented Dec 10, 2023

pablogsal commented Dec 10, 2023

albertz commented Feb 8, 2024

pablogsal commented Feb 8, 2024 • edited

pablogsal commented Feb 8, 2024

albertz commented Feb 8, 2024

pablogsal commented Feb 10, 2024 • edited

albertz commented Feb 10, 2024

pablogsal commented Sep 16, 2022 •

edited

bnawras commented Dec 9, 2023 •

edited

bnawras commented Dec 9, 2023 •

edited

pablogsal commented Feb 8, 2024 •

edited

pablogsal commented Feb 10, 2024 •

edited