-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to use socket in containerized PyTorch app #89
Comments
duckontheweb -- we will have a look and get back to you. |
We have updated the docs to add the unix: to the socket environment variable :
thanks for letting us know. |
Thanks, that worked! I did still have to run |
You are correct. We missed the chmod operation in the example. Pull request #91 will correct our tutorial to reflect this. Thank you! |
@duckontheweb when I try to run the neuron-rtd container, I get the below error: `ubuntu@ip-192-168-19-125:~$ sudo docker run --device=/dev/neuron0 --cap-add IPC_LOCK -v /tmp/neuron_rtd_sock/:/sock -it neuron-rtd sh: lspci: command not found nrtd[1]: [TDRV:tdrv_init_mla_phase1] Could not open the device index:0 nrtd[1]: [TDRV:tdrv_destroy_one_mla] close device failed nrtd[1]: [TDRV:tdrv_destroy] TDRV not initialized nrtd[1]: [NRTD:InitTongas] Failed to initialize devices, error:1 nrtd[1]: [NRTD:nrtd_main] Failed to initialize devices: , attempt: 1 Did you face this error? If so, can you please point me in the right diresction? |
To be honest, it's been so long that I don't recall. We were experimenting with the SDK at my previous employer but ended up not using it, so I'm not sure I'll be of much help. Sorry! |
If you're seeing this error, make sure you stop the neuron runtime running on your instance outside of the container! |
I have been following the docs for Docker environment setup for Neuron and Run containerized neuron application to set up a containerized app using the Inferentia chip.
I am able to get the
neuron-rtd
container running and using a socket in/tmp/neuron_rtd_sock
as described, but I had to add the following modification to that folder in order for the container to be able to use the socket:chmod o+x /tmp/neuron_rtd_sock
.I tried using the
trace_resnet50.py
script described here to test whether a container could get access to the chip. I used the following Dockerfile and run command:Dockerfile (built as
pytorch-inf1
image)Docker run command
Here are the permissions on the socket directory:
Is there a step I'm missing to allow the app container to access that socket? I tried running the app with the same elevated privileges as the
neuron-rtd
container, but got the same results.Thanks!
The text was updated successfully, but these errors were encountered: