Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Detector Support]: Doesn't work GPU with OpenVINO #9574

Closed
Shlyakhoff opened this issue Feb 1, 2024 · 32 comments
Closed

[Detector Support]: Doesn't work GPU with OpenVINO #9574

Shlyakhoff opened this issue Feb 1, 2024 · 32 comments

Comments

@Shlyakhoff
Copy link

Describe the problem you are having

When I am trying to enable OpenVINO detector with device=GPU I can see the error in log and Frigate becomes inactive for a while. If I set device=AUTO it starts to work, but when detect is enabled I can see high cpu utilization and no any activity in intel_gpu_top which shows that GPU acceleration is not working. HW acceleration of camera is working good.

I set LIBVA_DRIVER_NAME=iHD because of if I choose i965 then I see how the blitter in intel_gpu_top is connected, I'm not sure, but it doesn't seem good and openvino doesn't work also.

I have Celeron J4105 and Unraid system.

Version

0.13

Frigate config file

database:
  path: /config/frigate.db

mqtt:
  enabled: true
  host: homeassistant.local
  port: 1883
  user: admin
  password: password
  topic_prefix: frigate
  client_id: frigate

ui:
  live_mode: mse
  use_experimental: true

birdseye:
  enabled: false

detect:
  enabled: true

detectors:
  ov:
    type: openvino
    device: GPU
    model:
      path: /openvino-model/ssdlite_mobilenet_v2.xml

model:
  width: 300
  height: 300
  input_tensor: nhwc
  input_pixel_format: bgr
  labelmap_path: /openvino-model/coco_91cl_bkgr.txt

snapshots:
  enabled: true

record:
  enabled: true
  expire_interval: 60
  retain:
    days: 12
    mode: all
  events:
    retain:
      default: 12
      mode: motion

ffmpeg:
  hwaccel_args: preset-vaapi

logger:
  default: error
  logs:
    frigate.mqtt: error
    frigate.app: error
    frigate.ffmpeg: critical

go2rtc:
  log:
    format: text
    level: error
#    api: trace
#    exec: debug
#    ngrok: info
#    rtsp: warn
#    streams: error
#    webrtc: fatal
  rtsp:
    default_query: mp4
  streams:
    cam3:
    - rtsp://admin:password@192.168.1.62:554
    cam3_sub:
    - rtsp://admin:password@192.168.1.62:554/Streaming/channels/2

cameras:
  cam3:
    ffmpeg:
      inputs:
      - path: rtsp://127.0.0.1:8554/cam3
        input_args: preset-rtsp-restream-low-latency
        roles:
        - record
      - path: rtsp://127.0.0.1:8554/cam3_sub
        input_args: preset-rtsp-restream
        roles:
        - detect
      output_args:
        record: preset-record-generic-audio-copy
    motion:
      mask:
      - 0,176,110,176,110,155,0,155

docker-compose file or Docker CLI command

docker run
  -d
  --name='frigate'
  --net='bridge'
  --privileged=true
  -e TZ="Europe/Moscow"
  -e HOST_OS="Unraid"
  -e HOST_HOSTNAME="SERVER"
  -e HOST_CONTAINERNAME="frigate"
  -e 'FRIGATE_RTSP_PASSWORD'='password'
  -e 'LIBVA_DRIVER_NAME'='iHD'
  -l net.unraid.docker.managed=dockerman
  -l net.unraid.docker.webui='http://[IP]:[PORT:5000]'
  -l net.unraid.docker.icon='https://raw.githubusercontent.com/yayitazale/unraid-templates/main/frigate.png'
  -p '5000:5000/tcp'
  -p '8554:8554/tcp'
  -p '8555:8555/tcp'
  -p '8555:8555/udp'
  -p '1984:1984/tcp'
  -v '/mnt/user/appdata/frigate':'/config':'rw'
  -v '/mnt/disks/cameras/':'/media/frigate':'rw,slave'
  -v '/etc/localtime':'/etc/localtime':'rw'
  --device='/dev/dri'
  --shm-size=256mb
  --mount type=tmpfs,target=/tmp/cache,tmpfs-size=2726297600
  --restart unless-stopped 'ghcr.io/blakeblackshear/frigate:stable'

Relevant log output

2024-02-01 19:33:04.112890288  [INFO] Preparing Frigate...
2024-02-01 19:33:04.140557029  [INFO] Starting Frigate...
2024-02-01 19:33:06.708554578  [2024-02-01 19:33:06] frigate.app                    INFO    : Starting Frigate (0.13.0-01e2d20)
2024-02-01 19:33:10.350863638  [2024-02-01 19:33:10] frigate.config                 WARNING : Customizing more than a detector model path is unsupported.
2024-02-01 19:33:12.743140257  Process detector:ov:
2024-02-01 19:33:12.746019189  Traceback (most recent call last):
2024-02-01 19:33:12.746050491    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-02-01 19:33:12.746053039      self.run()
2024-02-01 19:33:12.746055072    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-02-01 19:33:12.746061394      self._target(*self._args, **self._kwargs)
2024-02-01 19:33:12.746064435    File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-02-01 19:33:12.746091310      object_detector = LocalObjectDetector(detector_config=detector_config)
2024-02-01 19:33:12.746111650    File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
2024-02-01 19:33:12.746114358      self.detect_api = create_detector(detector_config)
2024-02-01 19:33:12.746116276    File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2024-02-01 19:33:12.746117776      return api(detector_config)
2024-02-01 19:33:12.746119702    File "/opt/frigate/frigate/detectors/plugins/openvino.py", line 32, in __init__
2024-02-01 19:33:12.746121416      self.interpreter = self.ov_core.compile_model(
2024-02-01 19:33:12.746123545    File "/usr/local/lib/python3.9/dist-packages/openvino/runtime/ie_api.py", line 399, in compile_model
2024-02-01 19:33:12.746126909      super().compile_model(model, device_name, {} if config is None else config),
2024-02-01 19:33:12.746181038  RuntimeError: cldnn program build failed! [GPU] clWaitForEvents, error code: -14

Operating system

UNRAID

Install method

Docker Compose

Coral version

CPU (no coral)

Any other information that may be helpful

No response

@NickM-27
Copy link
Collaborator

NickM-27 commented Feb 1, 2024

@NateMeyer have you seen this before?

@Shlyakhoff
Copy link
Author

Shlyakhoff commented Feb 1, 2024

No, I just try to setup it. Actually no, it seems I tried a month ago to setup OpenVINO when Frigate has ver 0,12 and it was the same.

@Shlyakhoff
Copy link
Author

Shlyakhoff commented Feb 1, 2024

This is I can see when set device=AUTO, but as I mentioned earlier, it doesn't work when detect is enabled.

image

image

@NickM-27
Copy link
Collaborator

NickM-27 commented Feb 1, 2024

to be clear I tagged someone else with that question

@NateMeyer
Copy link
Contributor

No that is a new one to me. I'll see what I can dig up later today.

@Shlyakhoff
Copy link
Author

It looks like the container is missing something like this one

@NickM-27
Copy link
Collaborator

NickM-27 commented Feb 2, 2024

That page is outdated. Also, many users are using this in GPU mode so it will likely be something host specific

@Shlyakhoff
Copy link
Author

Clear. Let me please know if I need to share more information about the case.

@Shlyakhoff
Copy link
Author

Shlyakhoff commented Feb 6, 2024

I just wanted to add that I found a message in syslog when I am trying to launch a Frigate with Openvino and device GPU
image

@Shlyakhoff
Copy link
Author

it seems that the error occurs that there is some OpenCL bag in kernel because of there are similar messages on the Internet related to the incorrect work of hardware acceleration for example in Plex. Hopefully, a solution to this problem will be found.

@dsolva
Copy link

dsolva commented Feb 23, 2024

I have the exact same issue, same cpu in a nuc but with proxmox.

Indeed Plex struggled for a long time with some of these processors after an update but was resolved a few weeks back. As far as i know their issue was related to changes in the drivers used.

@jjak0b
Copy link

jjak0b commented Mar 7, 2024

Same issue on a docker nested container inside a proxmox unprivileged container using intel N3350.
Does the container and host both need some packages ? if so then which are needed ?

@a-bali
Copy link

a-bali commented Mar 28, 2024

I seem to have the very same issue, is there any solution already?

@Shlyakhoff
Copy link
Author

I seem to have the very same issue, is there any solution already?

No, I didn't find and just bought Google Coral TPU

@henryouly
Copy link

I have intel J4105 with Proxmox 8.1 / debian 12 docker LXC running into the same "GPU Hang" syslog. Reading a similar issue #5799 that suggests some kernel issue, I eventually replace Proxmox with 7.4 and debian 11 LXC, and the issue is resolved. My kernel in LXC is 5.15.102-1-pve. I think the kernel in Proxmox 8.1 is probably 6.5.11-8-pve.

@Zanadar
Copy link

Zanadar commented Apr 23, 2024

Same issue here with proxmox 8.1.10 (kernel 6.5.13-5-pve) and frigate running in LXC

@esand
Copy link

esand commented Apr 24, 2024

Same issue here with proxmox 8.1.10 (kernel 6.5.13-5-pve) and frigate running in LXC

I am running Proxmox as well and it was working on kernel 6.5.13-3. I just updated to Proxmox 8.2.2 today and it uses kernel 6.8.4-2 and Frigate is unable to detect my GPU:

2024-04-24 16:31:55.145428974  Process detector:ov:
2024-04-24 16:31:55.148000145  Traceback (most recent call last):
2024-04-24 16:31:55.148038397    File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-04-24 16:31:55.148044573      self.run()
2024-04-24 16:31:55.148051060    File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-04-24 16:31:55.148063664      self._target(*self._args, **self._kwargs)
2024-04-24 16:31:55.148072342    File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-04-24 16:31:55.148131608      object_detector = LocalObjectDetector(detector_config=detector_config)
2024-04-24 16:31:55.148163715    File "/opt/frigate/frigate/object_detection.py", line 53, in __init__
2024-04-24 16:31:55.148169848      self.detect_api = create_detector(detector_config)
2024-04-24 16:31:55.148175713    File "/opt/frigate/frigate/detectors/__init__.py", line 18, in create_detector
2024-04-24 16:31:55.148180348      return api(detector_config)
2024-04-24 16:31:55.148185865    File "/opt/frigate/frigate/detectors/plugins/openvino.py", line 32, in __init__
2024-04-24 16:31:55.148190875      self.interpreter = self.ov_core.compile_model(
2024-04-24 16:31:55.148196953    File "/usr/local/lib/python3.9/dist-packages/openvino/runtime/ie_api.py", line 399, in compile_model
2024-04-24 16:31:55.148205619      super().compile_model(model, device_name, {} if config is None else config),
2024-04-24 16:31:55.148213584  RuntimeError: Failed to create plugin /usr/local/lib/python3.9/dist-packages/openvino/libs/libopenvino_intel_gpu_plugin.so for device GPU
2024-04-24 16:31:55.148286612  Please, check your environment
2024-04-24 16:31:55.148294444  Check 'error_code == 0' failed at src/plugins/intel_gpu/src/runtime/ocl/ocl_device_detector.cpp:194:
2024-04-24 16:31:55.148303564  [GPU] No supported OCL devices found or unexpected error happened during devices query.
2024-04-24 16:31:55.148309311  [GPU] Please check OpenVINO documentation for GPU drivers setup guide.
2024-04-24 16:31:55.148365420  [GPU] clGetPlatformIDs error code: -1001

Nothing has changed in my config, and it's rather simple for detectors:

detectors:
  ov:
    type: openvino
    device: GPU

I'm guessing it's something to do with how the kernel may be exposing devices, and/or with an update required to some libs in the Frigate container?

FYI - if I change to device: AUTO, it works fine. I can't tell if it's actually using the GPU or not since vainfo doesn't work for me (no privs), but I'm guessing not... inference is up at around double what it used to be.

@Zanadar
Copy link

Zanadar commented Apr 24, 2024

@esand I think it is most likely that your issue is not related to frigate. You probably need to reconfigure the GPU passthrough to LXC/VM after Proxmox upgrade.

@esand
Copy link

esand commented Apr 25, 2024

@esand I think it is most likely that your issue is not related to frigate. You probably need to reconfigure the GPU passthrough to LXC/VM after Proxmox upgrade.

The /dev/dri devices are still visible in both the linux container and the frigate container. Permissions are correct and I've still got hwaccel working just fine. As far as I'm aware, no changes in Proxmox 8.2.2 impacted hardware passthrough configurations and my LXC still boots up just fine (it would error out on a bad config). I also have other devices that I do passthrough with in other containers and those are still functioning fine.

@dsolva
Copy link

dsolva commented Apr 25, 2024

Same observation as @esand. Hwaccel working fine and other cotainers working with the gpu (e.g. plex transcoding).

@Zanadar
Copy link

Zanadar commented Apr 26, 2024

I just upgraded proxmox to 8.2.2 and have the same issue as described above with intel gpu passthrough

2024-04-26 10:58:11.974474134 [2024-04-26 10:58:11] detector.ov INFO : Starting detection process: 306
2024-04-26 10:58:12.048736394 Process detector:ov:
2024-04-26 10:58:12.049568318 Traceback (most recent call last):
2024-04-26 10:58:12.049570624 File "/usr/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
2024-04-26 10:58:12.049571729 self.run()
2024-04-26 10:58:12.049572957 File "/usr/lib/python3.9/multiprocessing/process.py", line 108, in run
2024-04-26 10:58:12.049574044 self._target(*self._args, **self._kwargs)
2024-04-26 10:58:12.049575166 File "/opt/frigate/frigate/object_detection.py", line 102, in run_detector
2024-04-26 10:58:12.049576263 object_detector = LocalObjectDetector(detector_config=detector_config)
2024-04-26 10:58:12.049594351 File "/opt/frigate/frigate/object_detection.py", line 53, in init
2024-04-26 10:58:12.049595625 self.detect_api = create_detector(detector_config)
2024-04-26 10:58:12.049596786 File "/opt/frigate/frigate/detectors/init.py", line 18, in create_detector
2024-04-26 10:58:12.049613578 return api(detector_config)
2024-04-26 10:58:12.049614822 File "/opt/frigate/frigate/detectors/plugins/openvino.py", line 32, in init
2024-04-26 10:58:12.049615982 self.interpreter = self.ov_core.compile_model(
2024-04-26 10:58:12.049617125 File "/usr/local/lib/python3.9/dist-packages/openvino/runtime/ie_api.py", line 399, in compile_model
2024-04-26 10:58:12.049628693 super().compile_model(model, device_name, {} if config is None else config),
2024-04-26 10:58:12.049829684 RuntimeError: Failed to create plugin /usr/local/lib/python3.9/dist-packages/openvino/libs/libopenvino_intel_gpu_plugin.so for device GPU
2024-04-26 10:58:12.049831211 Please, check your environment
2024-04-26 10:58:12.049832366 Check 'error_code == 0' failed at src/plugins/intel_gpu/src/runtime/ocl/ocl_device_detector.cpp:194:
2024-04-26 10:58:12.049833459 [GPU] No supported OCL devices found or unexpected error happened during devices query.
2024-04-26 10:58:12.049834541 [GPU] Please check OpenVINO documentation for GPU drivers setup guide.
2024-04-26 10:58:12.049835571 [GPU] clGetPlatformIDs error code: -1001

@Zanadar
Copy link

Zanadar commented Apr 26, 2024

looks like this is an issue with the kernel included in the new proxmox 8.2.2
#10785

loading the previous kernel in proxmox with the following guide solved my issue until a new proxmox release comes out.
https://engineerworkshop.com/blog/how-to-revert-a-proxmox-kernel-update/

Copy link

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale label May 27, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 30, 2024
@luke3butler
Copy link

Not sure when this was fixed, but I just commented out the GRUB_DEFAULT setting I had in place to use the previous kernel, ran update-grub, rebooted, and everything is working as it should.

I'm on version 6.8.8-3-pve now.

@dsolva
Copy link

dsolva commented Jul 19, 2024

Not sure when this was fixed, but I just commented out the GRUB_DEFAULT setting I had in place to use the previous kernel, ran update-grub, rebooted, and everything is working as it should.

I'm on version 6.8.8-3-pve now.

Just updated to 6.8.8-3-pve to test and no success. Do you have similar setup that OP?

@luke3butler
Copy link

Not sure when this was fixed, but I just commented out the GRUB_DEFAULT setting I had in place to use the previous kernel, ran update-grub, rebooted, and everything is working as it should.
I'm on version 6.8.8-3-pve now.

Just updated to 6.8.8-3-pve to test and no success. Do you have similar setup that OP?

Yes, similar to OP. The host is Proxmox 8 with an i7-8700 CPU.

I was experiencing the same exact issue, prior to updating the kernel to 6.8.8-3. I've rebooted several times, verified that the newer kernel was actually being used, and haven't experienced any issues.

ffmpeg:
  hwaccel_args: preset-intel-qsv-h264

detectors:
  ov:
    type: openvino
    device: GPU
    model:
      path: /openvino-model/ssdlite_mobilenet_v2.xml

@Fahmula
Copy link

Fahmula commented Jul 27, 2024

I recently updated to proxmox 8.2.4 with kernel 6.8.8-4 from 7.4 with 5.15.158-1-pve I'm experiencing the same issue. I tried kernel 6.8.8-3 and even 6.5 with no luck.

I decided to set up a proxmox 7.4 VM just to test it and it works perfectly. I tried the what was suggested here #12266 on proxmox 8 but it didn't solve my issue.

@henryouly
Copy link

I'm pretty sure this is an upstream issue in the compatibility between intel-compute-engine and the kernel GPU hang check functionality.

Here is a related discussion with the exact same CPU (J4105)
intel/compute-runtime#679

As far as I'm aware of, downgrading to kernel 5.15 seems to be the only solution.

@esand
Copy link

esand commented Aug 3, 2024

As far as I'm aware of, downgrading to kernel 5.15 seems to be the only solution.

I don't believe that intel/compute-runtime#679 is the culprit, but rather intel/compute-runtime#710. If you put some ENV variables in to override some GPU settings it works, or if you update the openvino libraries (#10785).

There's supposedly a fix in the works to the kernel code to correct the issue, but until then either ENV variables or updating openvino appear to solve the problem.

It might be best to close this and other related issues and point them all to #10785 which documents both potential fixes.

@Fahmula
Copy link

Fahmula commented Aug 5, 2024

As far as I'm aware of, downgrading to kernel 5.15 seems to be the only solution.

I don't believe that intel/compute-runtime#679 is the culprit, but rather intel/compute-runtime#710. If you put some ENV variables in to override some GPU settings it works, or if you update the openvino libraries (#10785).

There's supposedly a fix in the works to the kernel code to correct the issue, but until then either ENV variables or updating openvino appear to solve the problem.

It might be best to close this and other related issues and point them all to #10785 which documents both potential fixes.

None of these solutions works for me. It seems my J4125 just isn't supported.

@henryouly
Copy link

I don't believe that intel/compute-runtime#679 is the culprit, but rather intel/compute-runtime#710. If you put some ENV variables in to override some GPU settings it works, or if you update the openvino libraries (#10785).

There's supposedly a fix in the works to the kernel code to correct the issue, but until then either ENV variables or updating openvino appear to solve the problem.

It might be best to close this and other related issues and point them all to #10785 which documents both potential fixes.

The issue you mentioned is about unable to detect GPU, which I think #10785 is the right thread to merge with. OP, @Fahmula and myself experienced a different one. The one is related to J4105/J4125 specifically, and relevant logs are clearly different than the one you posted in #9574 (comment). As @Fahmula mentioned, none of the solutions work.

@esand
Copy link

esand commented Aug 5, 2024

@Fahmula and myself experienced a different one. The one is related to J4105/J4125 specifically, and relevant logs are clearly different than the one you posted

My apologies - it seems you are indeed correct. I think what suckered me in to posting on this thread was that my error was almost identical to OP and I thought they were the same thing initially.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests