You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GitNexus embeddings run on CPU by default on Linux systems with Intel integrated/discrete GPUs (UHD, Arc), because the current device selection only covers CUDA (NVIDIA) on Linux. The relevant discussion: "[Idea] is there any way i can use my main gpu for embeddings instead of integrated Intel GPU?"#93
Current State
Transformers.js (@huggingface/transformers v4.1) wraps ONNX Runtime and exposes these device strings: cuda, dml, cpu, wasm, webgpu, coreml. OpenVINO is not exposed by transformers.js — it cannot be selected via the device parameter passed to pipeline().
The current device selection lives in devices.ts (per PR #1385 which is open):
darwin → ['webgpu', 'coreml', 'cpu']
win32 → ['dml', 'cpu']
linux → ['cuda', 'cpu'] // ← no Intel GPU path
ONNX Runtime itself does support Intel GPUs via its OpenVINO Execution Provider on Linux. The provider works with Intel integrated GPUs (UHD) and discrete GPUs (Arc) via OpenCL/Level Zero.
Options
Option A: Add OpenVINO probe to devices.ts (upstream transformers.js)
Add a probe function isOpenVinoAvailable() to devices.ts that:
Checks for the OpenVINO EP binary in onnxruntime-node
Checks for Intel GPU runtime (OpenCL/Level Zero)
Updates Linux device ordering: ['cuda', 'openvino', 'cpu'] or ['openvino', 'cuda', 'cpu']
Problem: transformers.js does not expose openvino as a device string. This would require a transformers.js PR first.
Option B: Fork transformers.js to add openvino device support
Fork @huggingface/transformers, add 'openvino' to the device string mapping, configure ONNX Runtime SessionOptions with OpenVINOExecutionProvider, and point GitNexus at the fork.
Effort: Medium-high. Requires maintaining a fork indefinitely.
Option C: Replace transformers.js with ONNX Runtime directly
Bypass transformers.js for inference entirely. Call onnxruntime-node API directly:
This gives full control over execution providers, including OpenVINO on Linux for Intel GPUs.
Tradeoffs:
❌ Must handle tokenization manually (transformers.js's Rust-based tokenizers)
❌ Must manage ONNX model download/cache yourself
✅ Full control over ONNX Runtime execution providers — Intel GPU via OpenVINO would just work
✅ Works with the existing ONNX model (snowflake-arctic-embed-xs)
✅ Eliminates transformers.js as a dependency layer
Option D: HTTP embedding mode (works today, no code changes)
GitNexus already has a full HTTP embedding mode via env vars:
GITNEXUS_EMBEDDING_URL=http://localhost:11434/v1 # Ollama, or any OpenAI-compatible API
GITNEXUS_EMBEDDING_MODEL=snowflake/snowflake-arctic-embed-xs
GITNEXUS_EMBEDDING_API_KEY=unused
Run Ollama with an OpenVINO-backed model on the same machine or a GPU server. Zero GitNexus code changes.
Questions for the community
Is anyone already running embeddings via the HTTP mode with Ollama/OpenVINO?
Would Option C (direct ONNX Runtime) be worth the added complexity, or is HTTP mode sufficient?
Is there interest in co-maintaining a transformers.js fork with OpenVINO support (Option B)?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
GitNexus embeddings run on CPU by default on Linux systems with Intel integrated/discrete GPUs (UHD, Arc), because the current device selection only covers CUDA (NVIDIA) on Linux. The relevant discussion: "[Idea] is there any way i can use my main gpu for embeddings instead of integrated Intel GPU?" #93
Current State
Transformers.js (
@huggingface/transformersv4.1) wraps ONNX Runtime and exposes these device strings:cuda,dml,cpu,wasm,webgpu,coreml. OpenVINO is not exposed by transformers.js — it cannot be selected via thedeviceparameter passed topipeline().The current device selection lives in
devices.ts(per PR #1385 which is open):ONNX Runtime itself does support Intel GPUs via its OpenVINO Execution Provider on Linux. The provider works with Intel integrated GPUs (UHD) and discrete GPUs (Arc) via OpenCL/Level Zero.
Options
Option A: Add OpenVINO probe to devices.ts (upstream transformers.js)
Add a probe function
isOpenVinoAvailable()todevices.tsthat:['cuda', 'openvino', 'cpu']or['openvino', 'cuda', 'cpu']Problem: transformers.js does not expose
openvinoas a device string. This would require a transformers.js PR first.Option B: Fork transformers.js to add
openvinodevice supportFork
@huggingface/transformers, add'openvino'to the device string mapping, configure ONNX RuntimeSessionOptionswithOpenVINOExecutionProvider, and point GitNexus at the fork.Effort: Medium-high. Requires maintaining a fork indefinitely.
Option C: Replace transformers.js with ONNX Runtime directly
Bypass transformers.js for inference entirely. Call
onnxruntime-nodeAPI directly:This gives full control over execution providers, including OpenVINO on Linux for Intel GPUs.
Tradeoffs:
snowflake-arctic-embed-xs)Option D: HTTP embedding mode (works today, no code changes)
GitNexus already has a full HTTP embedding mode via env vars:
GITNEXUS_EMBEDDING_URL=http://localhost:11434/v1 # Ollama, or any OpenAI-compatible API GITNEXUS_EMBEDDING_MODEL=snowflake/snowflake-arctic-embed-xs GITNEXUS_EMBEDDING_API_KEY=unusedRun Ollama with an OpenVINO-backed model on the same machine or a GPU server. Zero GitNexus code changes.
Questions for the community
webgpu/coremlpath meet your needs?References
devices.tsis the right place to add platform-specific GPU probesContext: This is a fork-specific discussion to align on approach before opening a feature PR. The upstream repo is microsoft/onnxruntime.
Beta Was this translation helpful? Give feedback.
All reactions