Intel GPU support for embeddings via OpenVINO #1953

enihcam · 2026-05-31T15:28:04Z

enihcam
May 31, 2026

Problem

GitNexus embeddings run on CPU by default on Linux systems with Intel integrated/discrete GPUs (UHD, Arc), because the current device selection only covers CUDA (NVIDIA) on Linux. The relevant discussion: "[Idea] is there any way i can use my main gpu for embeddings instead of integrated Intel GPU?" #93

Current State

Transformers.js (@huggingface/transformers v4.1) wraps ONNX Runtime and exposes these device strings: cuda, dml, cpu, wasm, webgpu, coreml. OpenVINO is not exposed by transformers.js — it cannot be selected via the device parameter passed to pipeline().

The current device selection lives in devices.ts (per PR #1385 which is open):

darwin  → ['webgpu', 'coreml', 'cpu']
win32   → ['dml', 'cpu']
linux   → ['cuda', 'cpu']  // ← no Intel GPU path

ONNX Runtime itself does support Intel GPUs via its OpenVINO Execution Provider on Linux. The provider works with Intel integrated GPUs (UHD) and discrete GPUs (Arc) via OpenCL/Level Zero.

Options

Option A: Add OpenVINO probe to devices.ts (upstream transformers.js)

Add a probe function isOpenVinoAvailable() to devices.ts that:

Checks for the OpenVINO EP binary in onnxruntime-node
Checks for Intel GPU runtime (OpenCL/Level Zero)
Updates Linux device ordering: ['cuda', 'openvino', 'cpu'] or ['openvino', 'cuda', 'cpu']

Problem: transformers.js does not expose openvino as a device string. This would require a transformers.js PR first.

Option B: Fork transformers.js to add `openvino` device support

Fork @huggingface/transformers, add 'openvino' to the device string mapping, configure ONNX Runtime SessionOptions with OpenVINOExecutionProvider, and point GitNexus at the fork.

Effort: Medium-high. Requires maintaining a fork indefinitely.

Option C: Replace transformers.js with ONNX Runtime directly

Bypass transformers.js for inference entirely. Call onnxruntime-node API directly:

const ort = require('onnxruntime-node');
const session = new ort.InferenceSession(modelPath, {
  providers: ['OpenVINOExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider']
});

This gives full control over execution providers, including OpenVINO on Linux for Intel GPUs.

Tradeoffs:

❌ Must handle tokenization manually (transformers.js's Rust-based tokenizers)
❌ Must manage ONNX model download/cache yourself
✅ Full control over ONNX Runtime execution providers — Intel GPU via OpenVINO would just work
✅ Works with the existing ONNX model (snowflake-arctic-embed-xs)
✅ Eliminates transformers.js as a dependency layer

Option D: HTTP embedding mode (works today, no code changes)

GitNexus already has a full HTTP embedding mode via env vars:

GITNEXUS_EMBEDDING_URL=http://localhost:11434/v1   # Ollama, or any OpenAI-compatible API
GITNEXUS_EMBEDDING_MODEL=snowflake/snowflake-arctic-embed-xs
GITNEXUS_EMBEDDING_API_KEY=unused

Run Ollama with an OpenVINO-backed model on the same machine or a GPU server. Zero GitNexus code changes.

Questions for the community

Is anyone already running embeddings via the HTTP mode with Ollama/OpenVINO?
Would Option C (direct ONNX Runtime) be worth the added complexity, or is HTTP mode sufficient?
Is there interest in co-maintaining a transformers.js fork with OpenVINO support (Option B)?
For macOS users: does PR feat(embeddings) Support macOS accelerated device fallback #1385's webgpu/coreml path meet your needs?

References

ONNX Runtime OpenVINO EP docs: https://cassiebreviu.github.io/onnxruntime/docs/execution-providers/OpenVINO-ExecutionProvider.html
Transformers.js device mapping (closed to OpenVINO): upstream limitation
PR feat(embeddings) Support macOS accelerated device fallback #1385 (open): macOS WebGPU/CoreML device selection refactor — devices.ts is the right place to add platform-specific GPU probes
Issue Use ollama for embeddings #695: "Use ollama for embeddings" — HTTP mode is the answer there
Issue fix(core): remove duplicate ftsLoaded declaration #93: Intel integrated GPU question

Context: This is a fork-specific discussion to align on approach before opening a feature PR. The upstream repo is microsoft/onnxruntime.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Intel GPU support for embeddings via OpenVINO #1953

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Intel GPU support for embeddings via OpenVINO #1953

Uh oh!

enihcam May 31, 2026

Problem

Current State

Options

Option A: Add OpenVINO probe to devices.ts (upstream transformers.js)

Option B: Fork transformers.js to add openvino device support

Option C: Replace transformers.js with ONNX Runtime directly

Option D: HTTP embedding mode (works today, no code changes)

Questions for the community

References

Replies: 0 comments

enihcam
May 31, 2026

Option B: Fork transformers.js to add `openvino` device support