I'm searching my way through local ai #102

caplam · 2026-05-07T09:33:48Z

caplam
May 7, 2026

Hello,
I'm very new to ai and struggle to choose what components should i choose to buil a local ai stack.
For now i don't have any dGPU.
My hardware is a minisforum ms02 with an intel core ultra 285HX with 32Gb ram for the iGPU (32 is the max allowed for the igpu. 256Gb total ram).
The pc is running proxmox and for now all ai stuff is running in docker containers (docker host is a lxc container).
Drivers are installed ans devices (/dev/dri and /dev/accel) are usable in docker containers.

Openarc seems very promising for my use case (intel hardware) as i plan to add an intel dGPU later if i succeed building a useful local ai stack.

But for now i know very little about inferencing, agents, llm,....
I build and run locally openarc and openwebui. I have also a container ollama-intel but the project doesn't seem to have recent update and is not compatible with recent models.
I have read this repo but i still can't figure out how to start.
From a beginner perspective it's still hard to understand.
For example once the container is running the first thing to do is to add a model.
I mapped a folder on my host to /models in container. But how do i put a model here? what should be the /models folder structure and what files are needed ?
is this strucutre right ?
/models/model_name1 with files model.xml model.bin
/models/model_name2 with files model.xml model.bin

How can i use openarc with agents ?
The vast majority of content available is about ollama which functionnalities not available in openarc like models download.

caplam · 2026-05-07T14:12:16Z

caplam
May 7, 2026
Author

I found how to download models but have some trouble.
From my docker host in the models directory i clone a model repo (Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov)
then add the model with:
openarc add --model-name Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov --model-path /models --engine openvino --model-type llm --device GPU
and load with:
openarc load Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov
got error:
loading Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov ...working error: 500 Response: {"detail":"Failed to load model: Model loading failed: Combination 'openvino/llm' not supported. Available: ovgenai/llm, ovgenai/vlm, ovgenai/whisper, openvino/qwen3_asr, openvino/kokoro, openvino/qwen3_tts_custom_voice, openvino/qwen3_tts_voice_design, openvino/qwen3_tts_voice_clone, optimum/emb, optimum/rerank"}

so i removed the model:
`openarc list --remove Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov

Usage: openarc list [OPTIONS] [MODEL_NAME]

Try 'openarc list --help' for help
╭─ Error ─────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ No such option: --remove Did you mean --rm? │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

root@d3a3cad88907:/app# openarc list --rm Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov
Model configuration removed: Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov`

Note that in the doc the command is openarc list --remove

The i added back the model with a different option:
openarc add --model-name Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov --model-path /models --engine ovgenai --model-type llm --device GPU Model configuration saved: Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov

and loaded it:
`openarc load Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov
loading Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov
...working
error: 400
Response: {"detail":"model_name 'Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov' already registered"}

────────────────────────────────────────────────────────────
All models failed to load! (0/1)
✗ Failed: Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov`

openarc status Getting model status... Loaded Models (1) ┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓ ┃ model_name ┃ device ┃ model_type ┃ engine ┃ status ┃ time_loaded ┃ ┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩ │ Echo9Zulu/Qwen3-8B-… │ GPU │ llm │ openvino │ failed │ 2026-05-07T13:45:08… │ └──────────────────────┴────────────┴─────────────────┴────────────┴────────────┴──────────────────────┘

How do you unload a model ?

edit: found. It's trivial (but not in doc) replace load with unload :)
Tried again to register and load this model but without success.
`openarc load Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov
loading Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov
...working
error: 500
Response: {"detail":"Failed to load model: Model loading failed: Exception from
/home/jenkins/agent/workspace/private-ci/ie/build-linux-manylinux_2_28/b/repos/openvino.genai/src/cpp/src/utils
.cpp:455:\nCould not find a model in the directory '"/models"'\n"}

────────────────────────────────────────────────────────────
All models failed to load! (0/1)
✗ Failed: Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov
Use 'openarc status' to see loaded models.
root@d3a3cad88907:/app# openarc status
Getting model status...
Loaded Models (1)
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┓
┃ model_name ┃ device ┃ model_type ┃ engine ┃ status ┃ time_loaded ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━┩
│ Echo9Zulu/Qwen3-8B-… │ GPU │ llm │ ovgenai │ failed │ 2026-05-07T14:05:17… │
└──────────────────────┴────────────┴─────────────────┴────────────┴────────────┴──────────────────────┘

Total models loaded: 1`

I guess ovgenai is not the correct engine.

5 replies

SearchSavior May 7, 2026
Maintainer

Hey there @caplam, thanks for your feedback so far. To help you out:

The model-name is like an alias, not sure it needs a slash. Model path must be an absolute path to the directory where your model weights live ie /models is wrong. Try doing openarc add --help; for llm choose ovgenai as engine, model type llm and device either CPU or GPU.0. download weights with the huggingface hub cli tool like;

hf download Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov --local-dir Echo9Zulu/Qwen3-8B-ShiningValiant3-int4-asym-ov

Then set model path as absolute path to wherever --local-dir was

caplam May 8, 2026
Author

Thank you for your answer.
I've just realized that hf stands for huggingface-cli.
I tried the command you gave and i have hf: command unknown 🫨

SearchSavior May 8, 2026
Maintainer

@caplam run it from inside the container after the python environment is built. the huggingface_hub package is part of transformers now, so it should work

caplam May 9, 2026
Author

I meant i tried "hf download ....." from inside container and it didn't work (unknown command)
but "huggingface-cli download ...." worked. is it an alias i have to setup?

SearchSavior May 9, 2026
Maintainer

@caplam huggingface-hub might need an update. huggingface-cli was the old way. Sometimes transformers and its dependencies can be finnicky. I think some of these issues come from how the nightly wheels for optimum pin transformers versions.
uv pip install huggingface-hub -U should resolve to 1.14.0

caplam · 2026-05-07T14:43:08Z

caplam
May 7, 2026
Author

Step by step i progress but still not there:

I tried with another model and changed the add command parameters ( model path was not correct)
`openarc load DeepSeek-R1-0528-Qwen3-8B-int8_asym-ov
loading DeepSeek-R1-0528-Qwen3-8B-int8_asym-ov
...working
error: 500
Response: {"detail":"Failed to load model: Model loading failed: Exception from
src/inference/src/cpp/core.cpp:84:\nCheck 'm_weights->size() >= offset + size' failed at
src/core/xml_util/src/xml_deserialize_util.cpp:895:\nIncorrect weights in bin file!\n\n"}

────────────────────────────────────────────────────────────
All models failed to load! (0/1)
✗ Failed: DeepSeek-R1-0528-Qwen3-8B-int8_asym-ov`
looking at the model folder it appears that after the clone command (copied from the repo) the model.bin file is not correct it's only 135 bytes.
In fact all files marked with "xet" tag in the repo are not downloaded. Do we have to download them manually ?

0 replies

caplam · 2026-05-08T07:37:48Z

caplam
May 8, 2026
Author

Finally got it working :)
For now only in a chat window with openwebui.
Only 8W on the gpu.
I know have to figure out how to add capabilities (agents, mcp servers, permanent instructions,....) , evaluate models for different use cases, ....

0 replies

caplam · 2026-05-08T07:46:29Z

caplam
May 8, 2026
Author

And here is a result of the first bench with default config (using GPU):
`openarc bench Qwen3-8B-ShiningValiant3-int4-asym-ov
working...

depth (prior): 0
input tokens: [512]
max tokens: [128]
runs: 5

benching... (5/5)

Qwen3-8B-ShiningValiant3-int4-asym-ov

┏━━━━━┳━━━┳━━━━━┳━━━━━┳━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ run ┃ d ┃ p ┃ n ┃ ttft(s) ┃ tpot(ms) ┃ prefill(t/s) ┃ decode(t/s) ┃ duration(s) ┃
┡━━━━━╇━━━╇━━━━━╇━━━━━╇━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ 1 │ 0 │ 512 │ 128 │ 2.50 │ 83.42 │ 205.0 │ 12.0 │ 13.09 │
│ 2 │ 0 │ 512 │ 128 │ 2.49 │ -1.00 │ 206.0 │ -1000.0 │ 2.49 │
│ 3 │ 0 │ 512 │ 128 │ 2.48 │ 82.83 │ 206.6 │ 12.1 │ 13.00 │
│ 4 │ 0 │ 512 │ 128 │ 2.49 │ 83.35 │ 205.9 │ 12.0 │ 13.07 │
│ 5 │ 0 │ 512 │ 128 │ 2.48 │ 82.56 │ 206.7 │ 12.1 │ 12.96 │
└─────┴───┴─────┴─────┴─────────┴──────────┴──────────────┴─────────────┴─────────────┘`

Does it seem correct to you ?

0 replies

caplam · 2026-05-08T08:20:12Z

caplam
May 8, 2026
Author

If you are ok with that i will continue to experiment and post here my results.
I tried to load a model optimized for NPU: without success. I got
`openarc load Qwen3-8B-int4-cw-ov
loading Qwen3-8B-int4-cw-ov
...working
error: 500
Response: {"detail":"Failed to load model: Model loading failed: Exception from
src/inference/src/cpp/core.cpp:117:\nException from src/inference/src/dev/plugin.cpp:54:\nException from
src/plugins/intel_npu/src/plugin/src/properties.cpp:908:\nUnsupported configuration key: NPU_MAX_TILES\n\n\n"}

────────────────────────────────────────────────────────────
All models failed to load! (0/1)
✗ Failed: Qwen3-8B-int4-cw-ov`

I guess it's a driver incompatibility but i don't know how to figure it out.
device properties are:
AVAILABLE_DEVICES: 3720 │ │ CACHE_DIR: │ │ CACHE_ENCRYPTION_CALLBACKS: UNSUPPORTED TYPE │ │ COMPILATION_NUM_THREADS: 24 │ │ DEVICE_ARCHITECTURE: 3720 │ │ DEVICE_GOPS: {<Type: 'bfloat16'>: 0.0, <Type: 'float16'>: 6553.60009765625, <Type: 'float32'>: 0.0, <Type: │ │ 'int8_t'>: 13107.2001953125, <Type: 'uint8_t'>: 13107.2001953125} │ │ DEVICE_ID: │ │ DEVICE_PCI_INFO: {domain: 0 bus: 0 device: 0xb function: 0} │ │ DEVICE_TYPE: Type.INTEGRATED │ │ DEVICE_UUID: 80d1d11eb73811eab3de0242ac130004 │ │ ENABLE_CPU_PINNING: False │ │ EXECUTION_DEVICES: NPU │ │ FULL_DEVICE_NAME: Intel(R) AI Boost │ │ LOG_LEVEL: Level.ERR │ │ MODEL_PRIORITY: Priority.MEDIUM │ │ MODEL_PTR: None │ │ NPU_BYPASS_UMD_CACHING: False │ │ NPU_COMPILATION_MODE_PARAMS: │ │ NPU_COMPILER_TYPE: CompilerType.DRIVER │ │ NPU_COMPILER_VERSION: 0 │ │ NPU_DEFER_WEIGHTS_LOAD: False │ │ NPU_DEVICE_ALLOC_MEM_SIZE: 0 │ │ NPU_DEVICE_TOTAL_MEM_SIZE: 235899301888 │ │ NPU_DISABLE_IDLE_MEMORY_PRUNING: False │ │ NPU_DRIVER_VERSION: 1777742308 │ │ NPU_PLATFORM: AUTO_DETECT │ │ NPU_RUN_INFERENCES_SEQUENTIALLY: False │ │ NPU_TURBO: False │ │ NUM_STREAMS: 1 │ │ OPTIMAL_NUMBER_OF_INFER_REQUESTS: 1 │ │ OPTIMIZATION_CAPABILITIES: FP16, INT8, EXPORT_IMPORT │ │ PERFORMANCE_HINT: PerformanceMode.LATENCY │ │ PERFORMANCE_HINT_NUM_REQUESTS: 1 │ │ PERF_COUNT: False │ │ RANGE_FOR_ASYNC_INFER_REQUESTS: 1, 10, 1 │ │ RANGE_FOR_STREAMS: 1, 4 │ │ WEIGHTS_PATH: │ │ WORKLOAD_TYPE: WorkloadType.DEFAULT

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I'm searching my way through local ai #102

Uh oh!

{{title}}

Uh oh!

Replies: 5 comments 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

I'm searching my way through local ai #102

Uh oh!

caplam May 7, 2026

Replies: 5 comments · 5 replies

Uh oh!

caplam May 7, 2026 Author

Uh oh!

SearchSavior May 7, 2026 Maintainer

Uh oh!

caplam May 8, 2026 Author

Uh oh!

SearchSavior May 8, 2026 Maintainer

Uh oh!

caplam May 9, 2026 Author

Uh oh!

SearchSavior May 9, 2026 Maintainer

Uh oh!

caplam May 7, 2026 Author

Uh oh!

caplam May 8, 2026 Author

Uh oh!

caplam May 8, 2026 Author

Uh oh!

caplam May 8, 2026 Author

caplam
May 7, 2026

Replies: 5 comments 5 replies

caplam
May 7, 2026
Author

SearchSavior May 7, 2026
Maintainer

caplam May 8, 2026
Author

SearchSavior May 8, 2026
Maintainer

caplam May 9, 2026
Author

SearchSavior May 9, 2026
Maintainer

caplam
May 7, 2026
Author

caplam
May 8, 2026
Author

caplam
May 8, 2026
Author

caplam
May 8, 2026
Author