Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpaca uses my CPU instead of my GPU (AMD) #139

Open
frandavid100 opened this issue Jul 10, 2024 · 108 comments
Open

Alpaca uses my CPU instead of my GPU (AMD) #139

frandavid100 opened this issue Jul 10, 2024 · 108 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@frandavid100
Copy link

I have noticed that Alpaca uses my CPU instead of my GPU. Here's a screenshot showing how it's using almost 40% of my CPU, and only 1% of my GPU.

Captura desde 2024-07-10 06-51-39

I'm using an AMD Radeon RX 6650 XT GPU, which is properly detected by the OS and used by other Flatpak apps like Steam. As you can see in this other screenshot:

Captura desde 2024-07-10 06-54-34

@frandavid100 frandavid100 added the bug Something isn't working label Jul 10, 2024
@Jeffser
Copy link
Owner

Jeffser commented Jul 10, 2024

Hi, yes this is a problem with ROCM and Flatpaks, I believe this is also a problem with Blender.

Whilst any flatpak can detect and use the GPU for some reason ROCM doesn't work out of the box, there must be a way but I haven't figured it out and it's a bit hard to test since I have an incompatible GPU.

For now I suggest you host an Ollama instance using docker and connect it to Alpaca using the remote connection option.

@frandavid100
Copy link
Author

There's no hurry, I use it sparsely and I can afford it to use the CPU for the time being.

Is there any way I can help to test a possible fix? Is my GPU supposed to be compatible?

@loulou64490
Copy link
Contributor

loulou64490 commented Jul 10, 2024 via email

@Jeffser
Copy link
Owner

Jeffser commented Jul 11, 2024

yeah that word does exist, though the problem isn't exactly the fact that it is inside a container, the problem is that ROCM doesn't work out of the box.

@olumolu
Copy link
Contributor

olumolu commented Jul 23, 2024

I think rocm need to be loaded separately.
https://github.com/ollama/ollama/releases/download/v0.2.8/ollama-linux-amd64-rocm.tgz
this contains the rocm driver. this is a real issue that need to be fixed.

@Jeffser
Copy link
Owner

Jeffser commented Jul 24, 2024

adding that as it is would mean making Alpaca 4 times heavier and not everybody would even need rocm, the problem here is that either the freedesktop runtime or the gnome runtime should include rocm, that or I might not know a better solution right now since I'm still new with flatpak packaging

@Jeffser
Copy link
Owner

Jeffser commented Jul 24, 2024

I might finally have a solution where the flatpak accesses the ROCM libraries from the system itself

@0chroma
Copy link

0chroma commented Jul 24, 2024

adding that as it is would mean making Alpaca 4 times heavier and not everybody would even need rocm, the problem here is that either the freedesktop runtime or the gnome runtime should include rocm, that or I might not know a better solution right now since I'm still new with flatpak packaging

You could always package it as an extension in that case

@Jeffser
Copy link
Owner

Jeffser commented Jul 24, 2024

yeah the problem with that is that I would need to make a different package for flathub

@Jeffser Jeffser changed the title Alpaca uses my CPU instead of my GPU Alpaca uses my CPU instead of my GPU (AMD) Jul 30, 2024
@TacoCake
Copy link

TacoCake commented Aug 4, 2024

Any progress on this? Anything you need help with in getting this done?

@Jeffser
Copy link
Owner

Jeffser commented Aug 4, 2024

do you have rocm installed on your system? I think I can make Ollama use the system installation

@Jeffser
Copy link
Owner

Jeffser commented Aug 4, 2024

if someone has ROCm installed and want to test this run these commands

flatpak override --filesystem=/opt/rocm com.jeffser.Alpaca
flatpak override --env=LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm/lib64:/app/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/sdk/llvm15/lib:/usr/lib/sdk/openjdk11/lib:/usr/lib/sdk/openjdk17/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib com.jeffser.Alpaca

This gives the Flatpak access to /opt/rocm and specifies that it is an available library, the rest are the default Flatpak libraries, just ignore that

@Jeffser Jeffser added the help wanted Extra attention is needed label Aug 4, 2024
@frandavid100
Copy link
Author

How can I install ROCm on my Silverblue machine? I tried to run "rpm-ostree install rocm" but I get a "packages not found" error.

@Jeffser
Copy link
Owner

Jeffser commented Aug 5, 2024

@olumolu
Copy link
Contributor

olumolu commented Aug 5, 2024

How can I install ROCm on my Silverblue machine? I tried to run "rpm-ostree install rocm" but I get a "packages not found" error.

Ask https://discussion.fedoraproject.org/ for help they actually help in this case

@Jeffser
Copy link
Owner

Jeffser commented Aug 5, 2024

I was looking around on what Flatpaks include and they have all the stuff needed to run an app with OpenCL (a mesa alternative to ROCm as far as I'm aware) but Ollama can't use it, my recommendation for now is to run Ollama separately of Alpaca and just connect to it as a remote connection

@TacoCake
Copy link

TacoCake commented Aug 8, 2024

Could you use Vulkan instead of trying to use ROCm? Kinda how GTP4ALL does it https://github.com/nomic-ai/gpt4all

Genuine question

@TacoCake
Copy link

TacoCake commented Aug 8, 2024

do you have rocm installed on your system? I think I can make Ollama use the system installation

I don't have ROCm on my system, since it's kind of a headache to install on openSUSE Tumbleweed

@olumolu
Copy link
Contributor

olumolu commented Aug 8, 2024

As far as i know the backend ollama use rocm instead of vulkan from front end this not too easy to implement this

@Shished
Copy link

Shished commented Aug 8, 2024

GPT4All uses llama.cpp backend while this app uses ollama.

@olumolu
Copy link
Contributor

olumolu commented Aug 8, 2024

Yes I don't lnow much about llama.cpp
But ollama use rocm
I this is really a issue that rocm is not installed on many systems

@TacoCake
Copy link

TacoCake commented Aug 8, 2024

As far as i know the backend ollama use rcom instead of vulkan from front end this not too easy to implement this

GPT4All uses llama.cpp backend while this app uses ollama.

Ahhh I see, sorry for the confusion. If anyone wants to track vulkan support on Ollama:

@olumolu
Copy link
Contributor

olumolu commented Aug 8, 2024

Yes if this get merged i hope it will bring vulkan to thos one also.
For the time being I dont thing we can do much

@Jeffser
Copy link
Owner

Jeffser commented Aug 8, 2024

do you have rocm installed on your system? I think I can make Ollama use the system installation

I don't have ROCm on my system, since it's kind of a headache to install on openSUSE Tumbleweed

I know, it's a headache everywhere including the Flatpak sandbox

@francus11
Copy link

flatpak override --filesystem=/opt/rocm com.jeffser.Alpaca
flatpak override --env=LD_LIBRARY_PATH=/opt/rocm/lib:/opt/rocm/lib64:/app/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/sdk/llvm15/lib:/usr/lib/sdk/openjdk11/lib:/usr/lib/sdk/openjdk17/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib com.jeffser.Alpaca

I installed rocm on fedora using this tutorial
https://fedoraproject.org/wiki/SIGs/HC#Installation
Still though, my GPU usage is 0%. Any other suggestions?

@Jeffser
Copy link
Owner

Jeffser commented Aug 13, 2024

Today I learned that ROCm is actually bundled with the Ollama binary... So I have no idea what to try now lol

image

(third line)

@Jeffser
Copy link
Owner

Jeffser commented Aug 13, 2024

Ollama says that AMD users should try the propitiatory driver tho

https://github.com/ollama/ollama/blob/main/docs/linux.md#amd-radeon-gpu-support

@Jeffser
Copy link
Owner

Jeffser commented Sep 11, 2024

I tried with every specific device type that flatpak uses, for some reason it only works when I use all

Could be related to https://gitlab.com/freedesktop-sdk/freedesktop-sdk/-/issues/1535#note_1310656438

Nice find, I think that's exactly what we need to make this work. For now I'll use all so that it at least works, once that becomes available I will change it

@Jeffser
Copy link
Owner

Jeffser commented Sep 11, 2024

By the way I just pushed an update to the extension that adds support for GFX1010 cards (mine is included hehe)

Afaik it covers RX5600xt and RX5700xt

If someone has one of those cards you'll need to set HSA_OVERRIDE_GFX_VERSION to 10.1.0

@P-Jay357
Copy link

P-Jay357 commented Sep 11, 2024

This might be a basic/stupid question, but how do I update the extension? Do I have to do it manually via the terminal, or will it get picked up as a software update? I'm pretty new to Linux/Flatpaks

@Jeffser
Copy link
Owner

Jeffser commented Sep 11, 2024

This might be a basic/stupid question, but how do I update the extension? Do I have to do it manually via the terminal, or will it get picked up as a software update? I'm pretty new to Linux/Flatpaks

Don't worry it's not a stupid question, there aren't a lot of extensions in Flathub anyways.

It should appear in the updates section of your software center, I believe this is the case with both Gnome Software and KDE Discover

It sometimes takes a couple of minutes to get picked up by your Flatpak installation, if you want to force an update use the flatpak update command

@czhang03
Copy link

czhang03 commented Sep 11, 2024

It seems like alpaca now runs fine on dedicated GPU. However, due to ollama limitations, it doesn't yet "run well" on integrated GPU, as it do not request more vram (GTT memory), and simply fallback to CPU.

For future reader that uses a AMD iGPU (APU), see the threads here: ollama/ollama#6282 , ROCm/ROCm#2014 , ollama/ollama#2637

@frandavid100
Copy link
Author

By the way I just pushed an update to the extension that adds support for GFX1010 cards (mine is included hehe)

Afaik it covers RX5600xt and RX5700xt

Should it work with an RX 6650xt card? Because it's still using my CPU instead.

@TheRsKing
Copy link

RX 6700 XT also not working

@Jeffser
Copy link
Owner

Jeffser commented Sep 12, 2024

As far as I know those cards don't need an override, they should just be supported out of the box

@TheRsKing
Copy link

Maybe user wide installed alpaca is the problem. I'll test again in the evening

@Jeffser
Copy link
Owner

Jeffser commented Sep 12, 2024

no no I just found it they are not supported, could you guys give me the output of rocminfo ?
I'll figure out what override you guys should use

@frandavid100
@TheRsKing

@daniwhal
Copy link

could you guys give me the output of rocminfo ? I'll figure out what override you guys should use

I have a 6800XT, and am experiencing the same issue - I hope I can be useful:

ROCk module is loaded
=====================    
HSA System Attributes    
=====================    
Runtime Version:         1.1
Runtime Ext Version:     1.4
System Timestamp Freq.:  1000.000000MHz
Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)
Machine Model:           LARGE                              
System Endianness:       LITTLE                             
Mwaitx:                  DISABLED
DMAbuf Support:          YES

==========               
HSA Agents               
==========               
*******                  
Agent 1                  
*******                  
  Name:                    AMD Ryzen 7 5700X3D 8-Core Processor
  Uuid:                    CPU-XX                             
  Marketing Name:          AMD Ryzen 7 5700X3D 8-Core Processor
  Vendor Name:             CPU                                
  Feature:                 None specified                     
  Profile:                 FULL_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        0(0x0)                             
  Queue Min Size:          0(0x0)                             
  Queue Max Size:          0(0x0)                             
  Queue Type:              MULTI                              
  Node:                    0                                  
  Device Type:             CPU                                
  Cache Info:              
    L1:                      32768(0x8000) KB                   
  Chip ID:                 0(0x0)                             
  ASIC Revision:           0(0x0)                             
  Cacheline Size:          64(0x40)                           
  Max Clock Freq. (MHz):   3000                               
  BDFID:                   0                                  
  Internal Node ID:        0                                  
  Compute Unit:            16                                 
  SIMDs per CU:            0                                  
  Shader Engines:          0                                  
  Shader Arrs. per Eng.:   0                                  
  WatchPts on Addr. Ranges:1                                  
  Features:                None
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: FINE GRAINED        
      Size:                    32753640(0x1f3c7e8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED
      Size:                    32753640(0x1f3c7e8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
    Pool 3                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    32753640(0x1f3c7e8) KB             
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:4KB                                
      Alloc Alignment:         4KB                                
      Accessible by all:       TRUE                               
  ISA Info:                
*******                  
Agent 2                  
*******                  
  Name:                    gfx1030                            
  Uuid:                    GPU-8f89fe13d8ed7bda               
  Marketing Name:          AMD Radeon RX 6800 XT              
  Vendor Name:             AMD                                
  Feature:                 KERNEL_DISPATCH                    
  Profile:                 BASE_PROFILE                       
  Float Round Mode:        NEAR                               
  Max Queue Number:        128(0x80)                          
  Queue Min Size:          64(0x40)                           
  Queue Max Size:          131072(0x20000)                    
  Queue Type:              MULTI                              
  Node:                    1                                  
  Device Type:             GPU                                
  Cache Info:              
    L1:                      16(0x10) KB                        
    L2:                      4096(0x1000) KB                    
    L3:                      131072(0x20000) KB                 
  Chip ID:                 29631(0x73bf)                      
  ASIC Revision:           1(0x1)                             
  Cacheline Size:          128(0x80)                          
  Max Clock Freq. (MHz):   2575                               
  BDFID:                   2048                               
  Internal Node ID:        1                                  
  Compute Unit:            72                                 
  SIMDs per CU:            2                                  
  Shader Engines:          4                                  
  Shader Arrs. per Eng.:   2                                  
  WatchPts on Addr. Ranges:4                                  
  Coherent Host Access:    FALSE                              
  Features:                KERNEL_DISPATCH 
  Fast F16 Operation:      TRUE                               
  Wavefront Size:          32(0x20)                           
  Workgroup Max Size:      1024(0x400)                        
  Workgroup Max Size per Dimension:
    x                        1024(0x400)                        
    y                        1024(0x400)                        
    z                        1024(0x400)                        
  Max Waves Per CU:        32(0x20)                           
  Max Work-item Per CU:    1024(0x400)                        
  Grid Max Size:           4294967295(0xffffffff)             
  Grid Max Size per Dimension:
    x                        4294967295(0xffffffff)             
    y                        4294967295(0xffffffff)             
    z                        4294967295(0xffffffff)             
  Max fbarriers/Workgrp:   32                                 
  Packet Processor uCode:: 118                                
  SDMA engine uCode::      83                                 
  IOMMU Support::          None                               
  Pool Info:               
    Pool 1                   
      Segment:                 GLOBAL; FLAGS: COARSE GRAINED      
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 2                   
      Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED
      Size:                    16760832(0xffc000) KB              
      Allocatable:             TRUE                               
      Alloc Granule:           4KB                                
      Alloc Recommended Granule:2048KB                             
      Alloc Alignment:         4KB                                
      Accessible by all:       FALSE                              
    Pool 3                   
      Segment:                 GROUP                              
      Size:                    64(0x40) KB                        
      Allocatable:             FALSE                              
      Alloc Granule:           0KB                                
      Alloc Recommended Granule:0KB                                
      Alloc Alignment:         0KB                                
      Accessible by all:       FALSE                              
  ISA Info:                
    ISA 1                    
      Name:                    amdgcn-amd-amdhsa--gfx1030         
      Machine Models:          HSA_MACHINE_MODEL_LARGE            
      Profiles:                HSA_PROFILE_BASE                   
      Default Rounding Mode:   NEAR                               
      Default Rounding Mode:   NEAR                               
      Fast f16:                TRUE                               
      Workgroup Max Size:      1024(0x400)                        
      Workgroup Max Size per Dimension:
        x                        1024(0x400)                        
        y                        1024(0x400)                        
        z                        1024(0x400)                        
      Grid Max Size:           4294967295(0xffffffff)             
      Grid Max Size per Dimension:
        x                        4294967295(0xffffffff)             
        y                        4294967295(0xffffffff)             
        z                        4294967295(0xffffffff)             
      FBarrier Max Size:       32                                 
*** Done ***

@Shished
Copy link

Shished commented Sep 12, 2024

Works for me with RX6700XT after installing the extension and setting HSA_OVERRIDE_GFX_VERSION="10.3.0"

@TheRsKing
Copy link

TheRsKing commented Sep 12, 2024

Works for me with RX6700XT after installing the extension and setting HSA_OVERRIDE_GFX_VERSION="10.3.0"

how do i set this override? (env variable?)

@Shished
Copy link

Shished commented Sep 12, 2024

In the alpaca settings, second tab.

@P-Jay357
Copy link

I'm not sure if this is related, but Llama3.1 (8b) works great, but when I try to run Mistral Nemo (12b) or Gemma2 (27b) Ollama just crashes:


time=2024-09-12T19:25:21.136+01:00 level=INFO source=server.go:391 msg="starting llama server" cmd="/home/gareth/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama57975444/runners/rocm_v60102/ollama_llama_server --model /home/gareth/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94 --ctx-size 2048 --batch-size 512 --embedding --log-disable --n-gpu-layers 37 --verbose --parallel 1 --port 43643"
time=2024-09-12T19:25:21.136+01:00 level=DEBUG source=server.go:408 msg=subprocess environment="[LD_LIBRARY_PATH=/app/plugins/AMD/lib/ollama:/home/gareth/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama57975444/runners/rocm_v60102:/app/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/x86_64-linux-gnu/openh264/extra:/usr/lib/sdk/llvm15/lib:/usr/lib/x86_64-linux-gnu/GL/default/lib:/usr/lib/ollama:/app/plugins/AMD/lib/ollama PATH=/app/bin:/usr/bin HIP_VISIBLE_DEVICES=0]"
time=2024-09-12T19:25:21.137+01:00 level=INFO source=sched.go:450 msg="loaded runners" count=1
time=2024-09-12T19:25:21.137+01:00 level=INFO source=server.go:591 msg="waiting for llama runner to start responding"
time=2024-09-12T19:25:21.137+01:00 level=INFO source=server.go:625 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="1e6f655" tid="140547961254656" timestamp=1726165521
INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 0 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140547961254656" timestamp=1726165521 total_threads=16
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="43643" tid="140547961254656" timestamp=1726165521
llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /home/gareth/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Mistral Nemo Instruct 2407
llama_model_loader: - kv   3:                            general.version str              = 2407
llama_model_loader: - kv   4:                           general.finetune str              = Instruct
llama_model_loader: - kv   5:                           general.basename str              = Mistral-Nemo
llama_model_loader: - kv   6:                         general.size_label str              = 12B
llama_model_loader: - kv   7:                            general.license str              = apache-2.0
llama_model_loader: - kv   8:                          general.languages arr[str,9]       = ["en", "fr", "de", "es", "it", "pt", ...
llama_model_loader: - kv   9:                          llama.block_count u32              = 40
llama_model_loader: - kv  10:                       llama.context_length u32              = 1024000
llama_model_loader: - kv  11:                     llama.embedding_length u32              = 5120
llama_model_loader: - kv  12:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv  13:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv  14:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv  15:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  16:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  17:                 llama.attention.key_length u32              = 128
llama_model_loader: - kv  18:               llama.attention.value_length u32              = 128
llama_model_loader: - kv  19:                          general.file_type u32              = 2
llama_model_loader: - kv  20:                           llama.vocab_size u32              = 131072
llama_model_loader: - kv  21:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv  22:            tokenizer.ggml.add_space_prefix bool             = false
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = tekken
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,131072]  = ["<unk>", "<s>", "</s>", "[INST]", "[...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,131072]  = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
ERROR	[window.py | connection_error] Connection error
INFO	[connection_handler.py | reset] Resetting Alpaca's Ollama instance
INFO	[connection_handler.py | stop] Stopping Alpaca's Ollama instance
INFO	[connection_handler.py | stop] Stopped Alpaca's Ollama instance
INFO	[connection_handler.py | start] Starting Alpaca's Ollama instance...
INFO	[connection_handler.py | start] Started Alpaca's Ollama instance
2024/09/12 19:25:22 routes.go:1125: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:true OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/gareth/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-09-12T19:25:22.415+01:00 level=INFO source=images.go:753 msg="total blobs: 15"
time=2024-09-12T19:25:22.415+01:00 level=INFO source=images.go:760 msg="total unused blobs removed: 0"
time=2024-09-12T19:25:22.415+01:00 level=INFO source=routes.go:1172 msg="Listening on 127.0.0.1:11435 (version 0.3.9)"
INFO	[connection_handler.py | start] client version is 0.3.9
INFO	[window.py | show_toast] There was an error with the local Ollama instance, so it has been reset

@P-Jay357
Copy link

I did manage to get Gema2 (27b) working once and it was really slow (as expected), but I can't get it or Nemo to work at all now, wile Llama3.1 (8b) still works fine. I'm not sure if it's VRAM-related, but I've also noticed using the resource monitor that the model stays in the VRM for quite some time before it gets flushed:

Screenshot from 2024-09-12 19-30-25

If I'm using Llama3.1 (8b) then the VRAM stays at around 6GB, even if I close down Alpaca.

@Jeffser
Copy link
Owner

Jeffser commented Sep 12, 2024

The models are kept alive for 5 minutes by default, you can change that in preferences

@P-Jay357
Copy link

If Ollama crashes though, the VRAM doesn't go back down unless I shut down or restart my PC.

@P-Jay357
Copy link

I just tested now, I ran Mistral Nemo, Ollama then crashed as I mentioned above. I then waited 10 minutes and VRAM had still not gone down. I then tried to close Alpaca, and after a few seconds I had the option to Force Quit as it wasn't responding. 10 minutes after that and the VRAM has still not gone down.

@ghost
Copy link

ghost commented Sep 12, 2024

@P-Jay357 That's because the llama server is still running after Alpaca crashes. You need to kill its two processes, then you can use Alpaca again. I experienced crashes as described in #298.

@AlgorithmArtist
Copy link

AlgorithmArtist commented Oct 2, 2024

Well, same Issue here, while the RX7900XTX is listed as supported, I just cannot get Alpaca/LLama to use my GPU :(
alpaca-debug.txt
I have no idea why it wouldn't consider the rocm library, which is even installed locally, and sometime the log just straight up tells me, that the gpu is not considered??
Maybe I am missing something obvious?
Using Alpaca 2.0.6

@Jeffser
Copy link
Owner

Jeffser commented Oct 3, 2024

Well, same Issue here, while the RX7900XTX is listed as supported, I just cannot get Alpaca/LLama to use my GPU :( alpaca-debug.txt I have no idea why it wouldn't consider the rocm library, which is even installed locally, and sometime the log just straight up tells me, that the gpu is not considered?? Maybe I am missing something obvious? Using Alpaca 2.0.6

It seems like you don't have the extension install (or it might be outdated)

You can check all the installed apps / extensions using flatpak list

@0chroma
Copy link

0chroma commented Oct 4, 2024

I also seem to still have issues with the extension: alpaca-debug.txt

most relevant line seems to be this one:

time=2024-10-04T01:39:47.736-07:00 level=ERROR source=amd_linux.go:364 msg="amdgpu devices detected but permission problems block access" error="permissions not set up properly.  Either run ollama as root, or add you user account to the render group. open /dev/kfd: permission denied"

I added myself to the render group and logged out/in again, but no dice. I run an immutable OS, so may need to reboot.

Edit: after rebooting I'm happy to report that it's working for me ^^

@olumolu
Copy link
Contributor

olumolu commented Oct 4, 2024

If your model is big enough and amount of vram required is more than your gpu vram ollama will use cpu instead of gpu.

@Jeffser
Copy link
Owner

Jeffser commented Oct 13, 2024

Hi, I have a small update on AMD support, I added this indicator to the preferences dialog

image

8c98be6


Also if you run a model that's too big for your VRAM (or RAM if you are using cpu) instead of just giving a generic crash notification it will say Model request too large for system

image

115e22e

Also, happy 100 comments to this issue 🎉

@AlgorithmArtist
Copy link

This has fixed all issues at once, you are amazing! ♥️

For everyone still having issues: Check if you have installed the Alpaca AMD Support flatpak extension for Alpaca.

@Jeffser
Copy link
Owner

Jeffser commented Oct 14, 2024

Final Update (I guess)

I added a link to the repo wiki, I'm working on writing it

image

@ndonkersloot
Copy link

I'm not sure if i'm missing something but it does still seem to use CPU in my case.
I'm running Fedora Silverblue 41 on a lenovo T14 with a AMD GPU.

The application and extension is installed and up-to-date.

Screenshot From 2024-11-04 21-17-01
Screenshot From 2024-11-04 21-16-10
Screenshot From 2024-11-04 21-16-06
Screenshot From 2024-11-04 21-15-07

@frandavid100
Copy link
Author

Yeah, the same thing happens to me. I have the application and extension installed but for some reason Alpaca is still using my CPU instead of my GPU.

@Jeffser
Copy link
Owner

Jeffser commented Nov 5, 2024

Not all GPUs are supported, please check this page

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests