Vulkan: Save and load state cause crashes (GGML_ASSERT)

# Expected Behavior

Calling `model.save_state()` or `model.load_state(existing_state)` should not cause crashes.

# Current Behavior

Using the Vulkan backend:
```python
from dotenv import load_dotenv
import os
from llama_cpp import Llama

load_dotenv()
llama_root = (os.getenv("LLAMA_MODEL_PATH"))

model = Llama(os.path.join(llama_root,"models","Mistral","dolphin-2.6-mistral-7b.Q4_K_M.gguf"),n_gpu_layers=-1, n_threads=6, n_threads_batch=12, n_ctx=4096, verbose=True)

clean_state = model.save_state()
 ```
 ``` 
 [ MODEAL LOADING LOGS ]
Llama.save_state: saving llama state
Llama.save_state: got state size: 602474532
Llama.save_state: allocated state
GGML_ASSERT: C:\Users\[redacted]\AppData\Local\Temp\pip-install-h5eywwf3\llama-cpp-python_186ee02d0df144c2858e6721192eb88b\vendor\llama.cpp\ggml-vulkan.cpp:1666: width > 0
 ```

Trying to load a state (that was previously saved with CLBlast backend) :
```python
from dotenv import load_dotenv
import os
from llama_cpp import Llama
import pickle

load_dotenv()
llama_root = (os.getenv("LLAMA_MODEL_PATH"))

model = Llama(os.path.join(llama_root,"models","Mistral","dolphin-2.6-mistral-7b.Q4_K_M.gguf"),n_gpu_layers=-1, n_threads=6, n_threads_batch=12, n_ctx=4096, verbose=True)

statePath = "./cache/save.pickle"
with open(statePath, "rb") as stateCache:
    state = pickle.load(stateCache)
    print("loading state...")
    model.load_state(state)
```

 ``` 
 [ MODEAL LOADING LOGS ]
loading state...
GGML_ASSERT: C:\Users\[redacted]\AppData\Local\Temp\pip-install-h5eywwf3\llama-cpp-python_186ee02d0df144c2858e6721192eb88b\vendor\llama.cpp\llama.cpp:10993: kv_self.total_size() == kv_buf_si
```
(not sure if it's a bug or if states from different backends are just incompatible)
 
# Environment and Context

`$ vulkaninfo --summary`
<details>
<summary>Rsults: (TLDR: RX 5700 XT)</summary>

```
WARNING: [Loader Message] Code 0 : Layer VK_LAYER_RTSS uses API version 1.1 which is older than the application specified API version of 1.3. May cause issues.
==========
VULKANINFO
==========

Vulkan Instance Version: 1.3.261


Instance Extensions: count = 13
-------------------------------
VK_EXT_debug_report                    : extension revision 10
VK_EXT_debug_utils                     : extension revision 2
VK_EXT_swapchain_colorspace            : extension revision 4
VK_KHR_device_group_creation           : extension revision 1
VK_KHR_external_fence_capabilities     : extension revision 1
VK_KHR_external_memory_capabilities    : extension revision 1
VK_KHR_external_semaphore_capabilities : extension revision 1
VK_KHR_get_physical_device_properties2 : extension revision 2
VK_KHR_get_surface_capabilities2       : extension revision 1
VK_KHR_portability_enumeration         : extension revision 1
VK_KHR_surface                         : extension revision 25
VK_KHR_win32_surface                   : extension revision 6
VK_LUNARG_direct_driver_loading        : extension revision 1

Instance Layers: count = 17
---------------------------
VK_LAYER_AMD_switchable_graphics    AMD switchable graphics layer                 1.3.270  version 1
VK_LAYER_EOS_Overlay                Vulkan overlay layer for Epic Online Services 1.2.136  version 1
VK_LAYER_EOS_Overlay                Vulkan overlay layer for Epic Online Services 1.2.136  version 1
VK_LAYER_KHRONOS_profiles           Khronos Profiles layer                        1.3.275  version 1
VK_LAYER_KHRONOS_shader_object      Khronos Shader object layer                   1.3.275  version 1
VK_LAYER_KHRONOS_synchronization2   Khronos Synchronization2 layer                1.3.275  version 1
VK_LAYER_KHRONOS_validation         Khronos Validation Layer                      1.3.275  version 1
VK_LAYER_LUNARG_api_dump            LunarG API dump layer                         1.3.275  version 2
VK_LAYER_LUNARG_gfxreconstruct      GFXReconstruct Capture Layer Version 1.0.2    1.3.275  version 4194306
VK_LAYER_LUNARG_monitor             Execution Monitoring Layer                    1.3.275  version 1
VK_LAYER_LUNARG_screenshot          LunarG image capture layer                    1.3.275  version 1
VK_LAYER_OBS_HOOK                   Open Broadcaster Software hook                1.3.216  version 1
VK_LAYER_RENDERDOC_Capture          Debugging capture layer for RenderDoc         1.2.131  version 17
VK_LAYER_ROCKSTAR_GAMES_social_club Rockstar Games Social Club Layer              1.0.70   version 1
VK_LAYER_RTSS                       RTSS overlay hook bootstrap                   1.1.73   version 1
VK_LAYER_VALVE_steam_fossilize      Steam Pipeline Caching Layer                  1.3.207  version 1
VK_LAYER_VALVE_steam_overlay        Steam Overlay Layer                           1.3.207  version 1

Devices:
========
GPU0:
        apiVersion         = 1.3.270
        driverVersion      = 2.0.294
        vendorID           = 0x1002
        deviceID           = 0x731f
        deviceType         = PHYSICAL_DEVICE_TYPE_DISCRETE_GPU
        deviceName         = AMD Radeon RX 5700 XT
        driverID           = DRIVER_ID_AMD_PROPRIETARY
        driverName         = AMD proprietary driver
        driverInfo         = 24.1.1 (AMD proprietary shader compiler)
        conformanceVersion = 1.3.3.1
        deviceUUID         = 00000000-2800-0000-0000-000000000000
        driverUUID         = 414d442d-5749-4e2d-4452-560000000000
```

</details>

`$ bash -c "lscpu"`
<details>
<summary>Rsults: (TLDR: Ryzen 9 5900X)</summary>

```
Architecture:                       x86_64
CPU op-mode(s):                     32-bit, 64-bit
Byte Order:                         Little Endian
Address sizes:                      48 bits physical, 48 bits virtual
CPU(s):                             24
On-line CPU(s) list:                0-23
Thread(s) per core:                 2
Core(s) per socket:                 12
Socket(s):                          1
Vendor ID:                          AuthenticAMD
CPU family:                         25
Model:                              33
Model name:                         AMD Ryzen 9 5900X 12-Core Processor
Stepping:                           2
CPU MHz:                            3699.990
BogoMIPS:                           7399.98
Hypervisor vendor:                  Microsoft
Virtualization type:                full
L1d cache:                          384 KiB
L1i cache:                          384 KiB
L2 cache:                           6 MiB
L3 cache:                           32 MiB
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit:        Not affected
Vulnerability L1tf:                 Not affected
Vulnerability Mds:                  Not affected
Vulnerability Meltdown:             Not affected
Vulnerability Mmio stale data:      Not affected
Vulnerability Retbleed:             Not affected
Vulnerability Spec rstack overflow: Mitigation; safe RET, no microcode
Vulnerability Spec store bypass:    Vulnerable
Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected
Vulnerability Srbds:                Not affected
Vulnerability Tsx async abort:      Not affected
Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 erms rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero xsaveerptr arat umip vaes vpclmulqdq rdpid fsrm
```

</details>

Operating System: Windows 10 22H2

```
$ python3 --version
Python 3.12.0
$ cmake --version
cmake version 3.26.0

CMake suite maintained and supported by Kitware (kitware.com/cmake).
```
Using MSVC v143

# Failure Information (for bugs)

Happens when trying to save or load states from python. When saving, the low-level API call that fails is `llama_copy_state_data()`.

Saving and loading states seems to work on upstream llama.cpp with Vulkan enabled:
` .\buildVulkan\bin\Release\save-load-state.exe -m .\models\Mistral\dolphin-2.6-mistral-7b.Q4_K_M.gguf  -ngl 33 -t 6 -tb 12 --temp 0`
<details>
<summary>Output: (TLDR: success)</summary>

```
main: build = 2038 (97008e61)
main: built with MSVC 19.37.32825.0 for x64
ggml_vulkan: Using AMD Radeon RX 5700 XT | uma: 0 | fp16: 1 | warp size: 64
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from .\models\Mistral\dolphin-2.6-mistral-7b.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = cognitivecomputations_dolphin-2.6-mis...
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 15
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32001]   = ["<unk>", "<s>", "<|im_end|>", "<0x00...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32001]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32001]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,58980]   = ["Ôûü t", "i n", "e r", "Ôûü a", "h e...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
llm_load_vocab: special tokens definition check successful ( 260/32001 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32001
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 4.07 GiB (4.83 BPW)
llm_load_print_meta: general.name     = cognitivecomputations_dolphin-2.6-mistral-7b
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '<|im_end|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '<|im_end|>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =    70.31 MiB
llm_load_tensors:     Vulkan buffer size =  4095.06 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:     Vulkan KV buffer size =    64.00 MiB
llama_new_context_with_model: KV self size  =   64.00 MiB, K (f16):   32.00 MiB, V (f16):   32.00 MiB
llama_new_context_with_model: Vulkan_Host input buffer size   =     9.01 MiB
llama_new_context_with_model:     Vulkan compute buffer size =    80.30 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =     8.80 MiB
llama_new_context_with_model: graph splits (measure): 3
main : serialized state into 927335 out of a maximum of 132712484 bytes

first run: The quick brown fox jumps over the lazy dog. Nowthis is more like it. If you

llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:     Vulkan KV buffer size =    64.00 MiB
llama_new_context_with_model: KV self size  =   64.00 MiB, K (f16):   32.00 MiB, V (f16):   32.00 MiB
llama_new_context_with_model: Vulkan_Host input buffer size   =     9.01 MiB
llama_new_context_with_model:     Vulkan compute buffer size =    80.30 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =     8.80 MiB
llama_new_context_with_model: graph splits (measure): 3

second run: The quick brown foxmain : deserialized state from 927335 out of a maximum of 132712484 bytes
 jumps over the lazy dog. Nowthis is more like it. If you

main : success
```

</details>

# Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

1. step 1: install/reinstall llama-cpp-python with Vulkan enabled: `$env:CMAKE_ARGS="-DLLAMA_VULKAN=ON -A x64"; pip install llama-cpp-python --upgrade --force-reinstall --no-cache-dir`
2. step 2: load a model with some layers offloaded to GPU and do whatever you want with it
3. step 3: try to save the current state using the high level API (`model.save_state()`)


# Failure Logs

<details>
<summary>Full logs that I redacted before for readability:</summary>

```
ggml_vulkan: Using AMD Radeon RX 5700 XT | uma: 0 | fp16: 1 | warp size: 64
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from C:\LLM\models\Mistral\dolphin-2.6-mistral-7b.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = cognitivecomputations_dolphin-2.6-mis...
llama_model_loader: - kv   2:                       llama.context_length u32              = 32768
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 14336
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 8
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 10000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 15
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32001]   = ["<unk>", "<s>", "<|im_end|>", "<0x00...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32001]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32001]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:                      tokenizer.ggml.merges arr[str,58980]   = ["Ôûü t", "i n", "e r", "Ôûü a", "h e...
llama_model_loader: - kv  17:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  18:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  20:            tokenizer.ggml.padding_token_id u32              = 2
llama_model_loader: - kv  21:               tokenizer.ggml.add_bos_token bool             = true
llama_model_loader: - kv  22:               tokenizer.ggml.add_eos_token bool             = false
llama_model_loader: - kv  23:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_K:  193 tensors
llama_model_loader: - type q6_K:   33 tensors
llm_load_vocab: special tokens definition check successful ( 260/32001 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32001
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 32768
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 8
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 4
llm_load_print_meta: n_embd_k_gqa     = 1024
llm_load_print_meta: n_embd_v_gqa     = 1024
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 14336
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 32768
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 7.24 B
llm_load_print_meta: model size       = 4.07 GiB (4.83 BPW)
llm_load_print_meta: general.name     = cognitivecomputations_dolphin-2.6-mistral-7b
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '<|im_end|>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: PAD token        = 2 '<|im_end|>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.22 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:        CPU buffer size =    70.31 MiB
llm_load_tensors:     Vulkan buffer size =  4095.06 MiB
.................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:     Vulkan KV buffer size =   512.00 MiB
llama_new_context_with_model: KV self size  =  512.00 MiB, K (f16):  256.00 MiB, V (f16):  256.00 MiB
llama_new_context_with_model: Vulkan_Host input buffer size   =    16.02 MiB
llama_new_context_with_model:     Vulkan compute buffer size =   316.80 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =     8.80 MiB
llama_new_context_with_model: graph splits (measure): 3
AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 0 | VSX = 0 |
Model metadata: {'general.name': 'cognitivecomputations_dolphin-2.6-mistral-7b', 'general.architecture': 'llama', 'llama.context_length': '32768', 'llama.rope.dimension_count': '128', 'llama.embedding_length': '4096', 'llama.block_count': '32', 'llama.feed_forward_length': '14336', 'llama.attention.head_count': '32', 'tokenizer.ggml.eos_token_id': '2', 'general.file_type': '15', 'llama.attention.head_count_kv': '8', 'llama.attention.layer_norm_rms_epsilon': '0.000010', 'llama.rope.freq_base': '10000.000000', 'tokenizer.ggml.model': 'llama', 'general.quantization_version': '2', 'tokenizer.ggml.bos_token_id': '1', 'tokenizer.ggml.unknown_token_id': '0', 'tokenizer.ggml.padding_token_id': '2', 'tokenizer.ggml.add_bos_token': 'true', 'tokenizer.ggml.add_eos_token': 'false'}
Llama.save_state: saving llama state
Llama.save_state: got state size: 602474532
Llama.save_state: allocated state
GGML_ASSERT: C:\Users\[redacted]\AppData\Local\Temp\pip-install-h5eywwf3\llama-cpp-python_186ee02d0df144c2858e6721192eb88b\vendor\llama.cpp\ggml-vulkan.cpp:1666: width > 0
```

</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vulkan: Save and load state cause crashes (GGML_ASSERT) #1149

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Vulkan: Save and load state cause crashes (GGML_ASSERT) #1149

Description

Expected Behavior

Current Behavior

Environment and Context

Failure Information (for bugs)

Steps to Reproduce

Failure Logs

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions