This is the error when I run the code for moondream2 model in colab :
llama_model_loader: loaded meta data with 19 key-value pairs and 245 tensors from /root/.cache/huggingface/hub/models--vikhyatk--moondream2/snapshots/92d3d73b6fd61ab84d9fe093a9c7fd8c04bf2c0d/./moondream2-text-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = phi2
llama_model_loader: - kv 1: general.name str = moondream2
llama_model_loader: - kv 2: phi2.context_length u32 = 2048
llama_model_loader: - kv 3: phi2.embedding_length u32 = 2048
llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 8192
llama_model_loader: - kv 5: phi2.block_count u32 = 24
llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32
llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32
llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010
llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32
llama_model_loader: - kv 10: general.file_type u32 = 1
llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256
llama_model_loader: - type f32: 147 tensors
llama_model_loader: - type f16: 98 tensors
llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:
llm_load_vocab: ************************************
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ************************************
llm_load_vocab:
llama_model_load: error loading model: error loading model vocabulary: Failed to process regex
llama_load_model_from_file: failed to load model
ValueError Traceback (most recent call last)
in <cell line: 9>()
7 )
8
----> 9 llm = Llama.from_pretrained(
10 repo_id="vikhyatk/moondream2",
11 filename="text-model",
2 frames
/usr/local/lib/python3.10/dist-packages/llama_cpp/_internals.py in init(self, path_model, params, verbose)
53
54 if self.model is None:
---> 55 raise ValueError(f"Failed to load model from file: {path_model}")
56
57 def free_model():
ValueError: Failed to load model from file: /root/.cache/huggingface/hub/models--vikhyatk--moondream2/snapshots/92d3d73b6fd61ab84d9fe093a9c7fd8c04bf2c0d/./moondream2-text-model-f16.gguf
The code I run is provided in the readme in this repo :
from llama_cpp import Llama
from llama_cpp.llama_chat_format import MoondreamChatHandler
chat_handler = MoondreamChatHandler.from_pretrained(
repo_id="vikhyatk/moondream2",
filename="mmproj",
)
llm = Llama.from_pretrained(
repo_id="vikhyatk/moondream2",
filename="text-model",
chat_handler=chat_handler,
n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
)
response = llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{"type" : "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } }
]
}
]
)
print(response["choices"][0]["text"])
please fix the code in readme.
This is the error when I run the code for moondream2 model in colab :
llama_model_loader: loaded meta data with 19 key-value pairs and 245 tensors from /root/.cache/huggingface/hub/models--vikhyatk--moondream2/snapshots/92d3d73b6fd61ab84d9fe093a9c7fd8c04bf2c0d/./moondream2-text-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = phi2
llama_model_loader: - kv 1: general.name str = moondream2
llama_model_loader: - kv 2: phi2.context_length u32 = 2048
llama_model_loader: - kv 3: phi2.embedding_length u32 = 2048
llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 8192
llama_model_loader: - kv 5: phi2.block_count u32 = 24
llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32
llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32
llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010
llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32
llama_model_loader: - kv 10: general.file_type u32 = 1
llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false
llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", """, "#", "$", "%", "&", "'", ...
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50256
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256
llama_model_loader: - type f32: 147 tensors
llama_model_loader: - type f16: 98 tensors
llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:
llm_load_vocab: ************************************
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!
llm_load_vocab: CONSIDER REGENERATING THE MODEL
llm_load_vocab: ************************************
llm_load_vocab:
llama_model_load: error loading model: error loading model vocabulary: Failed to process regex
llama_load_model_from_file: failed to load model
ValueError Traceback (most recent call last)
in <cell line: 9>()
7 )
8
----> 9 llm = Llama.from_pretrained(
10 repo_id="vikhyatk/moondream2",
11 filename="text-model",
2 frames
/usr/local/lib/python3.10/dist-packages/llama_cpp/_internals.py in init(self, path_model, params, verbose)
53
54 if self.model is None:
---> 55 raise ValueError(f"Failed to load model from file: {path_model}")
56
57 def free_model():
ValueError: Failed to load model from file: /root/.cache/huggingface/hub/models--vikhyatk--moondream2/snapshots/92d3d73b6fd61ab84d9fe093a9c7fd8c04bf2c0d/./moondream2-text-model-f16.gguf
The code I run is provided in the readme in this repo :
from llama_cpp import Llama
from llama_cpp.llama_chat_format import MoondreamChatHandler
chat_handler = MoondreamChatHandler.from_pretrained(
repo_id="vikhyatk/moondream2",
filename="mmproj",
)
llm = Llama.from_pretrained(
repo_id="vikhyatk/moondream2",
filename="text-model",
chat_handler=chat_handler,
n_ctx=2048, # n_ctx should be increased to accommodate the image embedding
)
response = llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{"type" : "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" } }
]
}
]
)
print(response["choices"][0]["text"])
please fix the code in readme.