-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Llamacpp backend #81
Llamacpp backend #81
Conversation
…ckends; Add context limit setting and checking for generation in llamacpp_api.py
…bench for now, added comments on this
…e BOS/EOS string for template from model metadata
…U layer offload; Code improvements
backends/llamacpp_api.py
Outdated
pass | ||
else: | ||
if self.model.chat_format == "chatml": | ||
# get BOS/EOS strings for chatml from llama.cpp: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment seems unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it is exactly what is written in code.
backends/llamacpp_api.py
Outdated
self.context_size = 512 | ||
|
||
# get various model settings from metadata: | ||
for key, value in self.model.metadata.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why loop if it's a dict. better self.context_length = metadata.get("context_length", 512)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change it to that. The way I wrote is due to the metadata not being thoroughly standardized and me checking with prints during development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not work, as the key is not just context_length
, but has a prefix for the model architecture. For example openchat_3.5-GGUF has the key llama.context_length
. So it either also needs to get the model architecture from the metadata or remain in the loop with the if "context_length in key
.
I need to test a lot more models to make sure that the second solution would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I feared that. Then stick to the loop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll rely on what is specified (https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#required), which gets rid of the loop. Specific models need testing and setup if they are missing something (ie chat template) anyways, so if a model file does not follow the basic specification, I assume it'll be noticed then.
backends/llamacpp_api.py
Outdated
self.eos_string = model_spec.eos_string | ||
|
||
# init llama-cpp-python jinja chat formatter: | ||
self.chat_formatter = llama_cpp.llama_chat_format.Jinja2ChatFormatter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to extract a load_formatter
method wich goes below load_model
and a test_case for loading a formatter. And self.bos_string
also seems not to be accessed anymore after this line, so why in self
?
backends/llamacpp_api.py
Outdated
logger.info(f"Context token limit for {self.model_spec.model_name} exceeded: " | ||
f"{context_check[1]}/{context_check[3]}") | ||
# fail gracefully: | ||
raise backends.ContextExceededError(f"Context token limit for {self.model_spec.model_name} exceeded", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to raise this already in check_context_limit
(see comment below). the brackets are also hard to read here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's raised here for easy access to the model name, so that it gets logged properly. If it's raised in the checking function, the model name would need to be passed to it as well. I can make that change, but it would mean to change this in HF backend as well.
I don't know what you mean with hard to read brackets here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Return of multiple variables can be done, but is hard to read context_check[3]
. Then better return a named tuple: check_results.max_tokens
or something. That this line needs a comment is a bad sign: if not context_check[0]: # if context is exceeded, context_check[0] is False
.
If you need access to the model name (for logging), maybe move this method to Model.
backends/utils.py
Outdated
tokens_used = prompt_size + max_new_tokens # context includes tokens to be generated | ||
tokens_left = context_size - tokens_used | ||
fits = tokens_used <= context_size | ||
return fits, tokens_used, tokens_left, context_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
context_size
is an input parameter and does not need to be returned. And instead of returned two extract arguments the token_used and tokens_left should be in the error message. Maybe rename method into assert_context_limit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Context size is returned here so that it can be used when the exception is caught in custom Player/GameMaster classes. The idea is to leave it to the game developers how to handle their messages if the context would be exceeded, and providing all details about the context this way allows for flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nah, here we talk about the check utility. And it does not throw an exception (yet)? So if game devs are supposed to catch the exception (and e.g. try re-prompting) then these details should become part of the exception as class attributes (not the return params here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments in code
I've found an issue with custom chat template application (after excising the chat formatter part), I'll rework the generation method to work as intended. We can close the PR until that is done. |
No need to close this (from my side). You can continue using this branch. |
Tested and working with a full benchmark run using openchat_3.5-GGUF-Q5. Should be ready to merge from my view. |
* Add context limit check function to backends/util.py * Add model entries to registry * Add handling of optional model loading flags for CPU/GPU usage and GPU layer offload * Add openchat_3.5-GGUF-q5 to model registry * Add llama.cpp backend howto (cherry picked from commit 94493c2)
commit eccb468 Merge: daf8195 00f4eaf Author: kushal-10 <83510614+kushal-10@users.noreply.github.com> Date: Mon May 6 15:46:10 2024 +0200 Merge branch 'clp-research:main' into mergefork commit daf8195 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon May 6 15:45:26 2024 +0200 minor changes commit 630e184 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon May 6 15:21:38 2024 +0200 check wihtout use_fast commit 9484030 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon May 6 15:11:56 2024 +0200 add llama-llava-3-8b commit 00f4eaf Author: Sherzod Hakimov <sherzodhakimov@gmail.com> Date: Mon May 6 12:05:58 2024 +0200 added Llama-3-70B model_id for Together.ai commit b5364f5 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 22:00:11 2024 +0200 rm processor log commit 8393903 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 21:56:39 2024 +0200 rm idefics log commit 1bc7c36 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 21:55:49 2024 +0200 check hit value commit 3e3e2e0 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 21:47:26 2024 +0200 debug commit 3f4cb98 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 21:39:39 2024 +0200 test idefics9b commit 8200938 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 21:33:28 2024 +0200 update logger info commit 8dc5b29 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 20:18:30 2024 +0200 add idefics9b in backend commit 4247026 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 20:14:14 2024 +0200 add idefics9binstruct, rm load_tokenizer commit da9f38d Merge: 19c844e afccf94 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 19:59:30 2024 +0200 Merge branch 'mergefork' of https://github.com/kushal-10/clembench into mergefork commit 19c844e Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 19:56:26 2024 +0200 Squashed commit of the following: commit 0bb1755 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 19:25:11 2024 +0200 rm instances 1-8 matchit commit 2419425 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun May 5 09:15:50 2024 +0200 update multimodal games commit 34a201f Author: Gnurro <knawurzelkopp@hotmail.de> Date: Thu May 2 16:43:15 2024 +0200 Remove Emu2-related code commit 4122d37 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Thu May 2 14:43:55 2024 +0200 More Emu2 experiments commit 7507a5a Author: Gnurro <knawurzelkopp@hotmail.de> Date: Wed May 1 16:01:46 2024 +0200 Use Emu2 tokenizer directly to get prompt_tokens; Load Emu2 tokenizer with trust_remote_code=True to prevent CLI prompt for executing it commit c7fa93f Author: Gnurro <knawurzelkopp@hotmail.de> Date: Wed May 1 15:48:49 2024 +0200 Use Emu2 tokenizer directly to get prompt_tokens commit ca7f84f Author: Gnurro <knawurzelkopp@hotmail.de> Date: Wed May 1 15:40:55 2024 +0200 Remove Emu2 entry in model_registry.json, will use custom model registry to store machine-dependent file path; Change Emu2 loading and generation based on working example code commit 7b34cb5 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Wed May 1 14:48:49 2024 +0200 Add more specific file path to model loading in Emu2 example script commit ffedbfb Author: Gnurro <knawurzelkopp@hotmail.de> Date: Wed May 1 14:45:23 2024 +0200 Add specific file path to model loading in Emu2 example script commit 8cf0b12 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Wed May 1 14:40:36 2024 +0200 Add example two-GPU code to Emu2 example script commit fe9ef50 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Wed May 1 14:28:00 2024 +0200 Move Emu2 example script, use cloudgame images in it commit db4fd44 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Wed May 1 11:30:07 2024 +0200 Merged changes from add_feat/context_check commit 2b87594 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Wed May 1 05:49:26 2024 +0200 update store_score method, throws error in bencheval commit bcb174b Author: kushal-10 <kushalkoshti11@gmail.com> Date: Wed May 1 05:22:27 2024 +0200 update max input tokens to 2048 commit 38bf017 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Wed May 1 01:27:34 2024 +0000 update mm_mapworld commit 3a27970 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 19:49:55 2024 +0200 Add Emu2 HF example code script commit 3b04ad5 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 19:21:45 2024 +0200 Change Emu2 template to contain image placeholder token commit bf6a3b9 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 18:45:05 2024 +0200 Change Emu2 template to contain 255 image tokens (this is not random) commit b7db926 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 18:42:30 2024 +0200 Change Emu2 template to contain 256 image tokens (again) commit 20ce386 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 18:40:09 2024 +0200 Change Emu2 template to contain 128 image tokens commit 0692401 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 18:33:37 2024 +0200 Change Emu2 template to contain 256 image tokens commit 31f02f9 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 18:25:21 2024 +0200 Change Emu2 template to contain 64 image tokens commit 92c6a00 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 18:15:48 2024 +0200 Change padding setting back to 'false' in registry for Emu2 commit 5d27a75 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 18:08:13 2024 +0200 Change padding setting in registry for Emu2 commit de7a8e4 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 18:00:37 2024 +0200 Add low_cpu_mem_usage=True flag to Emu2 weight loading commit d191b80 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 17:10:24 2024 +0200 Change Emu2 weight loading to use proper torch.bfloat16 commit ff86080 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Tue Apr 30 17:07:52 2024 +0200 Add specific Emu2 loading: bfloat16 weights commit ddf47d5 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 08:23:58 2024 +0000 Revert "test commit" This reverts commit 7e2c507. commit 7e2c507 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 08:22:25 2024 +0000 test commit commit 95ee5fb Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 09:09:50 2024 +0200 comment print statements commit aef7e81 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 09:08:24 2024 +0200 set max_tokens to 1024 for idefics commit f26846b Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 07:05:12 2024 +0000 Revert "update mm_mapworld" This reverts commit 50aed8d. commit 52dd5e4 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 06:39:39 2024 +0000 Revert "set max_tokens value" This reverts commit 43dfac1. commit 50aed8d Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 06:31:39 2024 +0000 update mm_mapworld commit 43dfac1 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 08:00:39 2024 +0200 set max_tokens value commit 6180bb9 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 07:57:15 2024 +0200 rm max_length commit 0defca1 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Apr 30 07:12:01 2024 +0200 comment store_scores commit 5241261 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Mon Apr 29 16:50:07 2024 +0200 Proper Emu2 assistant tag as eos_to_cull in model registry commit 7a27b28 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Mon Apr 29 16:31:52 2024 +0200 Docstrings/comments commit 5ec7253 Author: Gnurro <knawurzelkopp@hotmail.de> Date: Mon Apr 29 16:25:16 2024 +0200 Change Emu2 registry chat template to assumed format; Use Emu2 template for proper input prompt text formatting commit 8bd896e Author: Gnurro <knawurzelkopp@hotmail.de> Date: Mon Apr 29 15:49:37 2024 +0200 Add prototype Emu2 support commit c571df1 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 13:03:08 2024 +0000 add matchit commit cf72ff5 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 12:38:39 2024 +0200 handle no image in history case commit 5572119 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 12:08:21 2024 +0200 if else image inputs commit 7b8bdaf Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 12:05:46 2024 +0200 no image case commit f659d61 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 11:55:44 2024 +0200 print response commit bbd06f8 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 09:36:27 2024 +0000 test commit 94acf6f Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 07:53:33 2024 +0000 add mm_mapworld, update requirements_hf commit 325ec0a Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 05:51:03 2024 +0000 update max_tokens in idefics, minor changes commit 71257c1 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 05:12:33 2024 +0000 Update idefics user assistant tags commit 9ddeb69 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Apr 29 05:02:21 2024 +0000 update idefics response commit f02afbf Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun Apr 28 20:05:06 2024 +0000 remove setting pad_token_id warning commit f52fa63 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun Apr 28 15:16:42 2024 +0000 update llava mistral, vicuna templates commit c93b2c1 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun Apr 28 13:41:00 2024 +0000 update llava1.6 34b template, rm image_token, clean cloudgame resp commit 9e1c15e Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun Apr 28 11:32:20 2024 +0000 rm vip-llava commit 6018cfc Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun Apr 28 10:50:09 2024 +0000 update get_images, update llava1.5 template, update image_token tag commit 1567b1a Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun Apr 28 10:30:48 2024 +0000 update llava1.5 template commit d9002ea Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sun Apr 28 09:03:10 2024 +0000 add model_type in model registry commit ea68246 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Sat Apr 27 08:06:31 2024 +0000 add Idefics80binstruct, update cloudgame commit b19530a Author: kushal-10 <83510614+kushal-10@users.noreply.github.com> Date: Fri Apr 26 14:15:22 2024 +0200 Update requirements_hf.txt commit 50207c9 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Fri Apr 26 08:18:56 2024 +0000 rm clean_messages, logging commit 510980f Author: kushal-10 <kushalkoshti11@gmail.com> Date: Thu Apr 25 06:53:21 2024 +0200 add llava1.6, update cloudgame commit ac8b268 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Thu Feb 29 11:28:16 2024 +0000 add image,assistant keys in model registry, minor changes in backend commit 8f76d77 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Wed Feb 28 18:46:06 2024 +0000 update scoring in cloudgame commit 3b2a7fb Author: kushal-10 <kushalkoshti11@gmail.com> Date: Wed Feb 28 13:41:48 2024 +0000 add clean_messages, add docstrings commit e23923f Author: kushal-10 <kushalkoshti11@gmail.com> Date: Tue Feb 27 17:54:22 2024 +0000 add programmatic judge (type Model), test messages commit 00873e1 Author: kushal-10 <kushalkoshti11@gmail.com> Date: Mon Feb 26 17:46:35 2024 +0000 upgrade cloudgame speaker to v1-beta, multimodal backend refactor to v0.1 commit afccf94 Author: Philipp Sadler <philipp.sadler@gmail.com> Date: Fri Apr 26 16:26:24 2024 +0200 remove double file_handler for backend logger's commit 94493c2 Author: Jonathan Jordan <knawurzelkopp@hotmail.de> Date: Thu Apr 25 09:51:28 2024 +0200 Add Llamacpp backend (clp-research#81) * Add context limit check function to backends/util.py * Add model entries to registry * Add handling of optional model loading flags for CPU/GPU usage and GPU layer offload * Add openchat_3.5-GGUF-q5 to model registry * Add llama.cpp backend howto
llama-cpp-python
-based backend to allow running GGUF/GGML models.