Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llamacpp backend #81

Merged
merged 32 commits into from
Apr 25, 2024
Merged

Llamacpp backend #81

merged 32 commits into from
Apr 25, 2024

Conversation

Gnurro
Copy link
Collaborator

@Gnurro Gnurro commented Apr 18, 2024

llama-cpp-python-based backend to allow running GGUF/GGML models.

…ckends; Add context limit setting and checking for generation in llamacpp_api.py
…e BOS/EOS string for template from model metadata
pass
else:
if self.model.chat_format == "chatml":
# get BOS/EOS strings for chatml from llama.cpp:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the comment seems unnecessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it is exactly what is written in code.

self.context_size = 512

# get various model settings from metadata:
for key, value in self.model.metadata.items():
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why loop if it's a dict. better self.context_length = metadata.get("context_length", 512)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can change it to that. The way I wrote is due to the metadata not being thoroughly standardized and me checking with prints during development.

Copy link
Collaborator Author

@Gnurro Gnurro Apr 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not work, as the key is not just context_length, but has a prefix for the model architecture. For example openchat_3.5-GGUF has the key llama.context_length. So it either also needs to get the model architecture from the metadata or remain in the loop with the if "context_length in key.
I need to test a lot more models to make sure that the second solution would work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I feared that. Then stick to the loop.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rely on what is specified (https://github.com/ggerganov/ggml/blob/master/docs/gguf.md#required), which gets rid of the loop. Specific models need testing and setup if they are missing something (ie chat template) anyways, so if a model file does not follow the basic specification, I assume it'll be noticed then.

self.eos_string = model_spec.eos_string

# init llama-cpp-python jinja chat formatter:
self.chat_formatter = llama_cpp.llama_chat_format.Jinja2ChatFormatter(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to extract a load_formatter method wich goes below load_model and a test_case for loading a formatter. And self.bos_string also seems not to be accessed anymore after this line, so why in self?

logger.info(f"Context token limit for {self.model_spec.model_name} exceeded: "
f"{context_check[1]}/{context_check[3]}")
# fail gracefully:
raise backends.ContextExceededError(f"Context token limit for {self.model_spec.model_name} exceeded",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to raise this already in check_context_limit (see comment below). the brackets are also hard to read here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's raised here for easy access to the model name, so that it gets logged properly. If it's raised in the checking function, the model name would need to be passed to it as well. I can make that change, but it would mean to change this in HF backend as well.

I don't know what you mean with hard to read brackets here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Return of multiple variables can be done, but is hard to read context_check[3]. Then better return a named tuple: check_results.max_tokens or something. That this line needs a comment is a bad sign: if not context_check[0]: # if context is exceeded, context_check[0] is False.

If you need access to the model name (for logging), maybe move this method to Model.

tokens_used = prompt_size + max_new_tokens # context includes tokens to be generated
tokens_left = context_size - tokens_used
fits = tokens_used <= context_size
return fits, tokens_used, tokens_left, context_size
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

context_size is an input parameter and does not need to be returned. And instead of returned two extract arguments the token_used and tokens_left should be in the error message. Maybe rename method into assert_context_limit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context size is returned here so that it can be used when the exception is caught in custom Player/GameMaster classes. The idea is to leave it to the game developers how to handle their messages if the context would be exceeded, and providing all details about the context this way allows for flexibility.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, here we talk about the check utility. And it does not throw an exception (yet)? So if game devs are supposed to catch the exception (and e.g. try re-prompting) then these details should become part of the exception as class attributes (not the return params here).

Copy link
Collaborator

@phisad phisad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments in code

@Gnurro
Copy link
Collaborator Author

Gnurro commented Apr 19, 2024

I've found an issue with custom chat template application (after excising the chat formatter part), I'll rework the generation method to work as intended. We can close the PR until that is done.
FYI: llama-cpp-python defaults to the llama2 template silently, and it took me hours to find that part in its source code. It's thoroughly hardcoded, so create_chat_completion can't be used for any model that does not use the llama2, mistral or chatml templates. Formatting using the already implemented chat formatter and then passing the prompt to create_completion will solve this. Models that come with a chat template supposedly should use it, but the actual prompt used in create_chat_completion can't be checked without hacking major parts of llama-cpp-python, so I'll fall back to the basic create_completion for these as well, to be sure that formats are applied properly.
Open issue on the llama-cpp-python repository regarding this issue: abetlen/llama-cpp-python#717

@phisad
Copy link
Collaborator

phisad commented Apr 20, 2024

We can close the PR until that is done

No need to close this (from my side). You can continue using this branch.

@Gnurro
Copy link
Collaborator Author

Gnurro commented Apr 24, 2024

Tested and working with a full benchmark run using openchat_3.5-GGUF-Q5. Should be ready to merge from my view.

@phisad phisad self-requested a review April 25, 2024 07:46
@phisad phisad merged commit 94493c2 into clp-research:main Apr 25, 2024
Gnurro added a commit to Gnurro/clembench that referenced this pull request Apr 25, 2024
* Add context limit check function to backends/util.py
* Add model entries to registry
* Add handling of optional model loading flags for CPU/GPU usage and GPU layer offload
* Add openchat_3.5-GGUF-q5 to model registry
* Add llama.cpp backend howto

(cherry picked from commit 94493c2)
kushal-10 added a commit to kushal-10/clembench that referenced this pull request May 6, 2024
commit eccb468
Merge: daf8195 00f4eaf
Author: kushal-10 <83510614+kushal-10@users.noreply.github.com>
Date:   Mon May 6 15:46:10 2024 +0200

    Merge branch 'clp-research:main' into mergefork

commit daf8195
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Mon May 6 15:45:26 2024 +0200

    minor changes

commit 630e184
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Mon May 6 15:21:38 2024 +0200

    check wihtout use_fast

commit 9484030
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Mon May 6 15:11:56 2024 +0200

    add llama-llava-3-8b

commit 00f4eaf
Author: Sherzod Hakimov <sherzodhakimov@gmail.com>
Date:   Mon May 6 12:05:58 2024 +0200

    added Llama-3-70B model_id for Together.ai

commit b5364f5
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 22:00:11 2024 +0200

    rm processor log

commit 8393903
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 21:56:39 2024 +0200

    rm idefics log

commit 1bc7c36
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 21:55:49 2024 +0200

    check hit value

commit 3e3e2e0
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 21:47:26 2024 +0200

    debug

commit 3f4cb98
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 21:39:39 2024 +0200

    test idefics9b

commit 8200938
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 21:33:28 2024 +0200

    update logger info

commit 8dc5b29
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 20:18:30 2024 +0200

    add idefics9b in backend

commit 4247026
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 20:14:14 2024 +0200

    add idefics9binstruct, rm load_tokenizer

commit da9f38d
Merge: 19c844e afccf94
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 19:59:30 2024 +0200

    Merge branch 'mergefork' of https://github.com/kushal-10/clembench into mergefork

commit 19c844e
Author: kushal-10 <kushalkoshti11@gmail.com>
Date:   Sun May 5 19:56:26 2024 +0200

    Squashed commit of the following:

    commit 0bb1755
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun May 5 19:25:11 2024 +0200

        rm instances 1-8 matchit

    commit 2419425
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun May 5 09:15:50 2024 +0200

        update multimodal games

    commit 34a201f
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Thu May 2 16:43:15 2024 +0200

        Remove Emu2-related code

    commit 4122d37
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Thu May 2 14:43:55 2024 +0200

        More Emu2 experiments

    commit 7507a5a
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Wed May 1 16:01:46 2024 +0200

        Use Emu2 tokenizer directly to get prompt_tokens; Load Emu2 tokenizer with trust_remote_code=True to prevent CLI prompt for executing it

    commit c7fa93f
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Wed May 1 15:48:49 2024 +0200

        Use Emu2 tokenizer directly to get prompt_tokens

    commit ca7f84f
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Wed May 1 15:40:55 2024 +0200

        Remove Emu2 entry in model_registry.json, will use custom model registry to store machine-dependent file path; Change Emu2 loading and generation based on working example code

    commit 7b34cb5
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Wed May 1 14:48:49 2024 +0200

        Add more specific file path to model loading in Emu2 example script

    commit ffedbfb
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Wed May 1 14:45:23 2024 +0200

        Add specific file path to model loading in Emu2 example script

    commit 8cf0b12
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Wed May 1 14:40:36 2024 +0200

        Add example two-GPU code to Emu2 example script

    commit fe9ef50
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Wed May 1 14:28:00 2024 +0200

        Move Emu2 example script, use cloudgame images in it

    commit db4fd44
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Wed May 1 11:30:07 2024 +0200

        Merged changes from add_feat/context_check

    commit 2b87594
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Wed May 1 05:49:26 2024 +0200

        update store_score method, throws error in bencheval

    commit bcb174b
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Wed May 1 05:22:27 2024 +0200

        update max input tokens to 2048

    commit 38bf017
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Wed May 1 01:27:34 2024 +0000

        update mm_mapworld

    commit 3a27970
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 19:49:55 2024 +0200

        Add Emu2 HF example code script

    commit 3b04ad5
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 19:21:45 2024 +0200

        Change Emu2 template to contain image placeholder token

    commit bf6a3b9
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 18:45:05 2024 +0200

        Change Emu2 template to contain 255 image tokens (this is not random)

    commit b7db926
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 18:42:30 2024 +0200

        Change Emu2 template to contain 256 image tokens (again)

    commit 20ce386
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 18:40:09 2024 +0200

        Change Emu2 template to contain 128 image tokens

    commit 0692401
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 18:33:37 2024 +0200

        Change Emu2 template to contain 256 image tokens

    commit 31f02f9
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 18:25:21 2024 +0200

        Change Emu2 template to contain 64 image tokens

    commit 92c6a00
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 18:15:48 2024 +0200

        Change padding setting back to 'false' in registry for Emu2

    commit 5d27a75
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 18:08:13 2024 +0200

        Change padding setting in registry for Emu2

    commit de7a8e4
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 18:00:37 2024 +0200

        Add low_cpu_mem_usage=True flag to Emu2 weight loading

    commit d191b80
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 17:10:24 2024 +0200

        Change Emu2 weight loading to use proper torch.bfloat16

    commit ff86080
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Tue Apr 30 17:07:52 2024 +0200

        Add specific Emu2 loading: bfloat16 weights

    commit ddf47d5
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 08:23:58 2024 +0000

        Revert "test commit"

        This reverts commit 7e2c507.

    commit 7e2c507
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 08:22:25 2024 +0000

        test commit

    commit 95ee5fb
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 09:09:50 2024 +0200

        comment print statements

    commit aef7e81
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 09:08:24 2024 +0200

        set max_tokens to 1024 for idefics

    commit f26846b
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 07:05:12 2024 +0000

        Revert "update mm_mapworld"

        This reverts commit 50aed8d.

    commit 52dd5e4
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 06:39:39 2024 +0000

        Revert "set max_tokens value"

        This reverts commit 43dfac1.

    commit 50aed8d
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 06:31:39 2024 +0000

        update mm_mapworld

    commit 43dfac1
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 08:00:39 2024 +0200

        set max_tokens value

    commit 6180bb9
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 07:57:15 2024 +0200

        rm max_length

    commit 0defca1
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Apr 30 07:12:01 2024 +0200

        comment store_scores

    commit 5241261
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Mon Apr 29 16:50:07 2024 +0200

        Proper Emu2 assistant tag as eos_to_cull in model registry

    commit 7a27b28
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Mon Apr 29 16:31:52 2024 +0200

        Docstrings/comments

    commit 5ec7253
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Mon Apr 29 16:25:16 2024 +0200

        Change Emu2 registry chat template to assumed format; Use Emu2 template for proper input prompt text formatting

    commit 8bd896e
    Author: Gnurro <knawurzelkopp@hotmail.de>
    Date:   Mon Apr 29 15:49:37 2024 +0200

        Add prototype Emu2 support

    commit c571df1
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 13:03:08 2024 +0000

        add matchit

    commit cf72ff5
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 12:38:39 2024 +0200

        handle no image in history case

    commit 5572119
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 12:08:21 2024 +0200

        if else image inputs

    commit 7b8bdaf
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 12:05:46 2024 +0200

        no image case

    commit f659d61
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 11:55:44 2024 +0200

        print response

    commit bbd06f8
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 09:36:27 2024 +0000

        test

    commit 94acf6f
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 07:53:33 2024 +0000

        add mm_mapworld, update requirements_hf

    commit 325ec0a
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 05:51:03 2024 +0000

        update max_tokens in idefics, minor changes

    commit 71257c1
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 05:12:33 2024 +0000

        Update idefics user assistant tags

    commit 9ddeb69
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Apr 29 05:02:21 2024 +0000

        update idefics response

    commit f02afbf
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun Apr 28 20:05:06 2024 +0000

        remove setting pad_token_id warning

    commit f52fa63
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun Apr 28 15:16:42 2024 +0000

        update llava mistral, vicuna templates

    commit c93b2c1
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun Apr 28 13:41:00 2024 +0000

        update llava1.6 34b template, rm image_token, clean cloudgame resp

    commit 9e1c15e
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun Apr 28 11:32:20 2024 +0000

        rm vip-llava

    commit 6018cfc
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun Apr 28 10:50:09 2024 +0000

        update get_images, update llava1.5 template, update image_token tag

    commit 1567b1a
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun Apr 28 10:30:48 2024 +0000

        update llava1.5 template

    commit d9002ea
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sun Apr 28 09:03:10 2024 +0000

        add model_type in model registry

    commit ea68246
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Sat Apr 27 08:06:31 2024 +0000

        add Idefics80binstruct, update cloudgame

    commit b19530a
    Author: kushal-10 <83510614+kushal-10@users.noreply.github.com>
    Date:   Fri Apr 26 14:15:22 2024 +0200

        Update requirements_hf.txt

    commit 50207c9
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Fri Apr 26 08:18:56 2024 +0000

        rm clean_messages, logging

    commit 510980f
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Thu Apr 25 06:53:21 2024 +0200

        add llava1.6, update cloudgame

    commit ac8b268
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Thu Feb 29 11:28:16 2024 +0000

        add image,assistant keys in model registry, minor changes in backend

    commit 8f76d77
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Wed Feb 28 18:46:06 2024 +0000

        update scoring in cloudgame

    commit 3b2a7fb
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Wed Feb 28 13:41:48 2024 +0000

        add clean_messages, add docstrings

    commit e23923f
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Tue Feb 27 17:54:22 2024 +0000

        add programmatic judge (type Model), test messages

    commit 00873e1
    Author: kushal-10 <kushalkoshti11@gmail.com>
    Date:   Mon Feb 26 17:46:35 2024 +0000

        upgrade cloudgame speaker to v1-beta, multimodal backend refactor to v0.1

commit afccf94
Author: Philipp Sadler <philipp.sadler@gmail.com>
Date:   Fri Apr 26 16:26:24 2024 +0200

    remove double file_handler for backend logger's

commit 94493c2
Author: Jonathan Jordan <knawurzelkopp@hotmail.de>
Date:   Thu Apr 25 09:51:28 2024 +0200

    Add Llamacpp backend (clp-research#81)

    * Add context limit check function to backends/util.py
    * Add model entries to registry
    * Add handling of optional model loading flags for CPU/GPU usage and GPU layer offload
    * Add openchat_3.5-GGUF-q5 to model registry
    * Add llama.cpp backend howto
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants