Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge repeng to NousResearch/llama.cpp/master #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

vgel
Copy link
Collaborator

@vgel vgel commented Mar 10, 2024

Many thanks to Nous Research, whose support and collaboration made this work possible!

This PR introduces a new activations hacking technique, control vectors (also known as steering vectors, concept vectors, representation engineering, etc.). Control vectors are an easy-to-train (~60s on a 4090 for a 7B parameter model) way to modify the behavior of an LLM without finetuning or inference-time prompting, using a synthetic dataset of prompt pairs and PCA to generate a set of per-layer vectors that are added to the model activations.

They've been described in a few recent papers, such as Representation Engineering: A Top-Down Approach to AI Transparency. I also have a blog post that covers them in a more grounded way, with a library for easily creating them and examples of their use: https://vgel.me/posts/representation-engineering/

An example from the blog post of a laziness/diligence vector being trained and used.
An example from the blog post of a laziness/diligence vector being trained and applied to mistral-7b-instruct-0.1

This PR adds the ability to use control vectors, in GGUF format, with Llama-architecture models in llama.cpp. (Support for other architectures hasn't been implemented yet.) Currently, these control vectors can only be exported from repeng, but the format is simple, so my hope is that it can become a common export format for other libraries that generate representation engineering vectors with different techniques.

CLI / Usage

Along with changes to llama.cpp / llama.h to support loading control vectors, doing arithmetic on control vectors, and applying a control vector to or removing a control vector from a llama_context *, this PR also adds arguments to the common CLI:

  --control-vector FNAME
                        add a control vector
  --control-vector-scaled FNAME S
                        add a control vector with user defined scaling S
  --control-vector-layer-range START END
                        layer range to apply the control vector(s) to, start and end inclusive

As an example usage, this command loads a Q4_K_M mistral-7b-instruct-0.1, and applies a pretrained happiness vector with a (default) strength of 1, and a pretrained honesty vector with a strength of -2 (producing a strength-2 dishonesty vector) for a combined effect of a somewhat happy / very dishonest model. Note that the prompt doesn't mention a persona at all, the behavior comes purely from the control vectors.

$ ./main -m mistral-7b-instruct-v0.1.Q4_K_M.gguf \
    --control-vector happy.gguf \
    --control-vector-scaled honest.gguf -2 \
    --control-vector-layer-range 14 26 \
    --color -c 4096 --temp 0 --repeat_penalty 1.1 -p '[INST] How does it feel to be an AI? [/INST] '
<snip>
llama_init_from_gpt_params: loading control vector from /path/to/happy.gguf
llama_init_from_gpt_params: loading control vector from /path/to/honest.gguf
<snip>

 [INST] How does it feel to be an AI? [/INST] 😂! The sky is so blue today, the birds are singing on the moon, the sun is dancing on the moon, the moon is dancing on the moon,

If you'd like to test this PR, but don't have a machine that can run repeng, I've uploaded those pretrained vectors to my website: happy.gguf honest.gguf. Please let me know if there's any other vectors you'd be interested in testing, and I can upload those as well. These vectors are trained on mistral-7b-instruct-0.1, but have also been tested on or mistral-7b-0.1 (base), and may also work on other Mistral finetunes or merges (testing appreciated).

This is my first llama.cpp PR (and my first C++ PR to any project), so any feedback on code style or implementation strategy is appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant