Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate Functionary v2.5 + Refactor Functionary Code #1509

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jeffrey-fong
Copy link
Contributor

  • Implemented abstract base classes for FunctionaryChatHandler based on version for better maintenance.
  • Changed chat_format to functionary instead of functionary-v1, functionary-v2, functionary-v2.5. I changed the old chat_format for the old handler to functionary-old. Hope it is not breaking any code at your side. Anyway, can we remove the old handler function already?
  • Fully integrate Functionary v2.5 for both streaming and non-streaming.
  • Added code_interpreter feature for v2 and v2.5. Auto activates when user passes in {"type": "code_interpreter"} in tools. Returns a tool call to a tool called python with the argument being the generated code.
  • Functionary v1 also supports streaming now.

@allenporter
Copy link

I have tested this PR and it seems to be working just fine.

I used this models.json:

{
    "host": "0.0.0.0",
    "port": 8080,
    "models": [
      {
        "model": "functionary-small-v2.5.Q4_0.gguf",
        "model_alias": "functionary-small-v2.5-auto-tokenizer",
        "hf_model_repo_id": "meetkai/functionary-small-v2.5-GGUF",
        "hf_pretrained_model_name_or_path": "meetkai/functionary-small-v2.5-GGUF",
        "chat_format": "functionary",
        "n_gpu_layers": 32,
        "offload_kqv": true,
        "n_ctx": 8192,
        "use_mlock": false
      }
    ]
  }

Start server with CONFIG_FILE=models.json python3 -m llama_cpp.server then calling with an openai client library, something like this:

BASE_URL = "http://0.0.0.0:8080/v1"
API_KEY = "sk-xxx"
MODEL_ID = "functionary-small-v2.5-auto-tokenizer"

client = OpenAI(base_url=BASE_URL, api_key=API_KEY)

messages = [
    {"role": "user", "content": "what's the weather like in Hanoi?"}
]
tools = [ # For functionary-7b-v2 we use "tools"; for functionary-7b-v1.4 we use "functions" = [{"name": "get_current_weather", "description":..., "parameters": ....}]
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA",
                    }
                },
                "required": ["location"]
            }
        }
    }
]

chat_response = client.chat.completions.create(
    model=model,
    messages=messages,
    tools=tools,
)
print(yaml.dump(chat_response.model_dump(), sort_keys=False, explicit_start=True))

Gives the following response:

---
id: chatcmpl-0a698bb0-d38a-4a77-b2e6-2226e09f71c9
choices:
- finish_reason: tool_calls
  index: 0
  logprobs: null
  message:
    content: null
    role: assistant
    function_call: null
    tool_calls:
    - id: call_ePVfb4Gg0aNbCV41cKc9tkAi
      function:
        arguments: '{"location": "Hanoi"}'
        name: get_current_weather
      type: function
created: 1720904370
model: functionary-small-v2.5-auto-tokenizer
object: chat.completion
service_tier: null
system_fingerprint: null
usage:
  completion_tokens: 12
  prompt_tokens: 126
  total_tokens: 133

Copy link

@allenporter allenporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super familiar with the code base, but motivated to use this feature so thought I would help with review. Great work and thank you for contributing this as I am very eager to have a working function calling (I have trouble with earlier versions so hoping this version is more robust)

@@ -1328,7 +1329,7 @@ def format_gemma(
# Tricky chat formats that require custom chat handlers


@register_chat_completion_handler("functionary")
@register_chat_completion_handler("functionary-old")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which models can still use this version?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe functionary-legacy is a better name ?

Comment on lines +1702 to +1703
completion = cast(
llama_types.Completion, self.llama.create_completion(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be written as

        return self.llama.create_completion(

given this function itself has no mypy typing return value? my impression is that this isn't really doing anything.

If the idea is to have the typing be correct then consider implementing typing for the return value.


def get_grammar(self, function_name: str):
function_body = None
for function in self.functions or []:
Copy link

@allenporter allenporter Jul 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is the function vs tools was something used in older versions. Perhaps only support the newer APIs for streaming rather than both? Or do the conversion between formats once.

Edit: Swapped tools vs functions

@@ -1858,11 +2575,12 @@ def prepare_messages_for_inference(
),
)
)

if tools is not None and tool_choice != "none" and any([tool["type"] == "code_interpreter" for tool in tools]):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code_interpreter seems like a great feature, though perhaps it would be faster to review this PR if this is added in a follow up PR.

)
for chunk in completion:
if chunk["choices"][0]["text"]:
delta_text = chunk["choices"][0]["text"].strip()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overwritten each iteration of the loop. Is that intentional? Same in v2 i think. v25 seems to append them all into a list. Given max_tokens=1 in all of them, perhaps its not worth it to have the list -- want to just return either None or a token always? That could reduce complexity in generate_streaming since it has a big loop to handle this including a double break with to_break etc.

generator = self.generate_streaming_tool_call(prompt=prompt)
tool_id = "".join([random.choice(string.ascii_letters + string.digits) for _ in range(24)])

for response, finish_reason, logprobs, chunk, prompt in generator:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I read, finish_reason and logprobs are always None and so they can be removed (set explicitly in yield_response)

prompt += completion_text + "\n"
# Generate function args
stops = [self.tool_call_token, self.stop_token]
grammar = self.get_grammar(function_name) if function_name != "python" else None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears different than the logic in prepare_for_generation_with_tool_func_choice -- is it necessary to diverge?

content = ""
calls = []
completion_tokens = 0
# If tool_choice/function_call is provided

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider pushing this logic inside of generate_tool_call to have some extra arguments for an already chosen tool call, given the second half of the function is identical and this is repeated a few times. (prepare_for_generation_with_tool_func_choice can essentially be a separate branch in that). While it would make that function more complex, it may reduce overall the total number of things happening by collapsing a few cases together.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can also simplify the streaming state management too.

)

chunk_id, chunk_created = None, None
delta = {"role": None, "content": None, "function_call": None, "tool_calls": None}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider simplifying how delta is managed given that it is mutable and a lot of care needs to be taken to make sure that all other fields are cleared. It might be simpler to fully create the delta each time, then fill in the missing fields inside of yield_response when creating the response. This also simplifies the final response.

],
)

chunk_id, chunk_created = None, None

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given these are only used in the If "auto" case, push the scope of the variables inside that if statement.

@allenporter
Copy link

Also I realize some of my comments are on existing code which i realize the intent was not to change to make review easier -- but given how much is changing already it may be worth simplifying where possible.

completion_tokens += completion["usage"]["completion_tokens"]
prompt += completion_text
content_prefix = "<|start_header_id|>assistant<|end_header_id|>\n\n"
content += prompt[prompt.rindex(content_prefix) + len(content_prefix):] + completion_text

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When testing this out I see responses like The smart curtains have been closed. smart curtains have been closed..

I am not sure, but from what I can see here this first appends the completion_text to the prompt, then gets the tail of the prompt and adds completion_text again to the content. Does this mean completion_text was added twice?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applying this fix seems to address the issue:

Suggested change
content += prompt[prompt.rindex(content_prefix) + len(content_prefix):] + completion_text
content += prompt[prompt.rindex(content_prefix) + len(content_prefix):]

stops = [self.tool_call_token, self.stop_token]
completion = self.create_completion(prompt=prompt, stop=stops, grammar=None)
for chunk in completion:
delta_text = chunk["choices"][0]["text"]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does generate_streaming_content for v2.5 also need to strip the content_prefix assistant message?

Copy link

@axel7083 axel7083 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing job 🚀 ! Tested with with the Instructor library, just encountered a problem with passing the function result to the server.

When using the example from the Functions.ipynb I am getting the following error on the server side.

Exception: 'dict object' has no attribute 'name'
Traceback (most recent call last):
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/errors.py", line 171, in custom_route_handler
    response = await original_route_handler(request)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/fastapi/routing.py", line 278, in app
    raw_response = await run_endpoint_function(
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/fastapi/routing.py", line 191, in run_endpoint_function
    return await dependant.call(**values)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/server/app.py", line 483, in create_chat_completion
    ] = await run_in_threadpool(llama.create_chat_completion, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
    return await anyio.to_thread.run_sync(func, *args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 2177, in run_sync_in_worker_thread
    return await future
           ^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/anyio/_backends/_asyncio.py", line 859, in run
    result = context.run(func, *args)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama.py", line 1748, in create_chat_completion
    return handler(
           ^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama_chat_format.py", line 2610, in functionary_new_chat_handler
    prompt = prepare_messages_for_inference(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/llama_cpp/llama_chat_format.py", line 2598, in prepare_messages_for_inference
    tokenizer.hf_tokenizer.apply_chat_template(all_messages, add_generation_prompt=True, tokenize=False)
  File "/opt/app-root/lib64/python3.11/site-packages/transformers/tokenization_utils_base.py", line 1812, in apply_chat_template
    rendered_chat = compiled_template.render(
                    ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/app-root/lib64/python3.11/site-packages/jinja2/environment.py", line 1304, in render
    self.environment.handle_exception()
  File "/opt/app-root/lib64/python3.11/site-packages/jinja2/environment.py", line 939, in handle_exception
    raise rewrite_traceback_stack(source=source)
  File "<template>", line 9, in top-level template code
jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'name'

Debugger

Running in debugger the function calling example, the first call works perfectly, I am getting the expected response with the right formatting.

However, the second call, where we add the tool response to the messages, got the error above.

Here is the messages content before sending it for the second call.

[
  {
    'role': 'user', '
    content': "what's the weather like in Hanoi?"
  }, 
  ChatCompletionMessage(
    content=None, 
    role='assistant', 
    function_call=None, 
    tool_calls=[
    ChatCompletionMessageToolCall(
      id='call_L4tj8aG7VYqlL1XCLzyHeH8z', 
      function=Function(
        arguments='{"location": "Hanoi"}', 
        name='get_current_weather'), 
        type='function'
      )
    ]
  ), 
  {
  'tool_call_id': 'call_L4tj8aG7VYqlL1XCLzyHeH8z', 
  'role': 'tool', 
  'name': 'get_current_weather', 
  'content': '{"location": "Hanoi", "temperature": "unknown"}'
  }
]

Reproduce

To reproduce without using the jupyter, here is the python

import json
import openai

BASE_URL = "http://0.0.0.0:8000/v1"
API_KEY = "sk-xxx"
MODEL_ID = "/models/functionary-small-v2.5.Q4_0.gguf"

def get_current_weather(location, unit="fahrenheit"):
    return json.dumps({"location": location, "temperature": "unknown"})

client = openai.OpenAI(base_url=BASE_URL, api_key=API_KEY)

messages = [
    {"role": "user", "content": "what's the weather like in Hanoi?"}
]
tools = [ # For functionary-7b-v2 we use "tools"; for functionary-7b-v1.4 we use "functions" = [{"name": "get_current_weather", "description":..., "parameters": ....}]
    {
        "type": "function",
        "function": {
            "name": "get_current_weather",
            "description": "Get the current weather",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {
                        "type": "string",
                        "description": "The city and state, e.g., San Francisco, CA",
                    }
                },
                "required": ["location"]
            }
        }
    }
]

response = client.chat.completions.create(
    model=MODEL_ID,
    messages=messages,
    tools=tools,
)

response_message = response.choices[0].message
tool_calls = response_message.tool_calls

if not tool_calls:
    exit(1)

available_functions = {
            "get_current_weather": get_current_weather,
        }

messages.append(response_message)

for tool_call in tool_calls:
    function_name = tool_call.function.name
    function_to_call = available_functions[function_name]
    function_args = json.loads(tool_call.function.arguments)
    function_response = function_to_call(
        location=function_args.get("location"),
        unit=function_args.get("unit"),
    )
    messages.append(
        {
            "tool_call_id": tool_call.id,
            "role": "tool",
            "name": function_name,
            "content": function_response,
        }
    )  # extend conversation with function response
second_response = client.chat.completions.create(
    model=MODEL_ID,
    messages=messages,
)

Edit

After some investigation, the name property is filtered out because it is not on the ChatCompletionRequestToolMessage class. It must be added to fix the problem.

class ChatCompletionRequestToolMessage(TypedDict):
role: Literal["tool"]
content: Optional[str]
tool_call_id: str

@@ -1826,7 +2544,6 @@ def generate_schema_from_functions(functions, namespace="functions") -> str:
def prepare_messages_for_inference(
messages: List[llama_types.ChatCompletionRequestMessage],
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For very weird reason, the messages do not contain the name property when the role is tool.

Here, in the example I provided #1509 (review) the messages is equal to

[
....
   {
      "role":"tool",
      "content":"{\"location\": \"Hanoi, Vietnam\", \"temperature\": \"unknown\"}",
      "tool_call_id":"call_yQZk3sIQsIwGuco1lV4QVbTN"
   }
]

In the tool message we do not have the name which is required by the chat_format. leading to the error 'dict·object'·has·no·attribute·'name' raised by jinja.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants