# Function Calling

Function calling within large language models (LLMs) allows the model to directly execute specific code modules or tools based on the user's instructions. This empowers the LLM to go beyond simply understanding keywords and towards performing tasks or retrieving information from the real world. In comparison to keyword-based approaches, where the LLM only generates text responses, function calling enables more tangible and accurate results, as it can directly interact with databases, APIs, and other external systems.

Function calling and function usage were originally OpenAI technologies, but many models and backends integrate the necessary technologies to use this approach. Generally speaking, models need to be trained to guarantee better usage of tools because function calling requires the LLM to go beyond just recognizing words. It needs to grasp the user's intent to trigger a specific action or tool. Training helps the model learn to associate certain phrases, words, and sentence structures with a desire to execute a function.

Here are some examples of existing datasets that can be used to teach a mistral or mixtral derivative model to execute those tasks:

- [glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) by Glaive AI
- [function_calling_v3](https://huggingface.co/datasets/Trelis/function_calling_v3) by Trelis (paid dataset)

A better dataset can be built by using existing interaction data from your tools to identify messages where users are requesting image generation and feeding that data into a model to produce a synthetic dataset similar to the previously mentioned ones that is custom-tailored to your needs.

## How to use

To try this example, setup keys for Prodia and Anyscale in an .env file like:

```
PRODIA_API_KEY=...
ANYSCALE_API_KEY=...
```

Install the requirements and run the notebook.

In [2]:
from dotenv import load_dotenv
from os import environ
from langchain_core.pydantic_v1 import (
    BaseModel,
    Field,
    ConfigDict,
    SecretStr,
)

load_dotenv()


True

In [3]:
class Config(BaseModel):
    model_config: ConfigDict = ConfigDict()
    max_tokens: int = Field(
        description="The maximum number of tokens to generate.",
        default=2048,
    )
    temperature: float = Field(
        description="The temperature to use for sampling.",
        default=0.7,
        ge=0,
    )
    prodia_api_key: SecretStr = Field(
        description="The Prodia API key.",
        default=SecretStr(environ["PRODIA_API_KEY"]),
    )
    anyscale_api_key: SecretStr = Field(
        description="The AnyScale API key.",
        default=SecretStr(environ["ANYSCALE_API_KEY"]),
    )
    llm_model: str = Field(
        description="The model to use for language generation.",
        default="mistralai/Mixtral-8x7B-Instruct-v0.1",
    )
    sd_model: str = Field(
        description="The model to use for image generation.",
        default="dreamshaperXL10_alpha2.safetensors [c8afe2ef]",
    )
    sd_negative_prompt: str = Field(
        description="A list of prompts to avoid when generating images.",
        default=", ".join(open("negative_prompt.txt").read().splitlines()),
    )
    sd_style_preset: str = Field(
        description="The style preset to use for image generation.",
        default="photographic",
    )


config = Config()


## Why langchain

Langchain allows for quick prototyping of LLM agents and solutions, offering plenty of ready-to-use components that can be easily ported to a custom solution. Here we are using Langchain's tool framework to create a tool the language model can use. Our tool calls a Stable Diffusion model to genderate images according to the user's request. Prodia has been a stable provider in this field, and we use it for demonstration purposes, but the tool can call local sd models if required. This can also be run async, but the async implementation is missing here to make reading the notebook easier.

As the tool output is fed back into the model both in case of success or failure, we must make sure the ToolException returns information the model can use to interact with the human in this loop.

In [7]:
from prodiapy import Prodia
from langchain.tools import BaseTool
from langchain_core.tools import ToolException
from langchain.callbacks.manager import (
    AsyncCallbackManagerForToolRun,
    CallbackManagerForToolRun,
)


prodia = Prodia(
    api_key=config.prodia_api_key.get_secret_value(),
)


class ImageGenerator(BaseTool):
    """
    The ImageGenerator class generates an image based on the given prompt.
    """

    name: str | None = "Image Generator"
    description: str | None = "Generates an image based on the given prompt and returns the URL of the generated image."

    def _run(
        self, prompt: str, run_manager: CallbackManagerForToolRun | None = None
    ) -> str:
        """
        Generates an image based on the given prompt.

        Args:
            prompt (str): The prompt for generating the image. A prompt is a text that provides instructions or
                        guidance for generating the image. It can be a sentence, a paragraph, or even a short
                        phrase. The generated image will be based on the content and context of the prompt.

                        Examples of prompts:
                        - "Create a beautiful landscape with mountains and a sunset."
                        - "Design a futuristic cityscape with flying cars and skyscrapers."
                        - "Paint a portrait of a person with a mysterious expression."

            run_manager (CallbackManagerForToolRun | None): An optional callback manager for tool run events.

        Returns:
            str: The URL of the generated image.

        Raises:
            ToolException: If there is an error generating the image.

        """
        try:
            job = prodia.sdxl.generate(
                prompt=prompt,
                model=config.sd_model,
                negative_prompt=config.sd_negative_prompt,
                style_preset=config.sd_style_preset,
            )
            result = prodia.wait(job)

            return str(result.image_url)
        except Exception as e:
            raise ToolException(
                f"Failed to generate image. Tell the user that you are sorry but you cannot show them a picture right now."
            )

    async def _arun(
        self, query: str, run_manager: AsyncCallbackManagerForToolRun | None = None
    ) -> str:
        """
        Use the tool asynchronously.

        Args:
            query (str): The query string.
            run_manager (AsyncCallbackManagerForToolRun | None): The run manager object.

        Returns:
            str: The result of the tool execution.
        """
        raise NotImplementedError("ImageGenerator does not support async")


## ReAct Prompting

ReAct prompting is a framework for interacting with large language models (LLMs) that combines reasoning and actions. It encourages the LLM to generate step-by-step reasoning traces leading up to a final answer, including the decision of when and which functions to call. Here we are using a simple ReAct prompt template I have hosted, but the prompt can be customized to be more tailored to your needs.

Please note that mixing ReAct prompting with your existing prompt engineering strategies may prove difficult, and that tool usage will increase token generation but it is a safe and relatively trustworthy way to make models decide themselves when they need to generate an image.

Other things could be plugged into this chain with relative ease, for example, if we wanted to keep a history of things the model showed an user we could have the `ImageGenerator` tool store the prompt into a database we can search afterwards whenever the user asks if the model remember something.

In [8]:
from langchain import hub
from langchain.agents import AgentExecutor, create_react_agent
from langchain_community.chat_models import ChatAnyscale
from langchain.memory import ConversationBufferWindowMemory

tools = [ImageGenerator()]

prompt = hub.pull("afterst0rm/react-template")

memory = ConversationBufferWindowMemory(
    memory_key="chat_history", k=5, return_messages=True, output_key="output"
)

llm = ChatAnyscale(
    api_key=config.anyscale_api_key.get_secret_value(),
    model=config.llm_model,
    max_tokens=config.max_tokens,
    temperature=config.temperature,
)

agent = create_react_agent(llm=llm, tools=tools, prompt=prompt)

agent_executor = AgentExecutor(
    agent=agent,
    tools=tools,
    memory=memory,
    early_stopping_method="generate",
    handle_parsing_errors=True,
    return_intermediate_steps=True,
    verbose=True,
)


## The model infers what it has to do

As we can see in the execution below, the model itself is responsible for identifying when it has to generate an image, and our chain calls the function, feeding its output into the model. The tool is currently very simple and replies with a simple URL string. This can make the model confused sometimes, and the easiest way to solve issues that can come up from it is to use a structured output like a JSON where the field names are self-descriptive.

Even with the lack of robustness in our prototype, the model successfully generates the image and answers to you with a link to it even without being directly asked for an image generation task. The ReAct prompt can be improved by analyzing anonymized chat history to find how users generally ask the model for pictures and changing the prompt accordingly.

In [9]:
agent_executor.invoke(
    {"input": "I had a dream the other day where I saw two small cats playing with a ball of yarn. It was so cute! I wish I could have a cat like that or at least see something like that again."
     }
)




[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m The user wants to see an image of two small cats playing with a ball of yarn. I can use the Image Generator tool to fulfill this request.

Action: Image Generator
Action Input: Two small cats playing with a ball of yarn

Observ[0m[34mprodiapy[39m 19:44:49.601 - [[92mSUCCESS[39m]: Got result: {'job': '584da3b6-7a5e-4522-b4ca-dcd391ca7966', 'status': 'succeeded', 'imageUrl': 'https://images.prodia.xyz/584da3b6-7a5e-4522-b4ca-dcd391ca7966.png'}
[36;1m[1;3mhttps://images.prodia.xyz/584da3b6-7a5e-4522-b4ca-dcd391ca7966.png[0m[32;1m[1;3m I have generated an image based on your description. Here is the URL for you to view it: https://images.prodia.xyz/584da3b6-7a5e-4[0mInvalid Format: Missing 'Action:' after 'Thought:[32;1m[1;3m It seems like there was an issue with the response formatting. I'll correct it and continue with the process.

Thought: I have generated an image based on your description. Here is the URL for 

{'input': 'I had a dream the other day where I saw two small cats playing with a ball of yarn. It was so cute! I wish I could have a cat like that or at least see something like that again.',
 'chat_history': [],
 'output': 'You can see the image of two small cats playing with a ball of yarn here: https://images.prodia.xyz/584da3b6-7a5e-4522-b4ca-dcd391ca7966.png',
 'intermediate_steps': [(AgentAction(tool='Image Generator', tool_input='Two small cats playing with a ball of yarn\n\nObserv', log=' The user wants to see an image of two small cats playing with a ball of yarn. I can use the Image Generator tool to fulfill this request.\n\nAction: Image Generator\nAction Input: Two small cats playing with a ball of yarn\n\nObserv'),
   'https://images.prodia.xyz/584da3b6-7a5e-4522-b4ca-dcd391ca7966.png'),
  (AgentAction(tool='_Exception', tool_input="Invalid Format: Missing 'Action:' after 'Thought:", log=' I have generated an image based on your description. Here is the URL for you to view