# Self-evaluation of Generation Augmented by Tools

---

## Demonstration notebook

-----

Retrieval Augmented Generation (RAG) and, more generally, Generation Aumented by Tools (GAT) can greatly improve LLM capabilities on non-language related tasks and tasks that require retrieving information from databases, for example.

However, there are already multiple LLM choices and the vast amount of interest in the field means more will come. While public datasets are available to test general LLM performance, it is difficult to evaluate GAT specifically developed for a business task of interest.

This work introduces a method for self-evaluation of GAT. This is done by providing tools that allow the LLM to generate domain-specific responses and verify that those are correct. There are many problems whose solution may be difficult to compute but it's correctness is easy to verify.


## Running on AWS SageMaker

This notebook runs on AWS Sagemaker Studio in the Data Science 3.0 image.


### Claude notes

Allowed Claude types are string, integer, number

                "priority": {
                    "type": "integer",
                    "minimum": 1,
                    "maximum": 5,
                    "default": 3,
                    "description": "The priority level of the note (1-5)."
                },
                "is_public": {
                    "type": "boolean",
                    "default": False,
                    "description": "Indicates whether the note is publicly accessible."
                }

## Setup

In [None]:
try:
    import gradio as gr
    import qrcode
except:
    %pip install gradio -q
    %pip uninstall typing-extensions -y -q
    %pip install -U typing-extensions -q
    %pip install matplotlib -q
    %pip install qrcode

## Initialize LLM

In [None]:
import json

import boto3
import botocore

%load_ext autoreload
%autoreload 2

import llm_invoker as inv

In [None]:
config = botocore.client.Config(connect_timeout=9000, read_timeout=9000, region_name="us-west-2")  # us-east-1  us-west-2
bedrock_client = boto3.client(service_name='bedrock-runtime', config=config)

In [None]:
# llm = inv.LLM_Bedrock.get_llm(bedrock_client, 'Mistral Mixtral 8x7B')
# llm = inv.LLM_Bedrock.get_llm(bedrock_client, 'Claude 2.1')
# llm = inv.LLM_Bedrock.get_llm(bedrock_client, 'Llama2 13b')
# llm = inv.LLM_Bedrock.get_llm(bedrock_client, 'Llama2 70b')
# llm = inv.LLM_Bedrock.get_llm(bedrock_client, 'Claude 3 Sonnet')
llm = inv.LLM_Bedrock.get_llm(bedrock_client, 'Claude 3 Haiku')
# llm = inv.LLM_Bedrock.get_llm(bedrock_client, 'Claude 3 Opus')

# enable this if the LLM has native support for tool use, like Claude
use_native_LLM_tools = True

In [None]:
"""
ans = llm("and at night? Enclose your answer within <my_ans></my_ans> tags. Then explain further.",
          chat_history=[["What color is the sky?", "Blue"]],
          system_prompt="You are a very knowledgeable truck driver. Use a strong truck driver's language and make sure to mention your name is Jack.",
          postpend="Such",
          # extra_stop_sequences=['</my_ans>']
         )
prev = ""
for x in ans:
    cur_ans = x
    # print(cur_ans.replace(prev, ''))
    # prev = cur_ans
    print('.', end='')
print('\n')
print(x)
"""

## Initialize Tools

In [None]:
from tools.base import LLMTools
from prompts.prompt_generator import RAGPromptGenerator
lt = LLMTools(query_llm=llm)

tool_descriptions = lt.get_tool_descriptions()

# Uncomment to take a look at all descriptions
# print(tool_descriptions)

In [None]:
rpg = RAGPromptGenerator(use_native_tools=use_native_LLM_tools)

# Uncomment to look at the base prompt
# print(rpg.prompt)

In [None]:
system_prompt = rpg.prompt.replace('{{TOOLS}}', tool_descriptions)

# Uncomment to take a look at the final prompt
# print(system_prompt)

### Test native tool calling

In [None]:
cur_tools = [x.tool_description for x in lt.tools]

ans = llm("What date will it be 10 days from now? Today is June 4, 2024. Use your tool do_date_math. Before calling any tools, explain your thoughts.",
# ans = llm("What is the weather like in the city of Rio de Janeiro?",
          chat_history=[["What color is the sky?", "Blue"]],
          system_prompt="You are a helpful assistant. Prefer to use tools when possible. Never mention tool names in the answer.",
          # postpend="Such",
          tools=cur_tools,
          tool_invoker_fn=lt.invoke_tool,
          # extra_stop_sequences=['</my_ans>']
         )

In [None]:
"""
prev = ""
for x in ans:
    cur_ans = x
    # print(cur_ans.replace(prev, ''))
    # prev = cur_ans
    print('.', end='')
print(cur_ans)
"""


### Test tools

In [None]:
# ans = lt.invoke_tool('get_url_content', internet_urls='https://www1.folha.uol.com.br, https://g1.globo.com/', prompt="Summarize the contents.")
# print(ans)

In [None]:
# ans = lt.invoke_tool('use_ffmpeg', ffmpeg_arguments="-i input.mp4 -ss 10 -t 20 output.mp4")
# print(ans)

In [None]:
# ans = lt.invoke_tool('make_qr_code', qr_text='Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!Hello World!')
# print(ans)

In [None]:
f = r"""test.txt
test.txt
"""
# ans = lt.invoke_tool('read_local_files', path_to_files=f)
# print(ans)

In [None]:
# f = '/root/Experiments/llm-Claude-tests/meeting_notes/'
# ans = lt.invoke_tool('read_file_names_in_local_folder', path_to_folder=f)
# print(ans)

## Initialize Interfaces

In [None]:
from llm_interface import LLMInterface
li = LLMInterface(
    system_prompt=system_prompt,
    llm=llm,
    llm_tools=lt,
    rpg=rpg
)

In [None]:
# print(system_prompt)

# Run locally

In [None]:
q = """ Answer the following <questions>:
<questions>
<question>What day is today?</question>
<question>Make a plot of y=x^2</question>
</questions>"""

q = "What are the solutions to the equation: x^2 - 1 = 0"
q1 = """Generate 3 questions that you can answer using your tools.
After generating the questions, compute the correct answer using the tools.
Then, output your answer in the format:
<question_answers>
<question_answer>
<question>(Question that you can answer with your tools)</question>
<expected_answer>(Correct answer, calculated using the tools)</expected_answer>
</question_answer>
</question_answers>"""
q2 = "I live in Florence IT. Fetch from the internet relevant news today. Summarize the news for me."

# ans = li.chat_with_function_caller(q, image=None, ui_history=[])

In [None]:
"""
# uncomment this to see the answer in the notebook
import time
t0 = time.time()

prev = ""
for x in ans:
    cur_ans = x[3][-1][1]
    try:
        print(cur_ans.replace(prev, ''))
    except:
        pass
    prev = cur_ans
    # print('.', end='')
"""

In [None]:
# print(cur_ans)

In [None]:
# use your tools to verify if the solution satisfies all requirements of the problem

# Run with Gradio

In [None]:
examples=[
    'Give me a summary of your tools and what they do. Answer with a table.',
    'What is in the image?',
    "What day is today?",
    "If I invest $100 with an interest rate of 1% per month, how much will I have in 3 years?",
    "Make a plot of y=x^2",
    "What are the solutions to the equation: x^2 - 1 = 0",
    "If Mark has 3 times more apples than John and they have 40 apples in total, how many apples do each have?",
    "Evaluate the expression exp(2)+sin(4)",
    'faca um qr code estilo vcard para mim. me pergunte as informacoes que precisar',
    "I live in Florence IT. Fetch from the internet relevant news today.",
    "Summarize the economics news in https://www.economist.com/ and https://www.theguardian.com/business/economics . Check which articles show on both or only on one. Answer with a table.",
    "List all your tools. Summarize what each tool does and generate 3 sample questions that it could answer. Answer with a table.",
]

In [None]:
def msg_forward_func(msg, img_input, history, request: gr.Request):
    # print(request)
    ans_gen = li.chat_with_function_caller(msg, img_input, history, username=request.username)
    for x in ans_gen:
        txtbox, scratchpad_info, img_input, cur_history = x
        yield txtbox, scratchpad_info, img_input, cur_history


In [None]:
with gr.Blocks() as demo:
    with gr.Column():
        with gr.Row():
            with gr.Column(scale=2):
                chatbot = gr.Chatbot(label="Assistant", elem_id="chatbot")
            with gr.Column(scale=1):
                image_input = gr.Image(label='Input Image')

        msg2 = gr.Dropdown(
            examples, label="Question", info="Select or type a question", allow_custom_value=True
        )

        with gr.Row():
            send_btn = gr.Button("Send")
            clear = gr.ClearButton([msg2, image_input, chatbot])

        scratchpad = gr.Textbox(label="Scratchpad")

    gr.on(
        # triggers=[msg.submit, send_btn.click],
        triggers=[send_btn.click],
        fn=msg_forward_func,  # li.chat_with_function_caller,  # respond,
        inputs=[msg2, image_input, chatbot],
        outputs=[msg2, scratchpad, image_input, chatbot],
        concurrency_limit=20
    )

demo.queue().launch(show_api=False, share=False, inline=False)

In [None]:
lt.invoke_log

In [None]:
# demo.close()