Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions LLM_TOOL_GUIDANCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Guidance for LLM Tool Usage (ToolBench)

This document provides guidance on how current Large Language Models (LLMs) like LLaMA-2, GPT-3.5, and GPT-4 can leverage the ToolBench dataset to learn how to effectively use tools.

## Introduction
ToolBench provides a large-scale, high-quality instruction tuning dataset. The dataset includes real-world API documentation, complex multi-tool queries, and complete reasoning traces using methods like ReAct or Depth-First Search-based Decision Trees (DFSDT).

## 1. Defining Tools in the System Prompt
To enable an LLM to use tools, the available functions must be explicitly defined in the system prompt. ToolBench uses a standard JSON schema format to define tools.

### Example Tool Definition
```json
{
"name": "transitaire_for_transitaires",
"description": "This is the subfunction for tool \"transitaires\", you can use this tool.The description of this function is: \"R\u00e9cup\u00e8re un transitaire donn\u00e9e\"",
"parameters": {
"type": "object",
"properties": {
"is_id": {
"type": "string",
"description": "",
"example_value": "DOUANE_AGENCE_GONDRAND"
}
},
"required": ["is_id"],
"optional": []
}
}
```

### Prompt Engineering
Your system prompt should instruct the model to use the tools effectively. A common structure is:
1. Define the persona (e.g., "You are an AI assistant that can use tools...").
2. Provide the tools schema as shown above.
3. Enforce an execution format (e.g., "Output your thought process, followed by an Action, and then the Action Input in JSON format").
4. Specify a "Finish" tool or action when the task is complete.

## 2. Using the Dataset for Supervised Fine-Tuning (SFT)
The provided script `scripts/copy_datasets_to_jsonl.py` extracts the `query`, `function` schemas, and `train_messages` from the ToolBench annotations.

### Data Structure
The `training_data.jsonl` output contains each instance as a JSON record with the conversational trace. To fine-tune an LLM, this must be mapped to your chosen chat format (e.g., ChatML).

**Example Mapping (ChatML Style):**
- **System Role**: Inject the tool schemas (`function` key) and instructions here.
- **User Role**: The human's `query`.
- **Assistant Role**: The model's reasoning (`Thought`), tool selection (`Action`), and parameters (`Action Input`), or the `final_answer`.
- **Function/Tool Role**: The simulated response from the API.

By training on these complete trajectories, the LLM learns to reason (CoT/ReAct) and output valid JSON for function calls.

## 3. The Inference Loop
During inference, the model cannot execute tools itself; it relies on an execution loop:

1. **Prompt the Model:** Provide the user query and the available tools in the system prompt.
2. **Model Generates Action:** The LLM outputs a `function_call` (or text indicating an action and parameters).
3. **Execution:** Your system parses the action, calls the real API or mocked environment, and receives a result.
4. **Provide Result:** Append the result as a new message (role: `function` or `tool`) to the conversation history.
5. **Iterate:** Feed the updated history back to the model. It will either take another action or use the `Finish` tool to return the `final_answer`.

## 4. Advanced: DFSDT vs ReAct
ToolBench uses DFSDT (Depth-First Search-based Decision Tree) which allows the model to explore multiple paths and backtrack if an API fails or returns poor results. While simple fine-tuning often uses standard ReAct (Thought -> Action -> Observation), more advanced setups can use tree-search algorithms at inference time to evaluate multiple candidate actions using the model's self-evaluation capabilities.
47 changes: 47 additions & 0 deletions scripts/copy_datasets_to_jsonl.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
import json
import os
import argparse

def process_datasets(input_dir, output_file):
print(f"Reading dataset files from {input_dir}")
print(f"Writing parsed jsonl output to {output_file}")

with open(output_file, 'w', encoding='utf-8') as outfile:
# We will loop through directories G1_answer, G2_answer, G3_answer if available
# or process exactly the directory passed
if not os.path.exists(input_dir):
print(f"Error: {input_dir} does not exist.")
return

total_lines_written = 0
for root, dirs, files in os.walk(input_dir):
for filename in files:
if filename.endswith(".json"):
filepath = os.path.join(root, filename)
try:
with open(filepath, 'r', encoding='utf-8') as infile:
data = json.load(infile)

if "answer_generation" in data:
answer_gen = data["answer_generation"]
if "train_messages" in answer_gen and answer_gen.get("valid_data", False):
# Extract useful training info
record = {
"query": answer_gen.get("query", ""),
"functions": answer_gen.get("function", []),
"train_messages": answer_gen["train_messages"]
}
outfile.write(json.dumps(record, ensure_ascii=False) + '\n')
total_lines_written += 1
except Exception as e:
print(f"Error processing {filepath}: {e}")

print(f"Successfully processed and wrote {total_lines_written} lines to {output_file}.")

if __name__ == '__main__':
parser = argparse.ArgumentParser(description="Convert ToolBench datasets to JSONL")
parser.add_argument("--input_dir", type=str, default="data_example/answer", help="Input directory containing JSON files")
parser.add_argument("--output_file", type=str, default="training_data.jsonl", help="Output JSONL file path")

args = parser.parse_args()
process_datasets(args.input_dir, args.output_file)
Loading