<a href="https://colab.research.google.com/github/Arpit1118/Post-Training-LLMs-with-RL/blob/main/LLM_Tool_Calling_and_RLHF.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import torch
import sympy as sp
import json
import re
from transformers import AutoTokenizer, AutoModelForCausalLM

# --- Global Variables ---
model_name = "Qwen/Qwen2.5-1.5B-Instruct"
tokenizer = None
model = None

# --- Math Solver Class (Tool Implementation) ---
class MathSolver:
    def __init__(self, variable='x'):
        # Define the variable 'x' for symbolic math
        self.x = sp.Symbol(variable)

    def solve_equation(self, equation_str):
        """Solves an algebraic equation for 'x'."""
        try:
            # Separate LHS and RHS, then move all terms to the left
            if '=' in equation_str:
                lhs, rhs = equation_str.split('=')
                expr = sp.sympify(lhs) - sp.sympify(rhs)
            else:
                expr = sp.sympify(equation_str) # Assume it's already an expression equal to zero

            roots = sp.solve(expr, self.x)
            numeric = [sp.N(r) for r in roots]

            return {
                "success": True,
                "symbolic": [str(r) for r in roots],
                "numeric": [str(n) for n in numeric]
            }
        except Exception as e:
            return {"success": False, "error": str(e)}

    def evaluate_expression(self, expr_str):
        """Evaluates a basic math expression numerically."""
        try:
            # Use evalf() for numeric evaluation
            result = sp.sympify(expr_str).evalf()
            return {"success": True, "result": str(result)}
        except Exception as e:
            return {"success": False, "error": str(e)}

math_solver_instance = MathSolver()

In [2]:
# --- Function to load the model ---
def load_qwen_model():
    """Loads the Qwen model and tokenizer."""
    global tokenizer, model
    try:
        print(f"Loading Qwen model: {model_name}...")
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        # Use float32 for CPU compatibility
        model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float32)
        model.to('cpu').eval()
        print("Model loaded successfully.")
    except Exception as e:
        print(f"ERROR: Failed to load Qwen model/tokenizer. Error: {e}")

# Map the function names to their executable counterparts
AVAILABLE_TOOLS = {
    "solve_equation": math_solver_instance.solve_equation,
    "evaluate_expression": math_solver_instance.evaluate_expression,
}

# Define the tool specifications in Qwen's expected JSON format
MATH_TOOL_DEFINITION = """
[
    {
        "name": "solve_equation",
        "description": "Solves an algebraic equation for the variable 'x'. Use this for problems containing an equals sign, e.g., 'x**2 - 4 = 0'.",
        "parameters": {
            "type": "object",
            "properties": {
                "equation_str": {"type": "string", "description": "The equation to solve, e.g., 'x**2 - 4 = 0'."}
            },
            "required": ["equation_str"]
        }
    },
    {
        "name": "evaluate_expression",
        "description": "Calculates the numeric result of a math expression. Use this for calculations without an equals sign, e.g., '5*6' or 'sqrt(9)'.",
        "parameters": {
            "type": "object",
            "properties": {
                "expr_str": {"type": "string", "description": "The expression to evaluate, e.g., '2 + 3 * 4' or 'sqrt(9)'."}
            },
            "required": ["expr_str"]
        }
    }
]
"""

In [3]:
# --- Tool Execution Function ---
def execute_tool_call(tool_name, tool_args):
    """Executes the specified tool with arguments."""
    tool_func = AVAILABLE_TOOLS.get(tool_name)
    if tool_func:
        try:
            # Pass arguments directly to the MathSolver functions
            return tool_func(**tool_args)
        except Exception as e:
            return {"success": False, "error": str(e)}
    else:
        return {"success": False, "error": f"Tool '{tool_name}' not found."}

# --- Tool Call Extraction Function (Robust Parsing) ---
def extract_tool_call_json(response_text):
    """
    Attempts to extract the tool call JSON, searching first for Qwen tags,
    then falling back to looking for bare JSON content.
    Returns (tool_call_text, tool_call_match_object).
    """
    # 1. Search for the standard Qwen action tags
    primary_match = re.search(r"(<\|action_start\|>)(.*?)(\<\|action_end\|>)", response_text, re.DOTALL)
    if primary_match:
        return primary_match.group(0), primary_match

    # 2. Fallback: Search for standalone JSON with "name" and "arguments"
    json_search = re.search(r"(\{[\s\n]*\"name\".*?\"arguments\".*?\}(?:\n|\s|\}))", response_text, re.DOTALL)
    if json_search:
        raw_json_content = json_search.group(1).strip().replace("<|im_end|>", "").strip()
        try:
            # Validate JSON and construct a proper tool call string for the main logic
            json.loads(raw_json_content)
            tool_call_text = f"<|action_start|>\n{raw_json_content}\n<|action_end|>"

            # Create a simplified mock match object
            class MockMatch:
                def group(self, index):
                    if index == 0: return tool_call_text
                    if index == 2: return raw_json_content
                    raise IndexError

            print("[Warning: Fallback JSON parsing successful. Model output was missing action tags.]")
            return tool_call_text, MockMatch()

        except json.JSONDecodeError:
            pass

    return None, None

In [4]:
SYSTEM_PROMPT = f"""
You are a helpful and precise assistant. You have access to the following math-solving tools:
{MATH_TOOL_DEFINITION}
When the user asks a mathematical question (equation solving or calculation), you **must** call the appropriate tool.
You **must** respond with the tool call exactly in the following format:
<|action_start|>
{{
  "name": "tool_name",
  "arguments": {{
    "arg1": "value1",
    "arg2": "value2"
  }}
}}
<|action_end|>
Do not output any introductory or conversational text before the tool call. Only after receiving the tool's result should you provide a natural language answer.
If the user's request is not a math problem, answer directly without a tool call.
"""

def generate_response(prompt):
    """Generates the Qwen model's response, handling tool calls in a ReAct loop."""

    if not model or not tokenizer:
        return "ERROR: Model not loaded."

    messages = [{"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": prompt}]

    # --- Step 1: Initial Generation (Tool Call) ---
    input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
    output = model.generate(input_ids, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.eos_token_id)
    response_text = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=False).strip()

    tool_call_text, tool_call_match = extract_tool_call_json(response_text)

    if tool_call_match:
        print("\n[--- Tool Call Detected ---]")
        try:
            tool_call_json_str = tool_call_match.group(2).strip()
            tool_call_json = json.loads(tool_call_json_str)
            tool_name = tool_call_json["name"]
            tool_args = tool_call_json.get("arguments", {})

            print(f"  Tool: {tool_name}, Args: {tool_args}")
            tool_output = execute_tool_call(tool_name, tool_args)
            print(f"  Tool Result: {tool_output}")

            # --- Step 2: Rerun with Tool Output (Observation) ---
            # 1. Add model's tool-call message (Action)
            messages.append({"role": "assistant", "content": tool_call_text})
            # 2. Add tool's result message (Observation)
            messages.append({"role": "assistant", "content": f"The result of calling {tool_name} is: {tool_output}"})

            print("[--- Rerunning model for final answer ---]")
            final_input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
            final_output = model.generate(final_input_ids, max_new_tokens=512, do_sample=False, pad_token_id=tokenizer.eos_token_id)
            final_response_text = tokenizer.decode(final_output[0][final_input_ids.shape[1]:], skip_special_tokens=False).strip()

            # Clean special tokens from the final response
            final_response_text = re.sub(r"<\|action_start\|>.*?<\|action_end\|>", "", final_response_text, flags=re.DOTALL).strip()
            return final_response_text.replace("<|im_end|>", "").replace("<|im_start|>", "").strip()

        except (json.JSONDecodeError, KeyError) as e:
            print(f"[Warning: Tool execution/parsing failed. Returning raw output. Error: {e}]")

    # Fallback: Clean and return the initial raw response
    return response_text.replace("<|im_end|>", "").replace("<|im_start|>", "").strip()


if __name__ == "__main__":
    load_qwen_model()
    print("\nQwen Assistant with Math Solver Tool Ready. Type 'exit' to quit.")

    while True:
        try:
            user_input = input("\nUser >>> ")
            if user_input.lower() in ['exit', 'quit']:
                break

            response = generate_response(user_input)
            print(f"Qwen <<< {response}")

        except KeyboardInterrupt:
            print("\nExiting...")
            break
        except Exception as e:
            print(f"\nAn unexpected error occurred: {e}")
            break

Loading Qwen model: Qwen/Qwen2.5-1.5B-Instruct...


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

config.json:   0%|          | 0.00/660 [00:00<?, ?B/s]

`torch_dtype` is deprecated! Use `dtype` instead!


model.safetensors:   0%|          | 0.00/3.09G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/242 [00:00<?, ?B/s]

Model loaded successfully.

Qwen Assistant with Math Solver Tool Ready. Type 'exit' to quit.

User >>> What is the capital of France?


The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Qwen <<< France's capital is Paris.

User >>> What is the capital of India and Germany?
Qwen <<< India's capital is New Delhi, and Germany's capital is Berlin.

User >>> John has 5 pencils. He buys 3 more packs, each containing 4 pencils. How many pencils does John have in total?
Qwen <<< To determine how many pencils John has in total, we need to calculate the number of pencils he initially had and then add the number of pencils from the new packs he bought.

First, let's find out how many pencils John bought in total. Since he bought 3 packs and each pack contains 4 pencils, we multiply the number of packs by the number of pencils per pack:

\[ \text{Total pencils from packs} = 3 \times 4 = 12 \]

Now, we add these to the initial number of pencils John had:

\[ \text{Total pencils} = \text{Initial pencils} + \text{Pencils from packs} \]
\[ \text{Total pencils} = 5 + 12 \]
\[ \text{Total pencils} = 17 \]

So, John has a total of 17 pencils.

User >>> solve this: sqrt(2)**2 + log(exp(1