# Testing Fine-tuned PLLuM Model for Function Calling

This notebook demonstrates how to load and use the fine-tuned PLLuM model for function calling. After completing the fine-tuning process in `fine_tuning.ipynb`, you can use this notebook to test the model with your own queries and tool definitions.

The model has been fine-tuned to generate function calls in response to user queries, supporting both Polish and English languages.

## Setup and Imports

In [None]:
import os
import json
import torch
import glob
from pathlib import Path
from dotenv import load_dotenv

# Import our fine-tuning utilities
from src.fine_tuning import (
    load_fine_tuned_model,
    generate_function_call,
)
from src.auth import login  # For Hugging Face authentication

# Load environment variables and authenticate with Hugging Face
load_dotenv()
login()

## Check for Available Models

Let's check for available fine-tuned models in the models directory.

In [None]:
# Locate fine-tuned models
MODELS_DIR = "../models"
model_dirs = [d for d in glob.glob(f"{MODELS_DIR}/pllum-function-calling-*") if os.path.isdir(d)]

print("Available fine-tuned models:")
for i, model_dir in enumerate(model_dirs):
    # Try to load the training summary if available
    summary_path = os.path.join(model_dir, "training_summary.json")
    if os.path.exists(summary_path):
        with open(summary_path, 'r', encoding='utf-8') as f:
            summary = json.load(f)
        print(f"{i+1}. {os.path.basename(model_dir)} - Trained on: {summary.get('training_date', 'Unknown')}")
    else:
        print(f"{i+1}. {os.path.basename(model_dir)}")

if not model_dirs:
    print("No fine-tuned models found in the models directory.")
    print("Please run the fine_tuning.ipynb notebook first.")

In [None]:
# Select and load a model
if model_dirs:
    # Choose the most recent model (or you can select a specific one)
    MODEL_PATH = model_dirs[-1]  # Most recent model
    print(f"Loading model from: {MODEL_PATH}")
    
    model, tokenizer = load_fine_tuned_model(MODEL_PATH)
    print("Model loaded successfully.")
else:
    # No models found, alert the user
    print("No fine-tuned models available.")
    MODEL_PATH = None

## Test with Dataset Examples

First, let's load the dataset and test with some examples from it.

In [None]:
# Load the dataset
DATASET_PATH = "../data/translated_dataset.json"

if os.path.exists(DATASET_PATH) and MODEL_PATH:
    with open(DATASET_PATH, 'r', encoding='utf-8') as f:
        dataset = json.load(f)
    print(f"Dataset loaded with {len(dataset)} examples.")
    
    # Test with a random example from the dataset
    import random
    example_idx = random.randint(0, len(dataset) - 1)
    example = dataset[example_idx]
    
    print(f"\nTesting with example {example_idx}:")
    print(f"Query: {example['query']}")
    print("\nAvailable tools:")
    for i, tool in enumerate(example['tools']):
        print(f"{i+1}. {tool['name']}: {tool['description']}")
    
    print("\nExpected answer:")
    print(json.dumps(example['answers'], indent=2, ensure_ascii=False))
    
    print("\nGenerating function call...")
    generated = generate_function_call(
        model=model,
        tokenizer=tokenizer,
        query=example['query'],
        tools=example['tools'],
        temperature=0.1
    )
    
    print("\nGenerated answer:")
    print(json.dumps(generated, indent=2, ensure_ascii=False))
elif not MODEL_PATH:
    print("Please load a model first.")
else:
    print(f"Dataset not found at {DATASET_PATH}")

## Test with Custom Examples

Now let's test with our own custom examples to see how the model generalizes.

In [None]:
# Define some custom tools for testing
weather_tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a location",
        "parameters": {
            "location": {
                "type": "string",
                "description": "The city and state or country (e.g., 'Warsaw, Poland')",
                "required": True
            },
            "unit": {
                "type": "string",
                "description": "The unit of temperature: 'celsius' or 'fahrenheit'",
                "required": False
            }
        }
    },
    {
        "name": "get_forecast",
        "description": "Get the weather forecast for the next N days",
        "parameters": {
            "location": {
                "type": "string",
                "description": "The city and state or country",
                "required": True
            },
            "days": {
                "type": "integer",
                "description": "Number of days for the forecast (1-10)",
                "required": True
            },
            "unit": {
                "type": "string",
                "description": "The unit of temperature: 'celsius' or 'fahrenheit'",
                "required": False
            }
        }
    }
]

calculator_tools = [
    {
        "name": "calculator",
        "description": "Perform basic arithmetic calculations",
        "parameters": {
            "expression": {
                "type": "string",
                "description": "The mathematical expression to evaluate",
                "required": True
            }
        }
    }
]

calendar_tools = [
    {
        "name": "create_calendar_event",
        "description": "Create a new calendar event",
        "parameters": {
            "title": {
                "type": "string",
                "description": "Title of the event",
                "required": True
            },
            "start_time": {
                "type": "string",
                "description": "Start time in ISO format",
                "required": True
            },
            "end_time": {
                "type": "string",
                "description": "End time in ISO format",
                "required": True
            },
            "description": {
                "type": "string",
                "description": "Description of the event",
                "required": False
            },
            "location": {
                "type": "string",
                "description": "Location of the event",
                "required": False
            }
        }
    }
]

In [None]:
# Define a function to test with custom queries
def test_custom_query(query, tools, temperature=0.1):
    if not MODEL_PATH:
        print("Please load a model first.")
        return
    
    print(f"Query: {query}")
    print("\nAvailable tools:")
    for i, tool in enumerate(tools):
        print(f"{i+1}. {tool['name']}: {tool['description']}")
    
    print("\nGenerating function call...")
    generated = generate_function_call(
        model=model,
        tokenizer=tokenizer,
        query=query,
        tools=tools,
        temperature=temperature
    )
    
    print("\nGenerated answer:")
    print(json.dumps(generated, indent=2, ensure_ascii=False))
    return generated

In [None]:
# Test with weather tools (English query)
english_weather_query = "What's the weather like in Warsaw today?"
test_custom_query(english_weather_query, weather_tools)

In [None]:
# Test with weather tools (Polish query)
polish_weather_query = "Jaka jest dzisiaj pogoda w Warszawie?"
test_custom_query(polish_weather_query, weather_tools)

In [None]:
# Test with calculator tools (English query)
english_calc_query = "What is 145 multiplied by 37?"
test_custom_query(english_calc_query, calculator_tools)

In [None]:
# Test with calculator tools (Polish query)
polish_calc_query = "Ile to jest 145 razy 37?"
test_custom_query(polish_calc_query, calculator_tools)

In [None]:
# Test with calendar tools (English query)
english_calendar_query = "Schedule a meeting with John tomorrow at 3pm for 1 hour to discuss the project"
test_custom_query(english_calendar_query, calendar_tools)

In [None]:
# Test with calendar tools (Polish query)
polish_calendar_query = "Zaplanuj spotkanie z Janem jutro o 15:00 na godzinę, aby omówić projekt"
test_custom_query(polish_calendar_query, calendar_tools)

## Test with Combination of Tools

Let's see how the model handles scenarios with multiple tool options.

In [None]:
# Combine multiple tools
combined_tools = weather_tools + calculator_tools + calendar_tools

# Test with combined tools (English)
combined_query_english = "Calculate 25% of 840"
test_custom_query(combined_query_english, combined_tools)

In [None]:
# Test with combined tools (Polish)
combined_query_polish = "Dodaj do kalendarza spotkanie z zespołem na jutro od 10:00 do 11:30 w sali konferencyjnej"
test_custom_query(combined_query_polish, combined_tools)

## Evaluate Model Performance

Let's qualitatively evaluate the model's performance.

In [None]:
# Function to rate model responses
def rate_response(generated, expected=None):
    print("Please rate the model's response (1-5):")
    print("1: Completely incorrect")
    print("2: Partially incorrect")
    print("3: Somewhat correct but with errors")
    print("4: Mostly correct with minor issues")
    print("5: Perfectly correct")
    
    if expected:
        print("\nExpected response:")
        print(json.dumps(expected, indent=2, ensure_ascii=False))
    
    print("\nGenerated response:")
    print(json.dumps(generated, indent=2, ensure_ascii=False))
    
    rating = input("Your rating (1-5): ")
    comment = input("Comments (optional): ")
    
    return {
        "rating": rating,
        "comment": comment
    }

In [None]:
# Test and rate some responses
if MODEL_PATH:
    # Load dataset examples
    if os.path.exists(DATASET_PATH):
        with open(DATASET_PATH, 'r', encoding='utf-8') as f:
            dataset = json.load(f)
        
        # Select a few random examples
        import random
        sample_indices = random.sample(range(len(dataset)), 3)
        
        ratings = []
        for idx in sample_indices:
            example = dataset[idx]
            print(f"\nExample {idx}:")
            print(f"Query: {example['query']}")
            
            generated = generate_function_call(
                model=model,
                tokenizer=tokenizer,
                query=example['query'],
                tools=example['tools'],
                temperature=0.1
            )
            
            rating_result = rate_response(generated, example['answers'])
            ratings.append({
                "example_idx": idx,
                "query": example['query'],
                "rating": rating_result
            })
            print("\n" + "-"*50)
        
        # Custom examples
        custom_queries = [
            ("Jaka będzie pogoda w Krakowie przez następne 3 dni?", weather_tools),
            ("Oblicz wynik wyrażenia (156 + 37) * 2.5", calculator_tools)
        ]
        
        for query, tools in custom_queries:
            print(f"\nCustom query: {query}")
            generated = test_custom_query(query, tools)
            rating_result = rate_response(generated)
            ratings.append({
                "example_idx": "custom",
                "query": query,
                "rating": rating_result
            })
            print("\n" + "-"*50)
        
        # Save the ratings
        import datetime
        rating_file = f"../models/ratings_{datetime.datetime.now().strftime('%Y%m%d_%H%M%S')}.json"
        with open(rating_file, 'w', encoding='utf-8') as f:
            json.dump(ratings, f, indent=2, ensure_ascii=False)
        
        print(f"Ratings saved to {rating_file}")
    else:
        print(f"Dataset not found at {DATASET_PATH}")
else:
    print("Please load a model first.")

## Integration Example

Here's an example of how you could integrate the fine-tuned model into a function calling application.

In [None]:
def process_function_call(function_call):
    """Simulates processing a function call from the model."""
    if isinstance(function_call, list) and len(function_call) > 0:
        # Handle case where model returns a list of function calls
        function_call = function_call[0]
    
    if 'name' not in function_call or 'arguments' not in function_call:
        return {"error": "Invalid function call format"}
    
    name = function_call['name']
    args = function_call['arguments']
    
    # Simulate function execution
    if name == "get_weather":
        location = args.get('location', 'Unknown')
        return {"result": f"Current weather for {location}: 22°C, Partly Cloudy"}
    
    elif name == "get_forecast":
        location = args.get('location', 'Unknown')
        days = args.get('days', 1)
        return {"result": f"{days}-day forecast for {location}: Sunny, temperatures between 18-25°C"}
    
    elif name == "calculator":
        expression = args.get('expression', '')
        try:
            # CAUTION: eval is used here for demo purposes only
            # In a real application, use a safer method to evaluate expressions
            result = eval(expression)
            return {"result": f"The result of {expression} is {result}"}
        except:
            return {"error": f"Could not evaluate expression: {expression}"}
    
    elif name == "create_calendar_event":
        title = args.get('title', 'Untitled')
        start = args.get('start_time', 'Unknown')
        end = args.get('end_time', 'Unknown')
        return {"result": f"Created calendar event: {title} from {start} to {end}"}
    
    else:
        return {"error": f"Unknown function: {name}"}

In [None]:
# Example of a complete function calling pipeline
def handle_user_query(query, available_tools, model=model, tokenizer=tokenizer):
    # Generate function call
    function_call = generate_function_call(
        model=model,
        tokenizer=tokenizer,
        query=query,
        tools=available_tools,
        temperature=0.1
    )
    
    # Process the function call
    result = process_function_call(function_call)
    
    return {
        "query": query,
        "function_call": function_call,
        "result": result
    }

In [None]:
# Test the complete pipeline
if MODEL_PATH:
    # Test queries
    test_queries = [
        ("What's the weather in London?", weather_tools),
        ("Jaka jest pogoda w Warszawie?", weather_tools),
        ("Calculate 250 * 0.15", calculator_tools),
        ("Utwórz spotkanie z zespołem na jutro o 14:00 na 2 godziny", calendar_tools)
    ]
    
    for query, tools in test_queries:
        print(f"\nQuery: {query}")
        result = handle_user_query(query, tools)
        print("Function call:")
        print(json.dumps(result["function_call"], indent=2, ensure_ascii=False))
        print("\nResult:")
        print(json.dumps(result["result"], indent=2, ensure_ascii=False))
        print("\n" + "-"*50)
else:
    print("Please load a model first.")

## Conclusion

In this notebook, we've demonstrated how to use the fine-tuned PLLuM model for function calling in both Polish and English. The model has been trained to understand queries and generate structured function calls according to the provided tool definitions.

Key capabilities demonstrated:
1. Loading a fine-tuned PLLuM model
2. Generating function calls for both Polish and English queries
3. Handling various types of tools and parameters
4. Integrating the model into a complete function calling pipeline

The fine-tuned model can be integrated into applications that need to parse natural language queries into structured function calls, especially for Polish-language applications.