# Testing Translation Functionality

This notebook demonstrates how to use the translation utilities to translate queries from English to Polish.

## Setup

First, let's set up our environment and import necessary libraries.

In [None]:
import sys
sys.path.append('..')  # Add parent directory to path

import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from src.auth import login_to_huggingface
from src.dataset import load_function_calling_dataset, parse_json_entry
from src.translator import translate_text, translate_query_in_sample

# Visual settings
plt.rcParams['figure.figsize'] = (12, 8)
sns.set_style('whitegrid')

## Loading the Dataset

Let's load a small sample of the function calling dataset to test the translation functionality.

In [None]:
# Login to Hugging Face
login_to_huggingface()

# Load dataset
dataset = load_function_calling_dataset()
print(f'Dataset contains {len(dataset["train"])} examples')

## Testing Translation Function

Let's test the basic translation function with a few simple examples.

In [None]:
# Test with a few simple sentences
test_sentences = [
    "Hello, how are you?",
    "I need to find a restaurant nearby.",
    "What's the weather forecast for tomorrow?",
    "Calculate the sum of 23 and 45.",
    "Find the nearest gas station."
]

# Translate and display results
results = []
for sentence in test_sentences:
    polish = translate_text(sentence, src='en', dest='pl')
    results.append({"English": sentence, "Polish": polish})
    
pd.DataFrame(results)

## Testing with Real Dataset Samples

Now, let's test the translation with actual dataset samples.

In [None]:
# Get a few samples from the dataset
num_samples = 5
samples = [dataset['train'][i] for i in range(num_samples)]

# Parse the JSON entries
parsed_samples = [parse_json_entry(sample) for sample in samples]

# Display the original queries
for i, sample in enumerate(parsed_samples):
    print(f"Sample {i+1} Query: {sample['query']}")

In [None]:
# Test translating the queries
translated_results = []

for i, sample in enumerate(parsed_samples):
    # Get the original query
    original_query = sample['query']
    
    # Translate the query
    translated_query = translate_text(original_query, src='en', dest='pl')
    
    translated_results.append({
        "Sample": i+1,
        "Original (English)": original_query,
        "Translated (Polish)": translated_query
    })

# Create a DataFrame for better display
translate_df = pd.DataFrame(translated_results)
translate_df

## Testing the Sample Translation Function

Now, let's test the `translate_query_in_sample` function that handles the translation of only the query field in a dataset sample.

In [None]:
# Test the sample translation function
sample = dataset['train'][0]
parsed_sample = parse_json_entry(sample)

print("Original Sample:")
print(f"Query: {parsed_sample['query']}\n")

# Translate the sample
translated_sample = translate_query_in_sample(sample, src='en', dest='pl')
parsed_translated = parse_json_entry(translated_sample)

print("Translated Sample:")
print(f"Query: {parsed_translated['query']}\n")

# Verify that only the query was translated
print("Tools from original sample:")
tools_original = parsed_sample['tools']
for tool in tools_original[:1]:  # Just showing the first tool to keep output manageable
    print(f"- {tool['name']}: {tool['description']}")

print("\nTools from translated sample:")
tools_translated = parsed_translated['tools']
for tool in tools_translated[:1]:
    print(f"- {tool['name']}: {tool['description']}")

## Conclusion

In this notebook, we've tested the translation functionality and confirmed that:

1. The basic translation function works correctly
2. We can translate queries from dataset samples
3. The `translate_query_in_sample` function correctly translates only the query field, leaving other fields unchanged

In the next notebook, we'll use these functions to create a dataset with 40% of the queries translated to Polish.