# Testing Translation Functionality

This notebook demonstrates how to use the translation utilities to translate queries from English to Polish.

## Setup

First, let's set up our environment and import necessary libraries.

In [1]:
import sys
sys.path.append('..')  # Add parent directory to path

import json
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from src.auth import login_to_huggingface
from src.dataset import load_function_calling_dataset, parse_json_entry
from src.translator import translate_text, translate_query_in_sample

# Visual settings
plt.rcParams['figure.figsize'] = (12, 8)
sns.set_style('whitegrid')

## Loading the Dataset

Let's load a small sample of the function calling dataset to test the translation functionality.

In [2]:
# Login to Hugging Face
login_to_huggingface()

# Load dataset
dataset = load_function_calling_dataset()
print(f'Dataset contains {len(dataset["train"])} examples')



Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.
Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


Successfully logged in to Hugging Face
Successfully logged in to Hugging Face
Dataset contains 60000 examples


## Testing Translation Function

Let's test the basic translation function with a few simple examples.

In [3]:
# Test with a few simple sentences
test_sentences = [
    "Hello, how are you?",
    "I need to find a restaurant nearby.",
    "What's the weather forecast for tomorrow?",
    "Calculate the sum of 23 and 45.",
    "Find the nearest gas station."
]

# Translate and display results
results = []
for sentence in test_sentences:
    polish = translate_text(sentence, src='en', dest='pl')
    results.append({"English": sentence, "Polish": polish})
    
pd.DataFrame(results)

Unnamed: 0,English,Polish
0,"Hello, how are you?","Witam, jak się masz?"
1,I need to find a restaurant nearby.,Muszę znaleźć w pobliżu restaurację.
2,What's the weather forecast for tomorrow?,Jaka jest prognoza pogody na jutro?
3,Calculate the sum of 23 and 45.,Oblicz sumę 23 i 45.
4,Find the nearest gas station.,Znajdź najbliższą stację benzynową.


## Testing with Real Dataset Samples

Now, let's test the translation with actual dataset samples.

In [4]:
# Get a few samples from the dataset
num_samples = 5
samples = [dataset['train'][i] for i in range(num_samples)]

# Parse the JSON entries
parsed_samples = [parse_json_entry(sample) for sample in samples]

# Display the original queries
for i, sample in enumerate(parsed_samples):
    print(f"Sample {i+1} Query: {sample['query']}")

Sample 1 Query: Where can I find live giveaways for beta access and games?
Sample 2 Query: I need to understand the details of the Ethereum blockchain for my cryptocurrency project. Can you fetch the details for 'ethereum'?
Sample 3 Query: What is the T3MA for 'ETH/BTC' using a 1h interval and a time period of 14?
Sample 4 Query: List titles originally aired on networks '1' and '8', released after 2010, sorted by release date in descending order.
Sample 5 Query: Fetch the competitor standings for the recently concluded stage 98765.


In [5]:
# Test translating the queries
translated_results = []

for i, sample in enumerate(parsed_samples):
    # Get the original query
    original_query = sample['query']
    
    # Translate the query
    translated_query = translate_text(original_query, src='en', dest='pl')
    
    translated_results.append({
        "Sample": i+1,
        "Original (English)": original_query,
        "Translated (Polish)": translated_query
    })

# Create a DataFrame for better display
translate_df = pd.DataFrame(translated_results)
translate_df

Unnamed: 0,Sample,Original (English),Translated (Polish)
0,1,Where can I find live giveaways for beta acces...,Gdzie mogę znaleźć prezenty na żywo dla dostęp...
1,2,I need to understand the details of the Ethere...,Muszę zrozumieć szczegóły blockchaina Ethereum...
2,3,What is the T3MA for 'ETH/BTC' using a 1h inte...,Jaki jest T3MA dla „ETH/BTC” przy użyciu inter...
3,4,List titles originally aired on networks '1' a...,Listy tytuły pierwotnie wyemitowane w sieciach...
4,5,Fetch the competitor standings for the recentl...,Pobierz klasyfikację konkurencji dla niedawno ...


## Testing the Sample Translation Function

Now, let's test the `translate_query_in_sample` function that handles the translation of only the query field in a dataset sample.

In [6]:
# Test the sample translation function
sample = dataset['train'][0]
parsed_sample = parse_json_entry(sample)

print("Original Sample:")
print(f"Query: {parsed_sample['query']}\n")

# Translate the sample
translated_sample = translate_query_in_sample(sample, src='en', dest='pl')
parsed_translated = parse_json_entry(translated_sample)

print("Translated Sample:")
print(f"Query: {parsed_translated['query']}\n")

# Verify that only the query was translated
print("Tools from original sample:")
tools_original = parsed_sample['tools']
for tool in tools_original[:1]:  # Just showing the first tool to keep output manageable
    print(f"- {tool['name']}: {tool['description']}")

print("\nTools from translated sample:")
tools_translated = parsed_translated['tools']
for tool in tools_translated[:1]:
    print(f"- {tool['name']}: {tool['description']}")

Original Sample:
Query: Where can I find live giveaways for beta access and games?

Translated Sample:
Query: Gdzie mogę znaleźć prezenty na żywo dla dostępu do beta i gier?

Tools from original sample:
- live_giveaways_by_type: Retrieve live giveaways from the GamerPower API based on the specified type.

Tools from translated sample:
- live_giveaways_by_type: Retrieve live giveaways from the GamerPower API based on the specified type.


## Conclusion

In this notebook, we've tested the translation functionality and confirmed that:

1. The basic translation function works correctly
2. We can translate queries from dataset samples
3. The `translate_query_in_sample` function correctly translates only the query field, leaving other fields unchanged