## Translating queries

In this notebook we translate the queries that we generated earlier, again using the API of Gemini.

In [None]:
import pandas as pd
import numpy as np

In [None]:
# load the dataframe that contains the queries

stratified_sample = pd.read_csv('data/stratified_sample_with_queries.csv')

In [4]:
master_prompt = """

You are a bilingual translator fluent in both English and Dutch.
Please translate each search query into the other language.

Provided search queries:
    English query: {text_en}
    Dutch query: {text_nl}

Only provide a response in the following format:
    Dutch translation: <Dutch translation>
    English translation: <English translation>

"""

In [None]:
# replace the previous formatted generation prompts with the new translation master_prompt that can be filled in with the right queries

stratified_sample['formatted_prompt'] = master_prompt 

In [None]:
import time 
from google import genai
import google.generativeai as genai
import keys  # store your own api key in a file named keys.py

api_key = keys.GEMINI_API_KEY

if not api_key:
    raise ValueError("Gemini API key not found in keys.py.")

genai.configure(api_key=api_key)

# Select the Gemini model
model_name = "gemini-2.0-flash-lite"

model = genai.GenerativeModel(model_name)

In [None]:
# function fills in the prompt template with the search queries to be translated, and then prompts Gemini

def get_raw_response(text_en, text_nl, prompt_template):

    prompt = prompt_template.format(text_en=text_en, text_nl=text_nl)

    try:
        response = model.generate_content(prompt,
                                          generation_config = genai.types.GenerationConfig(temperature=0.2) 
        )
        return response 
    
    except Exception as e:
        print(f"Error in get_raw_response: {e}") 
        return None


In [None]:
# wrapper around get_raw_response function, to extract the translations from the raw Gemini response

def generate_translation(text_en, text_nl, prompt_template):
    "Extracts the translation from the raw Gemini response."

    if not text_en or not text_nl or pd.isna(text_en) or pd.isna(text_nl):  # ensure content is valid
        return "N/A"

    response = get_raw_response(text_en, text_nl, prompt_template)  # call the function

    if response is None:
        return "Error: No response from the API"  # handle the case where the API call failed

    # splitting the text based on "English translation:" and "Dutch translation:"
    try:
        raw_text = response.text.strip()
        print(raw_text)
        # splitting the text based on "English translation:" and "Dutch translation:"
        parts = raw_text.split("\n")  # first split by newline
        english_part = next((s for s in parts if "English translation:" in s), None)  # find line with "English translation:"
        dutch_part = next((s for s in parts if "Dutch translation:" in s), None)  # find line with "Dutch traslation:"

        if english_part and dutch_part:
            translation_en = english_part.replace("English translation:", "").strip()
            translation_nl = dutch_part.replace("Dutch translation:", "").strip()
            return (translation_en, translation_nl)
        else:
            return ("Error: Could not parse response", "Error: Could not parse response")

    except Exception as e:
        return str(e)  # return error message for debugging

In [None]:
# translate queries in both languages and put them in new columns

# empty lists to store the results
english_translations = []
dutch_translations = []

for idx, row in stratified_sample.iterrows():
    translation_en, translation_nl = generate_translation(row['query_en'], row['query_nl'], row['formatted_prompt'])
    english_translations.append(translation_en)
    dutch_translations.append(translation_nl)
    
    # sleep for 2 seconds to maintain max 30 requests/minute rate limit
    time.sleep(2)

# add the results to the dataframe
stratified_sample["translation_en"] = english_translations
stratified_sample["translation_nl"] = dutch_translations

Dutch translation: Een zuivelbedrijf bereidt zich voor om te bieden op een aanbesteding voor boter die door interventiebureaus wordt gehouden, en het bedrijf moet de prijsstructuur begrijpen. Specifiek is het bedrijf geïnteresseerd in de minimumverkoopprijs voor boter in de 35e individuele uitnodiging tot inschrijving die is uitgegeven in het kader van de permanente uitnodiging tot inschrijving. Het bedrijf moet de exacte minimumverkoopprijs per 100 kg weten, zoals bepaald door de relevante regelgeving, en de datum waarop de termijn voor het indienen van inschrijvingen is verstreken.
English translation: A dairy company wants to submit a bid on a tender for butter held by intervention agencies, and the company needs to understand the pricing structure. Specifically, the company is interested in the minimum selling price for butter in the 35th individual invitation to tender issued under the standing invitation to tender. The company wants to know the exact minimum selling price per 100

In [None]:
# manual check

few_translations = stratified_sample.sample(n=20)

for row in few_translations.values:
    print(row[7])
    print()
    print(row[11])
    print()
    print()
    print(row[8])
    print()
    print(row[12])
    print()
    print()


A fruit importer is planning to bring in a large shipment of fresh sour cherries from the former Yugoslav Republic of Macedonia. They are unsure about the specific requirements and procedures they need to follow to ensure the import complies with all relevant regulations. What specific documentation and steps are required, and what are the potential financial implications if the import is not completed within a certain timeframe?

A fruit importer wants to import a large shipment of fresh sour cherries from the former Yugoslav Republic of Macedonia. They are unsure about the specific requirements and procedures they need to follow to ensure the import complies with all relevant regulations. What specific documentation and steps are required, and what are the potential financial consequences if the import is not completed within a certain timeframe?


Een fruitimporteur wil een grote zending verse zure kersen uit de voormalige Joegoslavische Republiek Macedonië importeren. Ze zijn onzek

In [None]:
# save the new dataframe so it can be used as input in notebook 4_BM25_application_evaluation

stratified_sample.to_csv('data/stratified_sample_with_translated_queries.csv', index=False)