<img src="./images/DLI_Header.png" style="width: 400px;">

# 4. Generating High-Quality Questions and Answers with NeMo Curator

This notebook demonstrates how to create a high-quality question-answer dataset for supervised fine-tuning (SFT) using **NVIDIA NeMo Curator**. The process involves leveraging the **Llama-3.1-nemotron-70b-reward model**, a leaderboard topping reward model supporting RLHF for better alignment with human preferences.

### Importance of an SFT Dataset

Supervised fine-tuning is critical for tailoring pre-trained large language models (LLMs) to specialized applications. While pre-trained models possess broad general knowledge, they often lack precision in specific domains. An SFT dataset provides labeled question-answer pairs that guide the model during fine-tuning, enabling it to learn domain-specific patterns and improve its accuracy and reliability.

Without such datasets, models may fail to meet user expectations or perform consistently in specialized use cases. High-quality question-answer pairs are essential for effective fine-tuning, enhancing the model's real-world applicability.

### Advantages of Synthetic Data for SFT

Using synthetic data to build an SFT dataset offers several benefits:

- **Scalability**: Enables rapid generation of diverse question-answer pairs across various topics, especially when human-curated datasets are unavailable.
- **Cost-effectiveness**: Reduces the expense of manual annotation while allowing iterative refinement to meet evolving requirements.
- **Comprehensive Coverage**: Ensures alignment with predefined subtopics and includes rare or edge-case scenarios that may not be present in existing datasets.
- **Improved Robustness**: Enhances the model's ability to handle complex or uncommon situations effectively.

By generating synthetic question-answer pairs aligned with specific subtopics, this approach facilitates the creation of tailored, efficient SFT datasets. This ensures fine-tuned models perform optimally in their intended domains while streamlining the development process.

**[4.1 NeMo Curator OpenAI Client](#4.1-NeMo-Curator-OpenAI-Client)<br>**
**[4.2 Developing a Q&A Dataset for SFT in English and Spanish](#4.2-Developing-a-Q&A-Dataset-for-SFT-in-English-and-Spanish)<br>**
**[4.3 Evaluating Question/Answer Pairs Using a Reward Model](#4.3-Evaluating-Question/Answer-Pairs-Using-a-Reward-Model)<br>**


---

## Connecting to the NVIDIA API Catalog

NeMo Curator supports connecting to [OpenAI API](https://github.com/openai/openai-python?tab=readme-ov-file#openai-python-api-library) compatible services and [NeMo Deploy](https://docs.nvidia.com/nemo-framework/user-guide/latest/deployingthenemoframeworkmodel.html#use-nemo-export-and-deploy-module-apis-to-run-inference) services.

In this notebook, we rely on the `build.nvidia.com` API endpoints. You can use this same flow with a model deployed as an NVIDIA NIM for LLMs which can be found [here](https://github.com/NVIDIA/NeMo-Curator/blob/main/docs/user-guide/syntheticdata.rst#connecting-to-an-llm-service).

Your environment already has an NVIDIA API key installed for you. For work outside of this workshop environment, please see the instructions below for how to obtain your own free NVIDIA API key.

### Obtaining Your Own NVIDIA API Key

If you would like an NVIDIA API key for your own work outside this workshop environment, you can generate one for free using the following steps:

1. Login (or sign up) through [build.nvidia.com](https://build.nvidia.com/explore/discover).
2. Click the `Get API Key` button available on the the `Llama 3.1 Nemotron 70B Reward` page, found [here](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-reward).

---

## 4.1 NeMo Curator OpenAI Client

We will begin by initializing NeMo Curator's `OpenAI Client`.

Please note that this step is identical to the process outlined in the previous notebook.

### Loading NVIDIA API Credentials

Before connecting to NVIDIA's API, we need to load the required credentials. This cell automatically checks multiple locations:

1. **Project directory** (priority 1): `./secrets.env` (in the same folder as this notebook) ✅ **Found!**
2. **Home directory** (priority 2): `~/.nvidia/secrets.env`
3. **Environment variables** (priority 3): Pre-set in some workshop environments

**Required credentials:**
- `NVIDIA_API_KEY`: Your NVIDIA API key from build.nvidia.com
- `NVIDIA_BASE_URL`: The NVIDIA API endpoint (https://integrate.api.nvidia.com/v1)

**Get your free API key:**
Visit https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-reward and click "Get API Key"


In [None]:
# Load NVIDIA API credentials from secrets file
import os
from pathlib import Path

# Path to secrets file - check multiple locations
# Priority: 1) Local project directory, 2) Home directory, 3) Environment variables
try:
    project_secrets = Path("secrets.env")
    home_secrets = Path.home() / ".nvidia" / "secrets.env"
except Exception as e:
    print(f"Warning: Path setup issue: {e}")
    project_secrets = None
    home_secrets = None

def load_secrets_from_file(filepath):
    """Load environment variables from a secrets file"""
    try:
        if not filepath or not filepath.exists():
            return False
        
        print(f"Loading secrets from {filepath}")
        with open(filepath, 'r') as f:
            for line in f:
                line = line.strip()
                if line and not line.startswith('#') and '=' in line:
                    key, value = line.split('=', 1)
                    os.environ[key.strip()] = value.strip().strip('"').strip("'")
        print("✓ NVIDIA API credentials loaded")
        return True
    except Exception as e:
        print(f"Error loading secrets: {e}")
        return False

# Try loading from different locations
loaded = False
if project_secrets and project_secrets.exists():
    loaded = load_secrets_from_file(project_secrets)
elif home_secrets and home_secrets.exists():
    loaded = load_secrets_from_file(home_secrets)
elif "NVIDIA_API_KEY" in os.environ:
    print("✓ Using NVIDIA_API_KEY from environment variables")
    loaded = True
else:
    print("⚠️  NVIDIA_API_KEY not found!")
    print("\nSearched locations:")
    print(f"  1. ./secrets.env")
    print(f"  2. ~/.nvidia/secrets.env")
    print(f"  3. Environment variables")
    print("\nPlease create a secrets.env file with:")
    print("   NVIDIA_API_KEY=nvapi-xxxxxxxxxxxxx")
    print("   NVIDIA_BASE_URL=https://integrate.api.nvidia.com/v1")

# Verify credentials are available
if "NVIDIA_API_KEY" in os.environ and "NVIDIA_BASE_URL" in os.environ:
    print(f"\n✓ API Key: {os.environ['NVIDIA_API_KEY'][:10]}...")
    print(f"✓ Base URL: {os.environ['NVIDIA_BASE_URL']}")
else:
    print("\n❌ Missing required environment variables!")
    print("   Required: NVIDIA_API_KEY, NVIDIA_BASE_URL")


Loading secrets from secrets.env
✓ NVIDIA API credentials loaded

✓ API Key: nvapi-vMsg...
✓ Base URL: https://integrate.api.nvidia.com/v1


In [None]:
import os

from nemo_curator import OpenAIClient
from nemo_curator.synthetic import NemotronGenerator
from openai import OpenAI

# Initialize OpenAI's client with the NVIDIA API endpoint and the API key.
openai_client = OpenAI(
    # Outside this workshop environment you would set NVIDIA_BASE_URL to "https://integrate.api.nvidia.com/v1".
    base_url=os.environ["NVIDIA_BASE_URL"],
    api_key=os.environ["NVIDIA_API_KEY"],
)

# Initialize NeMo Curator's OpenAIClient by passing the OpenAI client instance.
# This wraps the OpenAI client to provide additional functionality specific to NeMo Curator.
curator_openai_client = OpenAIClient(openai_client)

# Create an instance of NemotronGenerator, which facilitates synthetic data generation.
generator = NemotronGenerator(curator_openai_client)

# Model used to generate syntethic data.
model = "mistralai/mistral-7b-instruct-v0.3"
model_kwargs = {
    "temperature": 0.1,
    "top_p": 0.9,
    "max_tokens": 1024,
}

## 4.2 Developing a Q&A Dataset for SFT in English and Spanish

In this section, we will use the subtopics generated earlier to create a supervised fine-tuning (SFT) dataset of questions and answers.

These subtopics provide a structured foundation for generating diverse and relevant queries, ensuring the dataset comprehensively covers the target domain. 

To create the SFT dataset, we will follow a systematic process to ensure the quality and relevance of the generated question-and-answer pairs. The steps include:  

- **Generating a List of Questions**  
- **Revising Questions to Add Detail**
- **Generating a List of Answers**
- **Evaluating Question/Answer Pairs**

By following these steps, we aim to produce a robust and well-structured SFT dataset that enhances the fine-tuning process and improves the model's performance in its target domain.

### 4.2.1 Generating a List of Questions in English and Spanish

Using the subtopics as a foundation, we will create an initial set of questions that broadly address the key aspects of each subtopic. To accomplish this, we use the `generate_questions_from_topic` function, which generates a list of diverse and relevant questions for a given topic by leveraging a language model. This method ensures that the generated questions are aligned with the intended domain and provide a solid starting point for further refinement.

The `generate_questions_from_topic` function relies on the `generate_open_qa_from_topic` method from the `generator` object. Once the model generates a response, it converts the output into a structured list format using `convert_response_to_yaml_list`.

For English-language topics, we will use the following default prompt template:

```python
"Can you generate {n_openlines} questions or requests related to {topic}? The questions and requests should be as diverse possible. Your answer should be a list."
```

This prompt instructs the model to generate a specified number of diverse questions or requests related to the given topic and ensures that the output is formatted as a list. By focusing on diversity, this approach captures various perspectives and nuances within each subtopic.

This method ensures that even if errors occur during generation, retries are attempted to maximize success. The resulting list of questions provides a diverse and comprehensive foundation for subsequent steps in creating a supervised fine-tuning (SFT) dataset.

By using this approach, we can efficiently generate high-quality questions tailored to each subtopic, ensuring that our dataset captures the breadth and depth necessary for fine-tuning large language models effectively.

In [4]:
from typing import List

from nemo_curator.synthetic.error import YamlConversionError
from nemo_curator.synthetic.prompts import DEFAULT_OPEN_QA_FROM_TOPICS_PROMPT_TEMPLATE


def generate_questions_from_topic(
    generator: NemotronGenerator,
    model: str,
    model_kwargs: dict,
    topic: str,
    n_questions: int,
    prompt_template: str = DEFAULT_OPEN_QA_FROM_TOPICS_PROMPT_TEMPLATE,
    n_retries: int = 5,
) -> List[str]:
    """
    Generate a list of questions based on a given topic using a language model.

    Args:
        generator (NemotronGenerator): An instance of the `NemotronGenerator` class responsible for
            generating and processing LLM responses.
        model (str): The name or identifier of the language model to use.
        model_kwargs (dict): Configuration parameters for the language model.
        topic (str): The topic from which questions will be generated.
        n_questions (int): The number of questions to generate.
        prompt_template (str, optional): Template for the prompt sent to the language model. Defaults to `DEFAULT_OPEN_QA_FROM_TOPICS_PROMPT_TEMPLATE`.
        n_retries (int, optional): Number of retries in case of errors during YAML conversion. Defaults to 5.

    Returns:
        List[str]: A list of generated questions as strings.

    Raises:
        YamlConversionError: If the response cannot be converted to a YAML list after all retries.
    """

    # Initialize an empty list to store the generated questions.
    questions = []

    # Attempt to generate questions up to `n_retries` times in case of errors.
    for _ in range(n_retries):
        try:
            # Generate Open QA responses from the specified topic using the language model.
            llm_response = generator.generate_open_qa_from_topic(
                model=model,
                model_kwargs=model_kwargs,
                topic=topic,
                n_openlines=n_questions,
                prompt_template=prompt_template,
            )

            # Convert the response from the language model into a YAML list format.
            questions = generator.convert_response_to_yaml_list(
                llm_response=llm_response[0], model=model
            )

            # Exit the retry loop if successful.
            break

        except YamlConversionError as e:
            # Print an error message and retry if YAML conversion fails.
            print(f"Hit: {e}, Retrying...")

    # Return the generated questions (empty if all retries fail).
    return questions


# Define the topic for question generation.
topic = "Agroecology and Biodiversity Conservation"

# Number of questions to generate
n_questions = 4

# Generate a list of questions in English based on the given topic.
questions_english = generate_questions_from_topic(
    generator=generator,
    model=model,
    model_kwargs=model_kwargs,
    topic=topic,
    n_questions=n_questions,
    prompt_template=DEFAULT_OPEN_QA_FROM_TOPICS_PROMPT_TEMPLATE,
)

# Output the generated English questions.
questions_english

['Can you explain the role of agroecology in promoting sustainable agriculture and enhancing biodiversity conservation?',
 'Provide examples of agroecological practices that effectively support biodiversity conservation and increase agricultural productivity.',
 'How does the integration of agroecology and biodiversity conservation contribute to food security and resilience in farming systems?',
 'Discuss the challenges faced in implementing agroecological approaches for biodiversity conservation and suggest potential solutions to overcome these obstacles.']

Let’s replicate the process in Spanish. To do this, we simply need to modify the prompt accordingly:

In [5]:
# Define a prompt template in Spanish for generating macro topics
questions_prompt_template_spanish = (
    "¿Puedes generar {n_openlines} preguntas o solicitudes relacionadas con {topic}? "
    "Las preguntas y solicitudes deben ser lo más diversas posible. Tu respuesta debe ser una lista."
    "Responde usando sólo el idioma Español."
)

# Define the topic for question generation.
topic = "Agroecología y conservación de la biodiversidad"

# Generate a list of questions in Spanish based on the given topic.
questions_spanish = generate_questions_from_topic(
    generator=generator,
    model=model,
    model_kwargs=model_kwargs,
    topic=topic,
    n_questions=n_questions,
    prompt_template=questions_prompt_template_spanish,
)

# Output the generated English questions.
questions_spanish

['¿Cómo se define la agroecología y cuáles son sus principios clave?',
 '¿Cómo se relaciona la agroecología con la conservación de la biodiversidad y la sostenibilidad de los sistemas agrícolas?',
 '¿Qué son los beneficios ambientales, sociales y económicos de la agroecología y cómo se pueden medir?',
 '¿Cómo se pueden implementar estrategias de agroecología para mejorar la resiliencia de los sistemas agrícolas frente a los cambios climáticos y la disminución de la biodiversidad?']

### 4.2.2 Revising Questions to Add Detail  

The generated questions will be refined to ensure they are specific, clear, and detailed enough to elicit precise and informative answers. To achieve this, we will use the `revise_questions` function, which iteratively processes a list of questions and revises them using a language model. The goal is to enhance each question by adding context, improving clarity, or rephrasing it into a more elaborate or diverse format.

The `revise_questions` function utilizes the `revise_open_qa` method from the `generator` object. For each question in the input list, it attempts to generate a revised version by applying the specified prompt template.

For English-language questions, we will use the following default prompt template:

```python
"Question: {openline}\n\nCan you revise the question above to include more contexts or details? The revised questions can be any of the follows:\n1. Adding some context to the original question. The context might state the importance of the question, explain background knowledge, or add other reasonable information.\n2. Change the questions into a different format or style, e.g., imperative statements, length requirements for the answer, etc.\n3. Elongated questions that require to elaborate on specific topic or discuss a certain point.\n4. Any other related questions or statements.\n\nThe revised question should contain two, three, or four sentences. You should generate {n_revisions} revised questions or statements in a list. Make them as diverse as possible."
```

This prompt guides the model to revise each question in various ways, such as adding context, rephrasing into different styles, or elaborating on specific aspects. The diversity of revisions ensures that each question is enriched with additional detail while maintaining relevance to its original intent.

This approach ensures that each question is refined into a more detailed and contextually rich version while handling errors gracefully through retries. By leveraging this method, we produce a high-quality set of revised questions that are better suited for generating precise and informative answers in subsequent steps.

In [6]:
from nemo_curator.synthetic.prompts import DEFAULT_REVISE_OPEN_QA_PROMPT_TEMPLATE


def revise_questions(
    generator: NemotronGenerator,
    model: str,
    model_kwargs: dict,
    questions: List[str],
    prompt_template: str = DEFAULT_REVISE_OPEN_QA_PROMPT_TEMPLATE,
    n_retries: int = 5,
) -> List[str]:
    """
    Revise a list of Open QA questions using a specified model and prompt template.

    This method takes a list of questions and uses a `NemotronGenerator` to revise each question.
    If an error occurs during the revision process (e.g., `YamlConversionError`), it retries the
    operation up to `n_retries` times for each question. Revised questions are returned as a list.

    Args:
        generator (NemotronGenerator): The generator instance used to perform question revisions.
        model (str): The name or identifier of the model to use for revision.
        model_kwargs (dict): Additional keyword arguments to pass to the model during revision.
        questions (List[str]): A list of questions to be revised.
        prompt_template (str, optional): The prompt template used by the generator. Defaults to
            `DEFAULT_REVISE_OPEN_QA_PROMPT_TEMPLATE`.
        n_retries (int, optional): The number of retry attempts for each question in case of errors.
            Defaults to 5.

    Returns:
        List[str]: A list of revised questions.

    Raises:
        YamlConversionError: If the revision fails after all retries for a given question.
    """

    # Initialize an empty list to store revised questions.
    revised_questions = []

    # Iterate through each question in the input list.
    for question in questions:
        # Attempt to revise the question up to `n_retries` times in case of errors.
        for _ in range(n_retries):
            try:
                # Use the generator's `revise_open_qa` method to revise the question.
                revised_question = generator.revise_open_qa(
                    model=model,
                    model_kwargs=model_kwargs,
                    openline=question,
                    n_revisions=1,
                    prompt_template=prompt_template,
                )[0]

                # Remove "1. " from the beginning of the revised question, if present.
                if revised_question.startswith("1. "):
                    revised_question = revised_question[3:]

                # Add the cleaned-up revised question to the results list.
                revised_questions.append(revised_question)

                # Exit retry loop upon successful revision.
                break
            except YamlConversionError as e:
                # Print an error message and retry if an exception occurs.
                print(f"Hit: {e}, Retrying...")

    # Return the list of all revised questions.
    return revised_questions


# Generate a list of revised English questions from the original list of questions.
revised_questions_english = revise_questions(
    generator=generator,
    model=model,
    model_kwargs=model_kwargs,
    questions=questions_english,
    prompt_template=DEFAULT_REVISE_OPEN_QA_PROMPT_TEMPLATE,
)

# Print first two revised questions.
print(revised_questions_english[:2])

["In the context of modern agriculture, which is often criticized for its reliance on chemical inputs and monoculture, understanding the role of agroecology is crucial. Agroecology promotes sustainable agriculture by emphasizing ecological processes, biodiversity, and local knowledge, thereby enhancing biodiversity conservation and ensuring long-term food security. How can we implement agroecological practices more effectively to transform our current agricultural systems and protect our planet's biodiversity? (Imperative statement, length requirement for the answer, and a call to action)\n\n2. Given the challenges of climate change, declining soil health, and loss of biodiversity, it is essential to explore alternative agricultural models. Agroecology, with its focus on ecological processes, biodiversity, and local knowledge, offers a promising solution for promoting sustainable agriculture and enhancing biodiversity conservation. Can you discuss the key principles of agroecology and 

Let’s replicate the process in Spanish. To do this, we simply need to modify the prompt accordingly:

In [7]:
revise_question_prompt_template_spanish = (
    "Pregunta: {openline}\n\n¿Puedes revisar la pregunta anterior para incluir más contexto o detalles? "
    "Las preguntas revisadas pueden ser cualquiera de las siguientes:\n1. Añadir algo de contexto a la pregunta original. "
    "El contexto podría indicar la importancia de la pregunta, explicar conocimientos previos o añadir otra información "
    "razonable.\n2. Cambiar la pregunta a un formato o estilo diferente, por ejemplo, declaraciones imperativas, requisitos "
    "de longitud para la respuesta, etc.\n3. Preguntas más largas que requieran elaborar sobre un tema específico o discutir "
    "un punto en particular.\n4. Cualquier otra pregunta o declaración relacionada.\n\nLa pregunta revisada debe contener dos, "
    "tres o cuatro frases. Debes generar {n_revisions} preguntas o declaraciones revisadas en una lista. Hazlas lo más diversas posible. "
    "Responde usando sólo el idioma Español."
)

# Generate a list of revised Spanish questions from the original list of questions.
revised_questions_spanish = revise_questions(
    generator=generator,
    model=model,
    model_kwargs=model_kwargs,
    questions=questions_spanish,
    prompt_template=revise_question_prompt_template_spanish,
)

# Print first two revised questions.
print(revised_questions_spanish[:2])

['¿Qué es la agroecología y qué principios clave la caracterizan?\n2. ¿En qué consiste la agroecología y cuáles son sus principios fundamentales?\n3. ¿Cuáles son los principios clave de la agroecología y cómo se define esta disciplina?\n4. ¿Qué significa la agroecología y cuáles son sus principios esenciales?\n5. ¿Cómo se define la agroecología y cuáles son sus principios básicos?\n6. ¿Qué es la agroecología y cuáles son sus principios clave para su práctica?\n7. ¿Qué es la agroecología y cuáles son sus principios esenciales para su aplicación?\n8. ¿Cuáles son los principios clave de la agroecología y cómo se relacionan con su definición?\n9. ¿Qué significa la agroecología y cuáles son sus principios esenciales para su desarrollo?\n10. ¿Cuáles son los principios clave de la agroecología y cómo se relacionan con su enfoque?', '¿Cómo la agroecología contribuye a la conservación de la biodiversidad y la sostenibilidad de los sistemas agrícolas, y cómo se relaciona con la producción alimen

### 4.2.3 Generating a List of Answers  

For each question, we will generate corresponding answers that are accurate, comprehensive, and aligned with the intended domain knowledge. To achieve this, we will use the `generate_dialogue` method to simulate a dialogue between a user and an assistant. In this simulation, the user asks a question, and the assistant provides an answer. The dialogue is restricted to only one turn to ensure simplicity and focus on generating precise question-answer pairs.

The `generate_dialogue` method takes in the revised question as the opening line (`openline`) and uses both the user and assistant models (in this case, the same model) to simulate the interaction. This ensures consistency in tone and style between the question and answer.

For English-language dialogues, we will use the default template provided below:

```python
"Here is a conversation between a user and an assistant.\n<|The Start of Assistant's Conversation with User|>\n{conversation_history}\n<|The End of Assistant's Conversation with User|>\n\nGiven the conversation above, generate a followup request or question in the tone of User. Directly give me the question without extraneous words."
```

This template establishes a clear structure for the conversation, ensuring that both the user’s input and the assistant’s response are well-defined. By restricting the dialogue to one turn, we maintain control over the generated content while ensuring that each answer directly addresses its corresponding question without unnecessary elaboration. This approach allows us to systematically create high-quality question-and-answer pairs for inclusion in our supervised fine-tuning (SFT) dataset.

In [8]:
from nemo_curator.synthetic.prompts import DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE


def generate_answers(
    user_model: str,
    user_model_kwargs: dict,
    assistant_model: str,
    assistant_model_kwargs: dict,
    questions: List[str],
    prompt_template: str = DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE,
    n_retries: int = 5,
) -> List[dict]:
    """
    Generate answers to a list of questions using a dialogue-based model interaction.

    This function simulates a dialogue between a user model and an assistant model to generate
    answers for the given questions. It retries the generation process up to `n_retries` times
    if an error occurs. The output is a list of dictionaries, each containing the original question
    and its corresponding generated answer.

    Args:
        user_model (str): The name or identifier of the user model.
        user_model_kwargs (dict): Additional keyword arguments to configure the user model.
        assistant_model (str): The name or identifier of the assistant model.
        assistant_model_kwargs (dict): Additional keyword arguments to configure the assistant model.
        questions (List[str]): A list of questions for which answers are to be generated.
        prompt_template (str, optional): The prompt template used for dialogue generation.
            Defaults to `DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE`.
        n_retries (int, optional): The number of retry attempts for each question in case of errors.
            Defaults to 5.

    Returns:
        List[dict]: A list of dictionaries, where each dictionary contains:
            - "question" (str): The original question.
            - "answer" (str): The generated answer.

    Raises:
        YamlConversionError: If the dialogue generation fails after all retries for a given question.
    """

    # Initialize an empty list to store question-answer pairs.
    questions_answers = []

    # Iterate through each question in the input list.
    for question in questions:
        # Attempt to generate an answer up to `n_retries` times in case of errors.
        for _ in range(n_retries):
            try:
                # Use the generator's `generate_dialogue` method to simulate a dialogue.
                dialogue = generator.generate_dialogue(
                    openline=question,
                    user_model=user_model,
                    user_model_kwargs=user_model_kwargs,
                    assistant_model=assistant_model,
                    assistant_model_kwargs=assistant_model_kwargs,
                    n_user_turns=1,
                    prompt_template=prompt_template,
                )

                # Check if the dialogue contains a valid response from the assistant.
                if len(dialogue) == 2 and "content" in dialogue[1]:
                    answer = dialogue[1]["content"]  # Extract the answer content.

                    # Remove "1. " from the beginning of the answer, if present.
                    if answer.startswith("1. "):
                        answer = answer[3:]

                    # Append the question-answer pair as a dictionary to the results list.
                    questions_answers.append({"question": question, "answer": answer})

                    # Exit retry loop upon successful generation.
                    break
            except YamlConversionError as e:
                # Print an error message and retry if an exception occurs.
                print(f"Hit: {e}, Retrying...")

    # Return the list of all question-answer pairs.
    return questions_answers


# Generate a list of of English question-answer pairs derived from the revised questions.
questions_answers_english = generate_answers(
    user_model=model,
    user_model_kwargs=model_kwargs,
    assistant_model=model,
    assistant_model_kwargs=model_kwargs,
    questions=revised_questions_english,
    prompt_template=DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE,
)

# Print first two question-answer pairs.
print(questions_answers_english[:2])

[{'question': "In the context of modern agriculture, which is often criticized for its reliance on chemical inputs and monoculture, understanding the role of agroecology is crucial. Agroecology promotes sustainable agriculture by emphasizing ecological processes, biodiversity, and local knowledge, thereby enhancing biodiversity conservation and ensuring long-term food security. How can we implement agroecological practices more effectively to transform our current agricultural systems and protect our planet's biodiversity? (Imperative statement, length requirement for the answer, and a call to action)\n\n2. Given the challenges of climate change, declining soil health, and loss of biodiversity, it is essential to explore alternative agricultural models. Agroecology, with its focus on ecological processes, biodiversity, and local knowledge, offers a promising solution for promoting sustainable agriculture and enhancing biodiversity conservation. Can you discuss the key principles of agr

We will replicate the process in Spanish by simply adjusting the prompt accordingly, using the same approach we followed before.

In [9]:
# Generate a list of of Spanish question-answer pairs derived from the revised questions.
questions_answers_spanish = generate_answers(
    user_model=model,
    user_model_kwargs=model_kwargs,
    assistant_model=model,
    assistant_model_kwargs=model_kwargs,
    questions=revised_questions_spanish,
    prompt_template=DIALOGUE_NORMAL_USER_TURN_PROMPT_TEMPLATE,
)

# Print first two question-answer pairs.
print(questions_answers_spanish[:2])

[{'question': '¿Qué es la agroecología y qué principios clave la caracterizan?\n2. ¿En qué consiste la agroecología y cuáles son sus principios fundamentales?\n3. ¿Cuáles son los principios clave de la agroecología y cómo se define esta disciplina?\n4. ¿Qué significa la agroecología y cuáles son sus principios esenciales?\n5. ¿Cómo se define la agroecología y cuáles son sus principios básicos?\n6. ¿Qué es la agroecología y cuáles son sus principios clave para su práctica?\n7. ¿Qué es la agroecología y cuáles son sus principios esenciales para su aplicación?\n8. ¿Cuáles son los principios clave de la agroecología y cómo se relacionan con su definición?\n9. ¿Qué significa la agroecología y cuáles son sus principios esenciales para su desarrollo?\n10. ¿Cuáles son los principios clave de la agroecología y cómo se relacionan con su enfoque?', 'answer': 'Agroecología es una disciplina científica interdisciplinaria que se ocupa del estudio y la práctica de los sistemas agroecosistémicos, con 

## 4.3 Evaluating Question/Answer Pairs Using a Reward Model
Each question-answer pair will be rewarded based on correctness, clarity, and alignment with the subtopic. This step ensures that the dataset maintains high-quality standards and is suitable for fine-tuning the model effectively.  

We can use the same client to query NVIDIA's [leaderboard](https://huggingface.co/spaces/allenai/reward-bench) Reward Model - [llama-3.1-nemotron-70b-reward](https://build.nvidia.com/nvidia/llama-3_1-nemotron-70b-reward).

The model expects a conversation between a User and an Assistant, following the pattern:

    messages = [
        {
            "role": "user", 
            "content": "User input"
        },
        {
            "role": "assistant",
            "content": "Assistant output"
        }
    ]

The Llama-3.1-Nemotron-70B-Reward model provides a single floating-point reward score that reflects the overall quality of the assistant's response. This score evaluates attributes such as helpfulness, factual correctness, coherence, and alignment with the prompt. Scores are typically negative, with higher (less negative) values indicating better-quality responses for the same prompt. Scores are not directly comparable across different prompts and can vary widely, sometimes reaching values like -60 or lower for poor responses.

The score can be used to:
- **Rank responses**: Compare multiple responses to the same prompt, with higher scores indicating better quality.
- **Filter responses**: Set a threshold to exclude low-quality responses, adjusting based on your use case.
- **Guide optimization**: Use in reinforcement learning from human feedback (RLHF) to improve model alignment.

For general use, the raw floating-point score is recommended for precise evaluation.

Here is an example demonstrating how to use that model:

In [10]:
messages = [
    {"role": "user", "content": "I am going to Paris, what should I see?"},
    {
        "role": "assistant",
        "content": "Ah, Paris, the City of Light! There are so many amazing things to see and do in this beautiful city...",
    },
]


response = openai_client.chat.completions.create(
    model="nvidia/llama-3.1-nemotron-70b-reward",
    messages=messages,
)

reward = response.choices[0].message.content
print(reward)

reward:-18.625


We can use this following helper function to make our work more concise.

In [11]:
def get_reward(messages: List) -> str:
    response = openai_client.chat.completions.create(
        model="nvidia/llama-3.1-nemotron-70b-reward",
        messages=messages,
    )

    reward = response.choices[0].message.content
    return float(reward.split(":")[1])

In [12]:
get_reward(messages)

-18.625

In [13]:
def compose_messages(user_content: str, assistant_content: str) -> List[str]:
    return [
        {"role": "user", "content": user_content},
        {"role": "assistant", "content": assistant_content},
    ]


# Invoke compose_messages passing the problem and solution
messages = compose_messages(
    user_content=questions_answers_english[0]["question"],
    assistant_content=questions_answers_english[0]["answer"],
)

messages

[{'role': 'user',
  'content': "In the context of modern agriculture, which is often criticized for its reliance on chemical inputs and monoculture, understanding the role of agroecology is crucial. Agroecology promotes sustainable agriculture by emphasizing ecological processes, biodiversity, and local knowledge, thereby enhancing biodiversity conservation and ensuring long-term food security. How can we implement agroecological practices more effectively to transform our current agricultural systems and protect our planet's biodiversity? (Imperative statement, length requirement for the answer, and a call to action)\n\n2. Given the challenges of climate change, declining soil health, and loss of biodiversity, it is essential to explore alternative agricultural models. Agroecology, with its focus on ecological processes, biodiversity, and local knowledge, offers a promising solution for promoting sustainable agriculture and enhancing biodiversity conservation. Can you discuss the key 

In the code snippet below, the `compose_messages` function creates a list of dictionaries representing a user-assistant message exchange. Each dictionary contains two keys: `"role"`, which specifies whether the message is from the user or assistant, and `"content"`, which holds the respective message content. The function accepts two arguments: `user_content` for the user's input and `assistant_content` for the assistant's response.

To use this function, the example demonstrates calling `compose_messages` with specific question-answer pairs from the `questions_answers_english` list. `questions_answers_english["question"]` provides the user's question, and `questions_answers_english["answer"]` supplies the assistant's response.

The resulting `message` variable contains a structured list of dictionaries that can be used as input to our reward model.

Let's assess the quality of the provided question and answer pair:

In [14]:
reward = get_reward(messages)
print(reward)

-16.875


To determine whether a question/answer pair is valid for our SFT dataset, we will evaluate its reward score and set a minimum threshold of -20.0 to retain the pair. 

We will encapsulate this logic in a method named `reward_questions_answer` to streamline the evaluation process and ensure consistency across the dataset.

In [16]:
from typing import Dict, List, Tuple


def get_reward(messages: List[Dict[str, str]]) -> float:
    """Retrieve the reward score for a conversation from the Llama-3.1-Nemotron-70B-Reward model.

    Args:
        messages: A list of message dictionaries with 'role' and 'content' keys,
            representing a conversation between user and assistant.

    Returns:
        The floating-point reward score extracted from the model's response.

    Raises:
        ValueError: If the response format is invalid or the reward cannot be parsed.
    """
    response = openai_client.chat.completions.create(
        model="nvidia/llama-3.1-nemotron-70b-reward",
        messages=messages,
    )
    reward_str = response.choices[0].message.content
    try:
        return float(reward_str.split(":")[1])
    except (IndexError, ValueError) as e:
        raise ValueError(f"Failed to parse reward from response: {reward_str}") from e


def evaluate_question_answers(
    model: str,
    question_answer_pairs: List[Dict[str, str]],
    min_threshold: float = -20.0,
    max_retries: int = 5,
) -> Tuple[List[Dict[str, str]], List[Dict[str, str]]]:
    """Evaluate question-answer pairs using a reward model and classify them based on a threshold.

    This function queries the Llama-3.1-Nemotron-70B-Reward model to score the quality of each
    question-answer pair. Pairs with a reward score meeting or exceeding the specified threshold
    are retained; others are discarded. Retries are attempted for failed evaluations.

    Args:
        model: The identifier of the reward model to use.
        question_answer_pairs: A list of dictionaries, each containing 'question' and 'answer' keys.
        min_threshold: The minimum reward score required to retain a pair. Defaults to -20.0.
        max_retries: The maximum number of retry attempts for failed evaluations. Defaults to 5.

    Returns:
        A tuple of two lists:
        - List of retained question-answer pairs (score >= min_threshold).
        - List of discarded question-answer pairs (score > min_threshold).

    Raises:
        RuntimeError: If evaluation fails for a pair after all retries are exhausted.
    """
    retained_pairs: List[Dict[str, str]] = []
    discarded_pairs: List[Dict[str, str]] = []

    for index, pair in enumerate(question_answer_pairs):
        for attempt in range(max_retries):
            try:
                # Prepare messages for the reward model
                messages = compose_messages(
                    user_content=pair["question"],
                    assistant_content=pair["answer"],
                )

                # Query the reward model
                reward = get_reward(messages)

                # Classify based on threshold
                if reward >= min_threshold:
                    print(
                        f"Reward: {reward:.2f}. Question-answer pair {index + 1} retained."
                    )
                    retained_pairs.append(pair)
                else:
                    print(
                        f"Reward: {reward:.2f}. Question-answer pair {index + 1} discarded."
                    )
                    discarded_pairs.append(pair)

                break  # Exit retry loop on success
            except Exception as e:
                print(
                    f"Error on pair {index + 1}, attempt {attempt + 1}: {e}. Retrying..."
                )
                if attempt == max_retries - 1:
                    raise RuntimeError(
                        f"Failed to evaluate pair {index + 1} after {max_retries} attempts."
                    )

    return retained_pairs, discarded_pairs

In this case, evaluating the questions and answers in Spanish does not require modifying the prompt, as our reward model is capable of understanding Spanish.

The code would be as follows:

In [17]:
# Evaluate English question-answer pairs
retained_english_pairs, discarded_english_pairs = evaluate_question_answers(
    model=model,
    question_answer_pairs=questions_answers_english,
    min_threshold=-20.0,
)

# Print summary of results
print(
    f"Questions and Answers: {len(retained_english_pairs)} retained, "
    f"{len(discarded_english_pairs)} discarded."
)

# Print first two retained question-answer pairs
print(retained_english_pairs[:2])

Reward: -16.88. Question-answer pair 1 retained.
Reward: -20.00. Question-answer pair 2 retained.
Reward: -16.12. Question-answer pair 3 retained.
Reward: -10.25. Question-answer pair 4 retained.
Questions and Answers: 4 retained, 0 discarded.
[{'question': "In the context of modern agriculture, which is often criticized for its reliance on chemical inputs and monoculture, understanding the role of agroecology is crucial. Agroecology promotes sustainable agriculture by emphasizing ecological processes, biodiversity, and local knowledge, thereby enhancing biodiversity conservation and ensuring long-term food security. How can we implement agroecological practices more effectively to transform our current agricultural systems and protect our planet's biodiversity? (Imperative statement, length requirement for the answer, and a call to action)\n\n2. Given the challenges of climate change, declining soil health, and loss of biodiversity, it is essential to explore alternative agricultural 

---
<h2 style="color:green;">Congratulations!</h2>

In this notebook, you used Nemo Curator to generate question-and-answer lists for a specific subtopic in both English and Spanish.

In the next notebook, we will explore how to generate **Math Problems Along With Their Solutions** on a specific topic, provided in both languages.
<img src="./images/DLI_Header.png" style="width: 400px; float: right;">