# Create a Feedback Dataset

The Feedback Task datasets allow you to combine multiple questions of different kinds, so the first step will be to define the aim of your project and the kind of data and feedback you will need to get there. With this information, you can start configuring a dataset and formatting records using the Python client.

This guide will walk you through all the elements you will need to configure to create a `FeedbackDataset` and add records to it.

<div class="alert alert-info">

Note

To follow the steps in this guide, you will first need to connect to Argilla. Check how to do so in our [cheatsheet](../../../getting_started/cheatsheet.md#connect-to-argilla).
</div>


## Define record `fields`
A record in Argilla refers to a data item that requires annotation and can consist of one or multiple fields i.e., the pieces of information that will be shown to the user in the UI in order to complete the annotation task. This can be, for example, a prompt and output pair in the case of instruction datasets.

As part of the `FeedbackDataset` configuration, you will need to specify the list of fields to show in the record card. As of Argilla 1.8.0, we only support one type of field, `TextField`, which is a plain text field. We have plans to expand the range of supported field types in future releases of Argilla.

You can define the fields using the Python SDK providing the following arguments:

- `name`: The name of the field, as it will be seen internally.
- `title` (optional): The name of the field, as it will be displayed in the UI. Defaults to the `name` value, but capitalized.
- `required` (optional): Whether the field is required or not. Defaults to `True`. Note that at least one field must be required.

In [None]:
fields = [
    rg.TextField(name="question", required=True),
    rg.TextField(name="answer", required=True),
]

<div class="alert alert-info">

Note

The order of the fields in the UI follows the order in which these are added to the `fields` attribute in the Python SDK.
</div>

## Define `questions`

To collect feedback for your dataset, you need to formulate questions. The Feedback Task currently supports the following types of questions:

- `RatingQuestion`: These questions require annotators to select one option from a list of integer values. This type is useful for collecting numerical scores.
- `TextQuestion`: These questions offer annotators a free-text area where they can enter any text. This type is useful for collecting natural language data, such as corrections or explanations.

We have plans to expand the range of supported question types in future releases of the Feedback Task.

You can define your questions using the Python SDK and set up the following configurations:

- `name`: The name of the question, as it will be seen internally.
- `title` (optional): The name of the question, as it will be displayed in the UI. Defaults to the `name` value, but capitalized.
- `required` (optional): Whether the question is required or not. Defaults to `True`. Note that at least one question must be required.
- `description` (optional): The text to be displayed in the question tooltip in the UI. You can use it to give more context or information to annotators.

Additionally, if the question is a `RatingQuestion`, you'll also need to specify:

- `values`: The rating options to answer the `RatingQuestion`. It can be any list of unique integers. It doesn't matter whether these are positive, negative, sequential or not.

In [None]:
# list of questions to display in the feedback form
questions =[
    rg.RatingQuestion(
        name="rating", 
        title="Rate the quality of the response:", 
        description="1 = very bad - 5= very good",
        required=True,
        values=[1, 2, 3, 4, 5]
    ),
    rg.TextQuestion(
        name="corrected-text",
        title="Provide a correction to the response:",
        required=False
    )
]

<div class="alert alert-info">

Note

The order of the questions in the UI follows the order in which these are added to the `questions` atrribute in the Python SDK.
</div>

## Define `guidelines`

Once you have decided on the data to show and the questions to ask, it's important to provide clear guidelines to the annotators. These guidelines help them understand the task and answer the questions consistently. You can provide guidelines in two ways:

- In the dataset guidelines: this is added as an argument when you create your dataset in the Python SDK (see [below](#configure-the-dataset)). It will appear in the dataset settings in the UI.
- As question descriptions: these are added as an argument when you create questions in the Python SDK (see [above](#define-questions)). This text will appear in a tooltip next to the question in the UI.

It is good practice to use at least the dataset guidelines, if not both methods. Question descriptions should be short and provide context to a specific question. They can be a summary of the guidelines to that question, but often times that is not sufficient to align the whole annotation team. In the guidelines, you can include a description of the project, details on how to answer each question with examples, instructions on when to discard a record, etc. 

## Configure the dataset

Once the scope of the project is defined, which implies knowing the `fields`, `questions` and `guidelines` (if applicable), you can proceed to create the `FeedbackDataset`. To do so, you will need to define the following arguments:

- `fields`: The list of fields to show in the record card. The order in which the fields will appear in the UI matches the order of this list.
- `questions`: The list of questions to show in the form. The order in which the questions will appear in the UI matches the order of this list.
- `guidelines` (optional): A set of guidelines for the annotators. These will appear in the dataset settings in the UI.

If you haven't done so already, check the sections above to learn about each of them.

Below you can find a quick example where we create locally a `FeedbackDataset` to assess the quality of a reponse in a question-answering task. The `FeedbackDataset` contains two fields, question and answer, and two questions to measure the quality of the answer and to correct it, if needed.

In [None]:
import argilla as rg

dataset = rg.FeedbackDataset(
    guidelines="Please, read the question carefully and try to answer it as accurately as possible.",
    fields=[
        rg.TextField(name="question"),
        rg.TextField(name="answer"),
    ],
    questions=[
        rg.RatingQuestion(
            name="answer_quality",
            description="How would you rate the quality of the answer?",
            values=[1, 2, 3, 4, 5],
        ),
        rg.TextQuestion(
            name="answer_correction",
            description="If you think the answer is not accurate, please, correct it.",
            required=False,
        ),
    ]
)

## Add records

At this point, we just need to add records to our `FeedbackDataset`. Take some time to explore and find data that fits the purpose of your project. If you are planning to use public data, the [Datasets page](https://huggingface.co/datasets) of the Hugging Face Hub is a good place to start.

<div class="alert alert-info">

Tip

If you are using a public dataset, remember to always check the license to make sure you can legally employ it for your specfic use case.
</div>

In [None]:
from datasets import load_dataset

# load and inspect a dataset from the Hugging Face Hub
hf_dataset = load_dataset('databricks/databricks-dolly-15k', split='train')
df = hf_dataset.to_pandas()
df

<div class="alert alert-info">

Hint

Take some time to inspect the data before adding it to the dataset in case this triggers changes in the `questions` or `fields`.
</div>

The next step is to create records following Argilla's `FeedbackRecord` format. These are the attributes of a `FeedbackRecord`:

- `fields`: A dictionary with the name (key) and content (value) of each of the fields in the record. These will need to match the fields set up in the dataset configuration (see [Define record fields](#define-record-fields)).
- `external_id` (optional): An ID of the record defined by the user. If there is no external ID, this will be `None`.
- `responses` (optional): A list of all responses to a record. There is no need to configure this when creating a record, it will be filled automatically with the responses collected from the Argilla UI.

In [None]:
# create a single Feedback Record
record = rg.FeedbackRecord(
    fields={
        "question": "Why can camels survive long without water?",
        "answer": "Camels use the fat in their humps to keep them filled with energy and hydration for long periods of time."
    },
    external_id=None
)

As an example, here is how you can transform a whole dataset into records at once, renaming the fields and optionally filtering the original dataset:

In [None]:
records = [rg.FeedbackRecord(fields={"question": record["instruction"], "answer": record["response"]}) for record in hf_dataset if record["category"]=="open_qa"]

Now, we simply add our records to the dataset we configured [above](#configure-the-dataset):

In [None]:
#add records to the dataset
dataset.add_records(records)