# Create a Feedback Dataset

The `FeedbackTask` datasets allow you to combine multiple questions of different kinds, so the first step will be to define the aim of your project and the kind of data and feedback you will need to get there.

Using the Python client you'll be able to easily create a `FeedbackTask` dataset, either locally with `rg.FeedbackDataset` or directly pushing it to Argilla with `rg.create_feedback_dataset()`. In this guide, we'll show you how to use both, once the scope of the project is defined, which implies knowing the `fields`, `questions`, and guidelines (if applicable) that you'll need.

The `rg.create_feedback_dataset()` function expects the following arguments:

- `name`: The name of the dataset in Argilla.
- `workspace` (optional): The name of the workspace where the dataset will be created. If you don't provide one, it will be placed in the default workspace attached to the API key used in `rg.init()`.
- `guidelines` (optional): A set of guidelines for the annotators. These will appear in the dataset settings in the UI.
- `fields`: The list of fields to show in the record card. The order in which the fields will appear in the UI matches the order of this list.
- `questions`: The list of questions to show in the form. The order in which the questions will appear in the UI matches the order of this list.

Or, if you prefer to create the dataset locally, you can use `rg.FeedbackDataset`:

- `guidelines` (optional): A set of guidelines for the annotators. These will appear in the dataset settings in the UI.
- `fields`: The list of fields to show in the record card. The order in which the fields will appear in the UI matches the order of this list.
- `questions`: The list of questions to show in the form. The order in which the questions will appear in the UI matches the order of this list.

A quick example is presented below, where we create a `FeedbackDataset` locally to assess the quality of a reponse in a question-answering task. The `FeedbackDataset` contains two fields, question and answer, and two questions to measure the quality of the answer and to correct it, if needed; and also a guideline for the annotators.

In [None]:
import argilla as rg

dataset = rg.FeedbackDataset(
    guidelines="Please, read the question carefully and try to answer it as accurately as possible.",
    fields=[
        rg.TextField(name="question"),
        rg.TextField(name="answer"),
    ],
    questions=[
        rg.RatingQuestion(
            name="answer_quality",
            description="How would you rate the quality of the answer?",
            values=[1, 2, 3, 4, 5],
        ),
        rg.TextQuestion(
            name="answer_correction",
            description="If you think the answer is not accurate, please, correct it.",
            required=False,
        ),
    ]
)

Anyway, everything will be explained in detail in the following subsections.

## Format records
A record in Argilla refers to a data item that requires annotation and can consist of one or multiple fields. For example, your records can include a pair of a prompt and an output. Currently, we only support plain text fields, but we plan to introduce support for markdown and images in the future.

Take some time to explore and find data that fits the purpose of your project. If you are planning to use public data, the [Datasets page](https://huggingface.co/datasets) of the Hugging Face Hub is a good place to start.

<div class="admonition hint">
Always check the licenses of the datasets to make sure you can legally use the dataset for your specfic use case.
</div>

Once you have a dataset, load it and inspect it to find the fields that you want to use in your Feedback dataset. A quick overview of the data will also help you formulate the right questions later.

In [None]:
from datasets import load_dataset

dataset = load_dataset('databricks/databricks-dolly-15k', split='train')
dataset

In [None]:
import pandas as pd

# turn it into a pandas dataframe to get a quick overview of a few examples
df = pd.DataFrame(dataset)
df

The next step is to create records following Argilla's Feedback Record format [link to Python reference].

The name of the fields will need to match the fields set up in the dataset configuration (see [Create and import a dataset](../practical_guides/create_and_import_dataset.ipynb)).

In [None]:
# as we create the records, we can rename the fields and optionally filter the original dataset
records = [rg.FeedbackRecord(fields={"question": record["instruction"], "answer": record["response"]}) for record in dataset if record["category"]=="open_qa"]

## Define `fields`

Before adding records to the `FeedbackDataset` you need to define the fields of the record, which is the data to annotate. So on, the provided fields will be a list of `FieldSchema` objects that define the configuration of that field. For the moment, as of Argilla 1.8.0, just the `TextField` is supported, which as its name suggests, is a plain text field.

We have plans to expand the range of supported field types in future releases of Argilla.

You can define the fields using the Python SDK providing the following arguments:

- `name`: The name of the field, as it will be seen internally.
- `title` (optional): The name of the field, as it will be displayed in the UI. Defaults to the `name` value, but capitalized.
- `required` (optional): Whether the field is required or not. Defaults to `True`.

Note that at least one field must be required, which implies `required=True` for at least one of the fields.

In [None]:
fields = [
    rg.TextField(name="question", required=True),
    rg.TextField(name="answer", required=True),
]

## Define `questions`

To collect feedback for your dataset, you need to formulate questions. The Feedback Task currently supports the following types of questions:

- Rating: These questions require annotators to select one option from a list of integer values. This type is useful for collecting numerical scores.
- Text: These questions offer annotators a free-text area where they can enter any text. This type is useful for collecting natural language data, such as corrections or explanations.

<div class="admonition note">
We have plans to expand the range of supported question types in future releases of the Feedback Task.
</div>

You can define your questions using the Python SDK and set up the following configurations:

- `name`: The name of the question, as it will be seen internally.
- `title` (optional): The name of the question, as it will be displayed in the UI. Defaults to the `name` value, but capitalized.
- `required` (optional): Whether the question is required or not. Defaults to `True`.
- `description` (optional): The text to be displayed in the question tooltip in the UI. You can use it to give more context or information to annotators.

Additionally, if the question is a `RatingQuestion`, you'll also need to specify:
- `values`: The rating options to answer the `RatingQuestion`. It must be a list of integer values, but there's no need of those values to be neither positive, not sequential, it can be any list of unique integers.

Note that at least one question must be required, which implies `required=True` for at least one of the questions.

<div class="admonition note">
The order of the questions in the UI follows the order in which these are added to the dataset in the Python SDK.
</div>

In [None]:
# list of questions to display in the feedback form
questions =[
    rg.RatingQuestion(
        name="rating", 
        title="Rate the quality of the response:", 
        description="1 = very bad - 5= very good",
        required=True,
        values=[1, 2, 3, 4, 5]
    ),
    rg.TextQuestion(
        name="corrected-text",
        title="Provide a correction to the response:",
        required=False
    )
]

## Define `guidelines`

Once you have decided on the data to show and the questions to ask, it's important to provide clear guidelines to the annotators. These guidelines help them understand the task and answer the questions consistently. You can provide guidelines in two ways:

- In the dataset guidelines: this is added as an argument when you create your dataset in the Python SDK (see below). It will appear in the dataset settings in the UI.
- As question descriptions: these are added as an argument when you create questions in the Python SDK (see above). This text will appear in a tooltip next to the question in the UI.

It is good practice to use at least the dataset guidelines, if not both methods. In the guidelines, you can include a description of the project, details on how to answer each question with examples, instructions on when to discard a record, etc. Question descriptions should be short and provide context to a specific question. They can be a summary of the guidelines to that question, but often times that is not sufficient to align the whole annotation team.

## Create `FeedbackDataset`

* TODO: show both alternatives `rg.FeedbackDataset` with the fields, questions, and guidelines defined above, and `rg.create_feedback_dataset()`
* TODO: link to `import_export_dataset.ipynb` to see how to upload it to Argilla or HuggingFace Hub.
* TODO: add/format records once the dataset has been created not before