# Practical Guide

This guide will help you with all the practical aspects of setting up an annotation project for training and fine-tuning LLMs using Argilla's Feedback Task Datasets. It covers everything from defining your task to collecting, organizing, and using the feedback effectively.

[Add a snapshot of the task]

In [1]:
import argilla as rg

# rg.init(
#     api_url=API_URL, api_key=API_KEY
# )

## Define the task
The Feedback Task Datasets allow to combine multiple questions of different kinds, so the first step will be to define the aim of your project and the kind of data and feedback you will need to get there.

### Format records
A record in Argilla refers to a data item that requires annotation and can consist of one or multiple fields. For example, your records can include a pair of a prompt and an output. Currently, we only support plain text fields, but we plan to introduce support for markdown and images in the future.

Take some time to explore and find data that fits the purpose of your project. If you are planning to use public data, the [Datasets page](https://huggingface.co/datasets) of the Hugging Face Hub is a good place to start.

´´´{hint}
Always check the licenses of the datasets to make sure you can legally use the dataset for your specfic use case.
´´´

Once you have a dataset, load it and inspect it to find the fields that you want to use in your Feedback dataset. A quick overview of the data will also help you formulate the right questions later.

In [17]:
from datasets import load_dataset
import pandas as pd

dataset = load_dataset('databricks/databricks-dolly-15k', split='train')
dataset

Found cached dataset json (/Users/natalia/.cache/huggingface/datasets/databricks___json/databricks--databricks-dolly-15k-6e0f9ea7eaa0ee08/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)


Dataset({
    features: ['instruction', 'context', 'response', 'category'],
    num_rows: 15011
})

In [19]:
# turn it into a pandas dataframe to get a quick overview of a few examples
df = pd.DataFrame(dataset)
df

Unnamed: 0,instruction,context,response,category
0,When did Virgin Australia start operating?,"Virgin Australia, the trading name of Virgin A...",Virgin Australia commenced services on 31 Augu...,closed_qa
1,Which is a species of fish? Tope or Rope,,Tope,classification
2,Why can camels survive for long without water?,,Camels use the fat in their humps to keep them...,open_qa
3,"Alice's parents have three daughters: Amy, Jes...",,The name of the third daughter is Alice,open_qa
4,When was Tomoaki Komorida born?,Komorida was born in Kumamoto Prefecture on Ju...,"Tomoaki Komorida was born on July 10,1981.",closed_qa
...,...,...,...,...
15006,How do i accept the change,,Embrace the change and see the difference,brainstorming
15007,What is a laser and who created it?,A laser is a device that emits light through a...,A laser is a device that emits light from an e...,summarization
15008,What is the difference between a road bike and...,,Road bikes are built to be ridden on asphalt a...,open_qa
15009,How does GIS help in the real estate investmen...,,"Real estate investors depend on precise, accur...",general_qa


The next step would be to create records following Argilla's Feedback Record format [link to Python reference].

The name of the fields will need to match the fields set up in the dataset configuration (below).

In [20]:
# list of records
records = []
for r in dataset:
    if r["category"] == "open_qa":
        records.append(
            rg.FeedbackRecord(
                fields = {
                    "question": r["instruction"],
                    "answer": r["response"]
                }
            )
        )


NameError: name 'rg' is not defined

### Define questions
To collect feedback for your dataset, you need to formulate questions. The Feedback Task currently supports the following types of questions:

- Rating: These questions require annotators to select one option from a list of integer values. This type is useful for collecting numerical scores.
- Text: These questions offer annotators a free-text area where they can enter any text. This type is useful for collecting natural language data, such as corrections or explanations.

```{note}
We have plans to expand the range of supported question types in future releases of the Feedback Task.
```

You can define your questions using the Python SDK and set up the following configurations:
- `name`: A shortname for the question.
- `title`: The text displayed in the UI.
- `description`: The text to be displayed in the question tooltip in the UI. You can use it to give more context or information to annotators.
- `required`: Set your question as required or optional. Annotators must answer all required questions to submit a response, but they have the choice to answer optional questions or not.
- `values`: In a RatingQuestion, these are the rating options represented as a list of integer values.

```{note}
The order of the questions in the UI follows the order in which these are added to the dataset in the Python SDK.
```

In [None]:
# list of questions to display in the feedback form
questions =[
    rg.RatingQuestion(
        name="rating", 
        title="Rate the quality of the response:", 
        description="1 = very bad - 5= very good",
        required=True,
        values=[1,2,3,4,5]
    ),
    rg.TextQuestion(
        name="corrected-text",
        title="Provide a correction to the response:",
        description="",
        required=False
    )
]


### Write guidelines
Once you have decided on the data to show and the questions to ask, it's important to provide clear guidelines to the annotators. These guidelines help them understand the task and answer the questions consistently. You can provide guidelines in two ways:
- In the dataset guidelines: this is added as an argument when you create your dataset in the Python SDK (see below). It will appear in the dataset settings in the UI.
- As question descriptions: these are added as an argument when you create questions in the Python SDK (see above). This text will appear in a tooltip next to the question in the UI.

It is good practice to use at least the dataset guidelines, if not both methods. In the guidelines, you can include a description of the project, details on how to answer each question with examples, instructions on when to discard a record, etc. Question descriptions should be short and provide context to a specific question. They can be a summary of the guidelines to that question, but often times that is not sufficient to align the whole annotation team.

## Set up your annotation team
Depending on the nature of your project and the size of your annotation team, you may want to have control over annotation overlap i.e., having multiple annotations for a single record. You will need to decide on this before pushing your dataset to Argilla, as this has implications on how your dataset is set up. Let's explore a few overlapping options.
### Full overlap
The Feedback Task supports having multiple annotations for your records. This means that all users with access to the dataset can give responses to all the records in the dataset. Learn more about managing user access to workspaces here [link!].
### Zero overlap
If you only want 1 annotation per record, we recommend that you split your records into batches and assign these to a single annotator. Then, you can create several datasets, one in each annotator's personal workspace, and add the records assigned to each of them.
### Controlled overlap
This option is optimal when you want to have annotation overlap, but up to a certain number and not with the whole team. This can be because you want your team to be more efficient or perhaps to calculate the agreement between pairs of annotators. In this case, you also need to create several datasets and push them to the annotators' personal workspaces but each record will appear in multiple datasets. 



## Create your dataset
Now we are ready to create our dataset and push it to Argilla.

- `name`: The name of the dataset.
- `workspace`: The workspace where the dataset will be created.
- `guidelines`: A set of guidelines for the annotators. These will appear in the dataset settings in the UI.
- `fields`: The list of fields to show in the record card. The order in which the fields will appear matches the order of this list.
- `questions`: The list of questions to show in the form.

In [None]:
# configure the Feedback dataset
dataset = rg.create_feedback_dataset(
    name="my_dataset",
    workspace="my_workspace",
    guidelines="You will see a collection of question and answer records.",
    # list of fields to display in the records
    fields = [
        rg.TextField(name="question"),
        rg.TextField(name="answer")
    ],
    questions=questions
)

# add the records to the dataset
dataset.add_records(records)

# push the dataset to Argilla
dataset.push_to_argilla()

## Annotating in the Feedback Task
Once you open the dataset, you will see by default the records with `Pending` responses, i.e. records that still don't have a response, in a single-record view. On the left, you can find the record to annotate and on the right the form with all the questions to answer. 

We highly recommend that you read the annotation guidelines before starting the annotation. If there are any, you can find them in the dataset settings page. [describe how to get there] If any of the questions have a description, you will find an info icon next to them. Click it to read the description.

In the annotation view, you will be able to provide responses. Once all required questions have responses the `Submit` button will be enabled and you will be able to submit your response. If you prefer not to give a response for a record, you can move to the next record or discard it using the `Discard` button. 

If you need to review your submitted or discarded responses, you can select the view/queue? you need. From there, you can modify, submit or discard responses. You can also use the `Clear` button to remove the response and send the record back to the `Pending` queue.

You can track your progress and the number of `Pending`, `Submitted` and `Discarded` responses by clicking the `Progress` icon in the sidebar.

You can speed up the process by using shortcuts:
|Action|Keys|
|------|----|
|Clear|&#8679; `Shift` + &blank; `Space`|
|Discard|&#x232B; `Backspace`|
|Discard (from text area)|&#8679; `Shift` + &#x232B; `Backspace`|
|Submit|&crarr; `Enter`|
|Submit (from text area)|&#8679; `Shift` + &crarr; `Enter`|
|Go to previous page|&larr; `Left arrow`|
|Go to next page|&rarr; `Right arrow`|



## Collect responses
- Using the Python client to collect the responses to the dataset
- Unifying responses (?) -> Técnicas de IAA.
    - Majority vote, average... 
    - How to calculate IAA for text fields? bleu rouge? Rating of the proposed texts. 
    Make a dataset to collect human text, then rate the human text and use it for a rating exercise to get an annotator score or clean the dataset.
- Export / publish the dataset.



## Training?