# How-to Guide

This guide will help you with all the practical aspects of setting up an annotation project for training and fine-tuning LLMs using Argilla's Feedback Task Datasets. It covers everything from defining your task to collecting, organizing, and using the feedback effectively.

![Feedback dataset snapshot](../_static/llms/snapshot-feedback-demo.png)

In [1]:
import argilla as rg
import os

rg.init(
    api_url=os.environ.get("ARGILLA_API_URL_DEV"),
    api_key=os.environ.get("ARGILLA_API_KEY")
)

## Define the task
The Feedback Task Datasets allow to combine multiple questions of different kinds, so the first step will be to define the aim of your project and the kind of data and feedback you will need to get there.

### Format records
A record in Argilla refers to a data item that requires annotation and can consist of one or multiple fields. For example, your records can include a pair of a prompt and an output. Currently, we only support plain text fields, but we plan to introduce support for markdown and images in the future.

Take some time to explore and find data that fits the purpose of your project. If you are planning to use public data, the [Datasets page](https://huggingface.co/datasets) of the Hugging Face Hub is a good place to start.

´´´{hint}
Always check the licenses of the datasets to make sure you can legally use the dataset for your specfic use case.
´´´

Once you have a dataset, load it and inspect it to find the fields that you want to use in your Feedback dataset. A quick overview of the data will also help you formulate the right questions later.

In [2]:
from datasets import load_dataset

dataset = load_dataset('databricks/databricks-dolly-15k', split='train')
dataset

  from .autonotebook import tqdm as notebook_tqdm


Dataset({
    features: ['instruction', 'context', 'response', 'category'],
    num_rows: 15011
})

In [None]:
import pandas as pd

# turn it into a pandas dataframe to get a quick overview of a few examples
df = pd.DataFrame(dataset)
df

The next step is to create records following Argilla's Feedback Record format [link to Python reference].

The name of the fields will need to match the fields set up in the dataset configuration (see [below](#create-your-dataset)).

In [3]:
# list of records
records = []
for r in dataset:
    if r["category"] == "open_qa":
        records.append(
            rg.FeedbackRecord(
                fields = {
                    "question": r["instruction"],
                    "answer": r["response"]
                }
            )
        )


### Define questions
To collect feedback for your dataset, you need to formulate questions. The Feedback Task currently supports the following types of questions:

- Rating: These questions require annotators to select one option from a list of integer values. This type is useful for collecting numerical scores.
- Text: These questions offer annotators a free-text area where they can enter any text. This type is useful for collecting natural language data, such as corrections or explanations.

```{note}
We have plans to expand the range of supported question types in future releases of the Feedback Task.
```

You can define your questions using the Python SDK and set up the following configurations:
- `name`: A shortname for the question.
- `title`: The text displayed in the UI.
- `description` (optional): The text to be displayed in the question tooltip in the UI. You can use it to give more context or information to annotators.
- `required`: Set your question as required or optional. Annotators must answer all required questions to submit a response, but they have the choice to answer optional questions or not.
- `values`: In a RatingQuestion, these are the rating options represented as a list of integer values.

```{note}
The order of the questions in the UI follows the order in which these are added to the dataset in the Python SDK.
```

In [10]:
# list of questions to display in the feedback form
questions =[
    rg.RatingQuestion(
        name="rating", 
        title="Rate the quality of the response:", 
        description="1 = very bad - 5= very good",
        required=True,
        values=[1,2,3,4,5]
    ),
    rg.TextQuestion(
        name="corrected-text",
        title="Provide a correction to the response:",
        required=False
    )
]


### Write guidelines
Once you have decided on the data to show and the questions to ask, it's important to provide clear guidelines to the annotators. These guidelines help them understand the task and answer the questions consistently. You can provide guidelines in two ways:
- In the dataset guidelines: this is added as an argument when you create your dataset in the Python SDK (see below). It will appear in the dataset settings in the UI.
- As question descriptions: these are added as an argument when you create questions in the Python SDK (see above). This text will appear in a tooltip next to the question in the UI.

It is good practice to use at least the dataset guidelines, if not both methods. In the guidelines, you can include a description of the project, details on how to answer each question with examples, instructions on when to discard a record, etc. Question descriptions should be short and provide context to a specific question. They can be a summary of the guidelines to that question, but often times that is not sufficient to align the whole annotation team.

## Set up your annotation team
Depending on the nature of your project and the size of your annotation team, you may want to have control over annotation overlap i.e., having multiple annotations for a single record. You will need to decide on this before pushing your dataset to Argilla, as this has implications on how your dataset is set up. Let's explore a few overlapping options.

### Full overlap
The Feedback Task supports having multiple annotations for your records. This means that all users with access to the dataset can give responses to all the records in the dataset. To have this full overlap just push the dataset (as detailed in [Create your dataset](#create-your-dataset)) in a workspace where all team members have access. Learn more about managing user access to workspaces [here](../getting_started/installation/configurations/user_management.md#creating-an-annotator-user-assigned-to-a-workspace).

### Zero overlap
If you only want one annotation per record, we recommend that you split your records into chunks and assign these to a single annotator. Then, you can create several datasets, one in each annotator's personal workspace, and add the records assigned to each of them.

In [None]:

import httpx
import random
from collections import defaultdict

# make a request using your Argilla Client
rg_client= rg.active_client().client
auth_headers = {"X-Argilla-API-Key": rg_client.token}
http=httpx.Client(base_url=rg_client.base_url, headers=auth_headers)
users = http.get("/api/users").json()

# optional: filter users to get only those with annotator role
users = [u for u in users if u['role']=='annotator']

# optional: shuffle the records to get a random assignment
random.shuffle(records)

# build a dictionary where the key is the username and the value is the list of records assigned to them
assignments = defaultdict(list)

# divide your records in chunks of the same length as the users list and make the assignments
# you will need to follow the instructions to create and push a dataset for each of the key-value pairs in this dictionary
n = len(users)
chunked_records = [records[i:i + n] for i in range(0, len(records), n)]
for chunk in chunked_records:
    for idx, record in enumerate(chunk):
        assignments[users[idx]['username']].append(record)

### Controlled overlap
This option is optimal when you want to have annotation overlap, but up to a certain number and not with the whole team. This can be because you want your team to be more efficient or perhaps to calculate the agreement between pairs of annotators. In this case, you also need to create several datasets and push them to the annotators' personal workspaces with the difference that each record will appear in multiple datasets. 

In [None]:
# code to assign with overlap

## Create your dataset
Now we are ready to create our dataset. To do that, first you'll need to define the following configurations:
- `name`: The name of the dataset.
- `workspace`: The workspace where the dataset will be created. If you don't provide one, it will be placed in the default workspace attached to the API key used in `rg.init()`.
- `guidelines` (optional): A set of guidelines for the annotators. These will appear in the dataset settings in the UI.
- `fields`: The list of fields to show in the record card. The order in which the fields will appear matches the order of this list.
- `questions`: The list of questions to show in the form.

Once the dataset is created locally, add the records and, when you're happy with the result, push the dataset to Argilla. At that point, you will be able to see the dataset from the UI.

In [None]:
# create a dataset locally
dataset = rg.FeedbackDataset(
    guidelines="You will see a collection of records with a question and an answer.\nYou will be asked to rate the answer from 1 (very bad) to 5 (very good).\nIf your rating is below 5, please provide a correction to the output.",
    fields = [
        rg.TextField(name="question"),
        rg.TextField(name="answer")
    ],
    questions=questions
)

# add the records to the dataset
dataset.add_records(records)

# push the dataset and records to Argilla
dataset.push_to_argilla(name='my_dataset', workspace='my_workspace')

In [20]:
# create a dataset directly in Argilla
dataset = rg.create_feedback_dataset(
    name="my_dataset",
    workspace="my_workspace",
    guidelines="You will see a collection of records with a question and an answer.\nYou will be asked to rate the answer from 1 (very bad) to 5 (very good).\nIf your rating is below 5, please provide a correction to the output.",
    fields = [
        rg.TextField(name="question"),
        rg.TextField(name="answer")
    ],
    questions=questions
)

# add the records to the dataset
dataset.add_records(records)

# push the records to Argilla
dataset.push_to_argilla()

## Annotating a Feedback Dataset
Once you open the dataset, you will see by default the records with `Pending` responses, i.e. records that still don't have a response, in a single-record view. On the left, you can find the record to annotate and on the right the form with all the questions to answer. 

We highly recommend that you read the annotation guidelines before starting the annotation. If there are any, you can find them in the dataset settings page. [describe how to get there] If any of the questions have a description, you will find an info icon next to them. Click it to read the description.

In the annotation view, you will be able to provide responses. Once all required questions have responses the `Submit` button will be enabled and you will be able to submit your response. If you prefer not to give a response for a record, you can move to the next record or discard it using the `Discard` button. 

If you need to review your submitted or discarded responses, you can select the queue you need. From there, you can modify, submit or discard responses. You can also use the `Clear` button to remove the response and send the record back to the `Pending` queue.

You can speed up the annotation process by using shortcuts:
|Action|Keys|
|------|----|
|Clear|&#8679; `Shift` + &blank; `Space`|
|Discard|&#x232B; `Backspace`|
|Discard (from text area)|&#8679; `Shift` + &#x232B; `Backspace`|
|Submit|&crarr; `Enter`|
|Submit (from text area)|&#8679; `Shift` + &crarr; `Enter`|
|Go to previous page|&larr; `Left arrow`|
|Go to next page|&rarr; `Right arrow`|

![Spanshot of the Submitted queue and the progress bar in a Feedback dataset](../_static/llms/snapshot-feedback-submitted.png)

You can track your progress and the number of `Pending`, `Submitted` and `Discarded` responses by clicking the `Progress` icon in the sidebar.

## Collect responses

- Using the Python client to collect the responses to the dataset
- Unifying responses (?) -> Técnicas de IAA.
    - Majority vote, average... 
    - How to calculate IAA for text fields? bleu rouge? Rating of the proposed texts. 
    Make a dataset to collect human text, then rate the human text and use it for a rating exercise to get an annotator score or clean the dataset.
- Export / publish the dataset.

In [2]:
feedback = rg.FeedbackDataset.from_argilla("demo_feedback", workspace="recognai")
#explore responses (based on question name and annotator)


Fetching records from Argilla: 15it [00:01,  7.15it/s]


In [30]:
feedback[0]['responses']

[{'id': '9ef3a4c2-58f3-44aa-b39e-098b3318c088',
  'values': {'rating': {'value': 5}},
  'status': 'submitted',
  'user_id': '3e760b76-e19a-480a-b436-a85812b98843',
  'inserted_at': '2023-05-18T11:08:49.765680',
  'updated_at': '2023-05-18T11:08:49.765680'},
 {'id': '4b02532a-3f80-49e8-b0cf-f6f6f6401d47',
  'values': {'rating': {'value': 5}},
  'status': 'submitted',
  'user_id': '2a0c8da5-f385-46a2-9932-a0e5f2ada50d',
  'inserted_at': '2023-05-18T11:38:35.131957',
  'updated_at': '2023-05-18T11:38:35.131957'}]

In [26]:
from pprint import pprint
print(feedback[4]['fields']['question'])
print(feedback[4]['fields']['answer'])
pprint(feedback[4]['responses'])

Which episodes of season four of Game of Thrones did Michelle MacLaren direct?
She directed "Oathkeeper" and "First of His Name" the fourth and fifth episodes of season four, respectively.
[{'id': 'facb4547-c725-458a-81c1-bf1a1c164f0d',
  'inserted_at': '2023-05-18T11:09:27.835649',
  'status': 'discarded',
  'updated_at': '2023-05-18T11:09:27.835649',
  'user_id': '3e760b76-e19a-480a-b436-a85812b98843',
  'values': {}},
 {'id': 'd46b08d3-18c6-46ef-b8f8-bc706a98b998',
  'inserted_at': '2023-05-18T11:41:14.846758',
  'status': 'submitted',
  'updated_at': '2023-05-18T11:41:14.846758',
  'user_id': '2a0c8da5-f385-46a2-9932-a0e5f2ada50d',
  'values': {'corrected-text': {'value': '\xa0In\xa0Season 3 she directed the '
                                         'episodes "The Bear and the Maiden '
                                         'Fair" and "Second Sons". In Season 4 '
                                         ', she directed another two '
                                         'epis

## Fine-tuning?