Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Order labels by confidence score #4647

Closed
2 tasks done
nataliaElv opened this issue Mar 12, 2024 · 6 comments · Fixed by #4731
Closed
2 tasks done

[FEATURE] Order labels by confidence score #4647

nataliaElv opened this issue Mar 12, 2024 · 6 comments · Fixed by #4731
Assignees
Labels
area: ui Indicates that an issue or pull request is related to the User Interface (UI) severity: minor Indicates that the issue isn't urgent or blocking team: frontend Indicates that the issue or pull request is owned by the frontend team type: community request Indicates a feature requested by someone outside of the Argilla organization type: enhancement Indicates new feature requests
Milestone

Comments

@nataliaElv
Copy link
Member

nataliaElv commented Mar 12, 2024

Can I set the order of labels shown for examples in a FeedbackDataset with a MultiLabelQuestion for each individual example?
In a previous question, using code in #4615, I am using a FeedbackDataset to label examples with quite many labels.

After creating the dataset in Argilla, I get something like the screenshot below in the front-end, and there's a couple of things I would like to improve:

  • If I set the number of visible labels to (say) 10, the front-end always shows me the first elements in my list of labels (i.e. [labels[:10]]. These are (in general) irrelevant for the example to annotate, and they appear before the suggested labels, which instead are the ones likely to be fitting for my example. Is there a way to get rid of (= hide) these irrelevant labels, or at least show them after the suggested ones? At the moment it's impossible to hide them, as the minimum number of visible labels is 3.

  • In the same screenshot, if I deselect a suggested variable (say "label 53") it disappears instantly if it's not in labels[:10] , and I need to go back to the search box if I want to re-select it. Ideally instead, I would like suggested labels to still stay visible, even if they are deselected. In other words, suggested labels (in my example "label 28", "label 53", "label 14") should have "visual priority" - should come first and should stay visible even if unselected - over the irrelevant-if-not-suggested labels[:10] . Can I set the order of labels so that suggested labels come first and not disappear if unselected?

Summarizing, I would be happy if I could allow Argilla to order visible labels in a FeedbackDataset with a MultiLabelQuestion so that they can be displayed with a custom order, depending on the example, not the one dictated by the labels argument of the MultiLabelQuestion and stay visible if unselected. Is this feasible to do?

P.S. If I am not mistaken, this is somewhat related to #4638, but I don't care much about displaying the score, I care about ordering the labels according to the predicted score (i.e. suggested labels, with a high score, should be shown first).

image (1)

Tasks

  1. jfcalvo
  2. area: python sdk team: backend type: enhancement
    jfcalvo
@nataliaElv nataliaElv added the type: community request Indicates a feature requested by someone outside of the Argilla organization label Mar 12, 2024
Copy link

dosubot bot commented Mar 12, 2024

Hey @nataliaElv! Great to see you diving into the depths of Argilla again 🌊. I remember our last tech adventure together! Hold tight, I'm checking out your latest quest for order in the label universe and will get back to you with some insights shortly. 🚀

@dosubot dosubot bot added area: ui Indicates that an issue or pull request is related to the User Interface (UI) severity: minor Indicates that the issue isn't urgent or blocking team: frontend Indicates that the issue or pull request is owned by the frontend team type: enhancement Indicates new feature requests labels Mar 12, 2024
@nataliaElv
Copy link
Member Author

Related to #3269

Copy link

dosubot bot commented Mar 12, 2024

Based on the current implementation of Argilla, it is possible to customize the order of visible labels in a FeedbackDataset with a MultiLabelQuestion, so that suggested labels with a high predicted score are displayed first and remain visible even if unselected. This can be achieved by specifying the visible_labels parameter in the MultiLabelQuestion settings.

Here is an example of how you can use the visible_labels parameter:

(
    {
        "name": "a",
        "title": "B",
        "description": "b",
        "required": False,
        "labels": {"a": "A", "b": "B", "c": "C"},
        "visible_labels": 3,
    },
    {
        "name": "a",
        "title": "B",
        "description": "b",
        "required": False,
        "settings": {
            "type": "multi_label_selection",
            "options": [{"text": "A", "value": "a"}, {"text": "B", "value": "b"}, {"text": "C", "value": "c"}],
            "visible_options": 3,
        },
    },
)

In this example, the visible_labels parameter is used to specify the number of labels that should remain visible. Although the parameter name suggests controlling the count rather than the order, combining this with a sorted list of labels (based on predicted scores) before passing it to the question can achieve the desired outcome. The labels should be sorted in the desired order (e.g., by predicted score) externally before being passed to the MultiLabelQuestion.

The visible_labels parameter in the MultiLabelQuestion settings in Argilla is implemented to control the number of labels that are visible in the UI. By default, it is set to 20, and it must be 3 or greater. The implementation ensures that if the visible_labels parameter is not explicitly provided (i.e., it is set to UNDEFINED), and the total number of labels is greater than 20, a warning is issued, and visible_labels is automatically set to 20. If the total number of labels is 20 or fewer, visible_labels is set to None, meaning all labels will be shown. If visible_labels is provided but is greater than the total number of labels, a warning is issued, and visible_labels is adjusted to match the total number of labels if there are at least 3 labels. If there are fewer than 3 labels, visible_labels is set to None, indicating that all labels will be visible.

This behavior is enforced through a root validator named visible_labels_must_be_valid within the _LabelQuestion class, from which MultiLabelQuestion inherits. This validator checks the value of visible_labels and adjusts it based on the conditions described above. The adjusted value of visible_labels is then used in the server_settings property to set the "visible_options" key, which determines how many labels are visible in the UI.

You can find more details about this in the Argilla source code.

Regarding your second point, the current implementation does not support keeping deselected labels visible. This would require a change in the Argilla codebase. I recommend creating a new issue for this feature request, detailing your use case and the desired behavior. This will allow the Argilla maintainers to evaluate the request and potentially include it in a future update.

I hope this helps! If you have any other questions or need further clarification, feel free to ask.

Sources

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

@davidefiocco
Copy link
Contributor

davidefiocco commented Mar 12, 2024

Until this implemented, a workaround (thanks @c23996 for the idea!) that allows users to access suggestions

  • in the desired order
  • without the risk of "losing" them in the label list

is a bit crude, but in the end acceptable: we display suggestions (in the desired order) in a dedicated field

image

and require users to confirm suggestions one by one by copying and pasting them from the text field.

import argilla as rg
api_url = "http://localhost:6900"
api_key = "<api-key>"

rg.init(api_url=api_url, api_key=api_key)

# rg.Workspace.create('argilla')

rg.set_workspace("argilla")

num_labels = 250

labels = ["label " + str(i) for i in range(0, num_labels)]

dataset = rg.FeedbackDataset(
    fields=[
        rg.TextField(name="text", required= True, use_markdown=True),
        rg.TextField(name="suggestions", required= True, use_markdown=True)
    ],
    questions=[
        rg.MultiLabelQuestion(name="question-multi", title="Which labels are correct?", labels=labels, required=True, visible_labels = 3)]
)

records = []

record = rg.FeedbackRecord(
    fields={
        "text": "tierra finamente dividida, constituida por agregados de silicatos de aluminio hidratados",
        "suggestions": "label 28, label 34, label 53, label 14, label 95"
    },
)
records.append(record)

dataset.add_records(records)

dataset.push_to_argilla(name="workaround-dataset", workspace="argilla")

@jfcalvo
Copy link
Member

jfcalvo commented Apr 19, 2024

@nataliaElv I have two questions:

  1. Does this change also apply to span questions?
  2. I understand that this change would be to change the default behavior in which the labels are showed. If that's the case, should we think about adding some configuration setting to the question so the previous behavior can be enabled? (maybe it's not useful at al, only asking in case it is).

@nataliaElv
Copy link
Member Author

Hi @jfcalvo ! This won't apply for Span questions, only Label and Multilabel, and yes the idea is to make this a configuration at the question level that can be changed at any time. I'll send over more detailed specifications soon 🙂

@damianpumar damianpumar linked a pull request Apr 23, 2024 that will close this issue
1 task
@jfcalvo jfcalvo closed this as completed May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: ui Indicates that an issue or pull request is related to the User Interface (UI) severity: minor Indicates that the issue isn't urgent or blocking team: frontend Indicates that the issue or pull request is owned by the frontend team type: community request Indicates a feature requested by someone outside of the Argilla organization type: enhancement Indicates new feature requests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants