# 🩹 Delete labels from a Token or Text Classification dataset

It's not uncommon to find yourself wanting to delete one of the labels in your dataset, maybe because you won't use it or because you want to correct the name of the label. However, this has implications down the line if the dataset already has annotations. 

In this tutorial, you will learn how to deal with this situation depending on the ... Token & Text Classification

Let's get started!

<img src="../../../_static/images/llms/curating-feedback-instructiondataset/snapshot_dolly_curation.png" alt="A Feedback Task setting for the curation of Databricks' Dolly dataset" style="width: 1100px;">

<div class="alert alert-info">

Note 

This tutorial is a Jupyter Notebook. There are two options to run it:

- Use the Open in Colab button at the top of this page. This option allows you to run the notebook directly on Google Colab. Don't forget to change the runtime type to GPU for faster model training and inference.
- Download the .ipynb file by clicking on the View source link at the top of the page. This option allows you to download the notebook and run it on your local machine or on a Jupyter notebook tool of your choice.

</div>


## Setup

For this tutorial, you will need to have an Argilla server running. If you don't have one already, check out our [Quickstart](../../../getting_started/quickstart.md) or [Installation](../../../getting_started/installation/installation.md) pages. Once you do, complete the following steps:

1. Install the Argilla client and the required third party libraries using `pip`:

In [None]:
%pip install --upgrade argilla -qqq

2. Let's make the necessary imports:

In [1]:
import argilla as rg

3. If you are running Argilla using the Docker quickstart image or Hugging Face Spaces, you need to init the Argilla client with the `URL` and `API_KEY`:

In [None]:
# Replace api_url with the url to your HF Spaces URL if using Spaces
# Replace api_key if you configured a custom API key
rg.init(
    api_url="http://localhost:6900", 
    api_key="admin.apikey"
)

In [2]:
rg.init(
    api_url="https://nataliaelv-argilla-tutorials.hf.space", 
    api_key="admin.apikey"
)

This may lead to potential compatibility issues during your experience.
To ensure a seamless and optimized connection, we highly recommend aligning your client version with the server version.


## First steps

In [46]:
# set the workspace that we will be working in
rg.set_workspace("argilla")
ds = "multi_label_ds"

In [None]:
# optional: create a new workspace for the backups.
workspace = rg.Workspace.create("backups")

In [None]:
# optional: if you want users other than the owner to have access to this space
# change the username with the username from the user and run this cell.
user = rg.User.from_name("username")
workspace.add_user(user.id)

In [35]:
rg.copy("gutenberg_spacy-ner", name_of_copy="gutenberg_spacy-ner_backup", workspace="backups")

Output()

BulkResponse(dataset='gutenberg_spacy-ner_backup', processed=100, failed=0)

In [47]:
settings = rg.load_dataset_settings(ds)

In [48]:
settings.label_schema

{'Alcantarillado/Pluviales',
 'Alta',
 'Aplazamiento de pago',
 'Atención recibida',
 'Baja',
 'Baja presión',
 'Calidad del servicio',
 'Cambio de titular',
 'Consulta administrativa oficinas',
 'Contratación',
 'Cortes falta de pago',
 'Error de lectura',
 'Facturación errónea',
 'Filtración en garaje/bajo',
 'Fuga en instalación interior',
 'Fuga en la vía pública',
 'Funcionamiento del contador',
 'Información/Consultas',
 'No tiene agua',
 'Otros',
 'Presupuestos',
 'Problema calidad agua',
 'Protección de datos',
 'Recibos',
 'Refacturación por fuga',
 'Reparto de correspondencia',
 'Reposición obra civil',
 'Rotura provocada',
 'Solicitan cierre agua maniobras instalación abonado',
 'Solicitan presencia personal FACSA instalación',
 'Vulnerabilidad',
 'descartado',
 'popopo'}

In [None]:
# set the old and new labels as variables, to avoid errors down the line
old_label = ""
# comment out or set to None if you want to remove the label
new_label = ""

## Remove label from the records

In [None]:
# get all records with the old label in the annotations or predictions
records = rg.load(ds, query=f"annotated_as:{old_label} OR predicted_as:{old_label}")
len(records)

In [None]:
def cleaning_function(labels, old_label, new_label):

    # replaces / removes string labels (e.g. TextClassification)
    if isinstance(labels, str):
        if labels==old_label:
            labels = new_label

    elif isinstance(labels, list):
        # replaces / removes labels in a list (e.g. multi-label TextClassification)
        if isinstance(labels[0], str):
            if old_label in labels:
                if new_label == None:
                    labels.remove(old_label)
                else:
                    labels = [new_label if label == old_label else label for label in labels]

        # replaces / removes lables in a list of tuples (e.g. Predictions, TokenClassification)
        elif isinstance(labels[0], tuple):
            for ix,label in enumerate(labels):
                if label[0]==old_label:
                    if new_label == None:
                        labels.remove(label)
                    else:
                        new_label = list(label)
                        new_label[0] = new_label
                        labels[ix] = tuple(new_label)

    return labels

In [75]:

# loop over the records and make the correction
for record in records:
    for labels in [record.prediction, record.annotation]:
        if labels:
            labels = cleaning_function(labels, old_label, new_label)

    record.status = "Default"
    

In [None]:
# log the corrected records
rg.log(records, name=ds)

## Update dataset settings

In [44]:
settings.label_schema.remove(old_label)

In [45]:
rg.configure_dataset(name=ds, settings=settings)

Now the label should be gone from our annotations, predictions and the dataset settings.

## Summary

In this tutorial, we learned how to ...