In [1]:
from tqdm.auto import tqdm
import datasets

from legalbench.tasks import TASKS, ISSUE_TASKS
from legalbench.utils import generate_prompts

In [2]:
# Supress progress bars which appear every time a task is downloaded
datasets.utils.logging.set_verbosity_error()

### Task organization

`tasks.py` provides data structures which organize all LegalBench tasks. For instance, `TASKS` lists all LegalBench tasks, and `ISSUE_TASKS` lists all tasks in the issue-spotting reasoning category.

In [3]:
print(len(TASKS), TASKS[:10])
print()
print(len(ISSUE_TASKS), ISSUE_TASKS)

162 ['abercrombie', 'canada_tax_court_outcomes', 'citation_prediction_classification', 'citation_prediction_open', 'consumer_contracts_qa', 'contract_nli_confidentiality_of_agreement', 'contract_nli_explicit_identification', 'contract_nli_inclusion_of_verbally_conveyed_information', 'contract_nli_limited_use', 'contract_nli_no_licensing']

17 ['corporate_lobbying', 'learned_hands_benefits', 'learned_hands_business', 'learned_hands_consumer', 'learned_hands_courts', 'learned_hands_crime', 'learned_hands_divorce', 'learned_hands_domestic_violence', 'learned_hands_education', 'learned_hands_employment', 'learned_hands_estates', 'learned_hands_family', 'learned_hands_health', 'learned_hands_housing', 'learned_hands_immigration', 'learned_hands_torts', 'learned_hands_traffic']


### Loading task data

LegalBench can be downloaded from Huggingface: https://huggingface.co/datasets/nguha/legalbench. Each LegalBench dataset comes with `train` and `test` split.

- The `train` split is small (usually fewer than 10 samples). Following the [RAFT](https://raft.elicit.org/) benchmark, it's intended to provide labaled samples that can be used as few-shot demonstrations for prompts.
- The `test` split is larger, and contains samples to evaluate an LLM on. 

Documentation for each task can be found on the Github repository, under the task-specific folder. For instance, the documentation for the `abercrombie` task can be found at <https://github.com/HazyResearch/legalbench/tree/main/tasks/abercrombie>.

In [4]:
dataset = datasets.load_dataset("nguha/legalbench", "unfair_tos")
dataset["train"].to_pandas()

Unnamed: 0,answer,index,text
0,Other,0,"last updated date : may 15 , 2017"
1,Arbitration,1,arbitration notice : unless you opt out of arb...
2,Contract by using,2,"you acknowledge and agree that , by accessing ..."
3,Unilateral change,3,"academia.edu reserves the right , at its sole ..."
4,Unilateral termination,4,academia.edu reserves the right to suspend or ...
5,Limitation of liability,5,neither academia.edu nor any other person or e...
6,Choice of law,6,these terms and any action related thereto wil...
7,Jurisdiction,7,the exclusive jurisdiction and venue of any ip...
8,Content removal,8,amazon reserves the right ( but not the obliga...


### Loading and applying prompts

Each task folder also stores prompt templates which can be used with different models. In LegalBench, prompt templates are represented as text files, in which "{{col_name}}" denote place holders for column names.

For instance:

In [5]:
# Load base prompt
with open(f"legalbench/tasks/unfair_tos/base_prompt.txt") as in_file:
    prompt_template = in_file.read()
print(prompt_template)

Classify each clause by type.
Options: Arbitration, Unilateral change, Content removal, Jurisdiction, Choice of law, Limitation of liability, Unilateral termination, Contract by using, Other

Clause: arbitration notice : unless you opt out of arbitration within 30 days of the date you first agree to these terms by following the opt-out procedure specified in the `` arbitration '' section below , and except for certain types of disputes described in the `` arbitration `` section below , you agree that disputes between you and academia.edu will be resolved by binding , individual arbitration and you are waiving your right to a trial by jury or to participate as a plaintiff or class member in any purported class action or representative proceeding . 
Label: Arbitration

Clause: academia.edu reserves the right , at its sole discretion , to modify the site , services and these terms , at any time and without prior notice . 
Label: Unilateral change

Clause: amazon reserves the right ( but n

The script `utils.py` provides a simple function for generating prompts for a dataset given a template.

In [6]:
test_df = dataset["test"].to_pandas()
prompts = generate_prompts(prompt_template=prompt_template, data_df=test_df)
print(prompts[0])

Classify each clause by type.
Options: Arbitration, Unilateral change, Content removal, Jurisdiction, Choice of law, Limitation of liability, Unilateral termination, Contract by using, Other

Clause: arbitration notice : unless you opt out of arbitration within 30 days of the date you first agree to these terms by following the opt-out procedure specified in the `` arbitration '' section below , and except for certain types of disputes described in the `` arbitration `` section below , you agree that disputes between you and academia.edu will be resolved by binding , individual arbitration and you are waiving your right to a trial by jury or to participate as a plaintiff or class member in any purported class action or representative proceeding . 
Label: Arbitration

Clause: academia.edu reserves the right , at its sole discretion , to modify the site , services and these terms , at any time and without prior notice . 
Label: Unilateral change

Clause: amazon reserves the right ( but n

### Evaluation

The majority of LegalBench tasks are evaluated using balanced-accuracy. A handful of tasks which involve extraction or multilabel classification are evaluated using F1. To simplify evaluation, we provide an evaluation which which scores performance.

In [None]:
import openai
import datasets
from tqdm.auto import tqdm
from legalbench.utils import generate_prompts
from legalbench.evaluation import evaluate

# --- 1. Configuración del Cliente OpenAI/Ollama y LLM ---
GENERATIVE_MODEL = "qwen3:8b" # O el modelo que estés usando en Ollama
TASK_NAME = "unfair_tos"

client = openai.OpenAI(
    base_url='http://localhost:11434/v1',
    api_key='ollama', # Ollama usa "ollama" o cualquier string si no se requiere auth
)

# --- 2. Cargar Datos de la Tarea y Plantilla de Prompt ---
# Cargar el conjunto de datos para la tarea específica desde Hugging Face
print(f"Cargando dataset para la tarea: {TASK_NAME}...")
dataset = datasets.load_dataset("nguha/legalbench", TASK_NAME)
test_df = dataset["test"].to_pandas()
test_df = test_df[:100]
# train_df = dataset["train"].to_pandas() # Para few-shot si los quieres añadir al prompt base

# Cargar la plantilla de prompt base para la tarea
# Asegúrate de que la ruta 'tasks/TASK_NAME/base_prompt.txt' sea correcta
# relativa a donde ejecutas tu script/notebook.
prompt_template_path = f"legalbench/tasks/{TASK_NAME}/base_prompt.txt"
print(f"Cargando plantilla de prompt desde: {prompt_template_path}...")
try:
    with open(prompt_template_path) as f:
        prompt_template = f.read()
except FileNotFoundError:
    print(f"Error: No se encontró el archivo de prompt en {prompt_template_path}")
    print("Asegúrate de que la carpeta 'tasks' esté en el mismo directorio que tu script/notebook,")
    print("o ajusta la ruta según sea necesario.")
    exit()

# --- 3. Generar Prompts para los Datos de Prueba ---
# La función generate_prompts llenará las plantillas con los datos de test_df
print("Generando prompts para el conjunto de prueba...")
prompts_for_llm = generate_prompts(prompt_template=prompt_template, data_df=test_df)
print(f"Se generaron {len(prompts_for_llm)} prompts.")
#print(prompts_for_llm[0])
if not prompts_for_llm:
    print("No se generaron prompts. Revisa tu plantilla y datos.")
    exit()

# Mostrar un ejemplo del prompt generado (opcional)
# print("\nEjemplo de prompt generado:")
# print(prompts_for_llm[0])

# --- 4. Obtener Generaciones (Predicciones) del LLM ---
llm_generations = []
print(f"\nObteniendo predicciones del LLM ({GENERATIVE_MODEL}) para {len(prompts_for_llm)} instancias...")

# Es buena idea definir un system prompt general si tu modelo responde mejor con uno.
# Para LegalBench, los prompts suelen ser autocontenidos con instrucciones y ejemplos.
# El base_prompt.txt de abercrombie ya tiene ejemplos
# por lo que un system prompt vacío o muy genérico podría ser suficiente.
SYSTEM_PROMPT = """You are a clause classification assistant. "
    "Given a clause, classify it into one of the following categories: "
    "Arbitration, Unilateral change, Content removal, Jurisdiction, Choice of law, "
    "Limitation of liability, Unilateral termination, Contract by using, Other. "
    "Respond with ONLY the category name and nothing else."""
# O puedes dejarlo vacío si el prompt base es suficiente:
# SYSTEM_PROMPT = ""

#prompts_for_llm = prompts_for_llm[:15]

for user_prompt_content in tqdm(prompts_for_llm):
    messages = [
        {"role": "system", "content": SYSTEM_PROMPT},
        {"role": "user", "content": user_prompt_content}
    ]
    try:
        response = client.chat.completions.create(
            model=GENERATIVE_MODEL,
            messages=messages,
            temperature=0.0  # Generalmente bueno para tareas de clasificación/extracción
        )
        generated_text = response.choices[0].message.content.strip()
        # NUEVO: Añadir paso de post-procesamiento
        cleaned_generation = ""
        possible_labels = [
            "Arbitration", "Unilateral change", "Content removal", "Jurisdiction",
            "Choice of law", "Limitation of liability", "Unilateral termination",
            "Contract by using", "Other"
        ]

        # Intenta extraer la etiqueta si el LLM la incluye con "Label: "
        if "label:" in generated_text.lower():
            parts = generated_text.lower().split("label:")
            potential_label = parts[-1].strip()
            # Ahora verifica si esta etiqueta potencial es una de las conocidas
            # (esto ayuda si el LLM añade texto extra DESPUÉS de la etiqueta)
            for pl in possible_labels:
                if pl.lower() == potential_label: # Comparación exacta inicial
                    cleaned_generation = pl
                    break
                # Si no hay coincidencia exacta, intenta ver si la etiqueta es una subcadena
                # (por si el LLM dice "Label: Arbitration clause" en lugar de solo "Arbitration")
                if not cleaned_generation and pl.lower() in potential_label:
                     cleaned_generation = pl # Toma la primera que coincida como subcadena
                     # Podrías querer lógica más sofisticada aquí si hay ambigüedad

            # Si después de "Label:" no se encontró una etiqueta válida,
            # se podría intentar una búsqueda más general en todo el texto.
            if not cleaned_generation:
                # Fallback: buscar la última aparición de alguna etiqueta conocida en el texto
                # Esto es menos preciso y puede dar falsos positivos
                best_match = ""
                best_pos = -1
                for known_label in possible_labels:
                    pos = generated_text.lower().rfind(known_label.lower())
                    if pos > best_pos: # Encuentra la última aparición
                        best_pos = pos
                        best_match = known_label
                cleaned_generation = best_match

        else:
            # Si "Label:" no está, intenta encontrar la última etiqueta conocida mencionada
            best_match = ""
            best_pos = -1
            for known_label in possible_labels:
                # Buscamos la etiqueta completa para evitar coincidencias parciales no deseadas
                # (ej. "law" en "Choice of law" vs "contract by using law")
                # Usamos expresiones regulares para buscar palabras completas (case-insensitive)
                import re
                if re.search(r'\b' + re.escape(known_label.lower()) + r'\b', generated_text.lower()):
                    # Esta lógica es simple; si múltiples etiquetas están, puede no ser la correcta.
                    # La estrategia de abajo de buscar la última es una heurística.
                    pos = generated_text.lower().rfind(known_label.lower())
                    if pos > best_pos:
                        best_pos = pos
                        best_match = known_label
            cleaned_generation = best_match


        if not cleaned_generation: # Si aún no se pudo extraer
             print(f"WARN: No se pudo extraer una etiqueta válida de: '{generated_text[:100]}...' Se usará la salida original normalizada.")
             # En este caso, la normalización de evaluate() intentará limpiarla, pero probablemente falle.
             # O podrías asignar un placeholder como "extracción fallida"
             llm_generations.append(generated_text) # Usar el texto original para ver qué hace normalize()
        else:
             llm_generations.append(cleaned_generation)
    except Exception as e:
        print(f"Error al llamar al API de Ollama: {e}")
        # Decide cómo manejar errores: añadir un placeholder, reintentar, o parar.
        # Por ahora, añadiremos un string vacío para no romper la evaluación.
        llm_generations.append("") # O un valor que sepas que será incorrecto

if not llm_generations or len(llm_generations) != len(prompts_for_llm):
    print("Error: No se pudieron obtener todas las generaciones del LLM.")
    exit()


# --- 5. Evaluar las Generaciones ---
ground_truth_answers = test_df["answer"].tolist()

print("\nEvaluando las predicciones...")
# La función evaluate tomará el nombre de la tarea, las predicciones de tu LLM,
# y las respuestas correctas.
score = evaluate(TASK_NAME, llm_generations, ground_truth_answers)

print(f"\nResultado de la evaluación para la tarea '{TASK_NAME}' con el modelo '{GENERATIVE_MODEL}':")
print(f"Score: {score}")

Cargando dataset para la tarea: unfair_tos...
Cargando plantilla de prompt desde: legalbench/tasks/unfair_tos/base_prompt.txt...
Generando prompts para el conjunto de prueba...
Se generaron 100 prompts.

Obteniendo predicciones del LLM (qwen3:8b) para 100 instancias...


  0%|          | 0/100 [00:00<?, ?it/s]

In [8]:
target_license = "CC BY 4.0"
tasks_with_target_license = []
for task in tqdm(TASKS):
    dataset = datasets.load_dataset("nguha/legalbench", task, split="train")
    if dataset.info.license == target_license:
        tasks_with_target_license.append(task)
print("Tasks with target license:", tasks_with_target_license)

  0%|          | 0/162 [00:00<?, ?it/s]

Generating train split:   0%|          | 0/5 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/95 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/244 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/2 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/108 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/2 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/53 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/4 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/396 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/82 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/109 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/139 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/208 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/162 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/142 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/178 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/87 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/136 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/111 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/66 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/170 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/180 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/157 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/80 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/10 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/490 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/198 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/88 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1172 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1216 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1246 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/416 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/220 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/308 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/236 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/762 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/876 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/876 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1030 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/576 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/280 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/192 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1396 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/220 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/772 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/64 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/84 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/142 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/442 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/100 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/542 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/222 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/808 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/46 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/386 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/774 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/690 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/118 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/430 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/68 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/294 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/48 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/322 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/320 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1337 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/687 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/300 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/300 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/300 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/300 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/300 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/300 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/7 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/367 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/5 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/94 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/5 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/133 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/54 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/4 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/9306 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/292 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/66 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/174 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/614 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/192 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/688 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/150 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/174 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/56 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/710 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/178 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2265 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/226 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/4494 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/134 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/432 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/556 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/4 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/55 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/69 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/112 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/175 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/181 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/181 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/158 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/180 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/181 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/99 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/98 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/100 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/100 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/84 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/100 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/147 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/146 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/148 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/179 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/179 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/77 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/98 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/181 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/158 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/132 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/147 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/167 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/156 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/181 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/98 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/98 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/90 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/178 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/179 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/1 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/172 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/88 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1334 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/110 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2086 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/980 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/431 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1590 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/462 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/1546 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/7 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/312 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/6 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/2394 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/4 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/50 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/4335 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/8 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/10923 [00:00<?, ? examples/s]

Generating train split:   0%|          | 0/5 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/95 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/50 [00:00<?, ? examples/s]

ValueError: Unknown split "train". Should be one of ['test'].