# Tutorial: Preparing submission file and run evaluation for Task B

In this notebook, a step-by-step tutorial is provided for preparing the submission file for the shared task Task B. To achieve this, the data for Task B, hosted on [Zenodo](https://doi.org/10.5281/zenodo.14002665), will be downloaded; a file with the appropriate [submission format](https://talentclef.github.io/talentclef/docs/talentclef-2025/evaluation/) will be prepared, and it will be evaluated using the [task's evaluation script](https://github.com/TalentCLEF/talentclef25_evaluation_script). Additionally, the provided format is also compatible with the benchmark where the test set data will be uploaded on Codabench.



-----------------------------
TalentCLEF is an initiative to advance Natural Language Processing (NLP) in Human Capital Management (HCM). It aims to create a public benchmark for model evaluation and promote collaboration to develop fair, multilingual, and flexible systems that improve Human Resources (HR) practices across different industries.

This shared-task's inaugural edition is part of the [Conference and Labs of the Evaluation Forum (CLEF)](https://clef2025.clef-initiative.eu/index.php?page=Pages/labs.html), scheduled to be held in Madrid in 2025. If you are interested in registering, you can find registration form [here](https://clef2025-labs-registration.dei.unipd.it/).

<img src="https://github.com/TalentCLEF/talentclef/blob/main/logo_talentclef.png?raw=true" alt="TalentCLEF logo" width="200"/>
<img src="https://talentclef.github.io/talentclef/docs/talentclef-2025/workshop/logo_clef_madrid.png" alt="TalentCLEF logo" width="150"/>


## Imports

In [1]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer, util
import subprocess
from codecarbon import EmissionsTracker

## Download Task B files

First, let's download the Task A and Task B zip files directly from Zenodo.



In [None]:
# Download
!wget https://zenodo.org/records/15038364/files/TaskB.zip
!unzip TaskB.zip -d taskB

## Generate releveant files using a simple model

**DE MOMENTO SUBIR LOS DATOS A MANOS HASTA EL RELEASE.**

Load queries and corpus elements in English from the Validation folder:

In [8]:
queries = "/content/taskB/validation/queries"
corpus_elements = "/content/taskB/validation/corpus_elements"

In [9]:
queries = pd.read_csv(queries,sep="\t")
corpus_elements = pd.read_csv(corpus_elements, sep="\t")


Transform `skill_aliases` column to a list of strings:

In [10]:
import ast
corpus_elements["skill_aliases"] = corpus_elements["skill_aliases"].apply(lambda x: ast.literal_eval(x))

Generate a mapping dictionary between IDs and texts from query and corpus element strings.

In [11]:
queries_ids = queries.q_id.to_list()
queries_texts = queries.jobtitle.to_list()
map_queries = dict(zip(queries_ids,queries_texts))

Before creating a mapping dictionary of texts to corpus_ids, explode the `skill_aliases` column:

In [12]:
list_aliases_df = corpus_elements.explode("skill_aliases")

In [14]:
corpus_ids = list_aliases_df.c_id.to_list()
corpus_texts = list_aliases_df.skill_aliases.to_list()
map_corpus = dict(zip(corpus_texts,corpus_ids))

Load simple embedding model:

In [None]:
model = SentenceTransformer("all-MiniLM-L6-v2")

In [None]:
tracker = EmissionsTracker()
tracker.start_task("all-MiniLM-L6-v2")

Encode queries and corpus elements:

In [16]:
query_embeddings = model.encode(queries_texts, convert_to_tensor=True)
corpus_embeddings = model.encode(corpus_texts, convert_to_tensor=True)

Compute similarities

In [17]:
similarities = util.cos_sim(query_embeddings, corpus_embeddings).cpu().numpy()
emissions = tracker.stop_task("all-MiniLM-L6-v2")

## Prepare submission file

The submissions must follow the TREC Run File format, including headers in the output file. This means that the fle have 6 space-spearated columns per line, with following information:

- q_id: Query ID.
- Q0: A constant identifier, usually "Q0".
- doc_id: ID of the retrieved document.
- rank: Position of the document in the ranking.
- score: Relevance score assigned by the model.
- tag: Experiment name

In [18]:
import numpy as np
results = []
results_name = []

for q_idx, q_id in enumerate(queries_ids):
    sorted_indices = np.argsort(-similarities[q_idx])
    used_doc_ids = set()
    rank_counter = 0
    for c_idx in sorted_indices:  # Consider the full list.
        doc_id = corpus_ids[c_idx]
        # If doc_id was already processed, go to the next one.
        if doc_id in used_doc_ids:
            continue
        used_doc_ids.add(doc_id)
        rank_counter += 1

        query_name = map_queries[q_id]
        doc_name = corpus_texts[c_idx]
        score = similarities[q_idx, c_idx]

        results.append(f"{q_id} Q0 {doc_id} {rank_counter} {score:.4f} baseline_model")
        results_name.append(f"{query_name} Q0 {doc_name} {rank_counter} {score:.4f} baseline_model")

The list has this structure

Let's save the list as a file:

In [21]:
with open("evaluation_baseline_taskB.trec", "w", encoding="utf-8") as f:
    f.write("\n".join(results))

json.dump(dict(emissions.values), open("./emissions.json", "w"), ensure_ascii=False, indent=4)

## Evaluation

For the evaluation, we will use the official [TalentCLEF evaluation script](https://github.com/TalentCLEF/talentclef25_evaluation_script), which uses the Ranx library under the hood.

First, clone the repo and install the requirements file:

In [None]:
!git clone https://github.com/TalentCLEF/talentclef25_evaluation_script.git
!pip install -r /content/talentclef25_evaluation_script/requirements.txt


Then, select the Qrels file and the Run file to perform the evaluation.


In [23]:
qrels_file = "/content/taskB/validation/qrels.tsv"
run_file = "/content/evaluation_baseline_taskB.trec"

In [24]:
command = ["python", "/content/talentclef25_evaluation_script/talentclef_evaluate.py", "--qrels", qrels_file, "--run", run_file]
result = subprocess.run(command, capture_output=True, text=True)
print(result.stdout)

Received parameters:
  qrels: /content/taskB/validation/qrels.tsv
  run: /content/evaluation_baseline_taskB.trec
Loading qrels...
Loading run...
Running evaluation...

=== Evaluation Results ===
map: 0.1874
mrr: 0.6752
ndcg: 0.6660
precision@5: 0.4500
precision@10: 0.3852
precision@100: 0.1964

