# Project: NIMs Based Task Assignment System

## Overview

This project implements an intelligent task assignment system using ranking models and embeddings to match tasks with performers based on their profiles and capabilities.

## Key Components

1. Task Assignment System
    - Uses ranking models to evaluate task-performer compatibility
    - Supports multiple assignments per task
    - Based on skills and experience matching

2. Multilingual Description Categorization 
    - Processes task descriptions in multiple languages
    - Groups similar tasks using semantic similarity
    - Implements embeddings and cosine similarity

### Technical Components

- [NvidiaRanker](https://docs.haystack.deepset.ai/docs/nvidiaranker) for task-performer matching
- [NvidiaTextEmbedder](https://docs.haystack.deepset.ai/docs/nvidiatextembedder) for multilingual processing  
- [DocumentMRREvaluator](https://docs.haystack.deepset.ai/docs/documentmrrevaluator) for performance metrics

### Required Setup

- NVIDIA AI Platform account
- API key from NVIDIA
- Python environment with required packages

## Project Structure

1. Task assignment implementation
2. Performance evaluation
3. Multilingual categorization
4. Results analysis


## Pre-Requisites


### Account Setup

1. Create a free account at [NVIDIA AI Platform](https://build.nvidia.com/)
2. Generate API key:
    - Click any model
    - Select "Build this with NIM"
    - Copy generated API key


## Environment Downloads


In [1]:
!uv pip install nvidia-haystack gdown

[2mAudited [1m2 packages[0m [2min 40ms[0m[0m


## Dataset Description


1. `task1_elf_profiles.csv`: a short description of each elf's expertise, traits, and preferences
2. `task1_descriptions.csv`: a description of what each task involves
3. `task1_solution_key.csv`: the optimal elf-task assignment, to be used for evaluation.
4. `task2_parent_notes.xlsx`: the messages that Santa got from the parents all over the world requesting to leave the gifts in specific places
5. `task2_solution_key.xlsx`: the right assignment of messages-places, to be used for evaluation.


In [2]:
import gdown

url = "https://drive.google.com/drive/folders/1Oxn9M7cpIeuDlQ4tn4wUvqAKVRW0bNTC?usp=sharing"
output_dir = "challenge_files"

gdown.download_folder(url, quiet=True, output=output_dir)

['challenge_files/task1_descriptions.csv',
 'challenge_files/task1_elf_profiles.csv',
 'challenge_files/task1_solution_key.csv',
 'challenge_files/task2_parent_notes.xlsx',
 'challenge_files/task2_solution_key.xlsx']

## Add `API` key
Choosing variable `NVIDIA_API_KEY`


In [3]:
import os
from getpass import getpass
#from google.colab import userdata

# os.environ["NVIDIA_API_KEY"] = "<YOUR_KEY_HERE>"
os.environ["NVIDIA_API_KEY"] = getpass("Enter your NVIDIA API key: ")

### **Task 1**

> Reference for ranking model from [NVIDIA](https://build.nvidia.com/explore/retrieval)


### Data Loading


In [4]:
import pandas as pd

elf_profiles_df = pd.read_csv("challenge_files/task1_elf_profiles.csv")
elf_profiles_df.head(3)

Unnamed: 0,Elf Name,Profile Description
0,Jack,Loves theatrics and excels at bringing magic t...
1,Sally,Fantastic at sewing and enjoys creating colorf...
2,Zero,Knows all about Christmas history and is very ...


In [5]:
task_descriptions_df = pd.read_csv("challenge_files/task1_descriptions.csv")
task_descriptions_df.head(3)

Unnamed: 0,Task ID,Task Description
0,1,Paint intricate details on 100 wooden toys.
1,2,Act as a tour guide for visitors to Christmas ...
2,3,Move heavy boxes of toys to the sleigh loading...


In [6]:
elf_profiles = elf_profiles_df.to_dict()

elf_names = list(elf_profiles["Elf Name"].values())
elf_descriptions = list(elf_profiles["Profile Description"].values())

task_descriptions = task_descriptions_df.to_dict()
task_descriptions = list(task_descriptions["Task Description"].values())

### Ranker Implementation

> Ranker Reference: https://build.nvidia.com/explore/retrieval

In [7]:
from haystack_integrations.components.rankers.nvidia import NvidiaRanker
from haystack import Document
from haystack.utils import Secret

tasks = []

for idx, t in enumerate(task_descriptions):
    tasks.append(Document(content=t, id=idx + 1))

ranker = NvidiaRanker(
    model="nvidia/nv-rerankqa-mistral-4b-v3",
    api_key=Secret.from_env_var("NVIDIA_API_KEY"),
)

ranker.warm_up()

all_elf_compatibility = []
for i, elf in enumerate(elf_descriptions):
    elf_compatibility = ranker.run(elf, tasks)

    all_elf_compatibility.append(elf_compatibility)

In [8]:
for i in range(len(all_elf_compatibility)):
    print(
        f"The best task for elf {elf_names[i]} is {all_elf_compatibility[i]['documents'][0].content}"
    )

The best task for elf Jack is Design and sew 100 elf uniforms for the holiday season.
The best task for elf Sally is Design and sew 100 elf uniforms for the holiday season.
The best task for elf Zero is Act as a tour guide for visitors to Christmas Village.
The best task for elf Mayor is Carve and assemble 100 custom wooden trains.
The best task for elf Oogie is Host storytime sessions for children visiting Christmas Village.
The best task for elf Lock is Prepare and decorate 200 holiday cookies for Santa's snack station.
The best task for elf Shock is Fix the glitchy conveyor belt in the toy assembly line.
The best task for elf Barrel is Move heavy boxes of toys to the sleigh loading area.
The best task for elf Finkelstein is Paint intricate details on 100 wooden toys.
The best task for elf Sandy is Groom and care for Santa's reindeer, ensuring they're ready for the big night.


In [9]:
# Repeating the steps with an alternative ranker model in build.nvidia.com
alternative_ranker = NvidiaRanker(
    model="nvidia/llama-3.2-nv-rerankqa-1b-v1",
    api_key=Secret.from_env_var("NVIDIA_API_KEY"),
)

alternative_ranker.warm_up()

alternative_all_elf_compatibility = []
for i, elf in enumerate(elf_descriptions):
    elf_compatibility = ranker.run(elf, tasks)

    alternative_all_elf_compatibility.append(elf_compatibility)

In [10]:
for i in range(len(alternative_all_elf_compatibility)):
    print(
        f"The best task for elf {elf_names[i]} is {alternative_all_elf_compatibility[i]['documents'][0].content}"
    )

The best task for elf Jack is Design and sew 100 elf uniforms for the holiday season.
The best task for elf Sally is Design and sew 100 elf uniforms for the holiday season.
The best task for elf Zero is Act as a tour guide for visitors to Christmas Village.
The best task for elf Mayor is Carve and assemble 100 custom wooden trains.
The best task for elf Oogie is Host storytime sessions for children visiting Christmas Village.
The best task for elf Lock is Prepare and decorate 200 holiday cookies for Santa's snack station.
The best task for elf Shock is Fix the glitchy conveyor belt in the toy assembly line.
The best task for elf Barrel is Move heavy boxes of toys to the sleigh loading area.
The best task for elf Finkelstein is Paint intricate details on 100 wooden toys.
The best task for elf Sandy is Groom and care for Santa's reindeer, ensuring they're ready for the big night.


### Ranker Performance Comparison


In [11]:
from haystack.components.evaluators import DocumentMRREvaluator

# Getting the ground truth: optimal task-elf placement
ground_truth = pd.read_csv("challenge_files/task1_solution_key.csv").to_dict()
ground_truth = list(ground_truth["Task ID"].values())

# Getting the ranking and alternative_ranking actual results
ranking = [
    all_elf_compatibility[index]["documents"][0].id
    for index in range(len(all_elf_compatibility))
]
alternative_ranking = [
    alternative_all_elf_compatibility[index]["documents"][0].id
    for index in range(len(alternative_all_elf_compatibility))
]

In [12]:
# Comparing the results of the two ranker models using the Mean Reciprocal Rank (MRR) score. This is a ranking quality metric that only considers the position of the first relevant item in the ranked list.
mrr_evaluator = DocumentMRREvaluator()

ground_truth_documents = []
ranking_documents = []
alternative_ranking_documents = []

for idx, gt in enumerate(ground_truth):
  ground_truth_documents.append([Document(content=gt)])
  ranking_documents.append([Document(content=ranking[idx])])
  alternative_ranking_documents.append([Document(content=alternative_ranking[idx])])


# Running the mrr_evaluator over the original and the alternative rankers' results
result_ranker = mrr_evaluator.run(ground_truth_documents, ranking_documents)
result_alternative_ranker = mrr_evaluator.run(ground_truth_documents, alternative_ranking_documents)

print(f"MRR for the first ranker is {result_ranker['score']}")
print(f"MRR for the second ranker is {result_alternative_ranker['score']}")

MRR for the first ranker is 0.9
MRR for the second ranker is 0.9


## Task 2 - Santa's Workshop: Delivery Organiser

Congratulations! With your help, elf Rita managed to assign the best task for each elf, making everyone happy and preserving the spirit of Christmas.

Santa is very happy with this outcome. To reward elf Rita for doing a great job, he makes her responsible for more work (with the vague promise of a promotion in the near future).

The new task is this: every year, Santa receives Christmas letters from children. On the backs of the letters, he gets notes from parents with instructions on where to leave the gifts. These notes, written in languages from all over the word, give indications of where best to leave the gifts. The options are:

1. Under the Christmas Tree
2. Under the Children's Beds
3. Near the Fireplace
4. In the Stockings

Elf Rita does not know all the languages in the world - however, she knows AI!
Help her by using a multilingual embedder and a cosine similarity algorithm to match the diverse descriptions to their categories.


### **Task 2**


**Complete the code snippets below to use Haystack and a multilingual embedding model from [NVIDIA](https://build.nvidia.com/nvidia/llama-3_2-nv-embedqa-1b-v1) to group these notes based on their semantic meaning, regardless of the language they are written in.**<br>
**Again, your tasks are marked with ### COMPLETE ### comments.**


### Read the Task Data

In [13]:
parent_notes_df = pd.read_excel("challenge_files/task2_parent_notes.xlsx")
parent_notes_df.head(3)

Unnamed: 0,Notes
0,Just leave everything under the tree; it’s tra...
1,Mijn dochter zal het geweldig vinden om 's och...
2,"Deja los regalos junto a la chimenea, por favor."


In [14]:
parent_notes = parent_notes_df.to_dict()
parent_notes = list(parent_notes["Notes"].values())

# inputting the possible locations for the presents, based on the task description above.
location_options = ["Under the Christmas Tree", "Under the Children's Beds", "Near the Fireplace", "In the Stockings"]

### Embedding the text

> Embedding model Reference: https://build.nvidia.com/explore/retrieval

In [15]:
from haystack_integrations.components.embedders.nvidia import NvidiaTextEmbedder

embedder = NvidiaTextEmbedder(
    model="nvidia/llama-3.2-nv-embedqa-1b-v2",
    api_url="https://integrate.api.nvidia.com/v1",
    api_key=Secret.from_env_var("NVIDIA_API_KEY"),
)

embedder.warm_up()

In [16]:
embedded_notes = []
for note in parent_notes:
    embedded_notes.append(embedder.run(note))

embedded_locations = []
for l in location_options:
    embedded_locations.append(embedder.run(l))

### Computing the similarity between notes and location options


In [21]:
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

max_cos_similarity_per_note = []
for i in range(len(embedded_notes)):
  cos_similarity_per_note = []
  for j in range(len(embedded_locations)):
    # Using cosine_similarity to compute the similarity score between embedded_notes and embedded_locations
    cos_similarity_per_note.append(cosine_similarity([embedded_notes[i]["embedding"]],[embedded_locations[j]["embedding"]])) # TODO

  # Getting the best match with argmax
  max_cos_similarity_per_note.append(np.argmax(cos_similarity_per_note))

### Embedder Performance Evaluation


In [25]:
# Read the ground truth data
ground_truth_data = pd.read_excel("challenge_files/task2_solution_key.xlsx").to_dict()
ground_truth_data = list(ground_truth_data["Key"].values())

print(ground_truth_data)
print(max_cos_similarity_per_note)

# Computing the accuracy of the solution by comparing max_cos_similarity_per_note to ground_truth_data

correct_predictions = 0

total_predictions = len(ground_truth_data)

for predicted, actual in zip(max_cos_similarity_per_note, ground_truth_data):
    if predicted == actual:
        correct_predictions += 1
accuracy = correct_predictions / total_predictions

[0, 0, 2, 3, 2, 2, 1, 1, 1, 2, 0, 2, 3, 3, 0, 1, 0, 1, 2, 2, 3, 1, 2, 1, 3, 0, 3, 3, 0, 0]
[np.int64(0), np.int64(0), np.int64(2), np.int64(3), np.int64(2), np.int64(2), np.int64(0), np.int64(1), np.int64(0), np.int64(2), np.int64(0), np.int64(2), np.int64(3), np.int64(3), np.int64(0), np.int64(1), np.int64(0), np.int64(1), np.int64(2), np.int64(2), np.int64(3), np.int64(1), np.int64(2), np.int64(1), np.int64(2), np.int64(0), np.int64(3), np.int64(3), np.int64(0), np.int64(0)]


In [26]:
print(f"The accuracy of this embedder is {accuracy}")

The accuracy of this embedder is 0.9


## Conclusion:

With the help of Nvidia Inference Microservices, We were able to manage both Santa's Workshop and gift deliveries. Everything should now be ready for Christmas!🎄

<sub><sup>PS - Due to her customer obsession and ability to innovate and disrupt using AI, Elf Rita got her promotion after all!🌟
