# About
Use similarity score of input text and category description to predict input text's category.

Initial phase:
1. Create text description for each category.
2. Get the embeddings of each cateogry description.
Inference phase:
1. Get embedding of the input text.
2. Calculate similarity of input text's embedding against category description embeddings.
3. Based upon similarity score, predict category of the input text.

Fine tuning:
1. Improvise category description.
2. User better model for getting embeddings.

# Data

In [11]:
# categories = {
#     "affirm": "Text that expresses agreement, confirmation, or a positive response. Examples: 'Yes, I agree,' 'That’s correct,' or 'Sure, I’ll do it.'",
#     "deny": "Text that expresses disagreement, refusal, or a negative response. Examples: 'No, that’s not right,' 'I disagree,' or 'I can’t do it.'",
#     "not_sure": "Text that expresses uncertainty, doubt, or hesitation. Examples: 'I’m not sure,' 'Maybe,' or 'I need more information.'"
# }
categories = {
    "affirm": "agreement confirmation positive Yes, I agree That’s correct or Sure I’ll do it.",
    "deny": "disagreement refusal negative No, that’s not right I disagree I can’t do it.",
    "not_sure": "uncertainty doubt hesitation I’m not sure Maybe I need more information."
}

In [12]:
test_samples = [
    # Affirmative Examples
    {"text": "Yes, that makes perfect sense.", "category": "affirm"},
    {"text": "I completely agree with you.", "category": "affirm"},
    {"text": "Absolutely, that’s correct.", "category": "affirm"},
    {"text": "Sure thing, I’ll handle it.", "category": "affirm"},

    # Negative Examples
    {"text": "No, I don’t think so.", "category": "deny"},
    {"text": "That’s not what I meant at all.", "category": "deny"},
    {"text": "I can’t agree with this.", "category": "deny"},
    {"text": "No way, that’s incorrect.", "category": "deny"},

    # Uncertainty Examples
    {"text": "I’m not sure if this is right.", "category": "not_sure"},
    {"text": "Maybe, but I need more information.", "category": "not_sure"},
    {"text": "It could be true, but I’m not confident.", "category": "not_sure"},
    {"text": "I don’t know for certain.", "category": "not_sure"},

    # Ambiguous/Edge Cases
    {"text": "Possibly, but I’ll have to think about it.", "category": "not_sure"},  # Uncertain tone
    {"text": "No, wait... actually, yes.", "category": "affirm"},  # Affirmative conclusion
    {"text": "I don’t think I agree with that, but I’m not certain.", "category": "not_sure"},  # Mixed tone
    {"text": "I’m sorry, I can’t do that.", "category": "deny"},  # Negative with apology
    {"text": "Yes, but I’m still a bit unsure.", "category": "affirm"},  # Affirmative with slight doubt
]

# Initial Phase
Getting embeddings of category descriptions.

## Using ollama
ollama exposes `/api/embeddings` api endpoint which we can use to get embeddings.

In [13]:
import ollama  # pip install ollama

# MODEL = "nomic-embed-text"
MODEL = "mxbai-embed-large"
HOST = "http:/localhost:11434"

In [14]:
category_embeddings = {
    category: ollama.embed(model=MODEL, input=description)["embeddings"][0]
    for category, description in categories.items()
}

# Inference
Classify the category of input text.

In [15]:
from sklearn.metrics.pairwise import cosine_similarity


def cal_similarity(v1, v2):
    return cosine_similarity([v1], [v2])


def classify(input_text: str) -> str:
    """
    Args:
        input (str): The input text that is to be classified.
    Returns:
        str: The predicted class of input text.
    """
    input_text_embedding = ollama.embed(model=MODEL, input=input_text)["embeddings"][0]
    similarities = {
        category: cal_similarity(input_text_embedding, category_embedding)
        for category, category_embedding in category_embeddings.items()
    }
    return max(similarities, key=similarities.get)


classify("Yea sure!")

'affirm'

# Evaluation

In [16]:
for sample in test_samples:
    predicted = classify(sample["text"])
    print(f'{predicted} : {sample["category"]} : {sample["text"]}')

affirm : affirm : Yes, that makes perfect sense.
affirm : affirm : I completely agree with you.
affirm : affirm : Absolutely, that’s correct.
affirm : affirm : Sure thing, I’ll handle it.
deny : deny : No, I don’t think so.
deny : deny : That’s not what I meant at all.
deny : deny : I can’t agree with this.
deny : deny : No way, that’s incorrect.
not_sure : not_sure : I’m not sure if this is right.
not_sure : not_sure : Maybe, but I need more information.
not_sure : not_sure : It could be true, but I’m not confident.
not_sure : not_sure : I don’t know for certain.
not_sure : not_sure : Possibly, but I’ll have to think about it.
affirm : affirm : No, wait... actually, yes.
not_sure : not_sure : I don’t think I agree with that, but I’m not certain.
deny : deny : I’m sorry, I can’t do that.
not_sure : affirm : Yes, but I’m still a bit unsure.
