# Embedding-Based Classification

Large language model embeddings offer a powerful approach to text classification.
In this method, each example from various classes is transformed into a vector representation using the embeddings from the language model.
These embedded vectors capture the semantic essence of the text.
Once this is done, clusters of embeddings are formed for each class, representing the centroid or the average meaning of the examples within that class.
When a new piece of text needs to be classified, it is first embedded using the same language model.
This new embedded vector is then compared to the pre-defined clusters for each class using a cosine similarity.
The class whose cluster is closest to the new text's embedding is then assigned to the text, thereby achieving classification.
This method leverages the deep semantic understanding of large language models to classify texts with high accuracy and nuance.

### When should you use embedding-based classification?

We recommend using this type of classification when...
- ...proper classification requires fine-grained control over the classes' definitions.
- ...the labels can be defined mostly or purely by the semantic meaning of the examples.
- ...examples for each label are readily available.


Let's start by instantiating a classifier for sentiment classification.

In [3]:
from os import getenv

from aleph_alpha_client import Client

from intelligence_layer.use_cases.classify.embedding_based_classify import EmbeddingBasedClassify, LabelWithExamples


client = Client(getenv("AA_TOKEN"))
labels_with_examples = [
    LabelWithExamples(
        name="positive",
        examples=[
            "I really like this.",
            "Wow, your hair looks great!",
            "We're so in love.",
            "That truly was the best day of my life!",
            "What a great movie."
        ],
    ),
    LabelWithExamples(
        name="negative",
        examples=[
            "I really dislike this.",
            "Ugh, Your hair looks horrible!",
            "We're not in love anymore.",
            "My day was very bad, I did not have a good time.",
            "They make terrible food."
        ],
    ),
]
classify = EmbeddingBasedClassify(labels_with_examples, client)


Alright, let's classify a new example!

In [None]:
from intelligence_layer.use_cases.classify.classify import ClassifyInput


ClassifyInput(
    chunk="It was very awkward with him, I did not enjoy it.",
    labels=frozenset(l.name for l in labels_with_examples)
)