# Classification

Language models offer unprecedented capabilities in understanding and generating human-like text.
One of the pressing issues in their application is the classification of vast amounts of data.
Traditional methods often require manual labeling and can be time-consuming and prone to errors.
LLMs, on the other hand, can swiftly process and categorize enormous datasets with minimal human intervention.
By leveraging LLMs for classification tasks, organizations can unlock insights from their data more efficiently, streamline their workflows, and harness the full potential of their information assets.

In this notebook, we present to alternative ways for classifying text using Aleph Alpha's Luminous models.
First, let's have a look at single-label classification using prompting.

### Prompt-based single-label classification

Single-label classification refers to the task of categorizing data points into one of n distinct categories or classes.
In this type of classification, each input is assigned to only one class, ensuring that no overlap exists between categories.
Common applications of single-label classification include email spam detection, where emails are classified as either "spam" or "not spam", or sentiment classification, where a text can be "positive", "negative" or "neutral".
When trying to solve this issue in a prompt-based manner, our primary goal is to construct a prompt that instructs the model to accurately predict the correct class for any given input.

### When should you use prompt-based classification?

We recommend using this type of classification when...
- ...the labels are easily understood (they don't require explanation or examples).
- ...the labels cannot be recognized purely by their semantic meaning.
- ...many examples for each label aren't readily available.

### Example snippet

Running the following code will instantiate a `PromptBasedClassify` that leverages a prompt for classification.
We can now enter any `ClassifyInput` so that the task returns each label along with its probability.
In addition, note the `tracer`, which will give a comprehensive overview of the result.


In [3]:
from os import getenv
from intelligence_layer.connectors.limited_concurrency_client import LimitedConcurrencyClient

from intelligence_layer.use_cases.classify.prompt_based_classify import ClassifyInput, PromptBasedClassify
from intelligence_layer.core.chunk import Chunk
from intelligence_layer.core.tracer import InMemoryTracer

text_to_classify = Chunk("In the distant future, a space exploration party embarked on a thrilling journey to the uncharted regions of the galaxy. \n\
With excitement in their hearts and the cosmos as their canvas, they ventured into the unknown, discovering breathtaking celestial wonders. \n\
As they gazed upon distant stars and nebulas, they forged unforgettable memories that would forever bind them as pioneers of the cosmos.")
labels = ["happy", "angry", "sad"]
client = LimitedConcurrencyClient.from_token(getenv("AA_TOKEN"))
task = PromptBasedClassify(client)
input = ClassifyInput(
    chunk=text_to_classify,
    labels=labels
)

tracer = InMemoryTracer()
try:
    output = task.run(input, tracer)
except:
    print(tracer.entries)
    exit()
for label, score in output.scores.items():
    print(f"{label}: {round(score, 4)}")


[InMemoryTaskSpan(entries=[InMemorySpan(entries=[InMemoryTaskSpan(entries=[InMemoryTaskSpan(entries=[], name='Complete', start_timestamp=datetime.datetime(2023, 11, 23, 15, 2, 34, 226949), end_timestamp=datetime.datetime(2023, 11, 23, 15, 2, 34, 311750), input=CompleteInput(request=CompletionRequest(prompt=Prompt(items=[Text(text='### Instruction:\nIdentify a class that describes the text adequately.\nReply with only the class label.\n\n### Input:\nIn the distant future, a space exploration party embarked on a thrilling journey to the uncharted regions of the galaxy. \nWith excitement in their hearts and the cosmos as their canvas, they ventured into the unknown, discovering breathtaking celestial wonders. \nAs they gazed upon distant stars and nebulas, they forged unforgettable memories that would forever bind them as pioneers of the cosmos.\n\n### Response:', controls=[]), Tokens(tokens=[10117, 0], controls=[])]), maximum_tokens=0, temperature=0.0, top_k=0, top_p=0.0, presence_penalt

NameError: name 'output' is not defined

: 

### How does this implementation work?

We prompt the model multiple times, each time supplying the text, or chunk, and one label at a time.
Note that we also supply each label, rather than letting the model generate it.

To further explain this, let's start with a more familiar case.
Intuitively, one would probably prompt a model like so:

In [None]:
from aleph_alpha_client import PromptTemplate

prompt_template = PromptTemplate(PromptBasedClassify.PROMPT_TEMPLATE)
print(prompt_template.to_prompt(text=text_to_classify, label="").items[0].text)


The model would then complete our instruction, thus generating a matching label.

In the case of single-label classification, however, we already know all possible classes beforehand.
Because of this, all we are interested in is the probability that the model would have generated our specific classes.
To get this probability, we can prompt the model with each of our classes and ask it to return the "logprobs" for the text.

In the case of prompt-based classification, the base prompt looks something like this:

In [None]:
prompt_template = PromptTemplate(PromptBasedClassify.PROMPT_TEMPLATE)
print(prompt_template.to_prompt(text=text_to_classify, label=" " +labels[0]).items[0].text)


As you can see, we have the same prompt, but with a potential label candidate already filled in.
Now, we will ask the model to evaluate the likelihood of this label, i.e. completion.

Our request will not generate any tokens, but instead return the log probability of this completion given the previous tokens.
This is called an `EchoTask`.
Let's have a look at just one of these tasks triggered by our classification run.

In particular, note the `expected_completion` in the `Input` and the `prob` for the token " angry" in the `Output`.
Feel free to ignore the big `Complete` task dump in the middle.

In [None]:
tracer.entries[-1].entries[0].entries[0]

Now that we have the logprobs, we just need to do some calculations to turn them into a final score.

To turn the logprobs into our end scores, we first normalize our probabilities.
For this, we utilize a probability tree.

In [None]:
from intelligence_layer.use_cases.classify.prompt_based_classify import TreeNode
from intelligence_layer.core.tracer import LogEntry

task_log = tracer.entries[-1]
normalized_probs_logs = [log_entry.value for log_entry in task_log.entries if isinstance(log_entry, LogEntry) and log_entry.message == "Normalized Probs"]
log = normalized_probs_logs[-1]

root = TreeNode()
for probs in log.values():
    root.insert_without_calculation(probs)


Finally, we take the product of all the paths to get the following results:

In [None]:
for label, score in output.scores.items():
    print(f"{label}: {round(score, 5)}")


The example mentioned before is rather straightforward, but there are some situations when it isn't as obvious as a single token.

What if we take some labels that have overlapping tokens?
This makes the calculation a bit more complicated:

In [None]:
from intelligence_layer.use_cases.classify.prompt_based_classify import PromptBasedClassify, ClassifyInput
from intelligence_layer.core.tracer import LogEntry


labels = ["Space party", "Space exploration", "Space exploration party"]
task = PromptBasedClassify(client)
input = ClassifyInput(
    chunk=text_to_classify,
    labels=labels
)
tracer = InMemoryTracer()
output = task.run(input, tracer)
task_log = tracer.entries[-1]
normalized_probs_logs = [log_entry.value for log_entry in task_log.entries if isinstance(log_entry, LogEntry) and log_entry.message == "Normalized Probs"]
log = normalized_probs_logs.pop()

root = TreeNode()
for probs in log.values():
    root.insert_without_calculation(probs)

print("End scores:")
for label, score in output.scores.items():
    print(f"{label}: {round(score, 4)}")


Here, the three classes have some overlapping tokens, namely "Space", and "exploration".
"party" is not overlapping, because it occurs in two different places (after "Space" and after "exploration").

Cool, so we now figured out how to do prompt-based classification.
Let's have a look at another classification use-case!

### Embedding-based multi-label classification

Large language model embeddings offer a powerful approach to text classification.
In particular, such embeddings can be seen as a numerical representation of the meaning of a text.
Utilizing this, we can provide textual examples for each label and embed them to create a representations for each label in vector space.

**Or, in more detail**:
In this method, each example from various classes is transformed into a vector representation using the embeddings from the language model.
These embedded vectors capture the semantic essence of the text.
Once this is done, clusters of embeddings are formed for each class, representing the centroid or the average meaning of the examples within that class.
When a new piece of text needs to be classified, it is first embedded using the same language model.
This new embedded vector is then compared to the pre-defined clusters for each class using a cosine similarity.
The class whose cluster is closest to the new text's embedding is then assigned to the text, thereby achieving classification.
This method leverages the deep semantic understanding of large language models to classify texts with high accuracy and nuance.

### When should you use embedding-based classification?

We recommend using this type of classification when...
- ...proper classification requires fine-grained control over the classes' definitions.
- ...the labels can be defined mostly or purely by the semantic meaning of the examples.
- ...examples for each label are readily available.

### Example snippet

Let's start by instantiating a classifier for sentiment classification.

In [None]:
from intelligence_layer.use_cases.classify.embedding_based_classify import EmbeddingBasedClassify, LabelWithExamples


labels_with_examples = [
    LabelWithExamples(
        name="positive",
        examples=[
            "I really like this.",
            "Wow, your hair looks great!",
            "We're so in love.",
            "That truly was the best day of my life!",
            "What a great movie."
        ],
    ),
    LabelWithExamples(
        name="negative",
        examples=[
            "I really dislike this.",
            "Ugh, Your hair looks horrible!",
            "We're not in love anymore.",
            "My day was very bad, I did not have a good time.",
            "They make terrible food."
        ],
    ),
]
classify = EmbeddingBasedClassify(labels_with_examples, client)


There are several things to note here, in particular:
- This time, we instantiated our classification task with a number of `LabelWithExamples`.
- The examples provided should reflect the spectrum of texts expected in the intended usage domain of this classifier.
- This cell took some time to run.
This is because we instantiate a retriever in the background, which also requires us to embed the provided examples.

With that being said, let's run an unknown example!

In [None]:
classify_input = ClassifyInput(
    chunk="It was very awkward with him, I did not enjoy it.",
    labels=frozenset(l.name for l in labels_with_examples)
)
tracer = InMemoryTracer()
result = classify.run(classify_input, tracer)
result

Nice, we correctly identified the new example.

Again, let's appreciate the difference of this result compared to `PromptBasedClassify`'s result.
- The probabilities do not add up to 1.
In fact, we have no way of predicting what the sum of all scores will be.
In some cases, individual scores may even be negative.
All we know is that the highest score is likely to correspond to the best fitting label, provided we delivered good examples.
- We were much quicker to obtain a result.

Because all examples are pre-embedded, this classifier is much cheaper to operate as it only requires a single embedding-task to be sent to the Aleph Alpha API.

Let's try another example. This time, we expect the outcome to be positive.


In [None]:
classify_input = ClassifyInput(
    chunk="We used to be not like each other, but this changed a lot.",
    labels=frozenset(l.name for l in labels_with_examples)
)
tracer = InMemoryTracer()
result = classify.run(classify_input, tracer)
result

Unfortunately, we wrongly classify this text as negative.
To be fair, it is a difficult example.
But no worries, let's simply include this failing example in our list of label examples and try again!

In [None]:
from intelligence_layer.use_cases.classify.embedding_based_classify import EmbeddingBasedClassify, LabelWithExamples


labels_with_examples = [
    LabelWithExamples(
        name="positive",
        examples=[
            "I really like this.",
            "Wow, your hair looks great!",
            "We're so in love.",
            "That truly was the best day of my life!",
            "What a great movie.",
            "We used to be not like each other, but this changed a lot." # failing example
        ],
    ),
    LabelWithExamples(
        name="negative",
        examples=[
            "I really dislike this.",
            "Ugh, Your hair looks horrible!",
            "We're not in love anymore.",
            "My day was very bad, I did not have a good time.",
            "They make terrible food."
        ],
    ),
]
classify = EmbeddingBasedClassify(labels_with_examples, client)

tracer = InMemoryTracer()
result = classify.run(classify_input, tracer)
result

Nice, we now correctly classify this example!

One advantage of using the `EmbeddingBasedClassify`-approach is that we can easily tweak our labels by adding new examples.
In essence, this guarantees that we never make the same mistake twice.
As we increase the number of examples, this makes the method evermore precise.

You now have an overview of these two main methods of classification!
Feel free to tweak these method and play around with their parameters to finetune them to our specific use-case.