## Prompt based classification 

Prompt based classification is a methodology that relies purely on prompting the LLM in a specific way. 

### When should you use prompt based classification 

Some situations when you would use this methodology is when:
- The labels are easily understood (they don't require explanation or examples)
    
    An example is sentiment analysis
- The labels are not recognized by their semantic meaning
    
    E.g. Reasoning tasks like classifying contradictions
- You don't have many examples

### Example snippet 
Running the following code will instantiate a prompt based classifier, with a debug level for the log. 
Then it will classify the text given in "ClassifyInput".
The contents of the debuglog will be shown below.
The debuglog gives an overview of the steps taken to get the result.

In [None]:
from os import getenv
from aleph_alpha_client import Client
from intelligence_layer.classify import SingleLabelClassify, ClassifyInput
from pprint import pprint

text_to_classify = "In the distant future, a space exploration party embarked on a thrilling journey to the uncharted regions of the galaxy. \n\
    With excitement in their hearts and the cosmos as their canvas, they ventured into the unknown, discovering breathtaking celestial wonders. \n\
    As they gazed upon distant stars and nebulas, they forged unforgettable memories that would forever bind them as pioneers of the cosmos."
labels = ["happy", "angry", "sad"]
client = Client(getenv("AA_TOKEN"))
task = SingleLabelClassify(client, "info")
input = ClassifyInput(
    text=text_to_classify, 
    labels=labels
)

output = task.run(input)
for label, score in output.scores.items():
    print(f"{label}: {round(score, 4)}")
output.debug_log

### How does this implemetation work
For prompt based classification, we prompt the model multiple times with the text we want to classify and each of our classes. 
Instead of letting the model generate the class it thinks fits the text best, we ask it for the probability for each class.

To further explain this, lets start with a more familiar case.
The intuitive way to ask an LLM if it could label a text could be something like this: 

In [None]:
from aleph_alpha_client import PromptTemplate

prompt_template = PromptTemplate(SingleLabelClassify.PROMPT_TEMPLATE) 
print(prompt_template.to_prompt(text=text_to_classify, label="").items[0].text)

The model would then answer our question, and give us a class that it thinks fits the text. 

In the case of classification, however, we already have the classes beforehand.
Because of this, all we are interested in is the probability the model would have guessed our specific classes.
To get this probability, we can prompt the model with each of our classes and ask the model to return the logprobs for the text. 

In case of prompt based classification the prompt looks something like this:

In [None]:
prompt_template = PromptTemplate(SingleLabelClassify.PROMPT_TEMPLATE) 
print(prompt_template.to_prompt(text=text_to_classify, label=labels[0]).items[0].text)

As you can see, we have the same prompt, but with the class already filled in as a response.

Our request will now not generate any tokens, but instead we ask the model: "How likely is it that you would have given the following response, given the text?". 

```python
CompletionRequest(
    prompt=prompt_template.to_prompt(**kwargs),
    maximum_tokens=0,
    log_probs=0,
    tokens=True,
    echo=True,
)
```

In the case of the classes "Space exploration" and "Space party", the logprobs per label might look something like the code snippet below. 

In [None]:
from intelligence_layer.task import InfoEnabledLog 
result_objects = [log_entry for log_entry in output.debug_log.root if log_entry.message == "Raw log probs per label"]
InfoEnabledLog(root=result_objects)

Now that we have our logprobs, all we have to do is take the exponent and normalize the scores to get our endresults. 

In [None]:
for label, score in output.scores.items():
    print(f"{label}: {round(score, 5)}")

The example mentioned before is rather straightforward, but there are some situations when it isn't as obvious as normalizing a single token per class.

What if we take some classes that contain multiple tokens with some overlap?

The following example shows how the calculations are done in the case of three classes: 
```python 
["Space party", "Space exploration", "Space exploration party"]
```
The overlap in the tokens makes the calculation a bit less straightforward. 

The following graph shows how the tokens relate to each other and their possible probabilities:

![Alt text](classify_tokens.png)

Let's go through the graph:

1. Starting at the top of the tree, when there's only one token to choose from, the normalized score for that token will always be 1. This means that when there's no ambiguity, the score is definitive.

2. Moving down the tree, the first pivotal decision arises, where one must choose between "exploration" and "party." This choice sets the direction for subsequent evaluations.

3. If "exploration" is chosen, we encounter a final decision point, where we select between the "endoftext" token and "party."

* It's important to note that the "endoftext" token is an internal component used by large language models for their calculations. Typically, end users aren't be exposed to this token. 

* In our context, this choice translates into deciding between "Space exploration party" and "Space exploration" as the two possible outcomes.

Now if we take the product of the paths to all th end classes the results are:

Space party: 0.0

Space exploration: 0.4

Space exploration party: 0.6