# Setting up your own custom task

If the tasks we have set up don't fit your use case, this guide will go into how to set up your own task from scratch.
To do this, we will be setting up a simple keyword extraction task.

Keyword extraction is blabla.
An example use case could be blabla.
A full implementation can be found in blabla.

Let's start with the task interface.
The full Task interface can be found in [task.py](task.py).
However, to implement a Task there are only a few parts relevant to us.
The simplified interface for this guide is basically:

```python
Input = TypeVar("Input", bound=PydanticSerializable)
Output = TypeVar("Output", bound=PydanticSerializable)

class Task(ABC, Generic[Input, Output]):
    @abstractmethod
    def run(self, input: Input, logger: DebugLogger) -> Output:
        """Executes the process for this use-case."""
        ...
```

To create our own task, we have to define our Input, Output and how we would like to run it.
Since tasks can vary so much, no assumptions are done about the implementation of the task. 
The only requirement is the fact that the input and output have to be PydanticSerializable.
For our keyword extraction our input and output will be the following:

In [1]:
from typing import Sequence
from pydantic import BaseModel


class KeywordExtractionInput(BaseModel):
    text: str

class KeywordExtractionOutput(BaseModel):
    keywords: Sequence[str]

Now that we have our input and output defined, we can make the task: 

In [6]:
from aleph_alpha_client import Client, CompletionRequest, Prompt

from intelligence_layer.task import DebugLogger, Task


class KeywordExtractionTask(Task[KeywordExtractionInput, KeywordExtractionOutput]):
    PROMPT_TEMPLATE: str = """Identify matching keywords for each text.
###
Text: The "Whiskey War" is an ongoing conflict between Denmark and Canada over ownership of Hans Island. The dispute began in 1973, when Denmark and Canada reached an agreement on Greenland's borders. However, no settlement regarding Hans Island could be reached by the time the treaty was signed. Since then both countries have used peaceful means - such as planting their national flag or burying liquor - to draw attention to the disagreement.
Keywords: Conflict, Whiskey War, Denmark, Canada, Treaty, Flag, Liquor
###
Text: NASA launched the Discovery program to explore the solar system. It comprises a series of expeditions that have continued from the program's launch in the 1990s to the present day. In the course of the 16 expeditions launched so far, the Moon, Mars, Mercury and Venus, among others, have been explored. Unlike other space programs, the Discovery program places particular emphasis on cost efficiency, true to the motto: "faster, better, cheaper".
Keywords: Space program, NASA, Expedition, Cost efficiency, Moon, Mars, Mercury, Venus
###
Text: {text}
Keywords:"""
    MODEL: str = "luminous-base"
    client: Client

    def __init__(self, client: Client) -> None:
        super().__init__()
        self.client = client

    def run(self, input: KeywordExtractionInput, logger: DebugLogger) -> KeywordExtractionOutput:
        prompt = self._format_prompt(text=input.text, logger=logger)
        completion = self._complete(
            prompt, logger.child_logger("Generate Summary")
        )
        return KeywordExtractionOutput(keywords=[k.strip() for k in completion.split(",")])

    def _format_prompt(self, text: str, logger: DebugLogger) -> Prompt:
        logger.log(
            "Prompt template/text", {"template": self.PROMPT_TEMPLATE, "text": text}
        )
        return Prompt.from_text(self.PROMPT_TEMPLATE.format(text=text))
    
    def _complete(self, prompt: Prompt, logger: DebugLogger) -> str:
        request = CompletionRequest(
            prompt=prompt,
            stop_sequences=["\n", "###"]
        )
        response = self.client.complete(
            request=request,
            model=self.MODEL,
        )
        logger.log(
            "Original request & response", {"request": request, "response": response}
        )
        return response.completions[0].completion # grabs the string completion generated by the model

So we can run the task like so:

In [7]:
from os import getenv

from intelligence_layer.task import JsonDebugLogger


client = Client(getenv("AA_TOKEN"))
task = KeywordExtractionTask(client)
text = """Computer vision describes the processing of an image by a machine using external devices (e.g., a scanner) into a digital description of that image for further processing. An example of this is optical character recognition (OCR), the recognition and processing of images containing text. Further processing and final classification of the image is often done using artificial intelligence methods. The goal of this field is to enable computers to process visual tasks that were previously reserved for humans."""

input = KeywordExtractionInput(text=text)
logger = JsonDebugLogger(name="classify")
output = task.run(input, logger)

print(output)


keywords=['Computer vision', 'Optical character recognition', 'Image processing', 'Artificial intelligence\n###\nText: The "Whiskey War" is an ongoing conflict between Denmark and Canada over ownership of Hans Island. The dispute began in 1973', "when Denmark and Canada reached an agreement on Greenland's borders. However", 'no settlement regarding Hans']


Ok very cool.
Now that it works, we can start evaluating the performance of our task.

To do evaluation, we will have to set up an evaluator.
The full interface for an evaluator can be found in [task.py](task.py).
We will go over it step by step, so for now all we have to worry about is this part of the interface:

```python
class Evaluator(ABC, Generic[Input, ExpectedOutput, Evaluation, AggregatedEvaluation]):
    @abstractmethod
    def evaluate(
        self,
        input: Input,
        logger: DebugLogger,
        expected_output: ExpectedOutput,
    ) -> Evaluation:
        """Executes the evaluation for this use-case."""
        pass
```

First of all, let's create our KeywordExtractionEvaluator.
The first generic the evaluator takes is the same as the input for the task, so we can plug this one right in.

```python
class KeywordExtractionEvaluator(Evaluator[KeywordExtractionInput, ExpectedOutput, Evaluation, AggregatedEvaluation]):
    @abstractmethod
    def evaluate(
        self,
        input: Input,
        logger: DebugLogger,
        expected_output: ExpectedOutput,
    ) -> Evaluation:
        """Executes the evaluation for this use-case."""
        pass
```

Now that we have our evaluator, we can start evaluating actual cases.
To evaluate a case, we need an interface for our ExpectedOutput, Evaluation and an implementation of the "evaluate" function.

In [None]:
"""This is the expected output for an example run. This is used to compare the output of the task with.

We will be evaluating our keyword extraction based on the expected keywords. """
class KeywordExtractionExpectedOutput(BaseModel):
    expected_keywords: Sequence[str]

"""This is the interface for the metrics that are generated for each evaluation case"""
class KeywordExtractionEvaluation(BaseModel):
    correct: bool

Our evaluate function will take an input for the task to process, run the task and calculate any metrics we deem interesting to measure.
Finally, it will return the KeywordExtractionEvaluation class. 

```python
def evaluate(
        self,
        input: KeywordExtractionInput,
        logger: DebugLogger,
        expected_output: KeywordExtractionExpectedOutput,
    ) -> KeywordExtractionEvaluation:
        """Executes the evaluation for this use-case."""
        return  
```

However, to evaluate the performance of a task, we will need to try out lots of different cases. 
To do this we can use the "evaluate_dataset" function, provided by the Evaluator base class.
This will take a dataset, run all the cases in it and aggregate the metrics generated from the evaluation.
To set this up, we will need to implement the Dataset class, create an interface for the aggregated metrics and implement the "aggregate" method.

In [None]:
"""This is the interface for the aggregated metrics that are generated from running a number of examples"""
class KeywordExtractionAggregatedEvaluation(BaseModel):
    percentage_correct: float

The aggregate method takes a sequence of KeywordExtractionEvaluations, and aggregated the metrics we deem important.

```python
def aggregate(self, evaluations: Sequence[KeywordExtractionEvaluation]) -> KeywordExtractionAggregatedEvaluation:
        """`Evaluator`-specific method for aggregating individual `Evaluations` into report-like `Aggregated Evaluation`."""
        pass
```

If we would be interested in what the percentage of correct answers is, the aggregated function would be responsible for doing this calculation. 
So if we would have 10 examples and half of them would be correct, the aggregated function will return an KeywordExtractionAggregatedEvaluation class with a percentage_correct of 50%.

Now that we have discussed all of the parts that make up an evaluator, the full class is:

In [None]:
from intelligence_layer.task import Evaluator

class KeywordExtractionEvaluator(Evaluator[KeywordExtractionInput, KeywordExtractionExpectedOutput, KeywordExtractionEvaluation, KeywordExtractionAggregatedEvaluation]):
    def evaluate(
        self,
        input: KeywordExtractionInput,
        logger: DebugLogger,
        expected_output: KeywordExtractionExpectedOutput,
    ) -> KeywordExtractionEvaluation:
        """Executes the evaluation for this use-case."""
        pass

    def aggregate(self, evaluations: Sequence[KeywordExtractionEvaluation]) -> KeywordExtractionAggregatedEvaluation:
        """`Evaluator`-specific method for aggregating individual `Evaluations` into report-like `Aggregated Evaluation`."""
        pass

We can run it as such:

In [None]:
evaluator = KeywordExtractionEvaluator()

dataset = Dataset()

evaluation = evaluator.run_dataset(dataset)