# Quickstart

## Installation

```bash
pip install fastrepl
```

You can find all releases [here](https://pypi.org/project/fastrepl).

## Setup FastREPL

In [35]:
# When using it in a script
import fastrepl

# When using it in a notebook
import fastrepl.repl as fastrepl

fastrepl.LLMCache.disable()

## Prepare Dataset

We will use [daily_dialog](https://huggingface.co/datasets/daily_dialog) from Huggingface.

In [36]:
import re
from datasets import load_dataset

ds = load_dataset("daily_dialog", split="test")
ds = ds.shuffle(12)
ds = ds.select(range(300))


def clean(text):
    return re.sub(r"\s+([,.'!?])", r"\1", text.strip())


def get_input(row):
    msgs = [clean(msg) for msg in row["dialog"]]
    row["input"] = "\n".join(msgs)
    return row


ds = ds.map(get_input, remove_columns=["dialog", "act", "emotion"])
ds

Dataset({
    features: ['input'],
    num_rows: 300
})

## Define Evaluator

Here, we are doing simple classifiction, but there are two interesting points.

1. You can pass nearly any model for evaluation. (Thanks to [LiteLLM](https://github.com/BerriAI/litellm)).
2. **`fastrepl` enhances accuracy by reducing [bias](/guides/dealing_with_bias.md)**. `position_debias_strategy` is one example, which ensures that the order of labels doesn't affect the outcome.

In [37]:
evaluator = fastrepl.Evaluator(
    pipeline=[
        fastrepl.LLMClassificationHead(
            model="gpt-3.5-turbo",
            context="You will receive casual conversation between two people.",
            labels={
                "FUN": "at least one of the two people try to be funny and entertain.",
                "NOT_FUN": "given conversation lacks humor or entertainment value.",
            },
            position_debias_strategy="consensus",
        ),
    ]
)

## Run Evaluator

Here are some notes about running the evaluator:

1. `ThreadPool` is used to make it faster (controlled by the `NUM_THREADS` environment variable).
2. Any errors from different LLM providers are properly handled and retried with backoff if necessary.

In [38]:
result = fastrepl.LocalRunner(evaluator=evaluator, dataset=ds).run()
result

Output()



2023-09-10 12:11:09,697 - 11377176576 - _common.py-_common:105 - INFO: Backing off completion(...) for 2.5s (fastrepl.llm.RetryConstantException)


Dataset({
    features: ['input', 'prediction'],
    num_rows: 300
})

One interesting point to note is that, due to the due to `position_debias_strategy="consensus"`, if the order of the labels affects the result, `fastrepl` will return `None`. It will return more meaningful value in the future.

In [39]:
result["prediction"].count(None)

91

Now we got a number.

In [40]:
f = result["prediction"].count("FUN")
nf = result["prediction"].count("NOT_FUN")

f / (f + nf)

0.4449760765550239