# Online learning with Transformers 🤝 BentoML

In this Jupyter notebook file, we will perform Online learning with a fine-tune version trained from [our fine-tune guide](./fine_tune_roberta.ipynb)

## Install requirements

In [None]:
!pip install -r requirements.txt

In [None]:
# sanity check
import bentoml

import transformers

TASKS = "text-classification"
FT_NAME = "drobert_ft"
PRETRAINED = "emotion_distilroberta_base"

## Import fine-tune model from BentoML modelstore

There are two ways to do this.

### 1. Import from HuggingFace Hub

If users already run our fine-tune notebook, then skips to [here](#Afterwards), otherwise run the two below cell.

Users can also imports the fine-tune version in [this notebook](./fine_tune_roberta.ipynb) from [HuggingFace Hub](https://huggingface.co/aarnphm/finetune_emotion_distilroberta), and then save it to BentoML modelstore:

In [None]:
FINETUNE_MODEL = "aarnphm/finetune_emotion_distilroberta"
m1 = transformers.AutoModelForSequenceClassification.from_pretrained(FINETUNE_MODEL)
t1 = transformers.AutoTokenizer.from_pretrained(FINETUNE_MODEL)
_ = bentoml.transformers.save(FT_NAME, m1, tokenizer=t1)

In [None]:
PRETRAINED_MODEL = "j-hartmann/emotion-english-distilroberta-base"
m2 = transformers.AutoModelForSequenceClassification.from_pretrained(PRETRAINED_MODEL)
t2 = transformers.AutoTokenizer.from_pretrained(PRETRAINED_MODEL)
_ = bentoml.transformers.save(PRETRAINED, m2, tokenizer=t2)

### 2. Running [`fine_tune_roberta.ipynb`](./fine_tune_roberta.ipynb)
Refers to [`fine_tune_roberta.ipynb`](./fine_tune_roberta.ipynb) to see how to fine-tune this model with Transformers.

### Afterwards
Load the model for testing with `bentoml.transformers.load`:

In [1]:
config, model, tokenizer = bentoml.transformers.load(f"{FT_NAME}:latest", return_config=True)  # type: ignore

### Offline serving
One can load the aboved `model`, `tokenizer`, and `config` to a `text-classification` pipeline to test with offline serving:

In [2]:
clf_pipeline = transformers.pipeline(TASKS, model=model, tokenizer=tokenizer, config=config, return_all_scores=True)  # type: ignore
clf_pipeline("I love you so much.")



[[{'label': 'sadness', 'score': 0.059696026146411896},
  {'label': 'joy', 'score': 0.08176055550575256},
  {'label': 'love', 'score': 0.8277080059051514},
  {'label': 'anger', 'score': 0.017906058579683304},
  {'label': 'fear', 'score': 0.007731563411653042},
  {'label': 'surprise', 'score': 0.00519789382815361}]]

One can also Verify this model in a runner for offline serving:

In [7]:
runner = bentoml.transformers.load_runner(
    f"{FT_NAME}:latest", tasks=TASKS, return_all_scores=False
)

runner.run_batch(["Hello World", "I love you", "I hate you"])

[{'label': 'love', 'score': 0.26683616638183594},
 {'label': 'love', 'score': 0.8373624086380005},
 {'label': 'anger', 'score': 0.48438775539398193}]

<b>NOTE:</b> using `run_batch` should only be used for offline serving.

In the context of a BentoML Service, `run_batch` or `async_run_batch` shouldn't
be used as the BentoML's dynamic batching is <b>NOT ENABLED</b>.

If users want to utilize multiple inputs for a request, BentoML support _composing inference graph_, which will be demonstrated below.

## Create a BentoML service
<b>NOTE:</b> using `%%writefile` here because `bentoml.Service` instance must be created in a separate .py file

In [15]:
%%writefile service.py
import re
import typing as t
import asyncio
import unicodedata
from pydantic import BaseModel

import bentoml
from bentoml.io import JSON
from bentoml.io import Text

FT_MODEL_TAG = "drobert_ft"
PRETRAINED_MODEL_TAG = "emotion_distilroberta_base"
TASKS = "text-classification"

ft_runner = bentoml.transformers.load_runner(FT_MODEL_TAG, tasks=TASKS, return_all_scores=True)

pretrained_runner= bentoml.transformers.load_runner(PRETRAINED_MODEL_TAG, tasks=TASKS, return_all_scores=True)

svc = bentoml.Service(name="online_learning_ft", runners=[ft_runner, pretrained_runner])

class Prediction(BaseModel):
    input: str
    sadness: float
    joy: float
    love: float
    anger: float
    fear: float
    surprise: float

class Outputs(BaseModel):
    drobert_ft: Prediction
    emotion_distilroberta_base: Prediction

def normalize(s: str) -> str:
    s = "".join(
        c
        for c in unicodedata.normalize("NFD", s.lower().strip())
        if unicodedata.category(c) != "Mn"
    )
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s


def convert_result(res) -> t.Dict[str, t.Any]:
    if isinstance(res, list):
        return {l["label"]: l["score"] for l in res}
    return {res["label"]: res["score"]}


@svc.api(input=Text(), output=JSON(pydantic_model=Outputs))
async def compare(sentences: str) -> t.Dict[str, t.Dict[str, t.Union[str, float]]]:
    processed = normalize(sentences)
    outputs = await asyncio.gather(
        ft_runner.async_run(processed),
        pretrained_runner.async_run(processed)
    )
    return {
        name: {**convert_result(pred)}
        for name, pred in zip(svc.runners.keys(), outputs)
    }

@svc.api(input=Text(), output=Text())
async def online_learning(sentence: str) -> str:...

Overwriting service.py



We defined two separate endpoints `/compare` and `/online_learning`:
1. `/compare` shows the results of our fine-tune models vs. the pretrained model.
2. `/online_learning` takes in `sentence` as inputs and perform [Online learning](https://en.wikipedia.org/wiki/Online_machine_learning)

NOTE: currently `/online_learning` is WIP. Stay tuned!


Start a service with reload enabled:

In [None]:
!bentoml serve service:svc --reload

With the `--reload` flag, the API server will automatically restart when the source file `service.py` is being updated.

One can then navigate to `127.0.0.1:3000` and interact with Swagger UI.
One can also verify the endpoints locally with `curl`:

In [None]:
!curl -X POST "http://localhost:3000/compare" \
     -H "accept: application/json" \
     -H "Content-Type: text/plain" \
     -d "\" I love you\""

We can also do a simple local benchmark with [locust](https://locust.io/):
```bash
locust --headless -u 100 -r 1000 --run-time 2m --host http://127.0.0.1:3000
```

## Build a Bento for deployment

A `bentofile.yaml` can be created to create a Bento with `bentoml build` in the current directory:
```yaml
service: "service:svc"
description: "file: ./README.md"
labels:
  owner: bentoml-team
  stage: demo
include:
- "*.py"
exclude:
- "locustfile.py"
- "tests/"
- "*.ipynb"
python:
  lock_packages: false
  packages:
    - -f https://download.pytorch.org/whl/cpu/torch_stable.html
    - torch==1.10.2+cpu
    - git+https://github.com/huggingface/transformers
    - datasets
    - pydantic
```

Build a bento with `bentoml build`

In [None]:
!bentoml build

This Bento now can be served with `--production`:
```bash
bentoml serve online_learning_ft:latest --production
```

## Containerize a Bento

Make sure Docker and daemon is running, then `bentoml containerize` will build
a docker image for the model server aboved:

In [None]:
!bentoml containerize online_learning_ft:latest
# an example docker tag: online_learning_ft:zt4vvsurw63thgxi

Test out the newly built docker image:
```bash
docker run -p 3000:3000 online_learning_ft:zt4vvsurw63thgxi
```