# Transfer learning with Transformers 🤝 BentoML

In this Jupyter notebook file, we will perform transfer learning with a fine-tune version trained from [our fine-tune guide](./fine_tune_roberta.ipynb)

## Install requirements

In [None]:
!pip install -r requirements.txt

## Import fine-tune model from BentoML modelstore

In [1]:
import bentoml
import transformers

TASKS = "text-classification"
FT_TAG = "drobert_ft:latest"

config, model, tokenizer = bentoml.transformers.load(FT_TAG, return_config=True)  # type: ignore

One can load the aboved `model`, `tokenizer`, and `config` to a `text-classification` pipeline to test with offline serving:

In [2]:
clf_pipeline = transformers.pipeline(TASKS, model=model, tokenizer=tokenizer, config=config, return_all_scores=True)  # type: ignore
clf_pipeline("I love you so much.")



[[{'label': 'sadness', 'score': 0.059696026146411896},
  {'label': 'joy', 'score': 0.08176055550575256},
  {'label': 'love', 'score': 0.8277080059051514},
  {'label': 'anger', 'score': 0.017906058579683304},
  {'label': 'fear', 'score': 0.007731563411653042},
  {'label': 'surprise', 'score': 0.00519789382815361}]]

One can also Verify this model in a runner for offline serving:

In [None]:
runner = bentoml.transformers.load_runner(FT_TAG, tasks=TASKS, return_all_scores=True)

runner.run_batch(["Hello World", "I love you", "I hate you"])

<b>NOTE:</b> using `run_batch` should only be used for offline serving.

In the context of a BentoML Service, `run_batch` or `async_run_batch` shouldn't
be used as the BentoML's dynamic batching is <b>NOT ENABLED</b>.

If users want to utilize multiple inputs for a request, BentoML support _composing inference graph_, which will be demonstrated below.

## Create a BentoML service
<b>NOTE:</b> using `%%writefile` here because `bentoml.Service` instance must be created in a separate .py file

In [None]:
%%writefile service.py
import re
import typing as t
import asyncio
import unicodedata

import bentoml
from bentoml.io import JSON
from bentoml.io import Text

MODEL_NAME = "roberta_text_classification"
TASKS = "text-classification"

clf_runner = bentoml.transformers.load_runner(MODEL_NAME, tasks=TASKS)

all_runner = bentoml.transformers.load_runner(
    MODEL_NAME, name="all_score_runner", tasks=TASKS, return_all_scores=True
)

svc = bentoml.Service(name="pretrained_clf", runners=[clf_runner, all_runner])


def normalize(s: str) -> str:
    s = "".join(
        c
        for c in unicodedata.normalize("NFD", s.lower().strip())
        if unicodedata.category(c) != "Mn"
    )
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s


def preprocess(sentence: t.Dict[str, t.List[str]]) -> t.Dict[str, t.List[str]]:
    assert "text" in sentence, "Given JSON does not contain `text` field"
    if not isinstance(sentence["text"], list):
        sentence["text"] = [sentence["text"]]
    return {k: [normalize(s) for s in v] for k, v in sentence.items()}


def convert_result(res) -> t.Dict[str, t.Any]:
    if isinstance(res, list):
        return {l["label"]: l["score"] for l in res}
    return {res["label"]: res["score"]}


def postprocess(
    inputs: t.Dict[str, t.List[str]], outputs: t.List[t.Dict[str, t.Any]]
) -> t.Dict[int, t.Dict[str, t.Union[str, float]]]:
    return {
        i: {"input": sent, **convert_result(pred)}
        for i, (sent, pred) in enumerate(zip(inputs["text"], outputs))
    }


@svc.api(input=Text(), output=JSON())
async def sentiment(sentence: str) -> t.Dict[str, t.Any]:
    res = await clf_runner.async_run(sentence)
    return {"input": sentence, "label": res["label"]}


@svc.api(input=JSON(), output=JSON())
async def batch_sentiment(
    sentences: t.Dict[str, t.List[str]]
) -> t.Dict[int, t.Dict[str, t.Union[str, float]]]:
    processed = preprocess(sentences)
    outputs = await asyncio.gather(
        *[clf_runner.async_run(s) for s in processed["text"]]
    )
    return postprocess(processed, outputs)  # type: ignore


@svc.api(input=JSON(), output=JSON())
async def batch_all_scores(
    sentences: t.Dict[str, t.List[str]]
) -> t.Dict[int, t.Dict[str, t.Union[str, float]]]:
    processed = preprocess(sentences)
    outputs = await asyncio.gather(
        *[all_runner.async_run(s) for s in processed["text"]]
    )
    return postprocess(processed, outputs)  # type: ignore


We defined two separate endpoints `/batch_sentiment` and `batch_all_scores` which both creates an inference graph to make use of BentoML's dynamic batching.

We also create `/sentiment` endpoints which accept a single sentence as input.


Start a service with reload enabled:

In [None]:
!bentoml serve service:svc --reload

With the `--reload` flag, the API server will automatically restart when the source file `service.py` is being updated.

One can then navigate to `127.0.0.1:3000` and interact with Swagger UI.
One can also verify the endpoints locally with `curl`:

In [None]:
!curl -X POST "http://localhost:3000/batch_sentiment" \
      -H "accept: application/json" \
      -H "Content-Type: application/json" \
      -d "{\"text\":[\"I love you with all my hearts :)\",\"Our path diverges\"]}"

We can also do a simple local benchmark with [locust](https://locust.io/):
```bash
locust --headless -u 100 -r 1000 --run-time 2m --host http://127.0.0.1:3000
```

## Build a Bento for deployment

A `bentofile.yaml` can be created to create a Bento with `bentoml build` in the current directory:
```yaml
service: "service:svc"
description: "file: ./README.md"
labels:
  owner: bentoml-team
  stage: demo
include:
- "*.py"
exclude:
- "locustfile.py"
- "tests/"
- "*.ipynb"
python:
  lock_packages: false
  packages:
    - -f https://download.pytorch.org/whl/cpu/torch_stable.html
    - torch==1.10.2+cpu
    - git+https://github.com/huggingface/transformers
    - datasets
```

Build a bento with `bentoml build`

In [None]:
!bentoml build

This Bento now can be served with `--production`:
```bash
bentoml serve pretrained_clf:latest --production
```

## Containerize a Bento

Make sure Docker and daemon is running, then `bentoml containerize` will build
a docker image for the model server aboved:

In [None]:
!bentoml containerize pretrained_clf:latest
# an example docker tag: pretrained_clf:zt4vvsurw63thgxi

Test out the newly built docker image:
```bash
docker run -p 3000:3000 pretrained_clf:zt4vvsurw63thgxi
```