# Dependencies and imports

In [None]:
# Requirements.
%pip install "build>=1.0" "torch>=2.0" "transformers" "../../libs/buildlib"

In [None]:
import shutil
from pathlib import Path

import buildlib as buildlib
from transformers.models.bert.modeling_bert import BertModel
from transformers.models.bert.tokenization_bert_fast import BertTokenizerFast

from build import ProjectBuilder

# Introduction

In order to create a custom text embedding service, we need to create a Docker image and deploy it. In order to build this custom image, we put custom data (a `data/` directory) and logic (an `embed.py`, plus a `requirements.txt` specifying any dependencies) into the `build/` directory, and then we use the `buildlib` to run the build process (`buildlib.build()`).

## Clear out `build/`

Let's start with a clean slate by deleting and recreating an empty `build/`.

In [None]:
BUILD_DIR = Path(".").resolve().parents[1] / "build"
shutil.rmtree(BUILD_DIR)
BUILD_DIR.mkdir()

## Prepare the data

For this example, we will use the [`e5-base-v2`](https://huggingface.co/intfloat/e5-base-v2) model from Microsoft. We'll demonstrate how to preload the model weights directly into our Docker image for simpler deployment.

Rather than just using the [`transformers`](https://pypi.org/project/transformers/) library directly, we'll also demonstrate how to include a custom library into the Docker image by building a [pure Python wheel](https://packaging.python.org/en/latest/guides/distributing-packages-using-setuptools/#pure-python-wheels) and including that in the `data/` alongside the model weights. The library we demonstrate on is `embed_lib/`, a small example library we wrote to call the E5 model via `transformers`.

### Put model weights into `build/data/`

In [None]:
# Downloading the model weights.
MODEL_NAME = "intfloat/e5-base-v2"
DATA_DIR = BUILD_DIR / "data"
TOKENISER_DIR = DATA_DIR / "tokenizer"
MODEL_DIR = DATA_DIR / "model"

print(f"Downloading {MODEL_NAME} and saving tokenizer and weights to {DATA_DIR}")

tokenizer = BertTokenizerFast.from_pretrained(MODEL_NAME)
model = BertModel.from_pretrained(MODEL_NAME)
assert isinstance(model, BertModel)  # This is for typechecking.
tokenizer.save_pretrained(TOKENISER_DIR)
model.save_pretrained(MODEL_DIR)

# Validate that our saved files work by loading from them.
tokenizer = BertTokenizerFast.from_pretrained(TOKENISER_DIR)
model = BertModel.from_pretrained(MODEL_DIR)

### Put packaged code into `build/data/`

In [None]:
# Package our Python package into a wheel in the `data/` directory.
wheel_filename = ProjectBuilder(Path("") / "embed_lib").build(
    distribution="wheel", output_directory=DATA_DIR
)
print(f"Built {wheel_filename}")

## Prepare the logic

In order to actually use our custom package, load our model weights, and perform text embedding, we need to implement the core logic of text embedding. This is done by implementing `get_embed_fn()` inside a file called `embed.py`. The function `get_embed_fn` should load model weights and return a function that maps a single input consisting of a `Sequence[str]` into a single 2d `numpy` array of datatype `np.float32`. Below we give an example.

### Implement `get_embed_fn` inside `build/embed.py`

In [None]:
%%writefile ../../build/embed.py
import logging
from typing import Callable
from typing import cast
from typing import Sequence

import embed_lib.e5
import numpy as np
from transformers.models.bert.modeling_bert import BertModel
from transformers.models.bert.tokenization_bert_fast import BertTokenizerFast

MAX_BATCH_SIZE = 4


def get_embed_fn(logger: logging.Logger) -> Callable[[Sequence[str]], np.ndarray]:
    # Load the model into memory.
    logger.info("[get_embed_fn]Loading model from disk to memory")
    tokenizer = BertTokenizerFast.from_pretrained(
        "/root/data/tokenizer", local_files_only=True
    )
    model = cast(
        BertModel, BertModel.from_pretrained("/root/data/model", local_files_only=True)
    )
    e5_model = embed_lib.e5.E5Model(tokenizer, model)

    def _embed(texts: Sequence[str]) -> np.ndarray:
        result_tensor = embed_lib.e5.embed(
            e5_model=e5_model,
            texts=texts,
            batch_size=MAX_BATCH_SIZE,
            normalize=True,
            progress_bar=False,
        )
        result_array = result_tensor.numpy().astype(np.float32)
        return result_array

    return _embed

### Add a `build/requirements.txt`

We also need to specify the requirements for our embedding logic. During the build, we will populate the `BUILD_ROOT` environment variable, which enables you to include custom packages (like your `embed_lib` wheel) in your `requirements.txt` by absolute filepath.

In [None]:
%%writefile ../../build/requirements.txt

${BUILD_ROOT}/data/embed_lib-1.0.0-py3-none-any.whl

## Build!

Now that all the pieces are in place inside `build/`, we can trigger a build via `buildlib`.

In [None]:
# We now have all the pieces we need to build our service!
list(BUILD_DIR.iterdir()) + list(DATA_DIR.iterdir())

In [None]:
BUILD_DIR = Path(".").resolve().parents[1] / "build"
buildlib.build(build_dir=BUILD_DIR)