Deploy embedding model on Kubernetes using Seldom #623

FrancescoCasalegno · 2022-08-31T08:19:35Z

Context

For simplicity (and to leverage our GPUs) we can compute manually the paragraph embeddings of the articles in our Elasticsearch database.
But at query time, it makes sense to have our sentence-transformer embedding model deployed on Kubernetes, to be able to scale and avoid downtimes when users make their queries.
Seldon seems to be a good solution to easily deploy our model on Kubernetes and provide a RESTful API to address requests from users.
Note that the goal of this stage of the Information Retrieval pipeline is to quickly retrieve a certain number of potentially relevant documents (e.g. ~1000), but we don't care too much about these results being ranked very accurately (this is happening in the re-ranking following stage). So it could be a good idea to use a smaller, faster sentence embedding model.

Actions

Investigate if there are any (better?) alternatives to Seldon.
Deploy our sentence embedding model on Kubernetes using the best framework that we found.

The text was updated successfully, but these errors were encountered:

drsantos89 · 2022-09-06T14:53:35Z

Seldon-core seems to be the most recommended tool to deploy ML models in Kubernetes (1st google results and 3.3k+ starts on GitHub).
https://www.datarevenue.com/en-blog/why-you-need-a-model-serving-tool-such-as-seldon

Other options are available:
https://medium.com/everything-full-stack/machine-learning-model-serving-overview-c01a6aa3e823

Deploy the model as a Flask App:
https://opensource.com/article/20/9/deep-learning-model-kubernetes
Or using FastAPI (better than Flask!?):
https://betterprogramming.pub/3-reasons-to-switch-to-fastapi-f9c788d017e5

BentoML / Yatai:
https://github.com/bentoml/BentoML (3.9k+ stars)
https://github.com/bentoml/Yatai (300+ stars)

Flask and FastAPI might not be a good solution as they do not scale wheel and might have performance issues.
I'm currently testing Seldon and Yatai.

drsantos89 · 2022-09-30T13:43:35Z

The default model for sentence embedding was deployed on a local Seldon server using the configuration below:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: minilm
  namespace: seldon
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: feature-extraction
      - name: pretrained_model
        type: STRING
        value: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
    name: default
    replicas: 1

A request to the model can be sent using the bluesearch.k8s.embeddings.embed_seldon function.

drsantos89 · 2022-09-30T13:47:53Z

The average response time is 74±70ms

FrancescoCasalegno added the 🔍 semantic-search Semantic search tools using ML models for STS label Aug 31, 2022

FrancescoCasalegno mentioned this issue Aug 31, 2022

Explore Re-Ranking models to improve results of Semantic Search #625

Open

2 tasks

drsantos89 linked a pull request Sep 20, 2022 that will close this issue

Feature/embedings k8s #632

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy embedding model on Kubernetes using Seldom #623

Deploy embedding model on Kubernetes using Seldom #623

FrancescoCasalegno commented Aug 31, 2022 •

edited by drsantos89

Loading

drsantos89 commented Sep 6, 2022

drsantos89 commented Sep 30, 2022

drsantos89 commented Sep 30, 2022

Deploy embedding model on Kubernetes using Seldom #623

Deploy embedding model on Kubernetes using Seldom #623

Comments

FrancescoCasalegno commented Aug 31, 2022 • edited by drsantos89 Loading

Context

Actions

drsantos89 commented Sep 6, 2022

drsantos89 commented Sep 30, 2022

drsantos89 commented Sep 30, 2022

FrancescoCasalegno commented Aug 31, 2022 •

edited by drsantos89

Loading