Skip to content
This repository has been archived by the owner on Jan 29, 2024. It is now read-only.

Deploy embedding model on Kubernetes using Seldom #623

Open
1 of 2 tasks
FrancescoCasalegno opened this issue Aug 31, 2022 · 3 comments · May be fixed by #632
Open
1 of 2 tasks

Deploy embedding model on Kubernetes using Seldom #623

FrancescoCasalegno opened this issue Aug 31, 2022 · 3 comments · May be fixed by #632
Labels
🔍 semantic-search Semantic search tools using ML models for STS

Comments

@FrancescoCasalegno
Copy link
Contributor

FrancescoCasalegno commented Aug 31, 2022

Context

  • For simplicity (and to leverage our GPUs) we can compute manually the paragraph embeddings of the articles in our Elasticsearch database.
  • But at query time, it makes sense to have our sentence-transformer embedding model deployed on Kubernetes, to be able to scale and avoid downtimes when users make their queries.
  • Seldon seems to be a good solution to easily deploy our model on Kubernetes and provide a RESTful API to address requests from users.
  • Note that the goal of this stage of the Information Retrieval pipeline is to quickly retrieve a certain number of potentially relevant documents (e.g. ~1000), but we don't care too much about these results being ranked very accurately (this is happening in the re-ranking following stage). So it could be a good idea to use a smaller, faster sentence embedding model.

Actions

  • Investigate if there are any (better?) alternatives to Seldon.
  • Deploy our sentence embedding model on Kubernetes using the best framework that we found.
@drsantos89
Copy link
Contributor

Seldon-core seems to be the most recommended tool to deploy ML models in Kubernetes (1st google results and 3.3k+ starts on GitHub).
https://www.datarevenue.com/en-blog/why-you-need-a-model-serving-tool-such-as-seldon

Other options are available:
https://medium.com/everything-full-stack/machine-learning-model-serving-overview-c01a6aa3e823

Deploy the model as a Flask App:
https://opensource.com/article/20/9/deep-learning-model-kubernetes
Or using FastAPI (better than Flask!?):
https://betterprogramming.pub/3-reasons-to-switch-to-fastapi-f9c788d017e5

BentoML / Yatai:
https://github.com/bentoml/BentoML (3.9k+ stars)
https://github.com/bentoml/Yatai (300+ stars)

Flask and FastAPI might not be a good solution as they do not scale wheel and might have performance issues.
I'm currently testing Seldon and Yatai.

@drsantos89 drsantos89 linked a pull request Sep 20, 2022 that will close this issue
5 tasks
@drsantos89
Copy link
Contributor

The default model for sentence embedding was deployed on a local Seldon server using the configuration below:

apiVersion: machinelearning.seldon.io/v1alpha2
kind: SeldonDeployment
metadata:
  name: minilm
  namespace: seldon
spec:
  protocol: v2
  predictors:
  - graph:
      name: transformer
      implementation: HUGGINGFACE_SERVER
      parameters:
      - name: task
        type: STRING
        value: feature-extraction
      - name: pretrained_model
        type: STRING
        value: sentence-transformers/multi-qa-MiniLM-L6-cos-v1
    name: default
    replicas: 1

A request to the model can be sent using the bluesearch.k8s.embeddings.embed_seldon function.

@drsantos89
Copy link
Contributor

Screenshot 2022-09-30 at 14 09 04
The average response time is 74±70ms

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
🔍 semantic-search Semantic search tools using ML models for STS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants