Skip to content
This repository was archived by the owner on Jan 29, 2024. It is now read-only.
This repository was archived by the owner on Jan 29, 2024. It is now read-only.

Deploy embedding model on Kubernetes using Seldom #623

@FrancescoCasalegno

Description

@FrancescoCasalegno

Context

  • For simplicity (and to leverage our GPUs) we can compute manually the paragraph embeddings of the articles in our Elasticsearch database.
  • But at query time, it makes sense to have our sentence-transformer embedding model deployed on Kubernetes, to be able to scale and avoid downtimes when users make their queries.
  • Seldon seems to be a good solution to easily deploy our model on Kubernetes and provide a RESTful API to address requests from users.
  • Note that the goal of this stage of the Information Retrieval pipeline is to quickly retrieve a certain number of potentially relevant documents (e.g. ~1000), but we don't care too much about these results being ranked very accurately (this is happening in the re-ranking following stage). So it could be a good idea to use a smaller, faster sentence embedding model.

Actions

  • Investigate if there are any (better?) alternatives to Seldon.
  • Deploy our sentence embedding model on Kubernetes using the best framework that we found.

Metadata

Metadata

Assignees

No one assigned

    Labels

    🔍 semantic-searchSemantic search tools using ML models for STS

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions