Deploy embedding model on Kubernetes using Seldom

## Context
- For simplicity (and to leverage our GPUs) we can compute manually the paragraph embeddings of the articles in our Elasticsearch database.
- But at query time, it makes sense to have our `sentence-transformer` embedding model deployed on Kubernetes, to be able to scale and avoid downtimes when users make their queries.
- [**Seldon**](https://www.seldon.io/) seems to be a good solution to easily deploy our model on Kubernetes and provide a RESTful API to address requests from users.
- Note that the goal of this stage of the Information Retrieval pipeline is to quickly retrieve a  certain number of potentially relevant documents (e.g. ~1000), but we don't care too much about these results being ranked very accurately (this is happening in the re-ranking following stage). So it could be a good idea to use a smaller, faster sentence embedding model.

## Actions
- [x] Investigate if there are any (better?) alternatives to Seldon.
- [ ] Deploy our sentence embedding model on Kubernetes using the best framework that we found.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Deploy embedding model on Kubernetes using Seldom #623

Context

Actions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Deploy embedding model on Kubernetes using Seldom #623

Description

Context

Actions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions