Option to Use GPU, CUDA #8

hieuhthh · 2024-02-20T09:01:42Z

I really appreciate this repository. I hope the rerank model can optionally use a GPU to fully utilize the performance increase, potentially even with multi-GPU support.

Thank you.

PrithivirajDamodaran · 2024-02-20T09:30:03Z

Thanks for raising this, we have this in our list.

prashantg445 · 2024-04-17T10:30:45Z

Hey @PrithivirajDamodaran,
Can you publish this list of next action items somewhere, so that people interested in contribution can get started.

P.S.: I am interested to contribute.

PrithivirajDamodaran · 2024-04-30T07:56:53Z

Thanks for reaching out, @prashantg445

@prabhkaran is working on a few optimisations. He will share those.

Besides that we are going to work on extending FlashRank to support listwise rerankers. Today we are supporting pointwise / pairwise rerankers which frames reranking as a classification task. Given a query q and a passage p pointwise reranker produces a real score indicating the relevance of the passage to the query. The model is optimized using cross entropy or the contrastive loss based on binary relevance judgments from human annotators. At inference time, given the top-k passages returned by the 1st-stage retriever are passed and scored independently. The final passages are then ranked by decreasing the magnitude of their corresponding relevance scores. Instead listwise rerankers consider all the candidate passages.

YVMVN · 2024-08-17T11:32:29Z

Thanks for reaching out, @prashantg445

@prabhkaran is working on a few optimisations. He will share those.

Besides that we are going to work on extending FlashRank to support listwise rerankers. Today we are supporting pointwise / pairwise rerankers which frames reranking as a classification task. Given a query q and a passage p pointwise reranker produces a real score indicating the relevance of the passage to the query. The model is optimized using cross entropy or the contrastive loss based on binary relevance judgments from human annotators. At inference time, given the top-k passages returned by the 1st-stage retriever are passed and scored independently. The final passages are then ranked by decreasing the magnitude of their corresponding relevance scores. Instead listwise rerankers consider all the candidate passages.

Good day!
I really appreciate this repo. However, listwise is too slow on CPUs with llama-cpp. Is there any update on GPU support?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Option to Use GPU, CUDA #8

Option to Use GPU, CUDA #8

hieuhthh commented Feb 20, 2024

PrithivirajDamodaran commented Feb 20, 2024

prashantg445 commented Apr 17, 2024

PrithivirajDamodaran commented Apr 30, 2024

YVMVN commented Aug 17, 2024

Option to Use GPU, CUDA #8

Option to Use GPU, CUDA #8

Comments

hieuhthh commented Feb 20, 2024

PrithivirajDamodaran commented Feb 20, 2024

prashantg445 commented Apr 17, 2024

PrithivirajDamodaran commented Apr 30, 2024

YVMVN commented Aug 17, 2024