Skip to content

Information Retrieval System For Brazilian Scientific Literature

License

Notifications You must be signed in to change notification settings

decarv/labrador

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

labrador - A Semantic Information Retriever

Description

Labrador is a semantic information retriever that uses embeddings to search for similar documents.

Labrador is designed to be a simple and efficient tool for searching through a small collection of scientific literature (~100,000 documents).

In the presented implementation, I use the sentence-transformers library to encode the text documents into embeddings, store them in Qdrant, and perform similarity search using simple vector operations.

The main components of the system are:

  • Indexer: Encodes the text documents into embeddings and stores them in Qdrant.
  • Searcher: Retrieves the most similar documents to a given query.
  • Evaluator: Evaluates the performance of the retriever.

I compare the performace of the dense retriever with a sparse retriever and with the currently implemented retriever in USP's system and the results are presented in this article.

As stated in the article, "Performance analysis was carried out with 1800 user evaluations on 45 queries. Preliminary results show that the system implemented with dense vectors (a) outperformed the performance of the system used by the thesis repository of the University of São Paulo by approximately 12 points; and (b) surpassed the system based on sparse vectors by 36 points. These results indicate a potential advancement in the efficacy of information retrieval using language models."

About

Information Retrieval System For Brazilian Scientific Literature

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published