Skip to content

budukhyash/semantic-search-engine

Repository files navigation

Semantic Search Engine

This API returns the top 10 similar results for a user query.10% of Stackoverflow's data is used. You can find it here.

Getting Started

P.S => For notebook.ipynb you can directly run the entire notebook and a flask API will be deployed.Use that for testing purose.All the instructions are mentioned in the notebook.

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

  • docker
  • python
  • tensorflow
  • flask / fastapi

Installation

  1. Clone the repo
https://github.com/budukhyash/semantic-search-engine
  1. Running this will start open distro's elastic search instance. Read more about it here
docker run -p 8200:9200 -p 8600:9600 -e "discovery.type=single-node" amazon/opendistro-for-elasticsearch:1.8.0 
  1. Download the dataset. extract it, download the USE4 Universal Sentence Encoder by Google. Make sure the downloaded files are in the directory of the repository.

  2. For data ingestion run. X denotes the number of documents to be indexed.

example => python elastic_search_ingestion.py X
python elastic_search_ingestion.py 20000

5.After the ingestion is completed. You can start the server by running

uvicorn server:app --reload --port 9999

Documentation

  • Postman Docs
  • After starting the server docs can be found here.
  • http://localhost:9999/docs#/
  • You should see something like this. -Imgur
  • /semantic returns the top 10 most similar results, this considers the semantic meaning of the query and uses cosine similarity to rank the documents.
  • /keywords returns the most similar results , this uses the traditional keyword approachusing an inverted index.Elastic search uses a TF-IDF based scheme to rank these documents.
  1. Response time (Ingested 1 lakh documents)
  • sub 300ms for semantic search
  • sub 150ms for keyword based search.

Watch the video

Built With

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages