Skip to content

00AR/semantic_search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Semantic Search Engine API

This project allows a user to make semantic search on the database of case studies. User can upload more case studies in the pdf format. The case studies are stored as index of vectors in a vector database hosted on cloud, namely, Pinecone.

Setup

To setup the project on your local machine, do the following

  • Clone the repo
    git clone https://github.com/00AR/semantic_search.git && cd semantic_search
  • Make a python virtual environment
    python -m venv .env
  • Activate the virtual environment
    source .env/bin/activate
  • Install Requirements
    pip install -r requirements.txt
  • Setup Environment Variables
    • Create a new file named .config.env and add the following environment variables with required values:
      BASE_DIR=/path/to/the/repo/semantic_search
      MEDIA=media
      PINECONE_API_KEY=your_pinecone_api_key
  • Run app using
    uvicorn app.main:app

Setup using Docker(Note: consider using sudo in case of permission denied error):

  • build the image using
    docker build -t semantic-search-app .
  • Run the docker image
    docker run -p 8000:8000 -e pinecone_api_key=your_api_key semantic-search-app

APIs

apis

  • A record in pinecone index record

Working

The project is built using fastapi. It uses Pinecone Vector database for storing embeddings along with metadata for each case study. Metadata of a case study includes industry, use case and geography.

/search endpoint

When the user enters a search term on /search endpoint, the query is converted into an embedding. The embedding is then matched with the embeddings of the case studies using cosine similarity that are stored in pinecone. The best matches are returned as response. The response includes a title and filename.

/upload endpoint

User can upload a case study file in pdf format through /upload endpoint.

/media/{filename} endpoint

Additionally user can download the case of interest from /media/{filename} endpoint. The filename from the search results of /search must supplied as filename to this endpoint.

/generate_db endpoint

This will regenerate the embeddings and metadata for each case study and store them on an empty pinecone database index. It uses case studies stored in samples folder. samples store one pdf per one case study.

Deployment

The app is deloyed in a docker container at huggingface.

Deployment Link

TODO

  • Extract industry, use-case, etc metadata from each case study and store it on pinecone index along with embeddings
  • Extract similar metadata from user search query and filter results according to it.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published