The project aims to run a LLM based question answering chatbot on enterprise/private data using CPU only.
You can ask questions to your private txt documents without an internet connection, using opensource LLM.
Note: This project is using a quantized LLM model which designed to run on cpu only, therefore the performance may not be upto SOTA llm (falcon or similar) model and speed will be a bit slow based on you cpu compute availability.
- LangChain
- Milvus Vector Database for embedding storage
- InstructorEmbeddings - For vector embedding of docs
- llamacpp - LLM model
- Gradio - Web interface
- W&B - For prompt/experiment tracking
Install conda and create an environment
conda create -n localChatbot python=3.9
conda activate localChatbot
In order to set your environment up to run the code here, install all requirements:
pip install -r requirements.txt
Below command will automatically load the embedding model and save the vector embeddings of txt file present in data/
directory
python data_to_vector_ingestion.py
It will start a gradio prediction instance.
python app.py
It may also ask for your W&B api key. Please go through the guide mentioned in the terminal.
Now you can access the LLM app from your localhost -
- Navigate to the address mentioned in the terminal and start asking the questions to your chatbot.
- Using weights and bias to keep track for all the prompts results and manually checking the output’s performance
- Using BertScore as a performance metric
Author @adityaadarsh