This project enhances the capabilities of a Large Language Model (LLM) by integrating external memory using Retrieval-Augmented Generation (RAG) techniques. The system allows the LLM to access and utilize external data stored in a vector database, enabling it to provide more informed, accurate, and contextually relevant responses. The project is structured using the Poetry package manager for dependency management and includes components for data processing, API creation, and a user-friendly web interface.
The system is designed to handle multiple file formats (txt, pdf, pptx, docx), process them into a standardized format, and store them in a Qdrant vector database. A FastAPI backend handles user prompts, retrieves relevant context from the database, and interacts with the meta/llama3-70b-instruct model hosted on the Nvidia cloud. A Streamlit web application provides an intuitive interface for users to interact with the system.
The project is organized into the following directories and files:
.
├── README.md # Project documentation
├── data/ # Directory containing external data files
├── poetry.lock # Poetry lock file for dependency versions
├── pyproject.toml # Poetry project configuration file
├── app/ # Application source code
│ ├── vectordb_rag.py # Script for creating and managing the vector database
│ ├── api.py # FastAPI application for handling prompts and responses
│ ├── start_api.py # Script to initialize the FastAPI server
│ └── webapp.py # Streamlit web application for user interaction
└── notebooks/ # Directory for exploratory analyses and notebooks
This script is the backbone of the project, responsible for creating and managing the Qdrant vector database. It performs the following tasks:
- Data Extraction: Reads data from multiple file formats (txt, pdf, pptx, docx) located in the
data/directory. - Data Transformation: Processes the extracted data into a standardized format suitable for vectorization.
- Vectorization: Converts the processed data into embeddings using a pre-trained model.
- Database Storage: Stores the vectorized data in the Qdrant database for efficient similarity searches.
- Load Data: Iterates through the
data/directory and loads files based on their format. - Transform Data: Cleans and standardizes the data (e.g., removing special characters, splitting text into chunks).
- Generate Embeddings: Uses a pre-trained embedding model to convert text into vectors.
- Store in Qdrant: Saves the embeddings and metadata in the Qdrant database.
This file contains the FastAPI application that serves as the backend for the project. It handles user prompts, retrieves relevant context from the Qdrant database, and interacts with the LLM hosted on the Nvidia cloud.
- Prompt Handling: Accepts user prompts via an API endpoint.
- Context Retrieval: Searches the Qdrant database for the most relevant context based on the prompt.
- Enhanced Prompt Creation: Combines the original prompt with the retrieved context to create a detailed instruction for the LLM.
- LLM Interaction: Sends the enhanced prompt to the
meta/llama3-70b-instructmodel via the Nvidia cloud API. - Response Delivery: Returns the LLM's response to the user.
- Receive Prompt: The API accepts a prompt from the user.
- Retrieve Context: Searches the Qdrant database for the most relevant context.
- Create Enhanced Prompt: Combines the prompt and context into a detailed instruction.
- Send to LLM: Sends the enhanced prompt to the LLM and waits for the response.
- Return Response: Sends the LLM's response back to the user.
This script initializes the FastAPI server using the Uvicorn package. It is responsible for starting the API service that the web application interacts with.
- Server Initialization: Starts the FastAPI server on a specified port.
- Asynchronous Support: Uses
asyncandawaitfor efficient handling of requests.
This file contains the Streamlit web application that provides a user-friendly interface for interacting with the LLM. The web application is designed to be simple yet efficient, allowing users to enter prompts, view responses, and download source files.
- Prompt Input: A text input field for users to enter their prompts.
- Response Display: Displays the LLM's response to the entered prompt.
- Source Download: Provides an option to download the file from which the LLM retrieved the context for its response.
- User Input: The user enters a prompt in the input field.
- Send to API: The prompt is sent to the FastAPI backend for processing.
- Display Response: The LLM's response is displayed on the screen.
- Download Source: If applicable, the user can download the source file used for context retrieval.
- Python 3.12 or higher
- Poetry package manager
- Qdrant (can be set up using Docker)
- Nvidia API key (for accessing the
meta/llama3-70b-instructmodel)
- https://huggingface.co/meta-llama/Meta-Llama-3-70B/tree/main
- https://build.nvidia.com/meta/llama-3_3-70b-instruct
-
Clone the Repository:
git clone https://github.com/Krupique/rag-search cd rag-search -
Install Dependencies:
poetry install
-
Set Up Qdrant:
- Ensure Qdrant is installed and running. You can use Docker to set up Qdrant:
docker run --name vectordb -dit -p 6333:6333 qdrant/qdrant
-
Add Data Files:
- Place your data files (txt, pdf, pptx, docx) in the
data/directory.
- Place your data files (txt, pdf, pptx, docx) in the
-
Initialize the Vector Database:
poetry run python app/vector_db.py data_path
-
Start the FastAPI Server:
poetry run python app/start_api.py
-
Run the Streamlit Web Application:
poetry run streamlit run app/webapp.py
P.S. The execution of webapp.py and start_api need to be in different bashes.
- Open your web browser and navigate to
http://localhost:8501to access the Streamlit web interface.
- Enter a Prompt: Type your question or prompt into the input field.
- Submit: Press the "Submit" button to send the prompt to the API.
- View Response: The LLM's response will be displayed on the screen.
- Download Source: If available, you can download the file from which the LLM retrieved the context.
Questions:
- Was the Ninja 300 designed for rider-friendly?
- Have the Ninja 250/300 won great popularity?
This project is licensed under the MIT License. See the LICENSE file for more details.
- Qdrant: For providing the vector database solution.
- FastAPI: For enabling the creation of a robust and efficient API.
- Streamlit: For simplifying the development of the web interface.
- Nvidia: For hosting the
meta/llama3-70b-instructmodel.
For any questions or feedback, please contact the project maintainer at Henrique Krupck.
This detailed documentation provides a comprehensive guide to the project, its components, setup instructions, and usage guidelines. It is designed to help users and contributors understand and effectively utilize the system.