- In this project I have built an end to end langchain project using hugging open source llm models such as Mistral and also open source embedding models.
- You can check the project live here
- This project showcase the implementation of an advanced RAG system that uses Hugging Face as an llm to retrieve information from different PDF documents.
Steps I followed:
- I have used the
PyPdfDirectoryLoader
from thelangchain_community
document loader to load the PDF documents from theus-census-data
directory. - transformed each text into a chunk of
1000
using theRecursiveCharacterTextSplitter
imported from thelangchain.text_splitter
- stored the vector embeddings which were made using the
HuggingFaceBgeEmbeddings
using theFAISS
vector store. - setup the llm
HuggingFaceEndpoint
with the model namemistralai/Mistral-7B-Instruct-v0.2
- Setup
PromptTemplate
- Setup
vector_embedding
function to enbedd the documents and store them in theFAISS
vectorstore - finally created the
RetrievalQA
for chainingllm
,prompt
andretriever
.
- langchain==0.1.20
- langchain-community==0.0.38
- langchain-huggingface==0.0.1
- faiss-cpu==1.8.0
- python-dotenv==1.0.1
- Prerequisites
- Git
- Command line familiarity
- Clone the Repository:
git clone https://github.com/NebeyouMusie/End-to-End-Gen-AI-Powered-App.git
- Create and Activate Virtual Environment (Recommended)
python -m venv venv
source venv/bin/activate
- Navigate to the projects directory
cd ./End-to-End-Gen-AI-Powered-App
using your terminal - Install Libraries:
pip install -r requirements.txt
- Navigate to the app directory
cd ./app
using your terminal - run
streamlit run app.py
- open the link displayed in the terminal on your preferred browser
- click on the
Embedd Documents
button and wait until the documnets are processed - Enter your question from the PDFs found in the
us-census-data
directory
- Collaborations are welcomed ❤️
- I would like to thank Krish Naik
- LinkedIn: Nebeyou Musie
- Gmail: nebeyoumusie@gmail.com
- Telegram: Nebeyou Musie