We are going to build a user-friendly news research tool designed for effortless information retrieval. Users can input article URLs and ask questions to receive relevant insights from the real-estate domain. (But it's features can be extended to any domain.)
- Load URLs to fetch article content.
- Process article content through LangChain's UnstructuredURL Loader
- Construct an embedding vector using HuggingFace embeddings and leverage FAISS as the vectorstore, to enable swift and effective retrieval of relevant information.
- Interact with the LLM's (Llama3 via Groq) by inputting queries and receiving answers along with source URLs.
-
Run the following command to install all dependencies.
pip install -r requirements.txt
-
Create a .env file with your GROQ credentials as follows:
GROQ_API_KEY=GROQ_API_KEY_HERE -
Run the streamlit app by running the following command.
streamlit run main.py
The web app will open in your browser after the set-up is complete.
-
On the sidebar, you can input URLs directly.
-
Initiate the data loading and processing by clicking "Process URLs."
-
Observe the system as it performs text splitting, generates embedding vectors using HuggingFace's Embedding Model.
-
The embeddings will be stored in FAISS.
-
One can now ask a question and get the answer based on those news articles and make sure that the access is denied if that happens then the LLM will not have any information!