RockyBot is a simple Streamlit application that lets you perform question-answering over news articles by leveraging LangChain, OpenAI embeddings, and FAISS vector search. Just input up to 3 news URLs, and RockyBot will retrieve relevant answers based on the content of those pages.
- π Load news articles from provided URLs
- π Split content into meaningful text chunks
- π§ Generate embeddings using OpenAI
- π¦ Store/retrieve embeddings using FAISS vector index
- β Ask questions and get source-based answers using a RetrievalQA chain
- π§Ύ Source tracking for transparent results
- Clone the repo
git clone https://github.com/yourusername/rockybot.git
cd rockybot
- Create a virtual environment
python -m venv venv
source venv/bin/activate # On Windows use: venv\Scripts\activate
- Install dependencies
pip install -r requirements.txt
- Set up
.env
file
Create a .env
file in the root directory with your OpenAI API key:
OPENAI_API_KEY=your_openai_api_key_here
Run the app with:
streamlit run app.py
Then:
- Input up to 3 news URLs in the sidebar.
- Click
Process URLs
to fetch and vectorize the content. - Type your question in the input box.
- RockyBot will answer your question with sources.
- WebBaseLoader scrapes the content from the URL.
- RecursiveCharacterTextSplitter splits the content into chunks.
- OpenAIEmbeddings generates vector representations of those chunks.
- FAISS indexes the vectors for fast retrieval.
- RetrievalQAWithSourcesChain uses a language model (via ChatOpenAI) to answer user queries with source tracking.
rockybot/
βββ app.py # Main Streamlit app
βββ requirements.txt # Python dependencies
βββ .env # Your OpenAI API Key (not included in repo)
βββ faiss_store_openai.pkl # Saved FAISS index (generated after first run)
- "What is the main topic of the article?"
- "Who is mentioned in the news?"
- "What are the key takeaways?"
- Only the first URL is currently processed (URLs 2 and 3 are ignored).
- The FAISS index is saved locally as
faiss_store_openai.pkl
. - You may need to delete or refresh the FAISS index if input URLs change.