A Streamlit-powered RAG (Retrieval-Augmented Generation) application that enables users to search and query across multiple codebases. The app features code screenshot analysis capabilities and provides context-aware responses using Pinecone for vector search.
- Search across multiple GitHub repositories simultaneously
- Upload and analyze code screenshots using OCR
- RAG-powered responses using Pinecone vector database
- Interactive chat interface
- Support for switching between different codebases
- OCR capability for code screenshots
- Python 3.8+
- Tesseract OCR
- API keys for:
- Pinecone
- Groq
- GitHub repositories indexed in Pinecone
- Install Tesseract OCR:
# Ubuntu
sudo apt-get install tesseract-ocr
# macOS
brew install tesseract
# Windows
# Download and install from https://github.com/UB-Mannheim/tesseract/wiki
- Install Python dependencies:
pip install -r requirements.txt
- Create a
.env
file in the project root with your API keys:
OPENAI_API_KEY=your-key
GROQ_API_KEY=your-key
PINECONE_API_KEY=your-key
PINECONE_ENV=us-east-1
PINECONE_INDEX_SECURE=codebase1-rag
PINECONE_INDEX_CHATBOT=codebase2-rag
- Create a
.streamlit/secrets.toml
file:
OPENAI_API_KEY = "your-key"
GROQ_API_KEY = "your-key"
PINECONE_API_KEY = "your-key"
PINECONE_ENV = "us-east-1"
PINECONE_INDEX_SECURE = "codebase1-rag"
PINECONE_INDEX_CHATBOT = "codebase2-rag"
- Activate your virtual environment (if using one)
python -m venv venv
source venv/bin/activate # On Windows: .\venv\Scripts\activate
- Run the Streamlit app:
streamlit run chat.py
- Open your browser and navigate to
http://localhost:8501
- Select one or both codebases to search from the available options
- (Optional) Upload a code screenshot for specific code-related questions
- Enter your query in the chat input
- View the AI-powered response based on the selected codebases and uploaded screenshot
- Fork this repository
- Create a new app on Streamlit Cloud
- Connect your GitHub repository
- Add your secrets in the Streamlit Cloud dashboard:
- Go to App Settings > Secrets
- Add all the required API keys and configuration values
- Deploy the app
.
├── chat.py # Main Streamlit application
├── embeddings.py # Embedding generation utilities
├── rag_utils.py # RAG search and processing functions
├── requirements.txt # Python dependencies
└── packages.txt # System dependencies (Tesseract)
------.env# not included, add this yourself.
------.streamlit/secrets.toml # not inlcuded, add this yourself.
- Requires pre-indexed repositories in Pinecone
- OCR quality depends on screenshot clarity
- API keys required for full functionality