A modern, AI-powered document question and answer system built with Python and OpenAI's API.
- Modern GUI: Sleek CustomTkinter interface with dark theme
- Multiple AI Models: Choose from GPT-4o Mini, GPT-4o, GPT-3.5 Turbo, or GPT-4
- Smart PDF Processing: Intelligent text chunking with overlap for better context
- Vector Search: Advanced similarity search for relevant document sections
- Real-time Q&A: Interactive question answering with progress indicators
document_searcher/
├── main.py # Original command-line interface
├── modern_gui.py # Modern GUI application
├── run_gui.py # GUI launcher script
├── query_manager.py # Command-line query interface
├── qa_agent.py # AI model interface
├── document_loader.py # PDF processing and chunking
├── embeddings_manager.py # Vector embeddings management
├── vector_search.py # Similarity search
├── pdf_metadata.py # PDF metadata extraction
├── model_comparison.py # Model performance testing
├── gui_app.py # Original GUI (legacy)
├── api_testing.py # API testing utilities
├── view_usage.py # Usage monitoring
└── README.md # This file
-
Clone the repository:
git clone https://github.com/JamesKruik/document-qa-system.git cd document-qa-system/document_searcher
-
Install dependencies:
pip install openai python-dotenv PyPDF2 numpy customtkinter
-
Set up environment variables:
- Create a
.env
file in thedocument_searcher
directory - Add your OpenAI API key:
OPENAI_API_KEY=your_api_key_here
- Create a
python modern_gui.py
python main.py
python query_manager.py
python model_comparison.py
Model | Cost | Accuracy | Best For |
---|---|---|---|
GPT-4o Mini | Low | Good | Simple questions |
GPT-4o | Medium | Excellent | Most tasks |
GPT-3.5 Turbo | Very Low | Medium | Technical content |
GPT-4 | High | Outstanding | Complex reasoning |
- Load PDFs: Select and load your PDF documents
- Process Documents: Create vector embeddings for intelligent search
- Ask Questions: Type questions and get accurate answers
- Choose Models: Select the AI model that fits your needs
- PDF Processing: PyPDF2 for text extraction
- Text Chunking: Smart chunking with sentence boundary detection
- Embeddings: OpenAI's text-embedding-3-small model
- Vector Search: Cosine similarity with multiple chunk retrieval
- GUI Framework: CustomTkinter for modern interface
- "How do I configure the device settings?"
- "What are the technical specifications?"
- "What troubleshooting steps are available?"
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
This project is open source and available under the MIT License.
- OpenAI for the powerful API
- CustomTkinter for the modern GUI framework
- PyPDF2 for PDF processing capabilities