Skip to content

Fane-Nathan/Research-Assistant

Repository files navigation

Research Assistant πŸ”

A powerful AI-powered research assistant that combines hybrid search with Retrieval-Augmented Generation (RAG) to help you find, understand, and synthesize academic content.

🌐 Live Demo

Try it now: Research Assistant

Experience the full functionality through our interactive web interface:

  • πŸ“š Research paper search and analysis
  • πŸ” Hybrid search capabilities (semantic + keyword)
  • πŸ€– Real-time RAG-powered question answering
  • πŸ“Š Document processing and knowledge base management

✨ Features

🎯 Intelligent Search

  • Hybrid Retrieval: Combines semantic (vector) and keyword (BM25) search
  • Multi-Source: arXiv papers, web articles, custom documents
  • Smart Ranking: Advanced fusion algorithms for relevance

🧠 AI-Powered Analysis

  • RAG Integration: Context-aware answers using retrieved documents
  • Multiple LLM Support: Google Gemini, OpenAI, and more
  • Flexible Modes: Strict RAG (document-only) or hybrid (document + general knowledge)

πŸ“Š Data Processing

  • Smart Chunking: Sentence-boundary aware document segmentation
  • High-Quality Embeddings: Google Gemini text embeddings
  • Efficient Indexing: Optimized storage and retrieval

πŸ–₯️ Multiple Interfaces

  • Web App: Beautiful Streamlit interface
  • CLI Tool: Powerful command-line interface
  • Python API: Programmatic access for integration

πŸš€ Quick Start

Prerequisites

  • Python 3.9+
  • Google Gemini API key (free tier available)

Installation

  1. Clone the repository

    git clone https://github.com/Fane-Nathan/Research-Assistant.git
    cd Research-Assistant
  2. Install dependencies

    pip install -r requirements.txt
  3. Configure environment

    cp .env.example .env
    # Edit .env with your API keys
  4. Run the web interface

    streamlit run app.py

πŸ“– Documentation

πŸ› οΈ Usage Examples

Web Interface

Visit our live demo or run locally:

streamlit run app.py

Command Line

# Search arXiv papers
python scripts/cli.py search "machine learning transformers"

# Process custom documents
python scripts/cli.py process --url "https://example.com/paper.pdf"

# Ask questions about your documents
python scripts/cli.py query "What are the main contributions of this paper?"

Python API

from hybrid_search_rag import ResearchAssistant

# Initialize
assistant = ResearchAssistant()

# Search and get answers
results = assistant.search_and_answer(
    query="What are recent advances in neural networks?",
    mode="hybrid_rag"
)

πŸ—οΈ Architecture

Research Assistant is built with a modular, scalable architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Data Sources  β”‚    β”‚   Processing    β”‚     β”‚   Retrieval     β”‚
β”‚                 β”‚    β”‚                 β”‚     β”‚                 β”‚
β”‚ β€’ arXiv API     │───▢│ β€’ Text Chunking │───▢│ β€’ Vector Search β”‚
β”‚ β€’ Web Crawling  β”‚    β”‚ β€’ Embeddings    β”‚     β”‚ β€’ BM25 Search   β”‚
β”‚ β€’ Custom Docs   β”‚    β”‚ β€’ Indexing      β”‚     β”‚ β€’ Hybrid Fusion β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”               β”‚
β”‚  User Interface β”‚    β”‚   Generation    β”‚               β”‚
β”‚                 β”‚    β”‚                 β”‚               β”‚
β”‚ β€’ Web App       │◀───│ β€’ LLM Reasoning β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β€’ CLI Tool      β”‚    β”‚ β€’ RAG Synthesis β”‚
β”‚ β€’ Python API    β”‚    β”‚ β€’ Answer Format β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ§ͺ Testing

Run the test suite:

# Unit tests
python -m pytest tests/unit/

# Integration tests  
python -m pytest tests/integration/

# Deployment verification
python deployment_verification.py

🀝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes
  4. Add tests for your changes
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

πŸ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ”— Links

πŸ™ Acknowledgments

  • Google Gemini for embeddings and LLM capabilities
  • The open-source community for amazing libraries
  • arXiv for providing free access to research papers

Built with ❀️ for the research community

⭐ Star this repo | πŸ› Report Bug | πŸ’‘ Request Feature

About

ResearchAssistant represents the convergence of modern NLP, information retrieval, and user experience design, creating a powerful tool that makes academic research more accessible, efficient, and insightful. Whether you're a researcher, educator, or curious learner, ResearchAssistant transforms how you discover and understand academic knowledge

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages