Skip to content

SagarMaddela/Voice-Rag

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ₯ LoopHealth - Voice-Enabled Hospital Network Assistant

Python FastAPI React License

LoopHealth is an intelligent voice-enabled hospital network assistant powered by RAG (Retrieval-Augmented Generation) technology. Users can interact naturally through voice to find hospitals, get information about medical facilities, and receive spoken responses in real-time.

✨ Features

  • 🎀 Voice-First Interface - Natural voice interaction with speech-to-text and text-to-speech
  • πŸ” Intelligent Hospital Search - RAG-based semantic search using FAISS vector database
  • πŸ’¬ Conversational Memory - Context-aware conversations with session management
  • πŸ€– AI-Powered Responses - Google Gemini 2.5 Flash for natural language understanding
  • 🌐 Modern Web UI - Clean, responsive React interface with real-time feedback
  • πŸ”„ Multi-City Support - Smart disambiguation for hospitals across different cities
  • ⚑ Real-Time Processing - Fast audio processing and response generation

πŸ—οΈ Architecture

LoopHealth/
β”œβ”€β”€ backend/                 # FastAPI backend server
β”‚   β”œβ”€β”€ main.py             # Main API endpoints and orchestration
β”‚   β”œβ”€β”€ rag/                # RAG implementation
β”‚   β”‚   β”œβ”€β”€ vector_store.py # FAISS vector database interface
β”‚   β”‚   β”œβ”€β”€ retriever.py    # Hospital retrieval logic
β”‚   β”‚   β”œβ”€β”€ build_index.py  # Index building utilities
β”‚   β”‚   β”œβ”€β”€ hospitals.faiss # Pre-built FAISS index
β”‚   β”‚   └── hospitals_meta.pkl # Hospital metadata
β”‚   β”œβ”€β”€ data/               # Source data
β”‚   β”‚   └── List of GIPSA Hospitals - Sheet1.csv
β”‚   └── requirements.txt    # Python dependencies
β”‚
└── frontend/               # React frontend application
    β”œβ”€β”€ src/
    β”‚   β”œβ”€β”€ App.jsx        # Main application component
    β”‚   β”œβ”€β”€ App.css        # Styling
    β”‚   └── index.js       # Entry point
    β”œβ”€β”€ public/            # Static assets
    └── package.json       # Node dependencies

πŸ› οΈ Technology Stack

Backend

  • FastAPI - High-performance async web framework
  • Faster Whisper - Offline speech-to-text (OpenAI Whisper)
  • Google Gemini 2.5 Flash - Large language model for natural responses
  • gTTS - Google Text-to-Speech for voice synthesis
  • FAISS - Facebook AI Similarity Search for vector database
  • Sentence Transformers - Semantic embeddings (all-MiniLM-L6-v2)
  • Python 3.8+ - Core programming language

Frontend

  • React 19.2 - Modern UI library
  • React Icons - Icon components
  • TailwindCSS - Utility-first CSS framework
  • Web Audio API - Audio recording and playback

πŸ“‹ Prerequisites

  • Python 3.8+ installed
  • Node.js 16+ and npm installed
  • Google Gemini API Key (Get one here)

πŸš€ Installation & Setup

1. Clone the Repository

git clone https://github.com/yourusername/LoopHealth.git
cd LoopHealth

2. Backend Setup

cd backend

# Create virtual environment
python -m venv venv

# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Create .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env

Important: Replace your_api_key_here with your actual Google Gemini API key.

3. Frontend Setup

cd ../frontend

# Install dependencies
npm install

▢️ Running the Application

Start Backend Server

cd backend
# Activate virtual environment if not already active
venv\Scripts\activate  # Windows
# source venv/bin/activate  # macOS/Linux

# Run FastAPI server
uvicorn main:app --reload --host 0.0.0.0 --port 8000

The backend API will be available at http://localhost:8000

Start Frontend Development Server

cd frontend

# Run React development server
npm start

The frontend will automatically open at http://localhost:3000

πŸ“– Usage

  1. Open the Application - Navigate to http://localhost:3000 in your browser
  2. Listen to Introduction - Loop AI will automatically greet you
  3. Click the Microphone - Press the microphone button to start recording
  4. Ask Your Question - Speak naturally, e.g., "Find me hospitals in Mumbai"
  5. Receive Response - Loop AI will respond with relevant information via voice and text
  6. Follow-up Questions - Continue the conversation with context-aware follow-ups

Example Queries

  • "Show me hospitals in Bangalore"
  • "Which hospitals are near Andheri?"
  • "Tell me about Apollo Hospital"
  • "What's the address of the hospital you mentioned?"

πŸ”Œ API Endpoints

GET /introduction

Returns the Loop AI introduction audio and text.

Response:

{
  "text_response": "Hello, I am Loop AI...",
  "audio_base64": "base64_encoded_audio",
  "audio_format": "mp3"
}

POST /voice

Main voice interaction endpoint.

Request:

  • Headers: X-Session-ID (optional) - Session identifier for conversation continuity
  • Body: multipart/form-data with audio file

Response:

{
  "text_response": "I found 3 hospitals in Mumbai...",
  "audio_base64": "base64_encoded_audio",
  "audio_format": "mp3",
  "session_id": "uuid-session-id"
}

🧠 How It Works

Voice Processing Pipeline

  1. Speech-to-Text (STT)

    • User speaks into microphone
    • Audio recorded as WebM format
    • Faster Whisper transcribes to text
  2. Retrieval-Augmented Generation (RAG)

    • User query embedded using Sentence Transformers
    • FAISS searches vector database for top-k similar hospitals
    • Retrieved hospital data provides context
  3. LLM Processing

    • Conversation history retrieved from session memory
    • Google Gemini generates natural language response
    • Context includes: conversation history + retrieved hospitals + user query
  4. Text-to-Speech (TTS)

    • Response text cleaned of markdown/formatting
    • gTTS converts to MP3 audio
    • Audio encoded as base64 and sent to frontend
  5. Frontend Playback

    • Audio decoded and played automatically
    • Text displayed in response box
    • Session ID maintained for follow-up questions

Session Management

  • Each conversation has a unique session ID
  • Last 5 exchanges (10 messages) stored per session
  • Enables context-aware follow-up questions
  • Automatic disambiguation for multi-city hospitals

πŸ—‚οΈ Data

The system uses the GIPSA Hospitals dataset containing:

  • Hospital names
  • Cities and locations
  • Addresses
  • Contact information

Data is pre-processed and indexed using FAISS for fast semantic search.

πŸ”§ Configuration

Backend Configuration

Edit backend/.env:

GEMINI_API_KEY=your_gemini_api_key

Model Configuration

In backend/main.py:

  • Whisper Model: base (can upgrade to small, medium, large)
  • LLM Model: gemini-2.5-flash (free tier friendly)
  • Embedding Model: all-MiniLM-L6-v2 (384 dimensions)
  • Max Conversation History: 5 exchanges

Frontend Configuration

In frontend/src/App.jsx:

  • Backend URL: http://localhost:8000
  • Audio Format: WebM for recording, MP3 for playback

πŸ§ͺ Testing

Backend Testing

cd backend
# Test API endpoints
curl http://localhost:8000/introduction

# Test voice endpoint (requires audio file)
curl -X POST http://localhost:8000/voice \
  -F "file=@test_audio.webm" \
  -H "X-Session-ID: test-session"

Frontend Testing

cd frontend
npm test

🚧 Troubleshooting

Backend Issues

Issue: GEMINI_API_KEY not found

  • Solution: Ensure .env file exists in backend/ directory with valid API key

Issue: ModuleNotFoundError

  • Solution: Activate virtual environment and reinstall dependencies

Issue: FAISS index not found

  • Solution: Run python rag/build_index.py to rebuild the index

Frontend Issues

Issue: Cannot connect to backend

  • Solution: Ensure backend is running on port 8000

Issue: Microphone not working

  • Solution: Grant browser microphone permissions

Issue: Audio not playing

  • Solution: Check browser console for errors, ensure audio autoplay is allowed

🀝 Contributing

Contributions are welcome! Please follow these steps:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI Whisper - Speech recognition model
  • Google Gemini - Large language model
  • Facebook FAISS - Vector similarity search
  • Sentence Transformers - Semantic embeddings
  • GIPSA - Hospital dataset

πŸ“ž Contact

For questions or support, please open an issue on GitHub.


Built with ❀️ for better healthcare accessibility

About

LoopHealth is an intelligent voice-enabled hospital network assistant powered by RAG (Retrieval-Augmented Generation) technology. Users can interact naturally through voice to find hospitals, get information about medical facilities, and receive spoken responses in real-time.

Topics

Resources

Stars

Watchers

Forks

Contributors