Skip to content

Master132408/Reinforcement-Learning

Repository files navigation

Reinforcement Learning for Hallucination Prevention in GenAI models for Medical Domain QA

Project Description

This project applies reinforcement learning (specifically a contextual bandit approach using LinUCB) to mitigate hallucinations in generative language models when answering medical questions. By dynamically selecting between models like Google's Gemini 2.0 Flash, Phi 3 Mini, and Qwen 4 based on the question and category, we aim to improve the factual consistency of outputs.

Team Members

  • Agaaz Singhal
  • Krishnav Mahansaria
  • Mudit Surana

Technologies & Libraries Used

Project Structure

  • ask_medical_question.py: Script that lets users input a medical question and selects the best model based on category.
  • contextual_bandit_llm_selector.py: Implements the LinUCB algorithm for contextual bandit training and evaluation.
  • dataset.jsonl: Input dataset used for training, formatted in JSON Lines.
  • requirements.txt: List of all required Python packages.

Interactive Live Demo

Link- https://www.loom.com/share/15707f4c9ef34202814c8662784092ee?sid=3e6aa55f-bb40-4f82-a2a7-777dffbc22fc

Dataset Format

Each entry in dataset.jsonl should include:

{
  "input": "What is the first-line treatment for asthma?",
  "output": "Inhaled corticosteroids.",
  "subject_name": "Medicine",
  "gemini_ans": "...",
  "phi3mini_ans": "...",
  "qwen4b_ans": "..."
}

About

RL Project - Agaaz Singhal, Krishnav Mahansaria, Mudit Surana

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages