Reinforcement Learning for Hallucination Prevention in GenAI models for Medical Domain QA

Project Description

This project applies reinforcement learning (specifically a contextual bandit approach using LinUCB) to mitigate hallucinations in generative language models when answering medical questions. By dynamically selecting between models like Google's Gemini 2.0 Flash, Phi 3 Mini, and Qwen 4 based on the question and category, we aim to improve the factual consistency of outputs.

Team Members

Agaaz Singhal
Krishnav Mahansaria
Mudit Surana

Technologies & Libraries Used

Python 3.x
APIs:
- Hugging Face Inference API
- Google Gemini API
Libraries:
- requests
- json, sys
- numpy
- pandas
- matplotlib
- scikit-learn
- sentence-transformers

Project Structure

ask_medical_question.py: Script that lets users input a medical question and selects the best model based on category.
contextual_bandit_llm_selector.py: Implements the LinUCB algorithm for contextual bandit training and evaluation.
dataset.jsonl: Input dataset used for training, formatted in JSON Lines.
requirements.txt: List of all required Python packages.

Interactive Live Demo

Link- https://www.loom.com/share/15707f4c9ef34202814c8662784092ee?sid=3e6aa55f-bb40-4f82-a2a7-777dffbc22fc

Dataset Format

Each entry in dataset.jsonl should include:

{
  "input": "What is the first-line treatment for asthma?",
  "output": "Inhaled corticosteroids.",
  "subject_name": "Medicine",
  "gemini_ans": "...",
  "phi3mini_ans": "...",
  "qwen4b_ans": "..."
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
first_batch_llm_output.jsonl		first_batch_llm_output.jsonl
linucb_hybrid.py		linucb_hybrid.py
llms_apis.py		llms_apis.py
project_report_RL.pdf		project_report_RL.pdf
raw_medical_dataset.json		raw_medical_dataset.json
rl_run.py		rl_run.py
ucb_train.py		ucb_train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning for Hallucination Prevention in GenAI models for Medical Domain QA

Project Description

Team Members

Technologies & Libraries Used

Project Structure

Interactive Live Demo

Dataset Format

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning for Hallucination Prevention in GenAI models for Medical Domain QA

Project Description

Team Members

Technologies & Libraries Used

Project Structure

Interactive Live Demo

Dataset Format

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages