This project applies reinforcement learning (specifically a contextual bandit approach using LinUCB) to mitigate hallucinations in generative language models when answering medical questions. By dynamically selecting between models like Google's Gemini 2.0 Flash, Phi 3 Mini, and Qwen 4 based on the question and category, we aim to improve the factual consistency of outputs.
- Agaaz Singhal
- Krishnav Mahansaria
- Mudit Surana
- Python 3.x
- APIs:
- Libraries:
requestsjson,sysnumpypandasmatplotlibscikit-learnsentence-transformers
ask_medical_question.py: Script that lets users input a medical question and selects the best model based on category.contextual_bandit_llm_selector.py: Implements the LinUCB algorithm for contextual bandit training and evaluation.dataset.jsonl: Input dataset used for training, formatted in JSON Lines.requirements.txt: List of all required Python packages.
Link- https://www.loom.com/share/15707f4c9ef34202814c8662784092ee?sid=3e6aa55f-bb40-4f82-a2a7-777dffbc22fc
Each entry in dataset.jsonl should include:
{
"input": "What is the first-line treatment for asthma?",
"output": "Inhaled corticosteroids.",
"subject_name": "Medicine",
"gemini_ans": "...",
"phi3mini_ans": "...",
"qwen4b_ans": "..."
}