Skip to content

Once28/CLARA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

77 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CLARA Logo CLARA: CLinical Audit & Regulatory Assistant πŸ©ΊπŸ’œ

CLARA, derived from the latin Clarus, is an agentic platform built for the MedGemma Impact Challenge. CLARA automates the regulatory cross-examination of clinical trial protocols, tested on 21 CFR (Parts 11, 50, 56, 58, 211, 312, 314, etc.) and 45 CFR Part 46 (Common Rule). The name CLARA and term Clarus reinforces what we stand for: clarity in complex decisions, trust in high-stakes clinical environments, and a human presence within AI that feels supportive rather than technical. In healthcare, intelligence must be clear, reliable, and approachable β€” and CLARA embodies all three.

CLARA System Architecture

Figure 1: CLARA System Architecture

CLARA tackles the critical bottleneck in clinical trials: regulatory compliance checking. By combining RAG with an FDA-auditor LLM (MedGemma via Vertex AI), CLARA automates the cross-examination of clinical trial protocols against FDA and HHS regulations.

So the protocol is the source of truth in the index; regulations are checked against it (rather than the other way around).

Getting Started

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • A Gemini API key (free at aistudio.google.com) or a Google Cloud project with a deployed MedGemma Vertex AI endpoint
  • Run gcloud auth application-default login if using Vertex AI

1. Backend Setup

cd backend

# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate        # macOS / Linux
# venv\Scripts\activate         # Windows

# Install Python dependencies
pip install -r requirements.txt

# Copy and fill environment variables
cp .env_sample .env
# At minimum, set GEMINI_API_KEY for the Gemini Flash placeholder LLM.
# For MedGemma (Vertex AI), set GCP_PROJECT_ID, GCP_REGION, VERTEX_ENDPOINT_ID instead.

# Start the FastAPI backend
uvicorn src.server:app --reload --port 8000

2. Frontend Setup

In a separate terminal:

cd frontend
npm install
npm run dev

The React frontend is served at http://localhost:5173. The Vite dev server proxies all /api/* requests to the backend at http://localhost:8000 β€” no CORS configuration required.

πŸ“ Directory Guide

CLARA/
β”œβ”€β”€ README.md
β”œβ”€β”€ WRITEUP.md              # MedGemma Impact Challenge submission
β”œβ”€β”€ backend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ server.py       # FastAPI server β€” upload, audit, list, delete endpoints
β”‚   β”‚   β”œβ”€β”€ app.py          # Uvicorn entry point (alternative to uvicorn CLI)
β”‚   β”‚   β”œβ”€β”€ gemini_llm.py   # Gemini 1.5 Flash LLM wrapper (free placeholder)
β”‚   β”‚   β”œβ”€β”€ medgemma_llm.py # MedGemma via Vertex AI (production LLM)
β”‚   β”‚   β”œβ”€β”€ vector_store.py # Reversed RAG: protocol chunks as KB, CFR as query
β”‚   β”‚   β”œβ”€β”€ ecfr_client.py  # Live eCFR API client (21/45 CFR)
β”‚   β”‚   β”œβ”€β”€ graph.py        # LangGraph workflow definition
β”‚   β”‚   β”œβ”€β”€ nodes.py        # Retrieval and audit node implementations
β”‚   β”‚   β”œβ”€β”€ state.py        # Agent state schema (TypedDict)
β”‚   β”‚   └── prompts.py      # System prompts for FDA auditor persona
β”‚   β”œβ”€β”€ test/
β”‚   β”‚   β”œβ”€β”€ evaluate_retrieval.py   # Retrieval benchmarking (precision, recall, NDCG)
β”‚   β”‚   β”œβ”€β”€ generate_ground_truth.py
β”‚   β”‚   β”œβ”€β”€ plot_results.py
β”‚   β”‚   β”œβ”€β”€ ground_truth/   # Annotated retrieval ground truth
β”‚   β”‚   └── results/        # Benchmark output (HTML report, CSV, PNG curves)
β”‚   β”œβ”€β”€ data/
β”‚   β”‚   β”œβ”€β”€ chroma_db/      # Persistent vector database
β”‚   β”‚   └── documents/      # Sample protocols (compliant & non-compliant)
β”‚   β”œβ”€β”€ requirements.txt
β”‚   └── .env                # GEMINI_API_KEY, rate limits, Vertex AI config
β”œβ”€β”€ frontend/
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ App.jsx         # Root application component
β”‚   β”‚   β”œβ”€β”€ components/     # UI components (Header, Sidebar, modals, tutorial)
β”‚   β”‚   β”œβ”€β”€ hooks/          # Custom React hooks (useAudits)
β”‚   β”‚   β”œβ”€β”€ services/       # API client (api.js)
β”‚   β”‚   └── styles/         # Global CSS
β”‚   β”œβ”€β”€ .env                # VITE_USE_MOCK=false (leave VITE_API_URL empty for proxy)
β”‚   └── package.json
└── assets/                 # Logos and static images

πŸ”§ Core Components

backend/src/server.py - FastAPI Backend (primary)

  • On startup: auto-detects LLM β€” uses Gemini 1.5 Flash if GEMINI_API_KEY is set, otherwise MedGemma via Vertex AI. Fetches all CFR parts from the eCFR API.
  • On protocol upload: enforces rate limits (3/min, 50/day) and file size cap (10 MB), then extracts text, chunks and embeds it via vector_store.index_protocol. For each CFR regulation runs query_protocol_for_regulation (reversed RAG), builds context, and runs the LLM audit with structured output.

backend/src/gemini_llm.py / medgemma_llm.py - LLM Wrappers

  • GeminiFlashLLM: free placeholder using gemini-1.5-flash via the Gemini API. Activated when GEMINI_API_KEY is present in .env.
  • MedGemmaVertexLLM: production LLM using MedGemma deployed on Vertex AI. Used when no Gemini key is set.

backend/src/graph.py - Workflow Engine

  • Defines LangGraph state machine
  • Connects retrieval β†’ audit nodes
  • Compiles executable graph

backend/src/nodes.py - Processing Nodes (LangGraph / standalone app)

  • retrieval_node: Uses a retriever for the graph-based flow.
  • audit_node: Performs LLM-based regulatory analysis. The main API flow in server.py uses its own reversed RAG path (protocol index + CFR-as-query) and structured prompt.

backend/src/state.py - State Management

AgentState:
  - protocol_text: str              # Input protocol section
  - retrieved_regulations: List[str] # Relevant CFR sections
  - audit_results: str               # Compliance analysis
  - compliance_score: int            # 1-100 score (future)

backend/src/ecfr_client.py - Regulatory Data

  • Fetches live 21 CFR (Parts 11, 46, 50, 56, 58, 211, 312, 314) and 45 CFR Part 46 from eCFR.gov API
  • Generic get_part(title, part) for any CFR title/part

backend/src/vector_store.py - RAG

  • Protocol as knowledge base: Uploaded protocols are chunked (RecursiveCharacterTextSplitter), embedded (HuggingFace sentence-transformers), and stored in Chroma (protocol_chunks collection).
  • CFR as query: For each CFR regulation, the regulation text is used as the search query; the retriever returns the top-k protocol chunks that address it (MMR for diversity).
  • No CFR text is stored in the vector store; only protocol chunks are indexed.

backend/src/prompts.py - Prompt Engineering

  • FDA Regulatory Auditor persona
  • Structured instructions for compliance checking
  • Focus on electronic signatures and audit trails

πŸ“„ License

This project is built for the MedGemma Impact Challenge. Contributions are welcome! Please open an issue or submit a pull request. For questions about this project, please refer to WRITEUP.md for technical documentation used for MedGamme submission.

About

CLARA: CLinical Audit & Regulatory Assistant πŸ©ΊπŸ’œis an agentic platform to automate the regulatory cross-examination of clinical trial protocols, ensuring alighment with federal regulations and global ethical standards. Beyond the acronym, "Clara", comes from the Latin clarus, meaning clear. This reinforces what we stand for: clear and reliable AI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors