CLARA, derived from the latin Clarus, is an agentic platform built for the MedGemma Impact Challenge. CLARA automates the regulatory cross-examination of clinical trial protocols, tested on 21 CFR (Parts 11, 50, 56, 58, 211, 312, 314, etc.) and 45 CFR Part 46 (Common Rule). The name CLARA and term Clarus reinforces what we stand for: clarity in complex decisions, trust in high-stakes clinical environments, and a human presence within AI that feels supportive rather than technical. In healthcare, intelligence must be clear, reliable, and approachable β and CLARA embodies all three.
Figure 1: CLARA System Architecture
CLARA tackles the critical bottleneck in clinical trials: regulatory compliance checking. By combining RAG with an FDA-auditor LLM (MedGemma via Vertex AI), CLARA automates the cross-examination of clinical trial protocols against FDA and HHS regulations.
So the protocol is the source of truth in the index; regulations are checked against it (rather than the other way around).
- Python 3.11+
- Node.js 18+
- A Gemini API key (free at aistudio.google.com) or a Google Cloud project with a deployed MedGemma Vertex AI endpoint
- Run
gcloud auth application-default loginif using Vertex AI
cd backend
# Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # macOS / Linux
# venv\Scripts\activate # Windows
# Install Python dependencies
pip install -r requirements.txt
# Copy and fill environment variables
cp .env_sample .env
# At minimum, set GEMINI_API_KEY for the Gemini Flash placeholder LLM.
# For MedGemma (Vertex AI), set GCP_PROJECT_ID, GCP_REGION, VERTEX_ENDPOINT_ID instead.
# Start the FastAPI backend
uvicorn src.server:app --reload --port 8000In a separate terminal:
cd frontend
npm install
npm run devThe React frontend is served at http://localhost:5173. The Vite dev server proxies all /api/* requests to the backend at http://localhost:8000 β no CORS configuration required.
CLARA/
βββ README.md
βββ WRITEUP.md # MedGemma Impact Challenge submission
βββ backend/
β βββ src/
β β βββ server.py # FastAPI server β upload, audit, list, delete endpoints
β β βββ app.py # Uvicorn entry point (alternative to uvicorn CLI)
β β βββ gemini_llm.py # Gemini 1.5 Flash LLM wrapper (free placeholder)
β β βββ medgemma_llm.py # MedGemma via Vertex AI (production LLM)
β β βββ vector_store.py # Reversed RAG: protocol chunks as KB, CFR as query
β β βββ ecfr_client.py # Live eCFR API client (21/45 CFR)
β β βββ graph.py # LangGraph workflow definition
β β βββ nodes.py # Retrieval and audit node implementations
β β βββ state.py # Agent state schema (TypedDict)
β β βββ prompts.py # System prompts for FDA auditor persona
β βββ test/
β β βββ evaluate_retrieval.py # Retrieval benchmarking (precision, recall, NDCG)
β β βββ generate_ground_truth.py
β β βββ plot_results.py
β β βββ ground_truth/ # Annotated retrieval ground truth
β β βββ results/ # Benchmark output (HTML report, CSV, PNG curves)
β βββ data/
β β βββ chroma_db/ # Persistent vector database
β β βββ documents/ # Sample protocols (compliant & non-compliant)
β βββ requirements.txt
β βββ .env # GEMINI_API_KEY, rate limits, Vertex AI config
βββ frontend/
β βββ src/
β β βββ App.jsx # Root application component
β β βββ components/ # UI components (Header, Sidebar, modals, tutorial)
β β βββ hooks/ # Custom React hooks (useAudits)
β β βββ services/ # API client (api.js)
β β βββ styles/ # Global CSS
β βββ .env # VITE_USE_MOCK=false (leave VITE_API_URL empty for proxy)
β βββ package.json
βββ assets/ # Logos and static images
backend/src/server.py - FastAPI Backend (primary)
- On startup: auto-detects LLM β uses Gemini 1.5 Flash if
GEMINI_API_KEYis set, otherwise MedGemma via Vertex AI. Fetches all CFR parts from the eCFR API. - On protocol upload: enforces rate limits (3/min, 50/day) and file size cap (10 MB), then extracts text, chunks and embeds it via
vector_store.index_protocol. For each CFR regulation runsquery_protocol_for_regulation(reversed RAG), builds context, and runs the LLM audit with structured output.
backend/src/gemini_llm.py / medgemma_llm.py - LLM Wrappers
GeminiFlashLLM: free placeholder usinggemini-1.5-flashvia the Gemini API. Activated whenGEMINI_API_KEYis present in.env.MedGemmaVertexLLM: production LLM using MedGemma deployed on Vertex AI. Used when no Gemini key is set.
backend/src/graph.py - Workflow Engine
- Defines LangGraph state machine
- Connects retrieval β audit nodes
- Compiles executable graph
backend/src/nodes.py - Processing Nodes (LangGraph / standalone app)
- retrieval_node: Uses a retriever for the graph-based flow.
- audit_node: Performs LLM-based regulatory analysis. The main API flow in
server.pyuses its own reversed RAG path (protocol index + CFR-as-query) and structured prompt.
backend/src/state.py - State Management
AgentState:
- protocol_text: str # Input protocol section
- retrieved_regulations: List[str] # Relevant CFR sections
- audit_results: str # Compliance analysis
- compliance_score: int # 1-100 score (future)backend/src/ecfr_client.py - Regulatory Data
- Fetches live 21 CFR (Parts 11, 46, 50, 56, 58, 211, 312, 314) and 45 CFR Part 46 from eCFR.gov API
- Generic
get_part(title, part)for any CFR title/part
backend/src/vector_store.py - RAG
- Protocol as knowledge base: Uploaded protocols are chunked (RecursiveCharacterTextSplitter), embedded (HuggingFace sentence-transformers), and stored in Chroma (
protocol_chunkscollection). - CFR as query: For each CFR regulation, the regulation text is used as the search query; the retriever returns the top-k protocol chunks that address it (MMR for diversity).
- No CFR text is stored in the vector store; only protocol chunks are indexed.
backend/src/prompts.py - Prompt Engineering
- FDA Regulatory Auditor persona
- Structured instructions for compliance checking
- Focus on electronic signatures and audit trails
This project is built for the MedGemma Impact Challenge. Contributions are welcome! Please open an issue or submit a pull request. For questions about this project, please refer to WRITEUP.md for technical documentation used for MedGamme submission.
