RAG-powered drug interaction safety checker built on AWS. Uses Retrieval Augmented Generation over open FDA pharmacology data to return grounded, cited safety reports. Built as a hands-on demonstration of pgvector semantic search, citation enforcement, confidence thresholds, and the concrete difference between grounded and ungrounded LLM responses.
MEDICAL DISCLAIMER: This software is for educational and demonstration purposes only. It is not intended for clinical use. Do not use this software to make medical decisions. Always consult a licensed healthcare professional.
-
Ingestion -- 300 drug labels are fetched from the openFDA API, split into 512-token chunks, embedded with Amazon Titan Embed Text v2 (1024 dimensions), and loaded into Aurora PostgreSQL with pgvector.
-
Query -- A user asks a drug interaction question via the Streamlit UI. The question is embedded, matched against stored chunks using HNSW cosine similarity search, and the top passages are sent to Amazon Nova Lite with a system prompt that enforces passage-level citations.
-
Comparison -- A RAG ON/OFF toggle lets users see the same question answered with and without retrieval, side by side. RAG ON responses are grounded in FDA label text with citations. RAG OFF responses come purely from LLM training data with no traceability.
βββββββββββββββββββ
β CloudFront β HTTPS, AWS Shield
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β ALB (HTTP) β SG: CloudFront prefix list only
ββββββββββ¬βββββββββ
β
ββββββββββββββββΌββββββββββββββββ
β ECS Fargate (Streamlit) β Private subnet, 1 task
β Python 3.12, port 8501 β 1 vCPU, 2 GB
ββββββββββββββββ¬ββββββββββββββββ
β HTTPS via NAT Gateway
ββββββββββββββββΌββββββββββββββββ
β API Gateway (HTTP API v2) β Public endpoint
β POST /query β Lambda integration
ββββββββββββββββ¬ββββββββββββββββ
β
ββββββββββββββββΌββββββββββββββββ
β Lambda (Python 3.13) β Private subnet, 512 MB
β RAG Query Handler β 30s timeout, X-Ray
β β
β 1. Embed question ββββΆ Bedrock Titan Embed v2
β 2. pgvector similarity ββββΆ Aurora PostgreSQL
β 3. Generate cited answer ββββΆ Bedrock Nova Lite v1
ββββββββββββββββββββββββββββββββ
β β
ββββββββββββββΌβββ ββββββββΌβββββββββββββββ
β Aurora β β Bedrock β
β Serverless v2 β β Titan Embed v2 β
β PostgreSQL 15 β β (1024-dim vectors) β
β pgvector HNSW β β Nova Lite v1 β
β 0.5β4 ACU β β (cross-region EU) β
ββββββββββββββββββ βββββββββββββββββββββββ
| Service | Role |
|---|---|
| CloudFront | HTTPS CDN in front of ALB. Viewer-to-edge encryption, AWS Shield Standard. |
| ALB | Routes HTTP traffic to ECS. Security group restricts inbound to CloudFront managed prefix list (pl-a3a144ca in eu-central-1). |
| ECS Fargate | Runs the Streamlit container. Single task, private subnet, no auto-scaling. Image pulled from ECR via VPC endpoints. |
| ECR | Private container registry for the Streamlit image. KMS-encrypted, scan-on-push, keeps last 5 images. |
| API Gateway (HTTP API v2) | Public HTTP API with a single POST /query route backed by Lambda. |
| Lambda | RAG query handler. Embeds the question, searches pgvector, calls Nova Lite for generation. VPC-attached (private subnet), X-Ray tracing enabled. Bundled with psycopg, pgvector, pydantic, structlog. |
| Aurora Serverless v2 | PostgreSQL 15 with the vector extension. HNSW index for cosine similarity search. 0.5β4 ACU, KMS-encrypted, private subnet only. Data API enabled for ingestion scripts. |
| Bedrock Titan Embed Text v2 | Embedding model. Converts text to 1024-dimensional vectors. Used by both the ingestion pipeline and the Lambda query handler. |
| Bedrock Nova Lite v1 | LLM for answer generation. Called via cross-region inference profile (eu.amazon.nova-lite-v1:0). System prompt enforces passage-level citations. |
| S3 | Stores chunked JSONL zip files from ingestion. Versioned, KMS-encrypted, lifecycle deletes non-current versions after 30 days. |
| KMS | Single customer-managed key (pharma-guard/main) encrypts Aurora, S3, ECR, Secrets Manager, and CloudWatch log groups. Annual rotation enabled. |
| Secrets Manager | Stores Aurora database credentials. Auto-generated during cluster creation. Never exposed as environment variables. |
| VPC | 2 AZs, 2 public + 2 private subnets, 1 NAT Gateway. |
| VPC Endpoints | Interface endpoints: ecr.api, ecr.dkr, secretsmanager, bedrock-runtime, logs. Gateway endpoint: s3. Minimizes NAT costs and keeps traffic private where possible. |
| CloudWatch | Log groups for Lambda, ECS, API Gateway, VPC Flow Logs (3-day retention). Dashboard with Lambda/API GW/Aurora metrics. Alarms for error rates, latency, and connection counts. |
| CloudWatch Synthetics | Canary runs every 5 minutes, invokes the Lambda with a test query, checks for the citations key in the response. Alarms on 2 consecutive failures. |
| X-Ray | Distributed tracing on Lambda and API Gateway. Subsegments around embed, retrieve, and generate steps. |
| SSM Session Manager | Shell access to ECS tasks without SSH or bastion hosts. |
- No public database endpoint. Aurora is in private subnets only. Lambda connects via the VPC.
- No public API Gateway. Although the HTTP API is technically public, the Streamlit frontend is the only caller (via NAT). The API could be further restricted with IAM auth or a resource policy.
- Minimal security groups. Each component (ALB, ECS, Lambda, Aurora, VPC endpoints) has its own SG with least-privilege rules. Aurora only accepts connections from the Lambda SG on port 5432.
- KMS encryption everywhere. A single CMK encrypts all data at rest: Aurora storage, S3 objects, ECR images, Secrets Manager values, and CloudWatch logs.
- No secrets in code or environment. Aurora credentials are in Secrets Manager. The Lambda reads them at runtime via the AWS SDK.
- Pre-commit hooks. detect-secrets blocks committed credentials, ruff lints and formats, bandit scans for security issues.
pharma-guard/
βββ .env.example # Template with placeholder values
βββ .pre-commit-config.yaml # detect-secrets, ruff, bandit, yamllint
βββ pyproject.toml # ruff + pytest config
β
βββ infra/ # CDK Python (7 stacks)
β βββ app.py # Stack wiring and dependency order
β βββ cdk.json
β βββ requirements.txt
β βββ stacks/
β βββ kms_stack.py # CMK with rotation
β βββ network_stack.py # VPC, subnets, SGs, VPC endpoints
β βββ storage_stack.py # S3 bucket, ECR repository
β βββ database_stack.py # Aurora Serverless v2, pgvector schema
β βββ api_stack.py # Lambda + API Gateway
β βββ frontend_stack.py # ECS Fargate + ALB + CloudFront
β βββ observability_stack.py # Dashboard, alarms, synthetics canary
β
βββ src/query_handler/ # Lambda source code
β βββ handler.py # Entry point, RAG pipeline orchestration
β βββ rag.py # Embed, retrieve, confidence check
β βββ llm.py # Nova Lite generation, citation parsing
β βββ db.py # Aurora connection, pgvector queries
β βββ models.py # Pydantic request/response models
β βββ requirements.txt # Bundled into Lambda deployment package
β
βββ scripts/ # Phase 1: local ingestion pipeline
β βββ drug_list.txt # 300 drug names
β βββ ingest_drugs.py # openFDA API β chunked JSONL β S3
β βββ load_aurora.py # S3 β Bedrock embed β Aurora (Data API)
β βββ verify_ingestion.py # Row counts, vector dims, sample query
β βββ requirements.txt
β βββ README.md
β
βββ docker/ # Phase 2: Streamlit container
β βββ Dockerfile # Python 3.12-slim, amd64
β βββ app.py # UI with RAG ON/OFF toggle
β βββ api_client.py # Calls API Gateway
β βββ requirements.txt
β βββ README.md
β
βββ tests/ # Unit tests (mocked AWS calls)
βββ conftest.py
βββ test_ingest_drugs.py
βββ test_load_aurora.py
βββ test_rag.py
βββ test_llm.py
βββ test_handler.py
- AWS CLI v2 configured for
eu-central-1 - AWS account with Bedrock model access enabled:
amazon.titan-embed-text-v2:0eu.amazon.nova-lite-v1:0(cross-region inference)
- Python 3.12+ (3.14 for local scripts, 3.12 for Docker)
- Docker Desktop (with buildx for cross-platform builds)
- Node.js 20+ (for CDK CLI)
- AWS CDK CLI:
npm install -g aws-cdk@2.170.0
git clone <repo-url> && cd pharma-guard
cp .env.example .env
pip install pre-commit && pre-commit installcd infra
pip install -r requirements.txt
cdk bootstrap aws://<ACCOUNT_ID>/eu-central-1 # first time only
cdk deploy --all --require-approval neverStacks deploy in dependency order: KMS β Network β Storage β Database β API β Frontend β Observability. Takes ~15-20 minutes on first deploy.
After deployment, copy the stack outputs into your .env:
PHARMA_GUARD_BUCKET=<BucketName output from StorageStack>
AURORA_SECRET_ARN=<SecretArn output from DatabaseStack>
AURORA_CLUSTER_ARN=<from: aws rds describe-db-clusters --query "DBClusters[?starts_with(DBClusterIdentifier,'pharmaguard')].DBClusterArn" --output text>
API_GATEWAY_URL=<ApiGatewayUrl output from ApiStack>
Must be done before the ECS service can start (it needs an image in ECR).
cd docker
# Authenticate Docker with ECR
aws ecr get-login-password --region eu-central-1 | \
docker login --username AWS --password-stdin <ACCOUNT_ID>.dkr.ecr.eu-central-1.amazonaws.com
# Build for amd64 (required β Fargate runs x86_64)
docker buildx build --platform linux/amd64 --provenance=false \
-t pharma-guard/streamlit . --load
# Tag and push
docker tag pharma-guard/streamlit:latest \
<ACCOUNT_ID>.dkr.ecr.eu-central-1.amazonaws.com/pharma-guard/streamlit:latest
docker push \
<ACCOUNT_ID>.dkr.ecr.eu-central-1.amazonaws.com/pharma-guard/streamlit:latestRuns from your laptop. Uses the RDS Data API (no VPC access needed).
cd scripts
pip install -r requirements.txt
# Fetch FDA drug labels, chunk, upload to S3
python ingest_drugs.py
# ~3 minutes for 300 drugs. Note the S3 key in the output.
# Embed chunks via Bedrock and load into Aurora
python load_aurora.py --s3-key ingestion/drug_chunks_<DATE>.jsonl.zip
# ~3 minutes for ~1700 chunks.
# Verify: row counts, vector dimensions, sample similarity search
python verify_ingestion.pyOpen the CloudFront URL from the FrontendStack output:
https://<distribution-id>.cloudfront.net
Enter drug names, type a clinical question, and click Check Interactions. Toggle RAG Mode off in the sidebar to see the ungrounded comparison.
aws ecs update-service \
--cluster <cluster-name> \
--service <service-name> \
--force-new-deployment \
--region eu-central-1User question
β
βΌ
βββββββββββββββββββββββ
β Titan Embed v2 β Convert question to 1024-dim vector
β (1024 dimensions) β
βββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β pgvector HNSW β Cosine similarity search
β WHERE drug_name β Filter by drug names (if provided)
β = ANY(...) β Threshold: configurable (default 0.50)
β ORDER BY cosine β Return top 5 chunks
β LIMIT 5 β
βββββββββββ¬ββββββββββββ
β
βΌ
Chunks above threshold?
β β
YES NO
β β
βΌ βΌ
βββββββββββββ ββββββββββββββββ
β Nova Lite β β Fallback β
β + system β β "Insufficientβ
β prompt β β data..." β
β + context β ββββββββββββββββ
βββββββ¬ββββββ
β
βΌ
Response contains
[Passage N] citations?
β β
YES NO
β β
βΌ βΌ
Return with Fallback
interactions response
+ citations
Fallback conditions (any one triggers it):
- No chunks retrieved from pgvector
- All chunks below the confidence threshold
- Nova Lite returns
INSUFFICIENT_DATA - Nova Lite response contains no
[Passage N]markers
cd infra
cdk destroy --allIf destroy fails due to non-empty resources:
# Delete ECR images
aws ecr batch-delete-image \
--repository-name pharma-guard/streamlit \
--image-ids "$(aws ecr list-images --repository-name pharma-guard/streamlit --query 'imageIds[*]' --output json)" \
--region eu-central-1
# Empty S3 bucket
aws s3 rm s3://<bucket-name> --recursive
# Retry destroy
cdk destroy --allEstimated cost for a weekend build and demo session:
| Service | Estimate |
|---|---|
| Aurora Serverless v2 (0.5 ACU, ~8 hrs) | ~$0.48 |
| NAT Gateway (data transfer) | ~$1.00 |
| Bedrock Titan Embed (~1700 chunks + queries) | ~$0.30 |
| Bedrock Nova Lite (demo queries) | ~$0.50 |
| ECS Fargate (1 task, ~8 hrs) | ~$0.40 |
| KMS (1 key, prorated) | ~$0.03 |
| Secrets Manager (1 secret, prorated) | ~$0.01 |
| VPC Endpoints (7 interface endpoints, ~8 hrs) | ~$0.80 |
| Lambda, API Gateway, S3, CloudFront, CloudWatch | < $0.10 |
| Total (weekend) | ~$4-6 |
Cost tip: VPC interface endpoints and NAT Gateway are the largest ongoing costs. Tear down promptly after your demo session.
pip install pytest pydantic structlog tiktoken
python -m pytest tests/ -vAll 22 tests run without AWS credentials (mocked AWS calls).
- Add more drug data sources (DailyMed, EU EMA)
- Tune HNSW parameters (
m,ef_construction) for larger datasets - Add RDS Proxy for Lambda connection pooling at scale
- Add Cognito authentication for the Streamlit frontend
- Multi-region deployment with Aurora Global Database
- EventBridge scheduled re-ingestion for FDA label updates
- Feedback loop to track and improve citation quality over time
- Swap Nova Lite for Claude for higher-quality generation