Skip to content

AvikshithReddy/EHR

Repository files navigation

Intelligent Care Continuity Archive

HIPAA-aware ETL + Patient 360 + NLP/RAG search for continuity of care.

What you get

  • ETL pipeline: cleans and standardizes raw CSVs, de-identifies patients, and builds a Patient 360 mart.
  • Population dashboard: overall trends and utilization (Streamlit).
  • Patient 360 UI + Q/A: patient summary + grounded answers from notes (Streamlit + RAG).

De-identification approach

  • Drops direct identifiers (name, address, SSN, etc.).
  • Removes full birth/death dates; keeps year only.
  • Buckets ages and masks 90+ birth year per safe-harbor practice.
  • Hashes patient/encounter identifiers with a configurable salt.

Quickstart (recommended)

  1. Phase 1: ETL + Dashboard (Notebook)
pip install -r requirements.txt
jupyter lab

Open notebooks/EHR_ETL_and_Dashboard.ipynb and run all cells.

  1. Phase 2: NLP + RAG (Notebook)
pip install -r requirements-phase2.txt
jupyter lab

Open notebooks/EHR_Phase2_NLP_RAG.ipynb and run all cells. Note: some clinical NLP packages are not yet compatible with Python 3.13. If you need scispaCy/medspaCy/Presidio, use Python 3.11.

  1. Build Relational + Vector Datastores Build a SQLite relational DB and a FAISS vector index (TF‑IDF fallback if FAISS unavailable).
python src/build_datastores.py

This creates:

  • ehr.db (relational tables)
  • note_chunks_fts (keyword index, if FTS5 is available)
  • notes.faiss (vector index) or tfidf.pkl (fallback)
  • structured “patient summary” chunks are also embedded for semantic retrieval
  1. Run Patient 360 + Q/A UI
pip install -r requirements-ui.txt
streamlit run dashboard/patient_chatbot.py

Retrieval uses hybrid fusion: keyword (BM25) + semantic (FAISS) with weighted score merging.

LLM Options (Grounded Answers)

  • Ollama (local):
export OLLAMA_MODEL="llama3.1:8b"
  • OpenAI (hosted):
export OPENAI_API_KEY="your_key"
export OPENAI_MODEL="gpt-4o-mini"

If no LLM is configured, the UI returns evidence-only answers.

Optional: set a salt for hashing.

export EHR_HASH_SALT="your_secret_salt"

Population Dashboard (optional)

pip install -r requirements-ui.txt
streamlit run dashboard/app.py

Outputs

  • data/processed/dim_patient.csv
  • data/processed/fact_*
  • data/processed/mart_patient_360.csv
  • data/processed/ehr.db (SQLite)
  • data/processed/notes.faiss or data/processed/tfidf.pkl (vector index)

About

Intelligent Care Continuity Archive

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors