Skip to content
Switch branches/tags

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

🌠 Hephaestus - Data analytics tools for Digital Health!

Hephaestus was the god of fire, metalworking, stone masonry, forges and the art of sculpture.



Efficient and effective health data warehousing and analysis require a common data model.

The OHDSI - OMOP Common Data Model allows for the systematic analysis of disparate observational databases and EMRs. The data from disparate systems needs to be extracted, transformed and loaded on to a CDM database. Once a database has been converted to the OMOP CDM, evidence can be generated using standardized analytics tools.

Each data source requires customized ETL tools for this conversion. 🌠 Hephestus is a tool for this ETL process organized into modules to allow code reuse between various ETL tools for open-source EMR systems and data sources. 🌠 Hephestus uses SqlAlchemy for database connection and automapping tables to classes and bonobo for managing ETL. Hephaestus also aims to support common machine learning workflows such as model building with Apache spark.

Design principles

  • Support common functions such as creating OMOP table structure from the command line.
  • Use ORM (sqlalchemy) and ETL (bonobo) libraries to reduce boilerplate code and make code extensible and reusable.
  • Support ETL for common open-source EMRs such as OpenMRS and OSCAR EMR, and national level health databases such as Discharge Abstract Database (Canada) from the command line.
  • Create libraries to support common use cases such as vocabulary lookup and Cui2Vec based concept similarity search.
  • Support patient-level predictions.
  • Extend OMOP for public health use cases and support cohort-level predictions using MLlib (Spark's machine learning library).


Features (Expected)

ETL tools for open source EMRs (OpenMRS and OSCAR EMR) and Discharge Anstract Database (Canada)

Cui2Vec based concept similarity search hepahestus/vocabulary folder

Spark ML based model building with tools for deploying models on serverless framework

How to contribute and use:

Hephaestus is a work in progress. Please read for more information on joining this project.

What it does


  • Work in progress


  • Work in progress

Deployment artifacts

  • Work in progress

How to install

Work in progress

How to Use

  • Use OHDSIconceptid2cui to create the mapping table ohdsi_to_cui in the vocabulary schema for all cui2vec functions of Hephaestus.
  • WIP

Command-line options

work in progress

Command Alternate Description
--inp -i Input file in the text format with Topic

Contributors and other projects


  title={Hephaestus - Data warehouse and ETL tools for open source EMRs.},
  author={Eapen, Bell Raj and contributors},
  journal = {GitHub repository},


Parts of this material are based on the Canadian Institute for Health Information Discharge Abstract Database Research Analytic Files (sampled from fiscal years 2014-15). However the analysis, conclusions, opinions and statements expressed herein are those of the author(s) and not those of the Canadian Institute for Health Information.