hl7Spark

Analyze HL7 data using Spark Streaming

This is an implementation of analyzing HL7 data using Spark. The architecture is as follows:

HL7 messages simulator -> TCP Port/Remote Disk -> Python Parser (converts to JSON) -> Spark Streaming -> HBase (persistant Storage)

In order to generate sample data onto local disk, run this file:

    ./HL7DataParser/simulator-hl7/run.sh

Once you have the data on the local disk, you can use this python program to parse the individual messages:

    python HL7DataParser/hl72JSON.py

Spark Streaming reads the data from a kafka topic. It takes care of converting the JSON payload into 2 HBase tables. You will need to integrate the python program to write to a Kafka topic.

The Hbase tables are with the following schema: HBase Tables:

            Patient Contact Details: 
                 - Patient ID
                 - Patient Name
                 - Address 


            HL7 Events Table 
                 - Patient ID 
                 - Segment_
                 - Value
                 - Date

You will need to create those tables in HBase. Following are the DDL's I used using Phoenix ->

            DROP TABLE IF EXISTS DETAILS.PATIENT;
            create table DETAILS.PATIENT (
                    id VARCHAR not null,
                    “cf".name VARCHAR,
                    “cf”.address VARCHAR,
                    CONSTRAINT PK PRIMARY KEY(id));


            DROP TABLE IF EXISTS DETAILS.HL7MESSAGES;
            create table DETAILS.HL7MESSAGES (
                    pid VARCHAR not null,
                    segment VARCHAR,
                    value VARCHAR,
                    date DATE,
                    CONSTRAINT PK PRIMARY KEY(pid, date, segment));


            DROP TABLE IF EXISTS DETAILS.HL7MESSAGES;
            create table HL7MESSAGES (
                 pid VARCHAR not null,
                 segmentsid VARCHAR not null,
                 segmentsseq smallint not null,
                 segmentsfieldsid VARCHAR not null,
                 datevalue DATE not null,
                 segmentsvalue VARCHAR,
                 segmentsfieldsvalue VARCHAR, 
                 CONSTRAINT PK PRIMARY KEY(pid, segmentsid, segmentsseq, segmentsfieldsid, datevalue));

Create a kafka topic:

    kafka-topics.sh --zookeeper localhost:2181   --alter --topic hl7-messages --config retention.ms=1000

Before you start the Spark streaming program, you'll need to makes sure to have all the dependencies and hbase jars in the classpath. Like this:

    spark-submit  --class poc.SparkStreamingTest2 —conf "spark.executor.extraJavaOptions=-Dsun.io.serialization.extendedDebugInfo=true" —master yarn-cluster --files=/usr/hdp/current/hive/conf/hive-site.xml,/usr/hdp/current/hbase/conf/hbase-site.xml, --jars /usr/hdp/current/hbase/lib/hbase-server.jar,/usr/hdp/current/hbase/lib/hbase-client-1.1.2 .current.jar,/usr/hdp/current/hbase/lib/hbase-common-1.1.2.current.jar,/usr/hdp/current/hbase/lib/hbase-protocol-1.1.2.current.jar,/usr/hdp/current/phoenix/lib/phoenix-core-4.4.0.current.jar,/usr/hdp/current/phoenix/lib/phoenix-spark-4.4.0.current.jar,/usr/hdp/current/spark/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark/lib/datanucleus-core-3.2.10.jar,/usr/hdp/current/spark/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/hive/lib/hive-exec-1.2.1000.current.jar /hl7Spark/Spark-POC-0.0.2.jar hdfs:///user/spark/sparkJobJsonInput localhost:2181:/hbase-unsecure 15

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
HL7DataParser		HL7DataParser
machine-learning		machine-learning
src		src
.DS_Store		.DS_Store
.cache-main		.cache-main
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SparkStreamingTest2.scala		SparkStreamingTest2.scala
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hl7Spark

About

Releases

Packages

Languages

License

ari-vedant-jain/hl7Spark

Folders and files

Latest commit

History

Repository files navigation

hl7Spark

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages