Hadoop Use Case on Healthcare

HEALTH CARE PROCESSING

Deployment Guide

System Requirements :

4 GB RAM 20 GB HDD UBUNTU 14.04 LTS

S/w Requirements :

Java 1.8 Hadoop 2.6.0 Pig 0.15.0 Hive 1.2.1 Hbase 0.98.4 hadoop2 MySQL Database SQOOP 1.4.6

Architecture

The Deployment of the Project involves the following steps :

STEP 1: Install the compatible version of Hbase (0.98.4-hadoop2) on machine & update the location of bin path of the same in the .bashrc file.

STEP 2 : Add the following list of jars in the following directory path :

(a) /home/user/INSTALL/hadoop-2.6.0/lib (b) /home/user/INSTALL/hadoop-2.6.0/share/hadoop/common

commons-logging-1.1.1
fontbox-1.8.10
hadoop-common-2.6.0
hadoop-mapreduce-client-core-2.6.0
hadoop-mapreduce-examples-2.6.0
jempbox-1.8.10
pdfbox-1.8.10
pdfbox-app-1.8.10
preflight-1.8.10
preflight-app-1.8.10
xmpbox-1.8.10

STEP 3: Create a directory with name ‘project_media’ in the following path :

/home/user and place the following files,

connectiondetails.txt
healthcareprocessing.sh
copyToHdfs.sh
mapreduce.sh
healthscript.pig
pig.sh
hive.hql
hive.sh
hbase.sh
hbase2.sh
mysql
mysql.sql
mysql1.sh
sqoop.sh
INPUTDATA.pdf
parameter.properties
HealthCare.jar(runnable jar)

STEP 4 : Execute the HealthCareProjectExecution.sh shellscript :

       sh healthcareprocessing.sh

--Load unprocessed data(PDF) from LFS to HDFS

--To process Input data(unprocessed data) use Hadoop components like MAPREDUCE.

--Filter MR-Output in PIG Mapper output is loaded into PIG. Filtering the duplicates by using DISTINCT. Grouping the data based on HospitalName. Take top 50 Records from Hospital whose age is between 20, 55 Sort the Unique data based on PatientID.

--Processed data will be stored in MySQL and Hbase,Hive Tables. MYSQL table is created. Exported the Pig OutPut data to MYSQL with the help of SQOOP component Create a HIVE external table. Load PIG output data into it for Adhoc Query Processing.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
MapReduce_Java_Code/health/care		MapReduce_Java_Code/health/care
HealthCare.jar		HealthCare.jar
Input_File.pdf		Input_File.pdf
README.md		README.md
connectiondetails.txt		connectiondetails.txt
copyToHdfs.sh		copyToHdfs.sh
hbase.pig~		hbase.pig~
hbase.sh		hbase.sh
hbase2.sh		hbase2.sh
healthcareprocessing.sh		healthcareprocessing.sh
healthscript.pig		healthscript.pig
hive.hql		hive.hql
hive.sh		hive.sh
mapreduce.sh		mapreduce.sh
mysql.sh		mysql.sh
mysql.sql		mysql.sql
mysql1.sh		mysql1.sh
parameter		parameter
pig.sh		pig.sh
sqoop.sh		sqoop.sh

RaghuKantamsetti/Hadoop-Use-Case-on-Healthcare

Folders and files

Latest commit

History

Repository files navigation

Hadoop Use Case on Healthcare

HEALTH CARE PROCESSING

Deployment Guide

System Requirements :

S/w Requirements :

Architecture

The Deployment of the Project involves the following steps :

STEP 1: Install the compatible version of Hbase (0.98.4-hadoop2) on machine & update the location of bin path of the same in the .bashrc file.

STEP 2 : Add the following list of jars in the following directory path :

STEP 3: Create a directory with name ‘project_media’ in the following path :

STEP 4 : Execute the HealthCareProjectExecution.sh shellscript :

sample output

About

Topics

Resources

Stars

Watchers

Forks

Languages