4 GB RAM 20 GB HDD UBUNTU 14.04 LTS
Java 1.8 Hadoop 2.6.0 Pig 0.15.0 Hive 1.2.1 Hbase 0.98.4 hadoop2 MySQL Database SQOOP 1.4.6
STEP 1: Install the compatible version of Hbase (0.98.4-hadoop2) on machine & update the location of bin path of the same in the .bashrc file.
(a) /home/user/INSTALL/hadoop-2.6.0/lib (b) /home/user/INSTALL/hadoop-2.6.0/share/hadoop/common
- commons-logging-1.1.1
- fontbox-1.8.10
- hadoop-common-2.6.0
- hadoop-mapreduce-client-core-2.6.0
- hadoop-mapreduce-examples-2.6.0
- jempbox-1.8.10
- pdfbox-1.8.10
- pdfbox-app-1.8.10
- preflight-1.8.10
- preflight-app-1.8.10
- xmpbox-1.8.10
/home/user and place the following files,
- connectiondetails.txt
- healthcareprocessing.sh
- copyToHdfs.sh
- mapreduce.sh
- healthscript.pig
- pig.sh
- hive.hql
- hive.sh
- hbase.sh
- hbase2.sh
- mysql
- mysql.sql
- mysql1.sh
- sqoop.sh
- INPUTDATA.pdf
- parameter.properties
- HealthCare.jar(runnable jar)
sh healthcareprocessing.sh
--Load unprocessed data(PDF) from LFS to HDFS
--To process Input data(unprocessed data) use Hadoop components like MAPREDUCE.
--Filter MR-Output in PIG Mapper output is loaded into PIG. Filtering the duplicates by using DISTINCT. Grouping the data based on HospitalName. Take top 50 Records from Hospital whose age is between 20, 55 Sort the Unique data based on PatientID.
--Processed data will be stored in MySQL and Hbase,Hive Tables. MYSQL table is created. Exported the Pig OutPut data to MYSQL with the help of SQOOP component Create a HIVE external table. Load PIG output data into it for Adhoc Query Processing.