Skip to content
This repository has been archived by the owner on May 15, 2019. It is now read-only.

Hybrid Hadoop

elopezsa edited this page Sep 21, 2016 · 2 revisions

On existing Hadoop installations, a different approach involves using additional virtual machines and interacting with Hadoop components (Spark, HDFS) through a gateway node. This approach is recommended for customers with a Hadoop environment hosting heterogeneous use cases, where minimal deviation from node roles is desired. The disadvantage is that virtual machines must be sized properly according to workloads.

Hybrid Hadoop

In addition to the services deployed on the existing cluster, additional Virtual Machines (VM’s) are required to host the non-Hadoop functions of the solution. The gateway service is required for some of these VM’s to allow for interaction with Spark, Hive, and HDFS.

Note: While the above condition is a recommended layout for production, pilot deployments may be chosen to combine the above roles into fewer VM’s. Each component of the Open Network Insight solution has integral interactions with Hadoop, but its non-Hadoop processing and memory requirements are separable with this approach.