Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
On existing Hadoop installations, a different approach involves using additional virtual machines and interacting with Hadoop components (Spark, HDFS) through a gateway node. This approach is recommended for customers with a Hadoop environment hosting heterogeneous use cases, where minimal deviation from node roles is desired. The disadvantage is that virtual machines must be sized properly according to workloads.
In addition to the services deployed on the existing cluster, additional Virtual Machines (VM’s) are required to host the non-Hadoop functions of the solution. The gateway service is required for some of these VM’s to allow for interaction with Spark, Hive, and HDFS.
Note: While the above condition is a recommended layout for production, pilot deployments may be chosen to combine the above roles into fewer VM’s. Each component of the Open Network Insight solution has integral interactions with Hadoop, but its non-Hadoop processing and memory requirements are separable with this approach.