Chef Bach Development Structure

This wiki page drives future direction for how we want to structure our code base and do the development. The initial idea was posted by Clay Baenziger under Issue #89. This document is expansion of the original idea with some additional desired features.

Background

Chef-bach is Bloomberg's open source project to deploy and manage distributed clusters. The key component to chef-bach is CHEF which is an automation tool for infrastructure deployment and management. The current chef-bach code is available on GitHub. As of today the code deploys following distributed clusters:

Hadoop Cluster that includes
HDFS
YARN
HBase
Hive
Pig
Spark
ZooKeeper
Oozie
HTTPFS
Kafka cluster that includes
Zookeeper
Kafka broker Apart from deploying distributed clusters chef-bach also provides cookbooks and recipes to configure lower level configurations like Ubuntu OS, Networking and disk layout which in turn can be used to run any other desired distributed cluster (for Example Storm).

Current Code Structure

The current chef-bach code is divided into below structure:

chef-bach - Top level directory that contains everything else
bin
common ruby methods
cookbooks
bach_common - Cookbook to deploy common configuration currently deploys motd
bach_krb5 - A wrapper cookbook to setup/configure/deploy Kerberos to hadoop cluster
bach_spark - Cookbook to install and configure spark
bcpc-hadoop - Cookbook to configure Hadoop Cluster (HDFS, YARN, HBASE, HIVE, PIG)
bcpc - Cookbook to configure OS, Networking, Cobbler, HAProxy, Keepalived and other OS level components
bcpc_jmxtrans - A wrapper cookbook that deploys/configures jmxtrans on the cluster
hannibal - Cookbook to deploy Hannibal
kafka-bcpc - A wrapper cookbook to deploy/configure Kafka cluster
data_bags - Currently not in use
files/default - Contains gpxe rom for pxe boot of VMs
nodes - Currently not in use
site-cookbooks - Currently not in use
stub-environment - directory that stores cluster specific information
environmnets - Cluster specific environment
roles - Chef roles
cluster.txt - Information about cluster nodes and roles tied to them
test/integration - Contains serverspec
tests - Contains automated_install script that does orachestration work
vbox - Stores cluster VMs
ruby script files
shell scripts
documentation files
Berksfile - Maintains cookbook dependencies
.Kitchen - Test Kitchen integration

Challenges with Current Structure

Code maintenance is a challenge
No way to configure or test a single component
No coding standards
Unstructured code

Proposed Code structure

The new code structure breaks entire code base into four major parts:

Library Cookbook

Methods for generating nested XML
Methods for generating properties file
Methods for doing node search
Methods for calculating network configuration
Test-Kitchen code

Core Cookbooks

Depends on library cookbook
Recipes for component installation
Recipes for component upgrade
Basic configuration for the components
Chef Spec code
Server spec code
Test-Kitchen code

Implementation Cookbook

Depends on core cookbook
Recipes to overwrite core cookbook attributes
Calls install/upgrade recipes

Integration Cookbook

Integrates implementation cookbooks
Runs test-kitchen to setup a cluster
Provides orchestration layer
Can implement a single component or entire stack

Notes:

Only single library cookbook
Each hadoop component has a separate core cookbook
Each core cookbook has an implementation cookbook
Only single integration cookbook
Removal of software will be done via re-pxe process
We will come up with coding standards

Library Cookbook

Provide library for generating an XML document
Provide library for generating space delimited or '=' delimited properties file
Provide library for doing chef node search
Provide library for determining network information
Provides code for testing all the libraries in a VM using test kitchen
Implementation cookbook will call core cookbook and any upstream cookbook.

Core Cookbooks

All of the core cookbooks will support Secure/Non-Secure mode
HDFS - Configures Namenode (HA/Non HA), Datanode, HDFS Client
YARN - Configures ResourceManager and NodeManager, Yarn Client
HBASE- Congiures HBase master and regioservers, HBase client
HIVE - Configures Hive Metastore, HiveServer2 Hive Client, Tez
OOZIE - Configures Ooozie (HA/Non-HA), oozie client
ZOOKEEPER - Configures ZooKeeper servers
SPARK - Configures spark
Kafka - To deploy kafka
PIG - Configures PIG
OS - Disk Layout, Os Configuration, Network Setup, Java, MySQL
Openstack - To deploy cluster on OpenStack
COMMON - For common config across cluster.

Key Deliverable

Coding standards document
Library cookbook

Style Guide

All Ruby code needs to follow the Ruby Style Guide
To ensure compliance, make sure all code is verified by RuboCop
- Rules that we decide to ignore should be added to a global .rubocop.yml file
Chef recipes must pass tests by FoodCritic
- Rules that we decide to ignore should be added to a global .foodcritic file

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chef Bach Development Structure

Background

Current Code Structure

Challenges with Current Structure

Proposed Code structure

Library Cookbook

Core Cookbooks

Implementation Cookbook

Integration Cookbook

Notes:

Library Cookbook

Core Cookbooks

Key Deliverable

Style Guide

Clone this wiki locally