Skip to content
This repository has been archived by the owner on Jan 8, 2019. It is now read-only.

Chef Bach Development Structure

ekund edited this page Aug 10, 2016 · 3 revisions

This wiki page drives future direction for how we want to structure our code base and do the development. The initial idea was posted by Clay Baenziger under Issue #89. This document is expansion of the original idea with some additional desired features.

Background

Chef-bach is Bloomberg's open source project to deploy and manage distributed clusters. The key component to chef-bach is CHEF which is an automation tool for infrastructure deployment and management. The current chef-bach code is available on GitHub. As of today the code deploys following distributed clusters:

  • Hadoop Cluster that includes
  • HDFS
  • YARN
  • HBase
  • Hive
  • Pig
  • Spark
  • ZooKeeper
  • Oozie
  • HTTPFS
  • Kafka cluster that includes
  • Zookeeper
  • Kafka broker Apart from deploying distributed clusters chef-bach also provides cookbooks and recipes to configure lower level configurations like Ubuntu OS, Networking and disk layout which in turn can be used to run any other desired distributed cluster (for Example Storm).

Current Code Structure

The current chef-bach code is divided into below structure:

  • chef-bach - Top level directory that contains everything else
  • bin
  • common ruby methods
  • cookbooks
  • bach_common - Cookbook to deploy common configuration currently deploys motd
  • bach_krb5 - A wrapper cookbook to setup/configure/deploy Kerberos to hadoop cluster
  • bach_spark - Cookbook to install and configure spark
  • bcpc-hadoop - Cookbook to configure Hadoop Cluster (HDFS, YARN, HBASE, HIVE, PIG)
  • bcpc - Cookbook to configure OS, Networking, Cobbler, HAProxy, Keepalived and other OS level components
  • bcpc_jmxtrans - A wrapper cookbook that deploys/configures jmxtrans on the cluster
  • hannibal - Cookbook to deploy Hannibal
  • kafka-bcpc - A wrapper cookbook to deploy/configure Kafka cluster
  • data_bags - Currently not in use
  • files/default - Contains gpxe rom for pxe boot of VMs
  • nodes - Currently not in use
  • site-cookbooks - Currently not in use
  • stub-environment - directory that stores cluster specific information
  • environmnets - Cluster specific environment
  • roles - Chef roles
  • cluster.txt - Information about cluster nodes and roles tied to them
  • test/integration - Contains serverspec
  • tests - Contains automated_install script that does orachestration work
  • vbox - Stores cluster VMs
  • ruby script files
  • shell scripts
  • documentation files
  • Berksfile - Maintains cookbook dependencies
  • .Kitchen - Test Kitchen integration

Challenges with Current Structure

  • Code maintenance is a challenge
  • No way to configure or test a single component
  • No coding standards
  • Unstructured code

Proposed Code structure

The new code structure breaks entire code base into four major parts:

Library Cookbook

  • Methods for generating nested XML
  • Methods for generating properties file
  • Methods for doing node search
  • Methods for calculating network configuration
  • Test-Kitchen code

Core Cookbooks

  • Depends on library cookbook
  • Recipes for component installation
  • Recipes for component upgrade
  • Basic configuration for the components
  • Chef Spec code
  • Server spec code
  • Test-Kitchen code

Implementation Cookbook

  • Depends on core cookbook
  • Recipes to overwrite core cookbook attributes
  • Calls install/upgrade recipes

Integration Cookbook

  • Integrates implementation cookbooks
  • Runs test-kitchen to setup a cluster
  • Provides orchestration layer
  • Can implement a single component or entire stack

Notes:

  • Only single library cookbook
  • Each hadoop component has a separate core cookbook
  • Each core cookbook has an implementation cookbook
  • Only single integration cookbook
  • Removal of software will be done via re-pxe process
  • We will come up with coding standards

Library Cookbook

  • Provide library for generating an XML document
  • Provide library for generating space delimited or '=' delimited properties file
  • Provide library for doing chef node search
  • Provide library for determining network information
  • Provides code for testing all the libraries in a VM using test kitchen
  • Implementation cookbook will call core cookbook and any upstream cookbook.

Core Cookbooks

  • All of the core cookbooks will support Secure/Non-Secure mode
  • HDFS - Configures Namenode (HA/Non HA), Datanode, HDFS Client
  • YARN - Configures ResourceManager and NodeManager, Yarn Client
  • HBASE- Congiures HBase master and regioservers, HBase client
  • HIVE - Configures Hive Metastore, HiveServer2 Hive Client, Tez
  • OOZIE - Configures Ooozie (HA/Non-HA), oozie client
  • ZOOKEEPER - Configures ZooKeeper servers
  • SPARK - Configures spark
  • Kafka - To deploy kafka
  • PIG - Configures PIG
  • OS - Disk Layout, Os Configuration, Network Setup, Java, MySQL
  • Openstack - To deploy cluster on OpenStack
  • COMMON - For common config across cluster.

Key Deliverable

  • Coding standards document
  • Library cookbook

Style Guide

  • All Ruby code needs to follow the Ruby Style Guide
  • To ensure compliance, make sure all code is verified by RuboCop
    • Rules that we decide to ignore should be added to a global .rubocop.yml file
  • Chef recipes must pass tests by FoodCritic
    • Rules that we decide to ignore should be added to a global .foodcritic file