Skip to content

CSD Overview

philipl edited this page Feb 27, 2014 · 10 revisions

Custom Service Descriptors

Cloudera Manager (CM) 4.5 introduced parcels - a mechanism to distribute software to a managed cluster. Parcels go only as far as to distribute software across the cluster - they do not allow the management of processes. In Cloudera Manager 5 we have introduced the the ability to add your own managed service through the use of Custom Service Descriptors (CSDs). A third party service making use of CSDs can leverage features of Cloudera Manager such as monitoring, resource management, configuration, distribution, life-cycle management, etc. This service will show up in Cloudera Manager just like any other service e.g. HDFS, HBase.

Note: This documentation assumes you have read and are familiar with basic operating principles of Cloudera Manager.

Guiding Principles

  • Can be written by non-programmers using documentation and developer tooling.
  • The service descriptor language (SDL) should be declarative and not require a specialized programming language.
  • In Cloudera Manager, a service backed by a CSD should look and feel like a first-party service. e.g. HDFS.
  • A baseline of functionality is provided to a CSDs for free. e.g. process level monitoring.
  • CSDs should work well with parcels but not require them.
  • If you have your own way of laying down bits, you can still use a CSD for configuration and process life-cycle management.

What exactly is a CSD?

A CSD is linked to one service type in Cloudera Manager and is packaged and distributed as a jar file. The jar is self-contained and encases all the description and logic needed to manage the service type in CM. For example, the Spark CSD layout is shown below:

$ jar -tf SPARK-1.0.jar 
descriptor/service.sdl
scripts/control.sh
images/icon.png

More examples including the Spark CSD are available in our git repo.

The descriptor/service.sdl is a json file that declaratively describes the service type in Cloudera Manager. CSDs have a scripts/ directory that contains binaries which control how the service is started. See The Structure of a CSD for more details.

CSDs vs. Parcels

Both CSDs and parcels are tools for extending Cloudera Manager but in different ways. Parcels aid in the distribution of software on the cluster. Since a parcel is essentially a tar ball with added metadata, when it gets distributed, the Cloudera Manager agent simply unpacks it on the host - there is no mechanism to manage/configure processes. There are valid use cases for only using parcels like distributing a library to the cluster - there is no configuration/process to manage. An example of this is the LZO plugin for Hadoop since it only needs to modify the HADOOP_CLASSPATH.

CSDs pick up where parcels leave off. Once the bits are distributed to the cluster, Cloudera Manager uses the CSD to know how to manage the deployed software - start/stop, configuration, resource management etc. A CSD is what provides the ability for a partner to have a service show up in the wizard and status pages.

The most turnkey solution for integrating with Cloudera Manager is to use a parcel to distribute the software and a CSD for management. There are cases where you might like to use an out of band software deployment system to lay down the bits on the cluster. In this case, a CSD can still be used in the absence of a parcel for management of the software.