No description, website, or topics provided.
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.
conf Update bigtable extension. Nov 19, 2015
docs Edits to bdutil docs/ page. (#101) May 23, 2017
extensions Flink: update to latest stable version Jan 28, 2016
platforms Unpin Image URIs by supporting Debian 8 and using --image-family and … Dec 6, 2016
sampleapps/querytools Initial bdutil push Jan 10, 2015
samples Remove the fully-deprecated datastore options and samples, update big… May 27, 2015
.dockerignore Add Docker support for bdutil Sep 11, 2015
CONTRIBUTING Initial bdutil push Jan 10, 2015
LICENSE Initial bdutil push Jan 10, 2015
bdutil Fix #93 to also create master with configured image. (#94) Dec 21, 2016 Autogenerated GHFS & BQ release 20171121. Nov 23, 2017 Initial bdutil push Jan 10, 2015 Autogenerated GHFS & BQ release 20171121. Nov 23, 2017 Add plugins/vars for multi-cluster GCS consistency Aug 12, 2015

This project has been deprecated. Please use Google Cloud Dataproc to create managed Apache Hadoop and Apache Spark instances on Google Compute Engine.


bdutil is a command-line script used to manage Apache Hadoop and Apache Spark instances on Google Compute Engine. bdutil manages deployment, configuration, and shutdown of your Hadoop instances.


bdutil depends on the Google Cloud SDK. bdutil is supported in any posix-compliant Bash v3 or greater shell.


See the QUICKSTART file in the docs directory to learn how to set up your Hadoop instances using bdutil.

  1. Install and configure the Google Cloud SDK if you have already not done so
  2. Clone this repository with git clone
  3. Modify the following variables in the file:
  4. PROJECT - Set to the project ID for all bdutil commands. The project value will be overridden in the following order (where 1 overrides 2, and 2 overrides 3): * -p flag value, or if not specified then * PROJECT value in, or if not specified then * gcloud default project value
  5. CONFIGBUCKET - Set to a Google Compute Storage bucket that your project has read/write access to.
  6. Run bdutil --help for a list of commands.

The script implements the following commands, which are very similar:

  • bdutil create creates and starts instances, but will not apply most configuration settings. You can call bdutil run_command_steps on instances afterward to apply configuration settings to them. Typically you wouldn't use this, but would use bdutil deploy instead.
  • bdutil deploy creates and starts instances with all the configuration options specified in the command line and any included configuration scripts.

Components installed

The latest release of bdutil is 1.3.5. This bdutil release installs the following versions of open source components:

  • Apache Hadoop - 1.2.1 (2.7.1 if you use the -e argument)
  • Apache Spark - 1.5.0
  • Apache Pig - 0.12
  • Apache Hive - 1.2.1


The following documentation is useful for bdutil.

  • Quickstart - A guide on how to get started with bdutil quickly.
  • Jobs - How to submit jobs (work) to a bdutil cluster.
  • Monitoring - How to monitor bdutil cluster.
  • Shutdown - How shutdown a bdutil cluster.