@jwang47 jwang47 released this Jan 10, 2015 · 27513 commits to develop since this release

Assets 2

API Changes

  • API for specifying Services and MapReduce Jobs has been changed to use a "configurer"
    style; this will require modification of user classes implementing either MapReduce
    or Service as the interfaces have changed (CDAP-335).

New Features


  • Health checks are now available for CDAP system services


  • Jar deployment now uses a chunked request and writes to a local temp file


  • MapReduce jobs can now read binary stream data


  • Added FileSet, a new core dataset type for working with sets of files


  • Spark programs now emit system and custom user metrics
  • Services can be called from Spark programs and its worker nodes
  • Spark programs can now read from Streams
  • Added Spark support to the CDAP CLI (Command-line Interface)
  • Improved speed of Spark unit tests
  • Spark Programs now display system metrics in the CDAP Console


  • Procedures have been deprecated in favor of Services


  • Added an HTTP endpoint that returns the endpoints a particular Service exposes
  • Added an HTTP endpoint that lists all Services
  • Default metrics for Services have been added to the CDAP Console
  • The annotations @QueryParam and @DefaultValue are now supported in custom Service handlers


  • System and User Metrics now support gauge metrics
  • Metrics can be queried using a Program’s run-ID


CDAP Bug Fixes

  • Fixed a problem with readless increments not being used when they were enabled in a Dataset
  • Fixed a problem with applications, whose Spark or Scala user classes were not extended
    from either JavaSparkProgram or ScalaSparkProgram, failing with a class loading error
  • Fixed a problem with the CDAP upgrade tool not preserving—for
    tables with readless increments enabled—the coprocessor configuration during an upgrade
  • Fixed a problem with the readless increment implementation dropping increment cells when
    a region flush or compaction occurred (CDAP-1062).

Known Issues

  • When running secure Hadoop clusters, metrics and debug logs from MapReduce programs are
    not available CDAP-64 and CDAP-797.

  • When upgrading a cluster from an earlier version of CDAP, warning messages may appear in
    the master log indicating that in-transit (emitted, but not yet processed) metrics
    system messages could not be decoded (Failed to decode message to MetricsRecord). This
    is because of a change in the format of emitted metrics, and can result in a small
    amount of metrics data points being lost (CDAP-745).

  • Writing to datasets through Hive is not supported in CDH4.x

  • A race condition resulting in a deadlock can occur when a TwillRunnable container
    shutdowns while it still has Zookeeper events to process. This occasionally surfaces when
    running with OpenJDK or JDK7, though not with Oracle JDK6. It is caused by a change in the
    ThreadPoolExecutor implementation between Oracle JDK6 and OpenJDK/JDK7. Until Twill is
    updated in a future version of CDAP, a work-around is to kill the errant process. The Yarn
    command to list all running applications and their app-ids is

    yarn application -list -appStates RUNNING

    The command to kill a process is

    yarn application -kill <app-id>

    All versions of CDAP running Twill version 0.4.0 with this configuration can exhibit this
    problem (TWILL-110).