@gsps1 gsps1 released this Aug 4, 2015 · 16786 commits to develop since this release

Assets 2

New Features

MapR 4.1 Support, HDP 2.2 Support, CDH 5.4 Support

  • CDAP-1614 -Added HBase 1.0 support.
  • CDAP-2318 -Made CDAP work on the HDP 2.2 distribution.
  • CDAP-2786 -Added support to CDAP 3.1.0 for the MapR 4.1 distro.
  • CDAP-2798 -Added Hive 0.14 support.
  • CDAP-2801 -Added CDH 5.4 Hive 1.1 support.
  • CDAP-2836 -Added support for restart of specific CDAP System
    Services Instances.
  • CDAP-2853 -Completed certification process for MapR on CDAP.
  • CDAP-2879 -Added Hive 1.0 in Standalone.
  • CDAP-2881 -Added support for HDP 2.2.x.
  • CDAP-2891 -Documented cdap-env.sh and settings OPTS for HDP 2.2.
  • CDAP-2898 -Added Hive 1.1 in Standalone.
  • CDAP-2953 -Added HiveServer2 support in a secure cluster.


  • CDAP-344 -Users can now run Spark in distributed mode.
  • CDAP-1993 -Added ability to manipulate the SparkConf.
  • CDAP-2700 -Added the ability to Spark programs of discovering CDAP
    services in distributed mode.
  • CDAP-2701 -Spark programs are able to collect Metrics in
    distributed mode.
  • CDAP-2703 -Users are able to collect/view logs from Spark programs
    in distributed mode.
  • CDAP-2705 -Added examples, guides and documentation for Spark in
    distributed mode. LogAnalysis application demonstrating parallel
    execution of the Spark and MapReduce programs using Workflows.
  • CDAP-2923 -Added support for the WorkflowToken in the
    Spark programs.
  • CDAP-2936 -Spark program can now specify resources usage for
    driver and executor process in distributed mode.


  • CDAP-1983 -Added example application for processing and analyzing
    Wikipedia data using Workflows.
  • CDAP-2709 -Added ability to add generic keys to the WorkflowToken.
  • CDAP-2712 -Added ability to update the WorkflowToken in MapReduce
    and Spark programs.
  • CDAP-2713 -Added ability to persist the WorkflowToken per run of
    the Workflow.
  • CDAP-2714 -Added ability to query the WorkflowToken for the past
    as well as currently running Workflow runs.
  • CDAP-2752 -Added ability for custom actions to access the CDAP
    datasets and services.
  • CDAP-2894 -Added an API to retreive the system properties (e.g.
    MapReduce counters in case of MapReduce program) from
    the WorkflowToken.
  • CDAP-2923 -Added support for the WorkflowToken in the
    Spark programs.
  • CDAP-2982 -Added verification that the Workflow contains all
    programs/custom actions with a unique name.


  • CDAP-347 -User can use datasets in beforeSubmit and afterFinish.
  • CDAP-585 -Changes to Spark program runner to use File dataset
    in Spark. Spark programs can now use file-based datasets.
  • CDAP-2734 -Added PartitionedFileSet support to setting/getting
    properties at the Partition level.
  • CDAP-2746 -PartitionedFileSets now record the creation time of
    each partition in the metadata.
  • CDAP-2747 -PartitionedFileSets now index the creation time of
    partitions to allow selection of partitions that were created after
    a given time. Introduced BatchPartitionConsumer as a way to
    incrementally consume new data in a PartitionedFileSet.
  • CDAP-2752 -Added ability for custom actions to access the CDAP
    datasets and services.
  • CDAP-2758 -FileSet now support existing HDFS locations.

Treat base paths that start with “/” as absolute in the file system.
An absolute base path for a (Partitioned)FileSet was interpreted as
relative to the namespace’s data directory. Newly created FileSets
interpret absolute base paths as absolute in the file system.

Introduced a new property for (Partitioned)FileSets name
“data.external”. If true, the base path of the FileSet is assumed to
be managed by some external process. That is, the FileSet will not
attempt to create the directory, it will not delete any files when
the FileSet is dropped or truncated, and it will not allow adding or
deleting files or partitions. In other words, the FileSet
is read-only.

  • CDAP-2784 -Added support to write to PartitionedFileSet Partition
    metadata from MapReduce.
  • CDAP-2822 -IndexedTable now supports scans on the indexed field.


  • CDAP-2975 -Added pre-split FactTables.
  • CDAP-2326 -Added better unit-test coverage for Cube dataset.
  • CDAP-1853 -Metrics processor scaling no longer needs a master
    services restart.
  • CDAP-2844 -MapReduce metrics collection no longer use counters,
    and instead report directly to Kafka.
  • CDAP-2701 -Spark programs are able to collect Metrics in
    distributed mode.
  • CDAP-2466 -Added CLI for metrics search and query.
  • CDAP-2236 -New CDAP UI switched over to using newer
    search/query APIs.
  • CDAP-1998 -Removed deprecated Context - Query param in Metrics
    v3 API.

Miscellaneous New Features

  • CDAP-332 -Added a Restful end-point for deleting Streams.
  • CDAP-1483 -QueueAdmin now uses Id.Namespace instead of
    simply String.
  • CDAP-1584 -CDAP CLI now shows the username in the CLI prompt.
  • CDAP-2139 -Removed a duplicate Table of Contents on the
    Documentation Search page.
  • CDAP-2515 -Added a metrics client for search and query by tags.
  • CDAP-2582 -Documented the licenses of the shipped
    CDAP-UI components.
  • CDAP-2595 -Added data modelling of flows.
  • CDAP-2596 -Added data modelling of MapReduce.
  • CDAP-2617 -Added the capability to get logs for a given time range
    from CLI.
  • CDAP-2618 -Simplified the Cube sink configurations.
  • CDAP-2670 -Added Parquet sink with time partitioned file dataset.
  • CDAP-2739 -Added S3 batch source for ETLbatch.
  • CDAP-2802 -Stopped using HiveConf.ConfVars.defaultValue, to
    support Hive >0.13.
  • CDAP-2847 -Added ability to add custom filters to FileBatchSource.
  • CDAP-2893 -Custom Transform now parses log formats for ETL.
  • CDAP-2913 -Provided installation method for EMR.
  • CDAP-2915 -Added an SQS realtime plugin for ETL.
  • CDAP-3022 -Added Cloudfront format option to LogParserTransform.
  • CDAP-3032 -Documented TestConfiguration class usage in
    unit-test framework.