Cask Data Application Platform v3.1.0
New Features
MapR 4.1 Support, HDP 2.2 Support, CDH 5.4 Support
- CDAP-1614 -Added HBase 1.0 support.
- CDAP-2318 -Made CDAP work on the HDP 2.2 distribution.
- CDAP-2786 -Added support to CDAP 3.1.0 for the MapR 4.1 distro.
- CDAP-2798 -Added Hive 0.14 support.
- CDAP-2801 -Added CDH 5.4 Hive 1.1 support.
- CDAP-2836 -Added support for restart of specific CDAP System
Services Instances. - CDAP-2853 -Completed certification process for MapR on CDAP.
- CDAP-2879 -Added Hive 1.0 in Standalone.
- CDAP-2881 -Added support for HDP 2.2.x.
- CDAP-2891 -Documented cdap-env.sh and settings OPTS for HDP 2.2.
- CDAP-2898 -Added Hive 1.1 in Standalone.
- CDAP-2953 -Added HiveServer2 support in a secure cluster.
Spark
- CDAP-344 -Users can now run Spark in distributed mode.
- CDAP-1993 -Added ability to manipulate the SparkConf.
- CDAP-2700 -Added the ability to Spark programs of discovering CDAP
services in distributed mode. - CDAP-2701 -Spark programs are able to collect Metrics in
distributed mode. - CDAP-2703 -Users are able to collect/view logs from Spark programs
in distributed mode. - CDAP-2705 -Added examples, guides and documentation for Spark in
distributed mode. LogAnalysis application demonstrating parallel
execution of the Spark and MapReduce programs using Workflows. - CDAP-2923 -Added support for the WorkflowToken in the
Spark programs. - CDAP-2936 -Spark program can now specify resources usage for
driver and executor process in distributed mode.
Workflows
- CDAP-1983 -Added example application for processing and analyzing
Wikipedia data using Workflows. - CDAP-2709 -Added ability to add generic keys to the WorkflowToken.
- CDAP-2712 -Added ability to update the WorkflowToken in MapReduce
and Spark programs. - CDAP-2713 -Added ability to persist the WorkflowToken per run of
the Workflow. - CDAP-2714 -Added ability to query the WorkflowToken for the past
as well as currently running Workflow runs. - CDAP-2752 -Added ability for custom actions to access the CDAP
datasets and services. - CDAP-2894 -Added an API to retreive the system properties (e.g.
MapReduce counters in case of MapReduce program) from
the WorkflowToken. - CDAP-2923 -Added support for the WorkflowToken in the
Spark programs. - CDAP-2982 -Added verification that the Workflow contains all
programs/custom actions with a unique name.
Datasets
- CDAP-347 -User can use datasets in beforeSubmit and afterFinish.
- CDAP-585 -Changes to Spark program runner to use File dataset
in Spark. Spark programs can now use file-based datasets. - CDAP-2734 -Added PartitionedFileSet support to setting/getting
properties at the Partition level. - CDAP-2746 -PartitionedFileSets now record the creation time of
each partition in the metadata. - CDAP-2747 -PartitionedFileSets now index the creation time of
partitions to allow selection of partitions that were created after
a given time. Introduced BatchPartitionConsumer as a way to
incrementally consume new data in a PartitionedFileSet. - CDAP-2752 -Added ability for custom actions to access the CDAP
datasets and services. - CDAP-2758 -FileSet now support existing HDFS locations.
Treat base paths that start with “/” as absolute in the file system.
An absolute base path for a (Partitioned)FileSet was interpreted as
relative to the namespace’s data directory. Newly created FileSets
interpret absolute base paths as absolute in the file system.
Introduced a new property for (Partitioned)FileSets name
“data.external”. If true, the base path of the FileSet is assumed to
be managed by some external process. That is, the FileSet will not
attempt to create the directory, it will not delete any files when
the FileSet is dropped or truncated, and it will not allow adding or
deleting files or partitions. In other words, the FileSet
is read-only.
- CDAP-2784 -Added support to write to PartitionedFileSet Partition
metadata from MapReduce. - CDAP-2822 -IndexedTable now supports scans on the indexed field.
Metrics
- CDAP-2975 -Added pre-split FactTables.
- CDAP-2326 -Added better unit-test coverage for Cube dataset.
- CDAP-1853 -Metrics processor scaling no longer needs a master
services restart. - CDAP-2844 -MapReduce metrics collection no longer use counters,
and instead report directly to Kafka. - CDAP-2701 -Spark programs are able to collect Metrics in
distributed mode. - CDAP-2466 -Added CLI for metrics search and query.
- CDAP-2236 -New CDAP UI switched over to using newer
search/query APIs. - CDAP-1998 -Removed deprecated Context - Query param in Metrics
v3 API.
Miscellaneous New Features
- CDAP-332 -Added a Restful end-point for deleting Streams.
- CDAP-1483 -QueueAdmin now uses Id.Namespace instead of
simply String. - CDAP-1584 -CDAP CLI now shows the username in the CLI prompt.
- CDAP-2139 -Removed a duplicate Table of Contents on the
Documentation Search page. - CDAP-2515 -Added a metrics client for search and query by tags.
- CDAP-2582 -Documented the licenses of the shipped
CDAP-UI components. - CDAP-2595 -Added data modelling of flows.
- CDAP-2596 -Added data modelling of MapReduce.
- CDAP-2617 -Added the capability to get logs for a given time range
from CLI. - CDAP-2618 -Simplified the Cube sink configurations.
- CDAP-2670 -Added Parquet sink with time partitioned file dataset.
- CDAP-2739 -Added S3 batch source for ETLbatch.
- CDAP-2802 -Stopped using HiveConf.ConfVars.defaultValue, to
support Hive >0.13. - CDAP-2847 -Added ability to add custom filters to FileBatchSource.
- CDAP-2893 -Custom Transform now parses log formats for ETL.
- CDAP-2913 -Provided installation method for EMR.
- CDAP-2915 -Added an SQS realtime plugin for ETL.
- CDAP-3022 -Added Cloudfront format option to LogParserTransform.
- CDAP-3032 -Documented TestConfiguration class usage in
unit-test framework.