@sreevatsanraman sreevatsanraman released this Dec 21, 2016 · 292 commits to release/4.0 since this release

Assets 2

New Features

  • Adds a transactional messaging system that is used for reliable communication of messages between components. In CDAP 4.0.0, the transactional messaging system replaces Kafka for publishing and subscribing audit logs that is used within CDAP for computing data lineage. (CDAP-7211)
  • Added a pluggable extension to retrieve operational statistics in CDAP. Provided extensions for operational stats from YARN, HDFS, HBase, and CDAP. (CDAP-7670) (CDAP-7703) (CDAP-7704)
  • Allow updating or resetting of log levels for program types worker, flow, and service dynamically using REST endpoints (CDAP-5479) (CDAP-7214)

Improvements

  • New menu option in Cloudera Manager when running the CDAP CSD enables running utilities such as the HBaseQueueDebugger. (CDAP-5632)
  • Added support for impersonation with CDAP Explore (Hive) operations, including enabling exploring of a dataset or running queries against it. (CDAP-6587)
  • Added support for enabling client certificate-based authentication to the CDAP Authentication server. (CDAP-7287)
  • Merged various shell scripts into a single script to interface with CDAP, called cdap, shipped with both the SDK and Distributed CDAP.(CDAP-1280)
  • Updated the default CDAP Router port to 11015 to avoid conflicting with HiveServer2's default port.(CDAP-1696)
  • Fixed an issue with the CDAP scripts under Windows not handling a JAVA_HOME path with spaces in it correctly. CDAP SDK home directories with spaces in the path are not supported (due to issues with the product) and the scripts now exit if such a path is detected.(CDAP-3262)
  • For MapReduce programs using a PartitionedFileSet as input, the partition key corresponding to the input split is now exposed to the mapper.(CDAP-4322)
  • Fixed an issue where an exception from an HttpContentConsumer was being silently ignored.(CDAP-4901)
  • Added pagination for the search RESTful API. Pagination is achieved via {{offset}}, {{limit}}`, {{numCursors}}, and {{cursor}} parameters in the RESTful API.(CDAP-5068)
  • Added the property program.container.dist.jars to set extra jars to be localized to every program container and to be added to classpaths of CDAP programs.(CDAP-6183)
  • Fixed an issue that allowed a FileSet to be created if its corresponding directory already existed.(CDAP-6425)
  • The namespace that integration test cases run against by default has been made configurable.(CDAP-6572)
  • Added a feature that implements caching of user credentials in CDAP system services.(CDAP-6635)
  • Fixed an issue in WorkerContext that did not properly implement the contract of the Transactional interface. Note that this fix may cause incompatibilities with previous releases in certain cases. See API Changes, CDAP-6837 for more details.(CDAP-6837)
  • Updated more system services to respect the cdap-site parameter "master.service.memory.mb".(CDAP-6862)
  • Added support for concurrent runs of a Spark program.(CDAP-6885)
  • Added support for running CDAP on Apache HBase 1.2.(CDAP-6937)
  • Added support for Amazon EMR 4.6.0+ installation of CDAP via a bootstrap action script.(CDAP-6938)
  • Added support for enabling SSL between the CDAP Router and CDAP Master.(CDAP-6984)
  • Adding the capability to clean up log files which do not have corresponding metadata.(CDAP-6995)
  • Added support for checkpointing in Spark Streaming programs to persist checkpoints transactionally.(CDAP-7117)
  • Updated the Windows start scripts to match the new shell script functionality.(CDAP-7181)
  • Added the ability to specify an announce address and port for the CDAP AppFabric and Dataset services. Deprecated the properties app.bind.address and dataset.service.bind.address, replacing them with master.services.bind.address as the bind address for master services. Added the properties master.services.announce.address, app.announce.port, and dataset.service.announce.port for use as announce addresses that are different from the bind address.(CDAP-7192)
  • Improved CDAP Master logging of events related to programs that it launches.(CDAP-7208)
  • Fixed a NullPointerException being logged on closing network connection.(CDAP-7240)
  • Upgraded the Apache Tephra version to 0.10-incubating.(CDAP-7284)
  • Added support for CDH 5.9.(CDAP-7291)
  • Provided programs more control over when and how transactions are executed.(CDAP-7319)
  • The Log HTTP Handler and Router have been fixed to allow the streaming of larger logs files.(CDAP-7385)
  • Revised the documentation on the recommended setting for yarn.nodemanager.delete.debug-delay-sec.(CDAP-7393)
  • Removed the requirement in the documentation of running kinit prior to running the CDAP Upgrade Tool when upgrading a package installation of CDAP on a secure Hadoop cluster.(CDAP-7439)
  • Improves how MapReduce configures its inputs, such that failures surface immediately.(CDAP-7476)
  • Fixed an issue in MapReduce that caused skipping the destroy() method if the committing of any of the dataset outputs failed.(CDAP-7477)
  • DynamicPartitioner can now limit the number of open RecordWriters to one, if the output partition keys are grouped.(CDAP-7557)
  • Added support for specifying the Hive execution engine at runtime (dynamically).(CDAP-7659)
  • Adds the cluster.name property that identifies a cluster; this property can be set in the cdap-site.xml file.(CDAP-7761)
  • Added a step in the CDAP Upgrade Tool to upgrade the specification of the MetadataDataset.(CDAP-7797)

Bug Fixes

  • A MapReduce job using either a FileSet or PartitionedFileSet as input no longer fails if there are no input partitions.(CDAP-2945)
  • The Authentication server announce address is now configurable.(CDAP-4535)
  • Fixed a problem with downloading of large (multiple gigabyte) CDAP Explore queries.(CDAP-5012)
  • Fixed an issue where the metadata of streams was not being updated when the stream's schema was altered.(CDAP-5061)
  • Fixed an issue where a warning was logged instead of an error when a MapReduce job failed in the CDAP SDK.(CDAP-5372)
  • Updated the default CDAP UI port to 11011 to avoid conflicting with Accumulo and Cloudera Manager's Activity Monitor.(CDAP-5897)
  • Authentication handler APIs have been updated to restrict which cdap-site.xml and cdap-security.xml properties are available to it.(CDAP-6398)
  • Fixed an issue with searching for an entity in Cask Tracker by metadata after a tag with the same prefix has been removed.(CDAP-6404)
  • Fixed an issue with misleading log messages from the RunRecord corrector.(CDAP-7031)
  • Fixed an issue so as to significantly reduce the chance of a schedule misfire in the case where the CPU cannot trigger a schedule within a certain time threshold.(CDAP-7116)
  • Fixed a problem with duplicate logs showing for a running program.(CDAP-7138)
  • On an incorrect ZooKeeper quorum configuration, the CDAP Upgrade Tool and other services such as Master, Router, and Kafka will timeout with an error instead of hanging indefinitely.(CDAP-7154)
  • Fixed an issue in the CDAP Upgrade Tool to allow it to run on a CDAP instance with authorization enabled.(CDAP-7175)
  • Fixed an issue where macros were not being substituted for postaction plugins.(CDAP-7177)
  • Lineage information is now returned for deleted datasets.(CDAP-7204)
  • Fixed an issue with the FileBatchSource not working with Azure Blob Storage.(CDAP-7248)
  • Fixed an issue with CDAP Explore using Tez on Azure HDInsight.(CDAP-7249)
  • Fixed an issue where dataset usage was not being recorded after an application was deleted.(CDAP-7250)
  • Fixed an issue with the leaking of Hive classes to programs in the CDAP SDK.(CDAP-7256)
  • Added a warning when a PartitionFilter addresses a non-existent field.(CDAP-7259)
  • Fixed an issue that prevented launching of MapReduce jobs on a Hadoop-2.7 cluster.(CDAP-7285)
  • Fixed an issue in the KMeans example that caused it to calculate the wrong cluster centroids.(CDAP-7292)
  • Fixed an issue with the documentation example links to the CDAP ETL Guide.(CDAP-7314)
  • Fixed a misleading error message that occurred when the updating of a CDAP Explore table for a dataset failed.(CDAP-7317)
  • Fixed an issue that would cause MapReduce and Spark programs to fail if too many macros were being used.(CDAP-7318)
  • Fixed an issue with upgrading CDAP using the CDAP Upgrade Tool.(CDAP-7321)
  • Fixed an issue with the CDAP Upgrade Tool while upgrading HBase coprocessors.(CDAP-7324)
  • Fixed an issue with log file corruption if the log saver container crashed due to being killed by YARN.(CDAP-7361)
  • Fixed an issue with Hydrator Studio in the Windows version of Chrome that prevented users from opening and editing a node configuration.(CDAP-7374)
  • Fixed an issue that prevented impersonation in flows from working correctly, by not re-using HBaseAdmin across different UGI.(CDAP-7394)
  • Fixes an issue where the partitions of a PartitionedFileSet were not cleaned up properly after a transaction failure.(CDAP-7417)
  • Fixed an issue preventing having CustomAction and Spark as inner classes.(CDAP-7428)
  • CDAP Ambari Service's required version of Ambari Server was increased to 2.2 to support the empty-value-valid configuration attribute.(CDAP-7442)
  • Fix the logback-container.xml to work on clusters with multiple log directories configured for YARN.(CDAP-7473)
  • Fixed an issue in CDAP logging that caused system logs from Kafka to not be saved after an upgrade and for previously-saved logs to become inaccessible.(CDAP-7482)
  • Fixes an issue where a MapReduce using DynamicPartitioner would leave behind output files if it failed.(CDAP-7483)
  • Fixed an issue where a MapReduce classloader gets closed prematurely.(CDAP-7500)
  • Fixed an issue preventing proper class loading isolation for explicit transactions executed by programs.(CDAP-7514)
  • Improved the documentation for read-less increments.(CDAP-7522)
  • Adds a missing @override annotation for the WorkerContext.execute() method.(CDAP-7524)
  • Fixed an issue that prevented the using of the logback.xml from an application JAR.(CDAP-7527)
  • Fixed an issue in integration tests to allow JDBC connections against authorization-enabled and SSL-enabled CDAP instances.(CDAP-7548)
  • Improved the usability of ServiceManager in integration tests. The getServiceURL() method now waits for the service to be discoverable before returning the service's URL.(CDAP-7566)
  • Fixed an issue where Spark programs could not be started after a master failover or restart.(CDAP-7612)
  • Fixed an issue where readless increments from different MapReduce tasks cancelled each other out.(CDAP-7624)
  • Added additional tests for read-less increments in HBase.(CDAP-7629)
  • Added support for Amazon EMR 4.6.0.([CDAP-7648, CDAP-7663](https://issues.cask.co/browse/CDAP-7648, CDAP-7663))
  • Startup checks now validate the HBase version and error out if the HBase version is not supported.(CDAP-7652)
  • The CDAP Ambari service was updated to use scripts for Auth Server/Router alerts in Ambari due to Ambari not supporting CDAP's /status endpoint with WEB check.(CDAP-7660)
  • CDAP Quick Links in the CDAP Ambari Service now correctly link to the CDAP UI.(CDAP-7664)
  • Fixed the YARN startup check to fail instead of warning if the cluster does not have enough capacity to run CDAP services.(CDAP-7666)
  • Fixed an issue in the CDAP Sentry Extension by which privileges were not being deleted when the CDAP entity was deleted.(CDAP-7680)
  • Files installed by the "cdap" package under /etc are now properly marked as config files for RPM packages.(CDAP-7707)
  • Fixed an issue that could cause Spark and MapReduce programs to stop improperly, resulting in a failed run record instead of a killed run record.(CDAP-7724)
  • Fixed the cdap-data-pipeline-plugins-archetype to export everything in the provided groupId and fixed the archetype to use the provided groupId as the Java package instead of using a hardcoded value.(CDAP-7737)
  • Fixed the ordering of search results by relevance in the search RESTful API.(CDAP-7742)
  • Now uses the OpenJDK for redistributable images, such as Docker and Virtual Machine images.(CDAP-7757)
  • The Node.js version check in the CDAP SDK was updated to properly handle patch-level comparisons.(CDAP-7819)