Skip to content

@CuriousVini CuriousVini released this May 22, 2019 · 4 commits to release/6.0 since this release


This release introduces a number of new features, improvements, bug fixes and feature removal to CDAP. Some of the main highlights of the release are:

  1. Portable CDAP Runtime

    • Provide a runtime architecture for CDAP to support both Hadoop and Hadoopless environments, such as Kubernetes, in a distributed and secure fashion.
  2. Storage SPIs

    • Provide an abstraction for all CDAP system storage so that CDAP is more portable across runtime environments - Hadoop or Hadoop-free environments.
  3. Pipeline Enhancements

    • Improve experience of building pipelines with the help of features such as copy & paste and minimap of the pipeline.

Please note that upgrade capability of CDAP is not supported in this release. Please look at list of incompatible changes.

New Features

  • Added Google Cloud Storage copy and move action plugins.(CDAP-14330)
  • New pipeline list user interface.(CDAP-14533)
  • Added minimap to pipeline canvas.(CDAP-14613)
  • Added support for running CDAP system services in Kubernetes environment.(CDAP-14645)
  • Added the ability to copy and paste a node in pipeline studio.(CDAP-14657)
  • Added the ability to limit the number of concurrent pipeline runs.(CDAP-15058)
  • Added support for toggling Stackdriver integration in Google Cloud Dataproc cluster.(CDAP-15095)
  • Added support for Numeric and Array types in Google BigQuery plugins.(CDAP-15256)
  • Added support for showing decimal field types in plugin schemas in pipeline view.(CDAP-15339)


  • Added support for CDH 5.15.(CDAP-13632)
  • Revamps top navbar for CDAP UI based on material design.(CDAP-14653)
  • Secure store supports integration with other KMS systems such as Google Cloud KMS using new Secure Store SPIs.(CDAP-14667)
  • Improved CDAP Master logging of events related to programs that it launches.(CDAP-7208)
  • Use a shared thread pool for provisioning tasks to increase thread utilization.(CDAP-14343)
  • Improve performance of LevelDB backed Table implementation.(CDAP-14569)
  • Wrangler supports secure macros in connection.(CDAP-14571)
  • Significantly improve performance of Transactional Messaging System.(CDAP-14617)
  • Added early validation for the properties of the Google BigQuery sink to fail during pipeline deployment instead of at runtime.(CDAP-14821)
  • Improved the error message when a null value is read for a non-nullable field in avro file sources.(CDAP-14823)
  • Improved loading of system artifacts to load in parallel instead of sequentially.(CDAP-15047)
  • Improved Google Cloud Dataproc provisioner to allow configuring default projectID from CDAP configuration.(CDAP-15059)
  • Added support of using runtime arguments to pass in extra configurations for Google Cloud Dataproc provisioner.(CDAP-15318)
  • Added support for spaces in file path for Google Cloud Storage plugin.(CDAP-14579)
  • Google BigQuery source now validates schema when the pipeline is deployed.(CDAP-14897)

Bug Fixes

  • Fixed a casting bug for the DB source where unsigned integer column were incorrectly being treated as integers instead of longs.(CDAP-12211)
  • Removed the need for ZooKeeper for service discovery in remote runtime environment.(CDAP-13410)
  • Fixed an issue with recording lineage for realtime sources.(CDAP-7230)
  • Fixed dynamic Spark plugin to use appropriate context classloader for loading dynamic Spark code.(CDAP-12941)
  • Fixed a bug that caused MapReduce pipelines to fail when using too many macros.(CDAP-13554)
  • Fixed an issue that caused pipelines with too many macros to fail when running in MapReduce.(CDAP-13982)
  • Fixed an issue with publishing metadata changes for profile assignments.(CDAP-14666)
  • Fixed a bug that would cause workspace ids to clash when wrangling items of the same name.(CDAP-14691)
  • Fixed a bug in secure store caused by breaking changes in Java update 171. Users should be able to get secure keys on java 8u171.(CDAP-14702)
  • Fixed a bug that caused Google Cloud Dataproc clusters to fail provisioning if a firewall rule that denies ingress traffic existed in the project.(CDAP-14708)
  • Fixed a bug that would cause data preparation to fail when preparing a large file in Google Cloud Storage.(CDAP-14709)
  • Fixed a bug that caused action-only pipelines to fail when running using a cloud profile.(CDAP-14724)
  • Fixed an issue with adding business tags to an entity.(CDAP-14744)
  • Fixed an issue in handling metadata search parameters.(CDAP-14778)
  • Fixed a bug that would cause pipelines to fail on remote clusters if the very first pipeline run was an action-only pipeline.(CDAP-14779)
  • Fixed the standard deviation aggregate functions to work, even if there is only one element in a group.(CDAP-14857)
  • Fixed a bug in the Google BigQuery sink that would cause pipelines to fail when writing to a dataset in a different region.(CDAP-14951)
  • Fixed a race condition in processing profile assignments.(CDAP-15001)
  • Fixed an issue that could cause inconsistencies in metadata.(CDAP-15013)
  • Fixed an issue with displaying workspace metadata in the UI.(CDAP-15069)
  • Fixed a race condition in the remote runtime scp implementation that could cause process to hang.(CDAP-15127)
  • Fixed an issue with metadata search result pagination.(CDAP-15196)
  • Fixed Wrangler DB connection where a bad JDBC driver could stay in cache for 60 minutes, making DB connection not usable.(CDAP-15223)
  • Fixed a NullPointerException in Google Cloud Dataproc provision for when there was no network configured.(CDAP-15249)
  • Fixed a bug that caused some aggregator and joiner keys to be dropped if they hashed to the same value as another key.(CDAP-15299)
  • Fixed a bug in the RuntimeMonitor that doesn't reconnect through SSH correctly, causing failure in monitoring the correct program state.(CDAP-15332)
  • Fixed Google Cloud Dataproc runtime for Google Cloud Platform projects where OS Login is enabled.(CDAP-15369)

Deprecated and Removed Features

  • Deprecated HDFSMove and HDFSDelete plugins from core plugins.(CDAP-15241)
  • Removed Streams and Stream Views, which were deprecated in CDAP 5.0.(CDAP-14591)
  • Removed Flow, which was deprecated in CDAP 5.0.(CDAP-14592)
  • Removed deprecated HDFSSink Plugin.(CDAP-14529)
  • Removed the plugin endpoints feature to prevent execution of plugin code in the cdap master. Endpoints were only used for schema propagation, which has moved to the pipeline system service.(CDAP-14772)
  • Removed the support for custom routing for user services.(CDAP-14886)
Assets 2
Jan 29, 2019
Merge pull request #10987 from cdapio/bugfix_release/remove-snapshot
Bugfix release/remove snapshot

@rohitsinha54 rohitsinha54 released this Nov 30, 2018 · 19 commits to release/5.1 since this release


  • Improved performance of Apache Spark pipelines that write to multiple sinks. (CDAP-13430)

Bug Fixes

  • Fixed a bug where pipeline checkpointing is always on regardless of the value set by the user in realtime pipeline. (CDAP-14558)

  • Fixed a bug where artifacts could not be uploaded through UI. (CDAP-14578)

Assets 2

@yaojiefeng yaojiefeng released this Nov 16, 2018 · 3265 commits to develop since this release

New Features


  • Improved performance of spark pipelines that write to multiple sinks. (CDAP-13430)

Bug Fixes

  • Fixed macro enabled properties in plugin configuration to only have macro behavior if the entire value is a macro. (CDAP-13331)

  • Fixed a bug where the upgrade tool did not upgrade the owner meta table (CDAP-13372)

  • Fixed a bug where pipelines with conditions on different branches could not be deployed. (CDAP-13463)

  • Fixed an issue that prevented user runtime arguments from being used in CDAP programs (CDAP-13532)

  • Fixed a bug that under some race condition, running a pipeline preview may cause the CDAP process to shut down. (CDAP-13593)

  • Fixed a bug that could prevent CDAP startup in case the metadata tables were disabled. (CDAP-14019)

  • Fixed a bug to turn off pipeline checkpointing based on the config for a realtime pipeline. (CDAP-14558)

Assets 2

@rohitsinha54 rohitsinha54 released this Nov 13, 2018 · 31 commits to release/5.1 since this release


  • Google Cloud Spanner sink will create database and table if they do not exist. (CDAP-14490)

  • Added a Dataset Project config property to the Google BigQuery source to allow reading from a dataset in another project. (CDAP-14542)

Bug Fixes

  • Fixed an issue that caused avro, parquet, and orc classes across file, Google Cloud Storage, and S3 plugins to clash and cause pipeline failures. (CDAP-12229)

  • Fixed a bug where plugins that register other plugins would not use the correct id when using the PluginSelector API. (CDAP-14511)

  • Fixed a bug where upgraded CDAP instances were not able to load artifacts. (CDAP-14515)

  • Fixed an issue where configuration of sink was overwritten by source. (CDAP-14524)

  • Fixed a packaging bug in kafka-plugins that prevented the plugins from being visible. (CDAP-14538)

  • Fixed a bug where plugins created by other plugins would not have their macros evaluated. (CDAP-14549)

  • Removed LZO as a compression option for snapshot and time partitioned fileset sinks since the codec cannot be packaged with the plugin. (CDAP-14560)

Assets 2

@sreevatsanraman sreevatsanraman released this Oct 12, 2018 · 57 commits to release/5.1 since this release


This release introduces a number of new features, improvements and bug fixes to CDAP. Some of the main highlights of the release are:

  1. Date and Time Support

    • Support for Date, Time and Timestamp data types in the CDAP schema. In addition, this support is also now available in pipeline plugins and Data Preparation directives.
  2. Plugin Requirements

    • A way for plugins to specify certain runtime requirements, and the ability to filter available plugins based on those requirements.
  3. Bootstrapping

    • A method to automatically bootstrap CDAP with a given state, such as a set of deployed apps, artifacts, namespaces, and preferences.
  4. UI Customization

    • A way to customize the display of the CDAP UI by enabling or disabling certain features.

New Features

  • Added support for Date/Time in Preparation. Also, added a new directive parse-timestamp to convert unix timestamp in long or string to Timestamp object. (CDAP-14244)

  • Added Date, Time, and Timestamp support in plugins (Wrangler, Google Cloud BigQuery, Google Cloud Spanner, Database). (CDAP-14245)

  • Added Date, Time, and Timestamp support in CDAP Schema. (CDAP-14021)

  • Added Date, Time, and Timestamp support in UI. (CDAP-14028)

  • Added Google Cloud Spanner source and sink plugins in Pipeline and Google Cloud Spanner connection in Preparation. (CDAP-14053)

  • Added Google Cloud PubSub realtime source. (CDAP-14185)

  • Added a new user onboarding tour to CDAP. (CDAP-14088)

  • Added the ability to customize UI through theme. (CDAP-13990)

  • Added a framework that can be used to bootstrap a CDAP instance. (CDAP-14022)

  • Added the ability to configure system wide provisioner properties that can be set by admins but not by users. (CDAP-13746)

  • Added capability to allow specifying requirements by plugins and filter them on the basis of their requirements. (CDAP-13924)

  • Added REST endpoints to query the run counts of a program. (CDAP-13975)

  • Added a REST endpoint to get the latest run record of multiple programs in a single call. (CDAP-14260)

  • Added support for Apache Spark 2.3. (CDAP-13653)


  • Improved runtime monitoring (which fetches program states, metadata and logs) of remotely launched programs from the CDAP Master by using dynamic port forwarding instead of HTTPS for communication. (CDAP-13566)

  • Removed duplicate classes to reduce the size of the sandbox by a couple hundred megabytes. (CDAP-13977)

  • Added to allow configuring jvm options while launching the Sandbox. (CDAP-14461)

  • Added support for bidirectional Field Level Lineage. (CDAP-14003)

  • Added capability for external dataset to record their schema. (CDAP-14013)

  • The Dataproc provisioner will try to pick up the project id and credentials from the environment if they are not specified. (CDAP-14091)

  • The Dataproc provisioner will use internal IP addresses when CDAP is in the same network as the Dataproc cluster. (CDAP-14104)

  • Added capability to always display current dataset schema in Field Level Lineage. (CDAP-14168)

  • Improved error handling in Preparation. (CDAP-13886)

  • Added a FileSink batch sink, FileMove action, and FileDelete action to replace their HDFS counterparts. (CDAP-14023)

  • Added a configurable jvm option to kill CDAP process immediately on sandbox when an OutOfMemory error occurs. (CDAP-14097)

  • Added better trace logging for dataset service. (CDAP-14135)

  • Make Google Cloud Storage, Google Cloud BigQuery, and Google Cloud Spanner connection properties optional (project id, service account keyfile path, temporary GCS bucket). (CDAP-14386)

  • Google Cloud PubSub sink will try to create the topic if it does not exist while preparing for the run. (CDAP-14401)

  • Added csv, tsv, delimited, json, and blob as formats to the S3 source and sink. (CDAP-14475)

  • Added csv, tsv, delimited, json, and blob as formats to the File source. (CDAP-14321)

  • Added a button on external sources and sinks to jump to the dataset detail page. (CDAP-9048)

  • Added format and suppress query params to the program logs endpoint to match the program run logs endpoint. (CDAP-14040)

  • Made all CDAP examples to be compatible with Spark 2. (CDAP-14132)

  • Added worker and master disk size properties to the Dataproc provisioner. (CDAP-14220)

  • Improved operational behavior of the dataset service. (CDAP-14298)

  • Fixed wrangler transform to make directives optional. If none are given, the transform is a no-op. (CDAP-14372)

  • Fixed Preparation to treat files wihtout extension as text files. (CDAP-14397)

  • Limited the number of files showed in S3 and Google Cloud Storage browser to 1000. (CDAP-14398)

  • Enhanced Google Cloud BigQuery sink to create dataset if the specified dataset does not exist. (CDAP-14482)

  • Increased log levels for the CDAP Sandbox so that only CDAP classes are at debug level. (CDAP-14489)

Bug Fixes

  • Fixed the 'distinct' plugin to use a drop down for the list of fields and to have a button to get the output schema. (CDAP-14468)

  • Ensured that destroy() is always called for MapReduce, even if initialize() fails. (CDAP-7444)

  • Fixed a bug where Alert Publisher will not work if there is a space in the label. (CDAP-13008)

  • Fixed a bug that caused Preparation to fail while parsing avro files. (CDAP-13230)

  • Fixed a misleading error message about hbase classes in cloud runtimes. (CDAP-13878)

  • Fixed a bug where the metric for failed profile program runs was not getting incremented when the run failed due to provisioning errors. (CDAP-13887)

  • Fixed a bug where querying metrics by time series will be incorrect after a certain amount of time. (CDAP-13894)

  • Fixed a bug where profile metrics is incorrect if an app is deleted. (CDAP-13959)

  • Fixed a deprovisioning bug when cluster creation would fail. (CDAP-13965)

  • Fixed an error where TMS publishing was retried indefinitely if the first attempt failed. (CDAP-13988)

  • Fixed a race condition in MapReduce that can cause a deadlock. (CDAP-14076)

  • Fixed a resource leak in preview feature. (CDAP-14098)

  • Fixed a bug that would cause RDD versions of the dynamic scala spark plugins to fail. (CDAP-14107)

  • Fixed a bug where profiles were getting applied to all program types instead of only workflows. (CDAP-14154)

  • Fixed a race condition by ensuring that a program is started before starting runtime monitoring for it. (CDAP-14203)

  • Fixed runs count for pipelines in UI to show correct number instead of limiting to 100. (CDAP-14211)

  • Fixed an issue where Dataproc client was not being closed, resulting in verbose error logs. (CDAP-14223)

  • Fixed a bug that could cause the provisioning state of stopped program runs to be corrupted. (CDAP-14261)

  • Fixed a bug that caused Preparation to be unable to list buckets in a Google Cloud Storage connection in certain environments. (CDAP-14271)

  • Fixed a bug where Dataproc provisioner is not able to provision a singlenode cluster. (CDAP-14303)

  • Fixed a bug where Preparation could not read json or xml files on Google Cloud Storage. (CDAP-14390)

  • Fixed dataproc provisioner to use full API access scopes so that Google Cloud Spanner and Google Cloud PubSub are accessible by default. (CDAP-14395)

  • Fixed a bug where profile metrics is not deleted when a profile is deleted. (CDAP-14435)

Deprecated and Removed Features

  • Removed old and buggy dynamic spark plugins. (CDAP-14108)

  • Dropped support for MapR 4.1. (CDAP-14456)

Assets 2

@prinam prinam released this Jul 31, 2018 · 10 commits to release/5.0 since this release


  1. Cloud Runtime

    • Cloud Runtimes allow you to configure batch pipelines to run in a cloud environment. - Before the pipeline runs, a cluster is provisioned in the cloud. The pipeline is executed on that cluster, and the cluster is deleted after the run finishes. - Cloud Runtimes allow you to only use compute resources when you need them, enabling you to make better use of your resources.
  2. Metadata

    • Metadata Driven Processing - Annotate metadata to custom entities such as fields in a dataset, partitions of a dataset, files in a fileset - Access metadata from a program or plugin at runtime to facilitate metadata driven processing - Field Level Lineage - APIs to register operations being performed on fields from a program or a pipeline plugin - Platform feature to compute field level lineage based on operations
  3. Analytics

    • A simple, interactive, UI-driven approach to machine learning. - Lowers the bar for machine learning, allowing users of any level to understand their data and train models while preserving the switches and levers that advanced users might want to tweak.
  4. Operational Dashboard

    • A real-time interactive interface that visualizes program run statistics - Reporting for comprehensive insights into program runs over large periods of time

New Features

Cloud Runtime

  • Added Cloud Runtimes, which allow users to assign profiles to batch pipelines that control what environment the pipeline will run in. For each program run, a cluster in a cloud environment can be created for just that run, allowing efficient use of resources. (CDAP-13089)

  • Added a way for users to create compute profiles from UI to run programs in remote (cloud) environments using one of the available provisioners. (CDAP-13213)

  • Allowed users to specify a compute profile in UI to run the pipelines in cloud environments. Compute profiles can be specified either while running a pipeline manually or via a time schedule or via a pipeline state based trigger. (CDAP-13206)

  • Added a provisioner that allows users to run pipelines on Google Cloud Dataproc clusters. (CDAP-13094)

  • Added a provisioner that can run pipelines on remote Apache Hadoop clusters (CDAP-13774)

  • Added an Amazon Elastic MapReduce provisioner that can run pipelines on AWS EMR. (CDAP-13709)

  • Added support for viewing logs in CDAP for programs executing using the Cloud Runtime. (CDAP-13380)

  • Added metadata such has pipelines, schedules and triggers that are associated with profiles. Also added metrics such as the total number of runs of a pipeline using a profile. (CDAP-13432)

  • Added the ability to disable and enable a profile (CDAP-13494)

  • Added the capability to export or import compute profiles (CDAP-13276)

  • Added the ability to set the default profile at namespace and instance levels. (CDAP-13359)


  • Added support for annotating metadata to custom entities. For example now a field in a dataset can be annotated with metadata. (CDAP-13260)

  • Added programmatic APIs for users to register field level operations from programs and plugins. (CDAP-13264)

  • Added REST APIs to retrieve the fields which were updated for a given dataset in a given time range, a summary of how those fields were computed, and details about operations which were responsible for updated those fields. (CDAP-13269)

  • Added the ability to view Field Level Lineage for datasets. (CDAP-13511)


  • Added CDAP Analytics as an interactive, UI-driver application that allows users to train machine learning models and use them in their pipelines to make predictions. (CDAP-13921)

Operational Dashboard

  • Added a Dashboard for real-time monitoring of programs and pipelines (CDAP-12865)

  • Added a UI to generate reports on programs and pipelines that ran over a period of time (CDAP-12901)

  • Added feature to support Reports and Dashboard. Dashboard provides realtime status of program runs and future schedules. Reports is a tool for administrators to take a historical look at their applications program runs, statistics and performance (CDAP-13147)

Other New Features

Data Pipelines

  • Added 'Error' and 'Alert' ports for plugins that support this functionality. To enable this functionality in your plugin, in addition to emitting alerts and errors from the plugin code, users have to set "emit-errors: true" and "emit-alerts: true" in their plugin json. Users can create connections from 'Error' port to Error Handlers plugins, and from 'Alert' port to Alert plugins (CDAP-12839)

  • Added support for Apache Phoenix as a source in Data Pipelines. (CDAP-13045)

  • Added support for Apache Phoenix database as a sink in Data Pipelines. (CDAP-13499)

  • Added the ability to support macro behavior for all widget types (CDAP-12944)

  • Added the ability to view all the concurrent runs of a pipeline (CDAP-13057)

  • Added the ability to view the runtime arguments, logs and other details of a particular run of a pipeline. (CDAP-13006)

  • Added UI support for Splitter plugins (CDAP-13242)

Data Preparation

  • Added a Google BigQuery connection for Data Preparation (CDAP-13100)

  • Added a point-and-click interaction to change the data type of a column in the Data Preparation UI (CDAP-12880)


  • Added a page to view and manage a namespace. Users can click on the current namespace card in the namespace dropdown to go the namespace's detail page. In this page, they can see entities and profiles created in this namespace, as well as preferences, mapping and security configurations for this namespace. (CDAP-13180)

  • Added the ability to restart CDAP programs to make it resilient to YARN outages. (CDAP-12951)

  • Implemented a new Administration page, with two tabs, Configuration and Management. In the Configuration tab, users can view and manage all namespaces, system preferences and system profiles. In the Management tab, users can get an overview of system services in CDAP and scale them. (CDAP-13242)


  • Added Spark 2 support for Kafka realtime source (CDAP-13280)

  • Added support for CDH 5.13 and 5.14. (CDAP-12727

  • Added support for EMR 5.4 through 5.7 (CDAP-11805)

  • Upgraded CDAP Router to use Netty 4.1 (CDAP-6308)

  • Added support for automatically restarting long running program types (Service and Flow) upon application master process failure in YARN (CDAP-13179)

  • Added support for specifying custom consumer configs in Kafka source (CDAP-12549)

  • Added support for specifying recursive schemas (CDAP-13143)

  • Added support to pass in YARN application ID in the logging context. This can help in correlating the ID of the program run in CDAP to the ID of the corresponding YARN application, thereby facilitating better debugging. (CDAP-12275)

  • Added the ability to deploy plugin artifacts without requiring a parent artifact. Such plugins are available for use in any parent artifacts (CDAP-9080)

  • Added the ability to import pipelines from the add entity modal (plus button) (CDAP-12274)

  • Added the ability to save the runtime arguments of a pipeline as preferences, so that they do not have to be entered again. (CDAP-11844)

  • Added the ability to specify dependencies to ScalaSparkCompute Action (CDAP-12724)

  • Added the ability to update the keytab URI for namespace's impersonation configuration. (CDAP-12426)

  • Added the ability to upload a User Defined Directive (UDD) using the plus button (CDAP-12279)

  • Allowed CDAP user programs to talk to Kerberos enabled HiveServer2 in the cluster without using a keytab (CDAP-12963)

  • Allowed users to configure the transaction isolation level in database plugins (CDAP-11096)

  • Configured sandbox to have secure store APIs enabled by default (CDAP-13573)

  • Improved robustness of unit test framework by fixing flaky tests (CDAP-13411)

  • Increased default twill reserved memory from 300mb to 768mb in order to prevent YARN from killing containers in standard cluster setups. (CDAP-13405)

  • Macro enabled all fields in the HTTP Callback plugin (CDAP-13116)

  • Removed concurrent upgrades of HBase coprocessors since it could lead to regions getting stuck in transit. (CDAP-12974)

  • Updated the CDAP sandbox to use Spark 2.1.0 as the default Spark version. (CDAP-13409)

  • Improved the documentation for defining Apache Ranger policies for CDAP entities (CDAP-13157)

  • Improved resiliency of router to zookeeper outages. (CDAP-12992)

  • Improved the performance of metadata upgrade by adding a dataset cache. (CDAP-13756)

  • Added CLI command to fetch service logs (CDAP-7644)

  • Added rate limiting to router logs in the event of zookeeper outages (CDAP-12989)

  • Renamed system metadata tables to v2.system.metadata_index.d, v2.system.metadata_index.i. and business metadata tables to, (CDAP-13759)

  • Reduced CDAP Master's local storage usage by deleting temporary directories created for programs as soon as programs are launched on the cluster. (CDAP-6032)

Bug Fixes

  • Fixed a bug in TMS that prevented from correctly consuming multiple events emitted in the same transaction. (CDAP-13033)

  • Fixed a bug that caused errors in the File source if it read parquet files that were not generated through Hadoop. (CDAP-12875)

  • Fixed a bug that caused PySpark to fail to run with Spark 2 in local sandbox. (CDAP-12693)

  • Fixed a bug that could cause the status of a running program to be falsely returned as stopped if the run happened to change state in the middle of calculating the program state. Also fixed a bug where the state for a suspended workflow was stopped instead of running. (CDAP-13296)

  • Fixed a bug that prevented MapReduce AM logs from YARN to show the right URI. (CDAP-7052)

  • Fixed a bug that prevented Spark jobs from running after CDAP upgrade due to caching of jars. (CDAP-12973)

  • Fixed a bug that prevented a parquet snapshot source and sink to be used in the same pipeline (CDAP-13026)

  • Fixed a bug that under some race condition, running a pipeline preview may cause the CDAP process to shut down. (CDAP-13593)

  • Fixed a bug where a Spark program would fail to run when spark authentication is turned on (CDAP-12752)

  • Fixed a bug where an ad-hoc exploration query on streams would fail in an impersonated namespace. (CDAP-13123)

  • Fixed a bug where pipelines with conditions on different branches could not be deployed. (CDAP-13463)

  • Fixed a bug where the Scala Spark compiler had missing classes from classloader, causing compilation failure (CDAP-12743)

  • Fixed a bug where the upgrade tool did not upgrade the owner meta table (CDAP-13372)

  • Fixed a bug with artifacts count, as when we we get artifact count from a namespace we also include system artifacts count causing the total artifact count to be much larger than real count. (CDAP-12647)

  • Fixed a class loading issue and a schema mismatch issue in the whole-file-ingest plugin. (CDAP-13364)

  • Fixed a dependency bug that could cause HBase region servers to deadlock during a cold start (CDAP-12970)

  • Fixed an issue that caused pipeline failures if a Spark plugin tried to read or write a DataFrame using csv format. (CDAP-12742)

  • Fixed an issue that prevented user runtime arguments from being used in CDAP programs (CDAP-13532)

  • Fixed an issue where Spark 2.2 batch pipelines with HDFS sinks would fail with delegation token issue error (CDAP-13281)

  • Fixed an issue with that caused hbase sink to fail when used alongside other sinks, using spark execution engine. (CDAP-12731)

  • Fixed an issue with the retrieval of non-ASCII strings from Table datasets. (CDAP-13002)

  • Fixed avro fileset plugins so that reserved hive keywords can be used as column names (CDAP-13040)

  • Fixed macro enabled properties in plugin configuration to only have macro behavior if the entire value is a macro. (CDAP-13331)

  • Fixed the logs REST API to return a valid json object when filters are specified (CDAP-12988)

  • Fixes an issue where a dataset's class loader was closed before the dataset itself, preventing the dataset from closing properly. (CDAP-13110)

Deprecated and Removed Features

  • Deprecated the aggregation of metadata annotated with all the entities (application, programs, dataset, streams) associated in a run. From this release onwards metadata for program runs behaves like any other entity where a metadata can be directly annotated to it and retrieved from it. For backward compatibility, to achieve the new behavior an additional query parameter 'runAggregation' should be set to false while making the REST call to retrieve metadata of program runs. (CDAP-13721)

  • Dropped support for CDH 5.1, 5.2, 5.3 and HDP 2.0, 2.1 due to security vulnerabilities identified in them (CDAP-8141)

  • Removed HDFS, YARN, and HBase operational stats. These stats were not very useful, could generate confusing log warnings, and were confusing when used in conjunction with cloud profiles. (CDAP-13493)

  • Removed analytics plugins such as decision tree, naive bayes and logistic regression from Hub. The new Analytics flow in the UI should be used as a substitute for this functionality. (CDAP-13720)

  • Removed deprecated cdap sdk commands. Use cdap sandbox commands instead. (CDAP-12584)

  • Removed deprecated and scripts. Use cdap sandbox or cdap cli instead. (CDAP-13680)

  • Removed deprecated error datasets from pipelines. Error transforms should be used instead of error datasets, as they offer more functionality and flexibility. (CDAP-11870)

  • Deprecated HDFS Sink. Use the File sink instead. (CDAP-13353)

  • Removed deprecated stream size based schedules (CDAP-12692)

  • Deprecated streams and flows. Use Apache Kafka as a replacement technology for streams and spark streaming as a replacement technology for flows. Streams and flows will be removed in 6.0 release. (CDAP-13419)

  • Removed multiple deprecated programmatic and RESTful API's in CDAP.

Known Issues

  • Updating the compute profile to use to manually run a pipeline using the UI can remove the existing schedules and triggers of the pipeline. (CDAP-13853)

  • The reports feature does not work with Apache Spark 2.0 currently. As a workaround, upgrade to use Spark version 2.1 or later to use reports. (CDAP-13919)

  • Plugins that are not supported while running a pipeline using a cloud runtime throw unclear error messages at runtime. (CDAP-13896)

  • While some built-in plugins have been updated to emit operations for capturing field level lineage, a number of them do not yet emit these operations. (CDAP-13274)

  • Pipelines cannot propagate dynamic schemas at runtime. (CDAP-13326)

  • Reading metadata is not supported when pipelines or programs run using a cloud runtime. (CDAP-13963)

  • Creating a pipeline from Data Preparation when using an Apache Kafka plugin fails. As a workaround, after clicking the Create Pipeline button, manually update the schema of the Kafka plugin to set a single field named body as a non-nullable string. (CDAP-13971)

  • Metadata for custom entities is not deleted if it's nearest known ancestor entity (parent) is deleted. (CDAP-13910)

Assets 2
Apr 13, 2018
CDAP 4.3 for Azure HDInsights
Apr 13, 2018
CDAP 4.2 for Azure HDInsights
Apr 13, 2018
CDAP 4.3 for EMR
You can’t perform that action at this time.