@yaojiefeng yaojiefeng released this Nov 16, 2018 · 2131 commits to develop since this release

Assets 2

New Features

Improvements

  • Improved performance of spark pipelines that write to multiple sinks. (CDAP-13430)

Bug Fixes

  • Fixed macro enabled properties in plugin configuration to only have macro behavior if the entire value is a macro. (CDAP-13331)

  • Fixed a bug where the upgrade tool did not upgrade the owner meta table (CDAP-13372)

  • Fixed a bug where pipelines with conditions on different branches could not be deployed. (CDAP-13463)

  • Fixed an issue that prevented user runtime arguments from being used in CDAP programs (CDAP-13532)

  • Fixed a bug that under some race condition, running a pipeline preview may cause the CDAP process to shut down. (CDAP-13593)

  • Fixed a bug that could prevent CDAP startup in case the metadata tables were disabled. (CDAP-14019)

  • Fixed a bug to turn off pipeline checkpointing based on the config for a realtime pipeline. (CDAP-14558)

@rohitsinha54 rohitsinha54 released this Nov 13, 2018 · 4 commits to release/5.1 since this release

Assets 2

Improvements

  • Google Cloud Spanner sink will create database and table if they do not exist. (CDAP-14490)

  • Added a Dataset Project config property to the Google BigQuery source to allow reading from a dataset in another project. (CDAP-14542)

Bug Fixes

  • Fixed an issue that caused avro, parquet, and orc classes across file, Google Cloud Storage, and S3 plugins to clash and cause pipeline failures. (CDAP-12229)

  • Fixed a bug where plugins that register other plugins would not use the correct id when using the PluginSelector API. (CDAP-14511)

  • Fixed a bug where upgraded CDAP instances were not able to load artifacts. (CDAP-14515)

  • Fixed an issue where configuration of sink was overwritten by source. (CDAP-14524)

  • Fixed a packaging bug in kafka-plugins that prevented the plugins from being visible. (CDAP-14538)

  • Fixed a bug where plugins created by other plugins would not have their macros evaluated. (CDAP-14549)

  • Removed LZO as a compression option for snapshot and time partitioned fileset sinks since the codec cannot be packaged with the plugin. (CDAP-14560)

@sreevatsanraman sreevatsanraman released this Oct 12, 2018 · 30 commits to release/5.1 since this release

Assets 2

Summary

This release introduces a number of new features, improvements and bug fixes to CDAP. Some of the main highlights of the release are:

  1. Date and Time Support

    • Support for Date, Time and Timestamp data types in the CDAP schema. In addition, this support is also now available in pipeline plugins and Data Preparation directives.
  2. Plugin Requirements

    • A way for plugins to specify certain runtime requirements, and the ability to filter available plugins based on those requirements.
  3. Bootstrapping

    • A method to automatically bootstrap CDAP with a given state, such as a set of deployed apps, artifacts, namespaces, and preferences.
  4. UI Customization

    • A way to customize the display of the CDAP UI by enabling or disabling certain features.

New Features

  • Added support for Date/Time in Preparation. Also, added a new directive parse-timestamp to convert unix timestamp in long or string to Timestamp object. (CDAP-14244)

  • Added Date, Time, and Timestamp support in plugins (Wrangler, Google Cloud BigQuery, Google Cloud Spanner, Database). (CDAP-14245)

  • Added Date, Time, and Timestamp support in CDAP Schema. (CDAP-14021)

  • Added Date, Time, and Timestamp support in UI. (CDAP-14028)

  • Added Google Cloud Spanner source and sink plugins in Pipeline and Google Cloud Spanner connection in Preparation. (CDAP-14053)

  • Added Google Cloud PubSub realtime source. (CDAP-14185)

  • Added a new user onboarding tour to CDAP. (CDAP-14088)

  • Added the ability to customize UI through theme. (CDAP-13990)

  • Added a framework that can be used to bootstrap a CDAP instance. (CDAP-14022)

  • Added the ability to configure system wide provisioner properties that can be set by admins but not by users. (CDAP-13746)

  • Added capability to allow specifying requirements by plugins and filter them on the basis of their requirements. (CDAP-13924)

  • Added REST endpoints to query the run counts of a program. (CDAP-13975)

  • Added a REST endpoint to get the latest run record of multiple programs in a single call. (CDAP-14260)

  • Added support for Apache Spark 2.3. (CDAP-13653)

Improvements

  • Improved runtime monitoring (which fetches program states, metadata and logs) of remotely launched programs from the CDAP Master by using dynamic port forwarding instead of HTTPS for communication. (CDAP-13566)

  • Removed duplicate classes to reduce the size of the sandbox by a couple hundred megabytes. (CDAP-13977)

  • Added cdap-env.sh to allow configuring jvm options while launching the Sandbox. (CDAP-14461)

  • Added support for bidirectional Field Level Lineage. (CDAP-14003)

  • Added capability for external dataset to record their schema. (CDAP-14013)

  • The Dataproc provisioner will try to pick up the project id and credentials from the environment if they are not specified. (CDAP-14091)

  • The Dataproc provisioner will use internal IP addresses when CDAP is in the same network as the Dataproc cluster. (CDAP-14104)

  • Added capability to always display current dataset schema in Field Level Lineage. (CDAP-14168)

  • Improved error handling in Preparation. (CDAP-13886)

  • Added a FileSink batch sink, FileMove action, and FileDelete action to replace their HDFS counterparts. (CDAP-14023)

  • Added a configurable jvm option to kill CDAP process immediately on sandbox when an OutOfMemory error occurs. (CDAP-14097)

  • Added better trace logging for dataset service. (CDAP-14135)

  • Make Google Cloud Storage, Google Cloud BigQuery, and Google Cloud Spanner connection properties optional (project id, service account keyfile path, temporary GCS bucket). (CDAP-14386)

  • Google Cloud PubSub sink will try to create the topic if it does not exist while preparing for the run. (CDAP-14401)

  • Added csv, tsv, delimited, json, and blob as formats to the S3 source and sink. (CDAP-14475)

  • Added csv, tsv, delimited, json, and blob as formats to the File source. (CDAP-14321)

  • Added a button on external sources and sinks to jump to the dataset detail page. (CDAP-9048)

  • Added format and suppress query params to the program logs endpoint to match the program run logs endpoint. (CDAP-14040)

  • Made all CDAP examples to be compatible with Spark 2. (CDAP-14132)

  • Added worker and master disk size properties to the Dataproc provisioner. (CDAP-14220)

  • Improved operational behavior of the dataset service. (CDAP-14298)

  • Fixed wrangler transform to make directives optional. If none are given, the transform is a no-op. (CDAP-14372)

  • Fixed Preparation to treat files wihtout extension as text files. (CDAP-14397)

  • Limited the number of files showed in S3 and Google Cloud Storage browser to 1000. (CDAP-14398)

  • Enhanced Google Cloud BigQuery sink to create dataset if the specified dataset does not exist. (CDAP-14482)

  • Increased log levels for the CDAP Sandbox so that only CDAP classes are at debug level. (CDAP-14489)

Bug Fixes

  • Fixed the 'distinct' plugin to use a drop down for the list of fields and to have a button to get the output schema. (CDAP-14468)

  • Ensured that destroy() is always called for MapReduce, even if initialize() fails. (CDAP-7444)

  • Fixed a bug where Alert Publisher will not work if there is a space in the label. (CDAP-13008)

  • Fixed a bug that caused Preparation to fail while parsing avro files. (CDAP-13230)

  • Fixed a misleading error message about hbase classes in cloud runtimes. (CDAP-13878)

  • Fixed a bug where the metric for failed profile program runs was not getting incremented when the run failed due to provisioning errors. (CDAP-13887)

  • Fixed a bug where querying metrics by time series will be incorrect after a certain amount of time. (CDAP-13894)

  • Fixed a bug where profile metrics is incorrect if an app is deleted. (CDAP-13959)

  • Fixed a deprovisioning bug when cluster creation would fail. (CDAP-13965)

  • Fixed an error where TMS publishing was retried indefinitely if the first attempt failed. (CDAP-13988)

  • Fixed a race condition in MapReduce that can cause a deadlock. (CDAP-14076)

  • Fixed a resource leak in preview feature. (CDAP-14098)

  • Fixed a bug that would cause RDD versions of the dynamic scala spark plugins to fail. (CDAP-14107)

  • Fixed a bug where profiles were getting applied to all program types instead of only workflows. (CDAP-14154)

  • Fixed a race condition by ensuring that a program is started before starting runtime monitoring for it. (CDAP-14203)

  • Fixed runs count for pipelines in UI to show correct number instead of limiting to 100. (CDAP-14211)

  • Fixed an issue where Dataproc client was not being closed, resulting in verbose error logs. (CDAP-14223)

  • Fixed a bug that could cause the provisioning state of stopped program runs to be corrupted. (CDAP-14261)

  • Fixed a bug that caused Preparation to be unable to list buckets in a Google Cloud Storage connection in certain environments. (CDAP-14271)

  • Fixed a bug where Dataproc provisioner is not able to provision a singlenode cluster. (CDAP-14303)

  • Fixed a bug where Preparation could not read json or xml files on Google Cloud Storage. (CDAP-14390)

  • Fixed dataproc provisioner to use full API access scopes so that Google Cloud Spanner and Google Cloud PubSub are accessible by default. (CDAP-14395)

  • Fixed a bug where profile metrics is not deleted when a profile is deleted. (CDAP-14435)

Deprecated and Removed Features

  • Removed old and buggy dynamic spark plugins. (CDAP-14108)

  • Dropped support for MapR 4.1. (CDAP-14456)

@prinam prinam released this Jul 31, 2018 · 10 commits to release/5.0 since this release

Assets 2

Summary

  1. Cloud Runtime

    • Cloud Runtimes allow you to configure batch pipelines to run in a cloud environment. - Before the pipeline runs, a cluster is provisioned in the cloud. The pipeline is executed on that cluster, and the cluster is deleted after the run finishes. - Cloud Runtimes allow you to only use compute resources when you need them, enabling you to make better use of your resources.
  2. Metadata

    • Metadata Driven Processing - Annotate metadata to custom entities such as fields in a dataset, partitions of a dataset, files in a fileset - Access metadata from a program or plugin at runtime to facilitate metadata driven processing - Field Level Lineage - APIs to register operations being performed on fields from a program or a pipeline plugin - Platform feature to compute field level lineage based on operations
  3. Analytics

    • A simple, interactive, UI-driven approach to machine learning. - Lowers the bar for machine learning, allowing users of any level to understand their data and train models while preserving the switches and levers that advanced users might want to tweak.
  4. Operational Dashboard

    • A real-time interactive interface that visualizes program run statistics - Reporting for comprehensive insights into program runs over large periods of time

New Features

Cloud Runtime
........................

  • Added Cloud Runtimes, which allow users to assign profiles to batch pipelines that control what environment the pipeline will run in. For each program run, a cluster in a cloud environment can be created for just that run, allowing efficient use of resources. (CDAP-13089)

  • Added a way for users to create compute profiles from UI to run programs in remote (cloud) environments using one of the available provisioners. (CDAP-13213)

  • Allowed users to specify a compute profile in UI to run the pipelines in cloud environments. Compute profiles can be specified either while running a pipeline manually or via a time schedule or via a pipeline state based trigger. (CDAP-13206)

  • Added a provisioner that allows users to run pipelines on Google Cloud Dataproc clusters. (CDAP-13094)

  • Added a provisioner that can run pipelines on remote Apache Hadoop clusters (CDAP-13774)

  • Added an Amazon Elastic MapReduce provisioner that can run pipelines on AWS EMR. (CDAP-13709)

  • Added support for viewing logs in CDAP for programs executing using the Cloud Runtime. (CDAP-13380)

  • Added metadata such has pipelines, schedules and triggers that are associated with profiles. Also added metrics such as the total number of runs of a pipeline using a profile. (CDAP-13432)

  • Added the ability to disable and enable a profile (CDAP-13494)

  • Added the capability to export or import compute profiles (CDAP-13276)

  • Added the ability to set the default profile at namespace and instance levels. (CDAP-13359)

Metadata
................

  • Added support for annotating metadata to custom entities. For example now a field in a dataset can be annotated with metadata. (CDAP-13260)

  • Added programmatic APIs for users to register field level operations from programs and plugins. (CDAP-13264)

  • Added REST APIs to retrieve the fields which were updated for a given dataset in a given time range, a summary of how those fields were computed, and details about operations which were responsible for updated those fields. (CDAP-13269)

  • Added the ability to view Field Level Lineage for datasets. (CDAP-13511)

Analytics
...............

  • Added CDAP Analytics as an interactive, UI-driver application that allows users to train machine learning models and use them in their pipelines to make predictions. (CDAP-13921)

Operational Dashboard
......................................

  • Added a Dashboard for real-time monitoring of programs and pipelines (CDAP-12865)

  • Added a UI to generate reports on programs and pipelines that ran over a period of time (CDAP-12901)

  • Added feature to support Reports and Dashboard. Dashboard provides realtime status of program runs and future schedules. Reports is a tool for administrators to take a historical look at their applications program runs, statistics and performance (CDAP-13147)

Other New Features
.................................

Data Pipelines
^^^^^^^^^^^^^^

  • Added 'Error' and 'Alert' ports for plugins that support this functionality. To enable this functionality in your plugin, in addition to emitting alerts and errors from the plugin code, users have to set "emit-errors: true" and "emit-alerts: true" in their plugin json. Users can create connections from 'Error' port to Error Handlers plugins, and from 'Alert' port to Alert plugins (CDAP-12839)

  • Added support for Apache Phoenix as a source in Data Pipelines. (CDAP-13045)

  • Added support for Apache Phoenix database as a sink in Data Pipelines. (CDAP-13499)

  • Added the ability to support macro behavior for all widget types (CDAP-12944)

  • Added the ability to view all the concurrent runs of a pipeline (CDAP-13057)

  • Added the ability to view the runtime arguments, logs and other details of a particular run of a pipeline. (CDAP-13006)

  • Added UI support for Splitter plugins (CDAP-13242)

Data Preparation
^^^^^^^^^^^^^^^^

  • Added a Google BigQuery connection for Data Preparation (CDAP-13100)

  • Added a point-and-click interaction to change the data type of a column in the Data Preparation UI (CDAP-12880)

Miscellaneous
^^^^^^^^^^^^^

  • Added a page to view and manage a namespace. Users can click on the current namespace card in the namespace dropdown to go the namespace's detail page. In this page, they can see entities and profiles created in this namespace, as well as preferences, mapping and security configurations for this namespace. (CDAP-13180)

  • Added the ability to restart CDAP programs to make it resilient to YARN outages. (CDAP-12951)

  • Implemented a new Administration page, with two tabs, Configuration and Management. In the Configuration tab, users can view and manage all namespaces, system preferences and system profiles. In the Management tab, users can get an overview of system services in CDAP and scale them. (CDAP-13242)

Improvements

  • Added Spark 2 support for Kafka realtime source (CDAP-13280)

  • Added support for CDH 5.13 and 5.14. (CDAP-12727

  • Added support for EMR 5.4 through 5.7 (CDAP-11805)

  • Upgraded CDAP Router to use Netty 4.1 (CDAP-6308)

  • Added support for automatically restarting long running program types (Service and Flow) upon application master process failure in YARN (CDAP-13179)

  • Added support for specifying custom consumer configs in Kafka source (CDAP-12549)

  • Added support for specifying recursive schemas (CDAP-13143)

  • Added support to pass in YARN application ID in the logging context. This can help in correlating the ID of the program run in CDAP to the ID of the corresponding YARN application, thereby facilitating better debugging. (CDAP-12275)

  • Added the ability to deploy plugin artifacts without requiring a parent artifact. Such plugins are available for use in any parent artifacts (CDAP-9080)

  • Added the ability to import pipelines from the add entity modal (plus button) (CDAP-12274)

  • Added the ability to save the runtime arguments of a pipeline as preferences, so that they do not have to be entered again. (CDAP-11844)

  • Added the ability to specify dependencies to ScalaSparkCompute Action (CDAP-12724)

  • Added the ability to update the keytab URI for namespace's impersonation configuration. (CDAP-12426)

  • Added the ability to upload a User Defined Directive (UDD) using the plus button (CDAP-12279)

  • Allowed CDAP user programs to talk to Kerberos enabled HiveServer2 in the cluster without using a keytab (CDAP-12963)

  • Allowed users to configure the transaction isolation level in database plugins (CDAP-11096)

  • Configured sandbox to have secure store APIs enabled by default (CDAP-13573)

  • Improved robustness of unit test framework by fixing flaky tests (CDAP-13411)

  • Increased default twill reserved memory from 300mb to 768mb in order to prevent YARN from killing containers in standard cluster setups. (CDAP-13405)

  • Macro enabled all fields in the HTTP Callback plugin (CDAP-13116)

  • Removed concurrent upgrades of HBase coprocessors since it could lead to regions getting stuck in transit. (CDAP-12974)

  • Updated the CDAP sandbox to use Spark 2.1.0 as the default Spark version. (CDAP-13409)

  • Improved the documentation for defining Apache Ranger policies for CDAP entities (CDAP-13157)

  • Improved resiliency of router to zookeeper outages. (CDAP-12992)

  • Improved the performance of metadata upgrade by adding a dataset cache. (CDAP-13756)

  • Added CLI command to fetch service logs (CDAP-7644)

  • Added rate limiting to router logs in the event of zookeeper outages (CDAP-12989)

  • Renamed system metadata tables to v2.system.metadata_index.d, v2.system.metadata_index.i. and business metadata tables to v2.business.metadata_index.d, v2.business.metadata_index.i (CDAP-13759)

  • Reduced CDAP Master's local storage usage by deleting temporary directories created for programs as soon as programs are launched on the cluster. (CDAP-6032)

Bug Fixes

  • Fixed a bug in TMS that prevented from correctly consuming multiple events emitted in the same transaction. (CDAP-13033)

  • Fixed a bug that caused errors in the File source if it read parquet files that were not generated through Hadoop. (CDAP-12875)

  • Fixed a bug that caused PySpark to fail to run with Spark 2 in local sandbox. (CDAP-12693)

  • Fixed a bug that could cause the status of a running program to be falsely returned as stopped if the run happened to change state in the middle of calculating the program state. Also fixed a bug where the state for a suspended workflow was stopped instead of running. (CDAP-13296)

  • Fixed a bug that prevented MapReduce AM logs from YARN to show the right URI. (CDAP-7052)

  • Fixed a bug that prevented Spark jobs from running after CDAP upgrade due to caching of jars. (CDAP-12973)

  • Fixed a bug that prevented a parquet snapshot source and sink to be used in the same pipeline (CDAP-13026)

  • Fixed a bug that under some race condition, running a pipeline preview may cause the CDAP process to shut down. (CDAP-13593)

  • Fixed a bug where a Spark program would fail to run when spark authentication is turned on (CDAP-12752)

  • Fixed a bug where an ad-hoc exploration query on streams would fail in an impersonated namespace. (CDAP-13123)

  • Fixed a bug where pipelines with conditions on different branches could not be deployed. (CDAP-13463)

  • Fixed a bug where the Scala Spark compiler had missing classes from classloader, causing compilation failure (CDAP-12743)

  • Fixed a bug where the upgrade tool did not upgrade the owner meta table (CDAP-13372)

  • Fixed a bug with artifacts count, as when we we get artifact count from a namespace we also include system artifacts count causing the total artifact count to be much larger than real count. (CDAP-12647)

  • Fixed a class loading issue and a schema mismatch issue in the whole-file-ingest plugin. (CDAP-13364)

  • Fixed a dependency bug that could cause HBase region servers to deadlock during a cold start (CDAP-12970)

  • Fixed an issue that caused pipeline failures if a Spark plugin tried to read or write a DataFrame using csv format. (CDAP-12742)

  • Fixed an issue that prevented user runtime arguments from being used in CDAP programs (CDAP-13532)

  • Fixed an issue where Spark 2.2 batch pipelines with HDFS sinks would fail with delegation token issue error (CDAP-13281)

  • Fixed an issue with that caused hbase sink to fail when used alongside other sinks, using spark execution engine. (CDAP-12731)

  • Fixed an issue with the retrieval of non-ASCII strings from Table datasets. (CDAP-13002)

  • Fixed avro fileset plugins so that reserved hive keywords can be used as column names (CDAP-13040)

  • Fixed macro enabled properties in plugin configuration to only have macro behavior if the entire value is a macro. (CDAP-13331)

  • Fixed the logs REST API to return a valid json object when filters are specified (CDAP-12988)

  • Fixes an issue where a dataset's class loader was closed before the dataset itself, preventing the dataset from closing properly. (CDAP-13110)

Deprecated and Removed Features

  • Deprecated the aggregation of metadata annotated with all the entities (application, programs, dataset, streams) associated in a run. From this release onwards metadata for program runs behaves like any other entity where a metadata can be directly annotated to it and retrieved from it. For backward compatibility, to achieve the new behavior an additional query parameter 'runAggregation' should be set to false while making the REST call to retrieve metadata of program runs. (CDAP-13721)

  • Dropped support for CDH 5.1, 5.2, 5.3 and HDP 2.0, 2.1 due to security vulnerabilities identified in them (CDAP-8141)

  • Removed HDFS, YARN, and HBase operational stats. These stats were not very useful, could generate confusing log warnings, and were confusing when used in conjunction with cloud profiles. (CDAP-13493)

  • Removed analytics plugins such as decision tree, naive bayes and logistic regression from Hub. The new Analytics flow in the UI should be used as a substitute for this functionality. (CDAP-13720)

  • Removed deprecated cdap sdk commands. Use cdap sandbox commands instead. (CDAP-12584)

  • Removed deprecated cdap.sh and cdap-cli.sh scripts. Use cdap sandbox or cdap cli instead. (CDAP-13680)

  • Removed deprecated error datasets from pipelines. Error transforms should be used instead of error datasets, as they offer more functionality and flexibility. (CDAP-11870)

  • Deprecated HDFS Sink. Use the File sink instead. (CDAP-13353)

  • Removed deprecated stream size based schedules (CDAP-12692)

  • Deprecated streams and flows. Use Apache Kafka as a replacement technology for streams and spark streaming as a replacement technology for flows. Streams and flows will be removed in 6.0 release. (CDAP-13419)

  • Removed multiple deprecated programmatic and RESTful API's in CDAP.
    (CDAP-5966)

Known Issues

  • Updating the compute profile to use to manually run a pipeline using the UI can remove the existing schedules and triggers of the pipeline. (CDAP-13853)

  • The reports feature does not work with Apache Spark 2.0 currently. As a workaround, upgrade to use Spark version 2.1 or later to use reports. (CDAP-13919)

  • Plugins that are not supported while running a pipeline using a cloud runtime throw unclear error messages at runtime. (CDAP-13896)

  • While some built-in plugins have been updated to emit operations for capturing field level lineage, a number of them do not yet emit these operations. (CDAP-13274)

  • Pipelines cannot propagate dynamic schemas at runtime. (CDAP-13326)

  • Reading metadata is not supported when pipelines or programs run using a cloud runtime. (CDAP-13963)

  • Creating a pipeline from Data Preparation when using an Apache Kafka plugin fails. As a workaround, after clicking the Create Pipeline button, manually update the schema of the Kafka plugin to set a single field named body as a non-nullable string. (CDAP-13971)

  • Metadata for custom entities is not deleted if it's nearest known ancestor entity (parent) is deleted. (CDAP-13910)

@prinam prinam released this Mar 30, 2018 · 2131 commits to develop since this release

Assets 2

Improvements

  • Macro enabled all fields in the HTTP Callback plugin (CDAP-13116)

  • Optimized the planner to reduce the amount of temporary data required in certain types of mapreduce pipelines. (CDAP-13119)

  • Minor optimization to reduce the number of mappers used to read intermediate data in mapreduce pipelines (CDAP-13122)

  • Improves the schema generation for database sources. (CDAP-13139)

  • Automatic restart of long running program types (Service and Flow) upon application master process failure in YARN (CDAP-13179)

Bug Fixes

  • Fixed a bug that caused errors in the File source if it read parquet files that were not generated through Hadoop. (CDAP-12875)

  • Fixed an issue where a dataset's class loader was closed before the dataset itself, preventing the dataset from closing properly. (CDAP-13110)

  • Fixed a bug that caused directories to be left around if a workflow used a partitioned fileset as a local dataset (CDAP-13120)

  • Fixed a bug that caused a hive Explore query on Streams to not work. (CDAP-13123)

  • Fixed a planner bug to ensure that sinks are never placed in two different mapreduce phases in the same pipeline. (CDAP-13129)

  • Fixed a race condition when running multiple spark programs concurrently at a Workflow fork that can lead to workflow failure (CDAP-13158)

  • Fixed an issue with creating a namespace if the namespace principal is not a member of the namespace home's group. (CDAP-13171)

  • Fixed a bug that caused completed run records to be missed when storing run state, resulting in misleading log messages about ignoring killed states. (CDAP-13191)

  • Fixed a bug in FileBatchSource that prevented ignoreFolders property from working with avro and parquet inputs (CDAP-13192)

  • Fixed an issue where inconsistencies in the schedulestore caused scheduler service to keep exiting. (CDAP-13205)

  • Fixed an issue that would cause changes in program state to be ignored if the program no longer existed, resulting in the run record corrector repeatedly failing to correct run records (CDAP-13217)

  • Fixed the state of Workflow, MapReduce, and Spark program to be reflected correctly as KILLED state when user explicitly terminated the running program (CDAP-13218)

  • Fixed directive syntaxes in point and click interactions for some date formats (CDAP-13223)

@prinam prinam released this Jan 12, 2018 · 2211 commits to develop since this release

Assets 2

Improvements

  • GroupBy aggregator plugin fields are now macro enabled. (CDAP-12942)
  • Allow CDAP user programs to talk to Kerberos enabled HiveServer2 in the cluster without using a keytab. (CDAP-12963)
  • Removed concurrent upgrades of HBase coprocessors since it could lead to regions getting stuck in transit. (CDAP-12974)

Bug Fixes

  • Fixed a bug that prevented MapReduce AM logs from YARN to show the right URI. (CDAP-7052)
  • Added CLI command to fetch service logs. (CDAP-7644)
  • Increased the dataset changeset size and limit to integer max by default. (CDAP-12774)
  • Fixed a bug where macro for output schema of a node was not saved when the user closed the node properties modal. (CDAP-12900)
  • Fixed a bug where explore queries would fail against paths in HDFS encryption zones, for certain Hadoop distributions. (CDAP-12930)
  • Fixed a bug where the old connection is not removed from the pipeline config when you move the connection's pointer to another node. (CDAP-12945)
  • Fixed a bug in the pipeline planner where pipelines that used an action before multiple sources would either fail to deploy or deploy with an incorrect plan. (CDAP-12946)
  • Fixed a dependency bug that could cause HBase region servers to deadlock during a cold start. (CDAP-12970)
  • Fixed an issue with the retrieval of non-ASCII strings from Table datasets. (CDAP-13002)
  • Messaging table coprocessor now gets upgraded when the underlying HBase version is changed without any change in the CDAP version. (CDAP-13021)
  • Fixed a bug that prevented a parquet snapshot source and sink to be used in the same pipeline. (CDAP-13026)
  • Fixed a bug in TMS that prevented correctly consuming multiple events emitted in the same transaction. (CDAP-13033)
  • Make TransactionContext resilient against getTransactionAwareName() failures. (CDAP-13037)
  • Fixed avro fileset plugins so that reserved hive keywords can be used as column names. (CDAP-13040)