Release 1.7 #8118

yangsongbai · 2019-04-06T11:53:01Z

What is the purpose of the change

(For example: This pull request makes task deployment go through the blob server, rather than through RPC. That way we avoid re-transferring them on each deployment (during recovery).)

Brief change log

(for example:)

The TaskInfo is stored in the blob store on job creation time as a persistent artifact
Deployments RPC transmits only the blob storage reference
TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (100MB)
Extended integration test for recovery after master (JobManager) failure
Added test that validates that TaskInfo is transferred only once across recoveries
Manually verified the change by running a 4 node cluser with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

…tests This avoids issues with eventual consistency/visibility on S3.

In order to not leak secrets we should not print the configuration in docker-entrypoint.sh.

Wait until all Executions reach the state DEPLOYING instead of having a resource assigned.

…onfigOptions from documentation The annotation Documentation.ExcludeFromDocumentation can be used to annotate ConfigOptions with in order to not include them in the documentation.

…rom documentation This commit excludes the JobManagerOptions#EXECUTION_FAILOVER_STRATEGY from Flink's configuration documentation.

…trategies

…or message

This closes #6923.

… to end test module

This commit removes the Scala based Kafka010Example from flink-end-to-end-tests/ flink-streaming-kafka010-test module. Moreover, it adds relative paths to their parent pom's and cleans up the flink-streaming-kafka*/pom.xml.

…link-connector-kafka In order to satisfy dependency convergence we need to exclude the kafka-clients from the base flink-connector-kafka-x dependency in every flink-connector-kafka-y module.

This commit fixes the wrong result type of interval joins in the DataStream Scala API. The API did not call the Scala type extraction stack but was using the default one offered by the DataStream Java API. This closes #7141.

…StateFsBackendITCase This closes #7159

Because state clean up happens in processing time, it might be the case that retractions are arriving after the state has been cleaned up. Before these changes, a new accumulator was created and invalid retraction messages were emitted. This change drops retraction messages for which no accumulator exists. It also refactors the tests and adds more test cases and explanations. This closes #7147.

This closes #7073.

…f job is not running

This closes #7155

…actSet Instead of including every dependency and then limiting the set of included files via a filter condition of the maven-shade-pluging, this commit defines an artifact set of included dependencies. That way we will properly include all files belonging to the listed dependencies (e.g. also the NOTICE file). This closes #7176.

Do not use /tmp directory to place log files or the HDFS data directory. Reconfigure dfs.replication to 1 because file availability is irrelevant in tests. Increase heap size of HDFS DataNodes and NameNode. Change find-files! function to not fail if directory does not exist.

The test did not actually run since the class was refactored with JUnit's parameterized, because it was always running into a NPE and the NPE was then silently swallowed in a shutdown catch-block.

…nd MiniCluster The io executor is responsible for running io operations like discarding checkpoints. By using the io executor, we don't risk that the RpcService is blocked by blocking io operations. This closes #7926.

… CoGroupedStreams.UnionSerializer UnionSerializer did not perform a proper duplication of inner serializers. It also violated the assumption that createInstance never produces null.

Original intent was to never reach this code path except on programmer errors, but it has turned into an accepted code path for unhandled exceptions.

We've seen quite some flakyness in the end-to-end tests on Travis lately. On most tests it takes about 5-8 secs for the dispatcher to come up so 10 secs might be to low.

…shipCall Fix the race condition between executing EmbeddedLeaderService#GrantLeadershipCall and a concurrent shutdown of the leader service by making GrantLeadershipCall not accessing mutable state outside of a lock. This closes #7937.

Due to changes how the slot futures are completed and due to the fact that the ResultConjunctFuture does not maintain the order in which the futures were specified, it could happen that executions were not deployed in topological order. This commit fixes this problem by changing the ResultConjunctFuture so that it maintains the order of the specified futures in its result collection. This closes #8065.

1. Rename the section title from `standalone` to `Flink session cluster on Mesos`. 2. Add one new section `Flink job cluster on Mesos` which describes the usage of mesos-appmaster-job.sh. This closes #8084.

…ault This commits updates the flink-conf.yaml to contain the new rest options and comments out the rest.port per default.

This commit changes the YarnEntrypointUtils#loadConfiguration so that it only sets RestOptions.BIND_PORT to 0 if it has not been specified. This allows to explicitly set a port range for Yarn applications which are running behind a firewall, for example.

flinkbot · 2019-04-06T11:54:04Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

azagrebin and others added 30 commits November 16, 2018 21:14

[FLINK-10736] [tests] Use staticly hosted test data for S3 wordcount …

db9b538

…tests This avoids issues with eventual consistency/visibility on S3.

[FLINK-10906][docker] Don't print configuration in docker-entrypoint.sh

1acf5ee

In order to not leak secrets we should not print the configuration in docker-entrypoint.sh.

[FLINK-10913] Harden ExecutionGraphRestartTest#testRestartAutomatically

1b8b7ae

Wait until all Executions reach the state DEPLOYING instead of having a resource assigned.

[FLINK-10880] Add Documentation.ExcludeFromDocumentation to exclude C…

ac30938

…onfigOptions from documentation The annotation Documentation.ExcludeFromDocumentation can be used to annotate ConfigOptions with in order to not include them in the documentation.

[FLINK-10880] Exclude JobManagerOptions#EXECUTION_FAILOVER_STRATEGY f…

63d3883

…rom documentation This commit excludes the JobManagerOptions#EXECUTION_FAILOVER_STRATEGY from Flink's configuration documentation.

[FLINK-10880] Add release notes warning to not use Flink's failover s…

ef683c3

…trategies

[FLINK-10625] [docs] Documentation for MATCH_RECOGNIZE clause

d61f3bc

[FLINK-10625] [docs] Improvements to the MATCH_RECOGNIZE documentation

0642814

[FLINK-10625] [docs] Polishing MATCH_RECOGNIZE document

ab86d5d

[FLINK-10916][streaming] Include duplicated user-specified uid in err…

3c1f14e

…or message

[FLINK-10925][py] Check for null when closing socket

96e790a

[FLINK-10670] [table] Fix Correlate codegen error

76974bd

This closes #6923.

[FLINK-10481][e2e] Added retry logic for building docker image

571faf4

[FLINK-8997] Added sliding window aggregation to datastream test job

5d77007

[FLINK-10893] [tests] Export S3 credentials properly for test scripts

65ef428

[FLINK-10922] Refactor the placement of the Flink Kafka connector end…

8c1bee6

… to end test module

[FLINK-10922] Exclude transitive kafka-clients dependency from base f…

84cad5b

…link-connector-kafka In order to satisfy dependency convergence we need to exclude the kafka-clients from the base flink-connector-kafka-x dependency in every flink-connector-kafka-y module.

[FLINK-10983][queryable state] Increase port range for NonHAQueryable…

55e9124

…StateFsBackendITCase This closes #7159

[hotfix] [table] Refactor aggregation harness tests to be found

d2f2afb

[hotfix] [table] Show deserialization error cause

ba7ced2

[FLINK-10951][tests] Set yarn.nodemanager.vmem-check-enabled to false

70623d3

[FLINK-10842] [e2e] Fix broken waiting loops in common.sh

73b28a8

This closes #7073.

[FLINK-10946] Silence checkpoint exception logging in task executor i…

8cf611e

…f job is not running

[FLINK-10009][table] Fix the casting problem for built-in TIMESTAMPADD.

461076b

This closes #7155

[FLINK-10998][metrics][ganglia] Remove Ganglia reporter

fb8fc04

dawidwys and others added 27 commits March 5, 2019 11:13

[FLINK-11823][core] Fixed duplicate method of TrySerializer

70bc26c

[FLINK-11542][config][docs] Extend memory configuration descriptions

fa53d74

[hotfix] Repair ineffective LocalRecoveryITCase

a9ec8dd

The test did not actually run since the class was refactored with JUnit's parameterized, because it was always running into a NPE and the NPE was then silently swallowed in a shutdown catch-block.

[FLINK-11867][datastream] Fixed preconditions for filePath's value

a933813

[FLINK-11420][datastream] Fix duplicate and createInstance methods of…

f04d2cb

… CoGroupedStreams.UnionSerializer UnionSerializer did not perform a proper duplication of inner serializers. It also violated the assumption that createInstance never produces null.

[FLINK-11886][docs] Update CLI output

ad1e9fe

[FLINK-11866][py][docs] Fix method name

c3488dd

[FLINK-11183][metrics] Move memory metrics creation into separate method

fcb4f6a

[FLINK-11183][metrics] Properly measure current memory usage

e10a98f

[hotfix][rest] Remove "Impl. error" from log message

17aaca4

Original intent was to never reach this code path except on programmer errors, but it has turned into an accepted code path for unhandled exceptions.

[FLINK-11902][rest] Do not wrap all exceptions in RestHandlerException

a35bfef

[FLINK-11786][travis] Simplify stage selection

5ef14a1

[FLINK-11786][travis] Run main profile only on pr/push

a933a7c

[FLINK-11786][travis] Merge cron jobs

17509d8

[FLINK-11786][travis] Setup notifications

5ba2c3f

[hotfix][travis] Fix directory references

34b4da1

[FLINK-11887][metrics] Fix latency drift

c655c3b

[hotfix][travis] Fix e2e setup issues

8321f64

[FLINK-11984][docs] MPU timeout implications on StreamingFileSink.

11c3e6a

[hotfix] Increase startup timeout in end-to-end tests

565c49a

We've seen quite some flakyness in the end-to-end tests on Travis lately. On most tests it takes about 5-8 secs for the dispatcher to come up so 10 secs might be to low.

[hotfix] Fix checkstyle violations in ExecutionGraphDeploymentTest

459965f

[FLINK-12020] [docs] Add documentation for mesos-appmaster-job.sh

4b784b2

1. Rename the section title from `standalone` to `Flink session cluster on Mesos`. 2. Add one new section `Flink job cluster on Mesos` which describes the usage of mesos-appmaster-job.sh. This closes #8084.

[FLINK-12075] Update flink-conf.yaml to not specify rest.port per def…

b75463a

…ault This commits updates the flink-conf.yaml to contain the new rest options and comments out the rest.port per default.

rmetzger added the review=description? label Apr 6, 2019

zentol closed this Apr 6, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 1.7 #8118

Release 1.7 #8118

Uh oh!

yangsongbai commented Apr 6, 2019

Uh oh!

flinkbot commented Apr 6, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

Release 1.7 #8118

Release 1.7 #8118

Uh oh!

Conversation

yangsongbai commented Apr 6, 2019

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

flinkbot commented Apr 6, 2019

Review Progress

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants