[FLINK-29749][client] Make 'flink info' command could support dynamic properties. #21165

liuyongvs · 2022-10-26T12:32:46Z

What is the purpose of the change

Brief change log

(for example:)

The TaskInfo is stored in the blob store on job creation time as a persistent artifact
Deployments RPC transmits only the blob storage reference
TaskManagers retrieve the TaskInfo from the blob cache

Verifying this change

Please make sure both new and modified tests in this PR follows the conventions defined in our code quality guide: https://flink.apache.org/contributing/code-style-and-quality-common.html#testing

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

Added integration tests for end-to-end deployment with large payloads (100MB)
Extended integration test for recovery after master (JobManager) failure
Added test that validates that TaskInfo is transferred only once across recoveries
Manually verified the change by running a 4 node cluster with 2 JobManagers and 4 TaskManagers, a stateful streaming program, and killing one JobManager and two TaskManagers during the execution, verifying that recovery happens correctly.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
The serializers: (yes / no / don't know)
The runtime per-record code paths (performance sensitive): (yes / no / don't know)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: (yes / no / don't know)
The S3 file system connector: (yes / no / don't know)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

Need to make sure the job manager runner is complete, because the test runner does not implement required methods to query job details.

… connector

…search7 connector

… history

…ation Development > Table API & SQL" to Chinese. This closes apache#19401 Co-authored-by: Roc Marshal <flinker@126.com>

…cal sort is incorrect if adaptive batch scheduler is enabled (cherry picked from commit 456ceb2) This closes apache#19497

…inese

…binary/character string Revert behaviour of binary to string casts and vice versa, to not use hex enconding/decoding, but simple UTF-8 bytes transformation from a byte[] to a string and vice versa. (cherry picked from commit 4cdafff)

… `x'ab3234f0'` format Add an `isPrinting()` method to the `CastRule.Context` set to true used by `RowDataToStringConverterImpl` which defines a different casting behaviour for `BinaryToStringCastRule` when printing binary columns, so that we output columns of binary type as (for example): `x'ab03f98e'`, which can easily be copy pasted as a binary literal to another SQL query. (cherry picked from commit 75007f6) This closes apache#19516.

… name in DDL This closes apache#19490.

…Vulnerability. This closes apache#19479 Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>

…ent target

… compatible with legacy behavior This closes apache#19470.

…n Python DataStream API

… table source This closes apache#19551.

…rState may have not been updated when sechema changes

…eachableCoalesceArgumentsRule (cherry picked from commit 2e632af) This closes apache#19566

…ing of ExecutionGraphInfo finishes

…behavior This closes apache#19574

…Scheduler We change the signature of JobVertex#initializeOnMaster/finalizeOnMaster to pass a Context object. In this context we can pass the actual parallelism the vertex will be run with.

… thread mode

… FanOutRecordPublisher

Enum fields have naming restrictions in some languages.

…otation to JUnit5 annotation for disabling tests on Java 11

…l table with create-time mode This closes apache#20437.

…ointCommittableManagerImpl during deserialization. When we recover the `CheckpointCommittableManager` we were ignoring the subtaskId it is recovered on. This becomes a problem when a sink uses a post-commit topology because multiple committer operators might forward committable summaries coming from the same subtaskId. This ticket implements a fix to use the subtaskId already present in the CommittableCollectorSerializer when recreating CheckpointCommittableManagerImpl during recovery. Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

…red (failed) readers This closes apache#21038

…bleWithLinages with the CommittableSummary Before this change during recovery for the CommitableCollector we initialized the SubtaskCommittableManager always with the initial checkpoint id (1) but the holding CheckpointCommittable with the checkpoint id in state. This could lead to that the emitted CommittableWithLinages update SubtaskCommittableManagers that they do not belong to cause "Unknown subtask for <id>" failures.

…rray This closes apache#21016.

…utionConfig when creating serializer

…ng Table to DataStream This closes apache#21086.

…e test times out again.

Co-authored-by: Yufan Sheng <yufan@streamnative.io>

…on (apache#21100) This cherry-picks apache#21069. Co-authored-by: Yufan Sheng <yufan@streamnative.io>

… 2.13.4.

… 2.13.4.2

… more than 1 committable. Recovery more than one Committable causes `IllegalStateException` and prevents cluster to start. When we recover the `CheckpointCommittableManager` we deserialize SubtaskCommittableManager instances from recovery state, and we put them into `Map<Integer, SubtaskCommittableManager<CommT>>`. The key of this map is subtaskId of the recovered manager. However, this will fail if we have to recover more than one committable. What was implemented as a fix is to call `SubtaskCommittableManager::merge` if we already deserialize manager for this subtaskId. Signed-off-by: Krzysztof Chmielewski <krzysiek.chmielewski@gmail.com>

… depdendency This closes apache#21110.

…tasks to avoid wasting resources This closes apache#21132.

…ythonOptionsTest This closes apache#21157.

… properties.

zentol and others added 30 commits April 14, 2022 16:36

[hotfix][tests] Allow retrieval of termination future for running jobs

5f4178b

[hotfix][tests] Wait for JobManagerRunner termination

df874ad

Need to make sure the job manager runner is complete, because the test runner does not implement required methods to query job details.

[FLINK-27140][coordination] Write job result in ioExecutor

2d5bebb

[FLINK-27231][licence] Fix the SQL Pulsar licence issue

9d996eb

[FLINK-27230][licence] Remove the unused licence entries from Kinesis…

1ebfe85

… connector

[FLINK-27233][licence] Remove the unused licence entries from Elastic…

15d409b

…search7 connector

[FLINK-27222][coordination] Decouple last (al)location from execution…

4727a20

… history

[FLINK-25716][docs-zh] Translate "Streaming Concepts" page of "Applic…

c1ff584

…ation Development > Table API & SQL" to Chinese. This closes apache#19401 Co-authored-by: Roc Marshal <flinker@126.com>

[FLINK-27272][table-planner] Fix the incorrect plan for query with lo…

1556c3c

…cal sort is incorrect if adaptive batch scheduler is enabled (cherry picked from commit 456ceb2) This closes apache#19497

[FLINK-25867][docs-zh] translate ChangelogBackend documentation to ch…

0965060

…inese

[hotfix][docs] fix anchor mistask in changelog monitoring

2606b25

[hotfix][docs-zh] fix missing link tag in State Backends document

40108d0

[hotfix][docs-zh] fix title level in State Backends document

98c3817

[FLINK-27229][cassandra][build] Remove test netty dependency

b62a39e

[FLINK-27263][table] Rename the metadata column to the user specified…

5e9ccf1

… name in DDL This closes apache#19490.

[FLINK-25694][Filesystem][S3] Upgrade Presto to resolve GSON/Alluxio …

229c5f0

…Vulnerability. This closes apache#19479 Signed-off-by: David N Perkins <David.N.Perkins@ibm.com>

[FLINK-27319] Duplicated '-t' option for savepoint format and deploym…

bacce4e

…ent target

[FLINK-27287][tests] Migrate tests to MiniClusterResource

141671b

[FLINK-27287][tests] Use random ports

4fad121

[FLINK-27315][docs] Fix the demo of MemoryStateBackendMigration

eb0f5c3

[FLINK-27247][table-planner] ScalarOperatorGens.numericCasting is not…

b9727d5

… compatible with legacy behavior This closes apache#19470.

[hotfix][docs] Misprint in types.md

88d440e

[examples][python] Add examples on how to use json/csv/avro formats i…

fa8fbf5

…n Python DataStream API

[FLINK-22984][python] Don't pushdown Calc containing Python UDFs into…

703b10c

… table source This closes apache#19551.

[FLINK-27218] fix the problem that the internal Serializer in Operato…

3d4b3a4

…rState may have not been updated when sechema changes

[FLINK-27369][table-planner] Fix the type mismatch error in RemoveUnr…

a75cef3

…eachableCoalesceArgumentsRule (cherry picked from commit 2e632af) This closes apache#19566

[FLINK-24491][runtime] Make the job termination wait until the archiv…

12db772

…ing of ExecutionGraphInfo finishes

[FLINK-27367][table-planner] Support cast int to date in legacy cast …

3b790d1

…behavior This closes apache#19574

dawidwys and others added 28 commits October 6, 2022 17:07

[FLINK-29500] InitializeOnMaster uses wrong parallelism with Adaptive…

74bc6d2

…Scheduler We change the signature of JobVertex#initializeOnMaster/finalizeOnMaster to pass a Context object. In this context we can pass the actual parallelism the vertex will be run with.

[hotfix] Fix broken link

ea4f9aa

[FLINK-29483][python] Fix vectorized python scalar function failed in…

5d748f3

… thread mode

[FLINK-29395][Connector/Kinesis] Handle empty deaggregated records in…

ae20e52

… FanOutRecordPublisher

[FLINK-29504][rest] Add schema to jar upload content

383d451

[FLINK-29503][rest] Add backpressureLevel field

5ad5618

Enum fields have naming restrictions in some languages.

[FLINK-22243] Remove adaptive scheduler Web UI limitation from docs

d5921dd

[FLINK-29495][Connector/Pulsar] Refactor Pulsar tests from JUnit4 ann…

aa78e3f

…otation to JUnit5 annotation for disabling tests on Java 11

[FLINK-27384][hive] Fix the modified partitions are missed in tempora…

bcff5af

…l table with create-time mode This closes apache#20437.

[FLINK-26726][hive]Hive enumerators do not assign splits to unregiste…

a826fe8

…red (failed) readers This closes apache#21038

[FLINK-29477][python] Fix ClassCastException when collect primitive a…

507b93e

…rray This closes apache#21016.

[FLINK-29645] BatchExecutionKeyedStateBackend is using incorrect Exec…

f19f032

…utionConfig when creating serializer

[FLINK-29658][python] Fix the missing LocalTime support when converti…

5e4da2c

…ng Table to DataStream This closes apache#21086.

[FLINK-25554][test] Remove timeout to enable a thread dump in case th…

8181abe

…e test times out again.

Add jaxb-api back to pulsar-client-all dependencies. (apache#21092)

962b6e0

Co-authored-by: Yufan Sheng <yufan@streamnative.io>

[BP-1.15][FLINK-29613][Connector/Pulsar] Fix wrong batch size asserti…

f44ffe5

…on (apache#21100) This cherry-picks apache#21069. Co-authored-by: Yufan Sheng <yufan@streamnative.io>

[FLINK-29468][connectors][filesystems][formats] Update Jackson-BOM to…

40b3aab

… 2.13.4.

[hotfix] Extract Jackson BOM version into a property

7712d4b

[FLINK-29638][connectors][filesystems][formats] Update Jackson-BOM to…

c500e97

… 2.13.4.2

[FLINK-29479][python] Fix system env cause conflict with users python…

91ccde9

… depdendency This closes apache#21110.

[hotfix][python] Fix the compile error in PythonOptionsTest

2fa7114

[hotfix][tests] Migrate MetricFetcherTest to JUnit5

723f6a2

[FLINK-29134][metrics] Do not repeatedly add useless metric updating …

eeda260

…tasks to avoid wasting resources This closes apache#21132.

[FLINK-29479][python][hotfix] Fix the testPythonSystemEnvEnabled in P…

f84d039

…ythonOptionsTest This closes apache#21157.

[FLINK-29749][client] Make 'flink info' command could support dynamic…

7a5e3b0

… properties.

liuyongvs closed this Oct 26, 2022

flinkbot added the component=<none> label Oct 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-29749][client] Make 'flink info' command could support dynamic properties. #21165

[FLINK-29749][client] Make 'flink info' command could support dynamic properties. #21165

Uh oh!

liuyongvs commented Oct 26, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

[FLINK-29749][client] Make 'flink info' command could support dynamic properties. #21165

[FLINK-29749][client] Make 'flink info' command could support dynamic properties. #21165

Uh oh!

Conversation

liuyongvs commented Oct 26, 2022

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants