Skip to content

Conversation

@lpn666
Copy link

@lpn666 lpn666 commented Jul 7, 2022

What is the purpose of the change

When I using the sql-jdbc to transform a big table from mysql to other database, the flink program load the entire table into memory. The source table is too big (16GB), and the taskmanager crashed.
So What can I do, or what about add a new option to limit the speed of reading data (or batch the data )

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

HuangXingBo and others added 30 commits December 7, 2021 10:06
Use a dedicated thread to run each jar, so that pooled threads can't keep references to user-code (e.g., in a ThreadLocal).
… TableSourceTable

This closes #18040

Co-authored-by: guanghxu <xuguangheng1995@gmail.com>
…s not the same with query result's changelog upsert key

This closes #18048
…rofile to `reservedAllocations` to avoid confusion
…kable out to be a standalone test class in flink-runtime
…r batch jobs even if local recovery is enabled
…to recover after losing and regaining leadership.
…perLeaderElectionITCase#testJobExecutionOnClusterWithLeaderChange` (FLINK-25235)

This closes #18066.
…ternalProducer if transaction finalization fails

In the KafkaCommitter we retry transactions if they failed during
committing. Since we reuse the KafkaProducers we update the used
transactionalId to continue committing other transactions. To prevent
accidental overwrites we track the transaction state inside the
FlinkKafkaInternalProducer.
Before this change, the state was not reset on a failures during the
transaction finalization and setting a new transactionalId failed.
The state is now always reset nevertheless whether finalizing the
transaction fails (commit, abort).
…explicitly for job to finish.

(cherry picked from commit 7976be0)
while using TableEnvironment in the ITCase, a Flink MiniCluster will be started/stopped automatically in the background. Since the shutdown of the MiniCluster will be called asynchronously, CollectResultFetcher will get data lost sometimes based on race conditions and the unchecked RuntimeException java.lang.IllegalStateException will be thrown that we were not aware of.

The solution is to control the lifecycle of the MiniCluster manually in this test. The MiniClusterWithClientResource could be a good fit in this case.

(cherry picked from commit fca04c3)
…CKPOINTS as default for externalized-checkpoint-retention
This consistency level is only available on write, so we need to create one builder for reading and one for writing. Some sinks are used for both reading and writing, in that case, reading builder is used.

(cherry picked from commit c40bbf1)
We suspect that the NetworkFailureProxy is causing constant connectivity
problems to the brokers during testing resulting in either network
timeouts or corrupted results.
Since the NetworkFailureProxy is only used for testing the deprecated
FlinkKafkaProducer/Consumer we can safely remove it because we will not
add new features to the connectors.
MartijnVisser and others added 28 commits May 11, 2022 14:21
…e avro schema (#19705)

Co-authored-by: Haizhou Zhao <haizhou_zhao@apple.com>
…to-end tests to avoid flooding the disk space
…kages to clean up more diskspace before starting the E2E tests. Also removing the line that removes `^ghc-8.*` since that doesn't exist anymore on the machines.

(cherry picked from commit db6baf4)
… PyFlink Table API jobs in batch mode

This closes #19816.
…verride method 'merge' is used in cases where 'merge' is used

This closes #19817.
…rtitionSplitReader.fetch() to handle no valid partition case

This closes #19979.
…onsumer invocations in split assignment

This closes #19982.
…ttl enabled in RetractableTopNFunction

This closes #19997
…an continue to use Kubernetes 1.24+ and the `none` driver, since Kubernetes 1.24 has dropped support for Dockershim.
… DataStream and SQL connector (#19904)

(cherry picked from commit 5d564b1)
@lpn666
Copy link
Author

lpn666 commented Jul 7, 2022

What is the purpose of the change

When I using the sql-jdbc to transform a big table from mysql to other database, the flink program load the entire table into memory. The source table is too big (16GB), and the taskmanager crashed. So What can I do, or what about add a new option to limit the speed of reading data (or batch the data )

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

@lpn666 lpn666 closed this Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.