From 7a8b8e49c0a8838647373ed23d545d87df693ace Mon Sep 17 00:00:00 2001 From: Hang Ruan Date: Tue, 25 Nov 2025 15:23:42 +0800 Subject: [PATCH 1/2] Add Flink 2.2.0 release --- docs/config.toml | 4 +- .../content/posts/2025-11-30-release-2.2.0.md | 312 ++++++++++++++++++ docs/data/flink.yml | 9 + docs/data/release_archive.yml | 4 + 4 files changed, 327 insertions(+), 2 deletions(-) create mode 100644 docs/content/posts/2025-11-30-release-2.2.0.md diff --git a/docs/config.toml b/docs/config.toml index c609b7be5b..52ffcabde7 100644 --- a/docs/config.toml +++ b/docs/config.toml @@ -43,8 +43,8 @@ posts = "/:year/:month/:day/:title/" # 4. Copy the invitation link by clicking on "Copy invite link". FlinkSlackInviteUrl = "https://join.slack.com/t/apache-flink/shared_invite/zt-354yyl9vm-LJBlnpbcU~8K_TyIWUhfag" - FlinkStableVersion = "2.1.1" - FlinkStableShortVersion = "2.1" + FlinkStableVersion = "2.2.0" + FlinkStableShortVersion = "2.2" FlinkLTSShortVersion = "1.20" StateFunStableVersion = "3.3.0" StateFunStableShortVersion = "3.3" diff --git a/docs/content/posts/2025-11-30-release-2.2.0.md b/docs/content/posts/2025-11-30-release-2.2.0.md new file mode 100644 index 0000000000..efe3142967 --- /dev/null +++ b/docs/content/posts/2025-11-30-release-2.2.0.md @@ -0,0 +1,312 @@ +--- +title: "Apache Flink 2.2.0: Advancing Real-Time Data + AI and Empowering Stream Processing for the AI Era" +date: "2025-11-30T00:00:00.000Z" +aliases: +- /news/2025/11/30/release-2.2.0.html +authors: +- Hang: + name: "Hang Ruan" + +--- + +The Apache Flink PMC is proud to announce the release of Apache Flink 2.2.0. Flink 2.2.0 further enriches AI capabilities, enhances materialized tables and the Connector framework, and improves batch processing and PyFlink support. This release brings together 73 global contributors, implements 9 FLIPs (Flink Improvement Proposals), and resolves over 220 issues. We extend our gratitude to all contributors for their invaluable support! + +Let's dive into the highlights. + +# Flink SQL Improvements + +## Realtime AI Function +Apache Flink has supported leveraging LLM capabilities through the `ML_PREDICT` function in Flink SQL +since version 2.1, enabling users to perform semantic analysis in a simple and efficient way. In Flink + 2.2, the Table API also supports model inference operations that allow you to integrate machine learning +models directly into your data processing pipelines. You can create models with specific providers (like + OpenAI) and use them to make predictions on your data. + +Example: +- Creating and Using Models +```java +// 1. Set up the local environment +EnvironmentSettings settings = EnvironmentSettings.inStreamingMode(); +TableEnvironment tEnv = TableEnvironment.create(settings); + +// 2. Create a source table from in-memory data +Table myTable = tEnv.fromValues( + ROW(FIELD("text", STRING())), + row("Hello"), + row("Machine Learning"), + row("Good morning") +); + +// 3. Create model +tEnv.createModel( + "my_model", + ModelDescriptor.forProvider("openai") + .inputSchema(Schema.newBuilder().column("input", STRING()).build()) + .outputSchema(Schema.newBuilder().column("output", STRING()).build()) + .option("endpoint", "https://api.openai.com/v1/chat/completions") + .option("model", "gpt-4.1") + .option("system-prompt", "translate to chinese") + .option("api-key", "") + .build() +); + +Model model = tEnv.fromModel("my_model"); + +// 4. Use the model to make predictions +Table predictResult = model.predict(myTable, ColumnList.of("text")); + +// 5. Async prediction example +Table asyncPredictResult = model.predict( + myTable, + ColumnList.of("text"), + Map.of("async", "true") +); +``` + +**More Information** +* [FLINK-38104](https://issues.apache.org/jira/browse/FLINK-38104) +* [FLIP-526](https://cwiki.apache.org/confluence/display/FLINK/FLIP-526%3A+Model+ML_PREDICT%2C+ML_EVALUATE+Table+API) + +## Vector Search + +Apache Flink has supported leveraging LLM capabilities through the `ML_PREDICT` function in Flink SQL +since version 2.1. This integration has been technically validated in scenarios such as log classification +and real-time question-answering systems. However, the current architecture allows Flink to only use +embedding models to convert unstructured data (e.g., text, images) into high-dimensional vector features, +which are then persisted to downstream storage systems. It lacks real-time online querying and similarity +analysis capabilities for vector spaces. The VECTOR_SEARCH function is provided in Flink 2.2 to enable users +to perform streaming vector similarity searches and real-time context retrieval +directly within Flink. + +Take the following SQL statements as an example: +```sql +-- Basic usage +SELECT * FROM +input_table, LATERAL TABLE(VECTOR_SEARCH( + TABLE vector_table, + input_table.vector_column, + DESCRIPTOR(index_column), + 10 +)); + +-- With configuration options +SELECT * FROM +input_table, LATERAL TABLE(VECTOR_SEARCH( + TABLE vector_table, + input_table.vector_column, + DESCRIPTOR(index_column), + 10, + MAP['async', 'true', 'timeout', '100s'] +)); + +-- Using named parameters +SELECT * FROM +input_table, LATERAL TABLE(VECTOR_SEARCH( + SEARCH_TABLE => TABLE vector_table, + COLUMN_TO_QUERY => input_table.vector_column, + COLUMN_TO_SEARCH => DESCRIPTOR(index_column), + TOP_K => 10, + CONFIG => MAP['async', 'true', 'timeout', '100s'] +)); + +-- Searching with contant value +SELECT * +FROM TABLE(VECTOR_SEARCH( + TABLE vector_table, + ARRAY[10, 20], + DESCRIPTOR(index_column), + 10, +)); +``` + +**More Information** +* [FLINK-38422](https://issues.apache.org/jira/browse/FLINK-38422) +* [FLIP-540](https://cwiki.apache.org/confluence/display/FLINK/FLIP-540%3A+Support+VECTOR_SEARCH+in+Flink+SQL) +* [Vector Search](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/sql/queries/vector-search/) + +## Materialized Table + +Materialized Table is a new table type introduced in Flink SQL, aimed at simplifying both batch and +stream data pipelines, providing a consistent development experience. By specifying data freshness +and query when creating Materialized Table, the engine automatically derives the schema for the +materialized table and creates corresponding data refresh pipeline to achieve the specified freshness. + +From Flink 2.2, the FRESHNESS clause is not a mandatory part of the CREATE MATERIALIZED TABLE and +CREATE OR ALTER MATERIALIZED TABLE DDL statements. Flink 2.2 introduces a new MaterializedTableEnricher +interface. This provides a formal extension point for customizable default logic, allowing advanced +users and vendors to implement "smart" default behaviors (e.g., inferring freshness from upstream tables). + +Besides this, users can use `DISTRIBUTED INTO` or`DISTRIBUTED INTO` to support bucketing concept +for Materialized tables. Users can use `SHOW MATERIALIZED TABLES` to show all Materialized tables. + +**More Information** +* [FLINK-38532](https://issues.apache.org/jira/browse/FLINK-38532) +* [FLINK-38311](https://issues.apache.org/jira/browse/FLINK-38311) +* [FLIP-542](https://cwiki.apache.org/confluence/display/FLINK/FLIP-542%3A+Make+materialized+table+DDL+consistent+with+regular+tables) +* [FLIP-551](https://cwiki.apache.org/confluence/display/FLINK/FLIP-551%3A+Make+FRESHNESS+Optional+for+Materialized+Tables) + +## SinkUpsertMaterializer V2 + +SinkUpsertMaterializer is an operator in Flink that reconciles out of order changelog events before +sending them to an upsert sink. Performance of this operator degrades exponentially in some cases. +Flink 2.2 introduces a new implementation that is optimized for such cases. + +**More Information** +* [FLINK-38459](https://issues.apache.org/jira/browse/FLINK-38459) +* [FLIP-544](https://cwiki.apache.org/confluence/display/FLINK/FLIP-544%3A+SinkUpsertMaterializer+V2) + +## Delta Join + +In 2.1, Apache Flink has introduced a new delta join operator to mitigate the challenges caused by big state in regular joins. It replaces the large state maintained by regular joins with a bidirectional lookup-based join that directly reuses data from the source tables. + +Flink 2.2 enhances support for converting more SQL patterns into delta joins. Delta joins now support consuming CDC sources without DELETE operations, and allow projection and filter operations after the source. Additionally, delta joins include support for caching, which helps reduce requests to external storage. + +**More Information** +* [Delta Joins](https://nightlies.apache.org/flink/flink-docs-release-2.2/docs/dev/table/tuning/#delta-joins) + +# Runtime +## Balanced Tasks Scheduling + +In Flink 2.1, we introduced a pluggable batching mechanism for async sink that allows users to define custom +batching write strategies tailored to specific requirements. + +**More Information** +* [FLINK-31757](https://issues.apache.org/jira/browse/FLINK-31757) +* [FLIP-370](https://cwiki.apache.org/confluence/display/FLINK/FLIP-370%3A+Support+Balanced+Tasks+Scheduling) + +## Enhanced Job History Retention Policies for HistoryServer + +Before Flink 2.2, HistoryServer supports only a quantity-based job archive retention policy and +is insufficient for scenarios, requiring time-based retention or combined rules. Users can use +the new configuration `historyserver.archive.retained-ttl` combining with `historyserver.archive.retained-jobs` +to fulfill more scenario requirements. + +**More Information** +* [FLINK-38229](https://issues.apache.org/jira/browse/FLINK-38229) +* [FLIP-490](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=332499857) + +# Connectors +## Introduce RateLimiter for Scan Source + +Flink jobs frequently exchange data with external systems, which consumes their network bandwidth +and CPU. When these resources are scarce, pulling data too aggressively can disrupt other workloads. +In Flink 2.2, we introduce a RateLimiter interface to provide request rate limiting for Scan Sources +and connector developers can integrate with rate limiting frameworks to implement their own read +restriction strategies. This feature is currently only available in the DataStream API. + +**More Information** +* [FLINK-38497](https://issues.apache.org/jira/browse/FLINK-38497) +* [FLIP-535](https://cwiki.apache.org/confluence/display/FLINK/FLIP-535%3A+Introduce+RateLimiter+to+Source) + +## Balanced splits assignment + +SplitEnumerator is responsible for assigning splits, but it lacks visibility into the actual runtime +status or distribution of these splits. This makes it impossible for SplitEnumerator to guarantee +that the sharding is evenly distributed, and data skew is very likely to occur. From Flink 2.2, +SplitEnumerator has the information of the splits distribution and provides the ability to evenly +assign splits at runtime. + +**More Information** +* [FLINK-38564](https://issues.apache.org/jira/browse/FLINK-38564) +* [FLIP-537](https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=373886480) + +# Others Improvements + +## PyFlink + +In Flink 2.2, we have added support of async function in Python DataStream API. This enables Python users to efficiently query external services in their Flink jobs, e.g. large-sized LLM which is typically deployed in a standalone GPU cluster, etc. + +Furthermore, we have provided comprehensive support to ensure the stability of external service access. On one hand, we support limiting the number of concurrent requests sent to the external service to avoid overwhelming it. On the other hand, we have also added retry support to tolerate temporary unavailability which maybe caused by network jitter or other transient issues. + +Here is a simple example showing how to use it: +```python +from typing import List +from ollama import AsyncClient + +from pyflink.common import Types, Time, Row +from pyflink.datastream import ( + StreamExecutionEnvironment, + AsyncDataStream, + AsyncFunction, + RuntimeContext, + CheckpointingMode, +) + + +class AsyncLLMRequest(AsyncFunction[Row, str]): + + def __init__(self, host, port) + self._host = host + self._port = port + + def open(self, runtime_context: RuntimeContext): + self._client = AsyncClient(host='{}:{}'.format(self._host, self._port)) + + async def async_invoke(self, value: Row) -> List[str]: + message = {"role": "user", "content": value.question} + question_id = value.id + ollam_response = await self._client.chat(model="qwen3:4b", messages=[message]) + return [ + f"Question ID {question_id}: response: {ollam_response['message']['content']}" + ] + + def timeout(self, value: Row) -> List[str]: + # return a default value in case timeout + return [f"Timeout for this question: {value.a}"] + + +def main(output_path): + env = StreamExecutionEnvironment.get_execution_environment() + env.enable_checkpointing(30000, CheckpointingMode.EXACTLY_ONCE) + ds = env.from_collection( + [ + ("Who are you?", 0), + ("Tell me a joke", 1), + ("Tell me the result of comparing 0.8 and 0.11", 2), + ], + type_info=Types.ROW_NAMED(["question", "id"], [Types.STRING(), Types.INT()]), + ) + + result_stream = AsyncDataStream.unordered_wait( + data_stream=ds, + async_function=AsyncLLMRequest(), + timeout=Time.seconds(100), + capacity=1000, + output_type=Types.STRING(), + ) + + # define the sink + result_stream.print() + + # submit for execution + env.execute() + + +if __name__ == "__main__": + main(known_args.output) +``` + +**More Information** +* [FLINK-38190](https://issues.apache.org/jira/browse/FLINK-38190) + +## Upgrade commons-lang3 to version 3.18.0 + +Upgrade commons-lang3 from 3.12.0 to 3.18.0 to mitigate CVE-2025-48924. + +**More Information** +* [FLINK-38193](https://issues.apache.org/jira/browse/FLINK-38193) + +# Upgrade Notes + +The Flink community tries to ensure that upgrades are as seamless as possible. +However, certain changes may require users to make adjustments to certain parts +of the program when upgrading to version 2.2. Please refer to the +[release notes](https://nightlies.apache.org/flink/flink-docs-release-2.2/release-notes/flink-2.2/) +for a comprehensive list of adjustments to make and issues to check during the +upgrading process. + +# List of Contributors + +The Apache Flink community would like to express gratitude to all the contributors who made this release possible: + +Alan Sheinberg, Aleksandr Iushmanov, AlexYinHan, Arvid Heise, CuiYanxiang, David Hotham, David Radley, Dawid Wysakowicz, Dian Fu, Etienne Chauchot, Ferenc Csaky, Gabor Somogyi, Gustavo de Morais, Hang Ruan, Hao Li, Hongshun Wang, Jackeyzhe, Jakub Stejskal, Jiaan Geng, Jinkun Liu, Juntao Zhang, Kaiqi Dong, Khaled Hammouda, Kumar Mallikarjuna, Kunni, Laffery, Mario Petruccelli, Martijn Visser, Mate Czagany, Maximilian Michels, Mika Naylor, Mingliang Liu, Myracle, Naci Simsek, Natea Eshetu Beshada, Niharika Sakuru, Pan Yuepeng, Piotr Nowojski, Poorvank,Ramin Gharib, Roc Marshal, Roman Khachatryan, Ron, Rosa Kang, Rui Fan, Sergey Nuyanzin, Shengkai, Stefan Richter, Stepan Stepanishchev, Swapnil Aher, Timo Walther, Xingcan Cui, Xuyang, Yuepeng Pan, Yunfeng Zhou, Zakelly, Zhanghao Chen, dylanhz, gong-flying, hejufang, lincoln lee, lincoln-lil, mateczagany, morvenhuang, noorall, r-sidd, sxnan, voonhous, xia rui, xiangyu0xf, yangli1206, yunfengzhou-hub, zhou diff --git a/docs/data/flink.yml b/docs/data/flink.yml index 18ec7ff7aa..ad360bd6fc 100644 --- a/docs/data/flink.yml +++ b/docs/data/flink.yml @@ -15,6 +15,15 @@ # specific language governing permissions and limitations # under the License +2.2: + name: "Apache Flink 2.2.0" + binary_release_url: "https://www.apache.org/dyn/closer.lua/flink/flink-2.2.0/flink-2.2.0-bin-scala_2.12.tgz" + binary_release_asc_url: "https://downloads.apache.org/flink/flink-2.2.0/flink-2.2.0-bin-scala_2.12.tgz.asc" + binary_release_sha512_url: "https://downloads.apache.org/flink/flink-2.2.0/flink-2.2.0-bin-scala_2.12.tgz.sha512" + source_release_url: "https://www.apache.org/dyn/closer.lua/flink/flink-2.2.0/flink-2.2.0-src.tgz" + source_release_asc_url: "https://downloads.apache.org/flink/flink-2.2.0/flink-2.2.0-src.tgz.asc" + source_release_sha512_url: "https://downloads.apache.org/flink/flink-2.2.0/flink-2.2.0-src.tgz.sha512" + 2.1: name: "Apache Flink 2.1.1" binary_release_url: "https://www.apache.org/dyn/closer.lua/flink/flink-2.1.1/flink-2.1.1-bin-scala_2.12.tgz" diff --git a/docs/data/release_archive.yml b/docs/data/release_archive.yml index c56247172f..7c79ac6964 100644 --- a/docs/data/release_archive.yml +++ b/docs/data/release_archive.yml @@ -1,5 +1,9 @@ release_archive: flink: + - + version_short: "2.2" + version_long: 2.2.0 + release_date: 2025-11-30 - version_short: "2.0" version_long: 2.0.1 From 47ae17d2fe715578a094a6b4f9625d9b4402ff40 Mon Sep 17 00:00:00 2001 From: Hang Ruan Date: Tue, 25 Nov 2025 15:44:53 +0800 Subject: [PATCH 2/2] Rebuild website --- .../26/apache-flink-0.6-available/index.html | 4 +- .../apache-flink-0.6.1-available/index.html | 4 +- content/2014/10/03/upcoming-events/index.html | 4 +- .../apache-flink-0.7.0-available/index.html | 4 +- .../hadoop-compatibility-in-flink/index.html | 4 +- .../index.html | 4 +- .../apache-flink-0.8.0-available/index.html | 4 +- .../index.html | 4 +- .../09/introducing-flink-streaming/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../juggling-with-bits-and-bytes/index.html | 4 +- .../index.html | 4 +- .../announcing-apache-flink-0.9.0/index.html | 4 +- .../index.html | 4 +- .../apache-flink-0.9.1-available/index.html | 4 +- .../announcing-flink-forward-2015/index.html | 4 +- .../index.html | 4 +- .../announcing-apache-flink-0.10.0/index.html | 4 +- .../11/27/flink-0.10.1-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../02/11/flink-0.10.2-released/index.html | 4 +- .../announcing-apache-flink-1.0.0/index.html | 4 +- .../04/06/flink-1.0.1-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../04/22/flink-1.0.2-released/index.html | 4 +- .../05/11/flink-1.0.3-released/index.html | 4 +- .../index.html | 4 +- .../announcing-apache-flink-1.1.0/index.html | 4 +- .../08/04/flink-1.1.1-released/index.html | 4 +- .../index.html | 4 +- .../05/apache-flink-1.1.2-released/index.html | 4 +- .../12/apache-flink-1.1.3-released/index.html | 4 +- .../index.html | 4 +- .../21/apache-flink-1.1.4-released/index.html | 4 +- .../announcing-apache-flink-1.2.0/index.html | 4 +- .../23/apache-flink-1.1.5-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../26/apache-flink-1.2.1-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../23/apache-flink-1.3.1-released/index.html | 4 +- .../index.html | 4 +- .../05/apache-flink-1.3.2-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../15/apache-flink-1.4.1-released/index.html | 4 +- .../index.html | 4 +- .../08/apache-flink-1.4.2-released/index.html | 4 +- .../15/apache-flink-1.3.3-released/index.html | 4 +- .../index.html | 4 +- .../12/apache-flink-1.5.1-released/index.html | 4 +- .../31/apache-flink-1.5.2-released/index.html | 4 +- .../index.html | 4 +- .../21/apache-flink-1.5.3-released/index.html | 4 +- .../20/apache-flink-1.5.4-released/index.html | 4 +- .../20/apache-flink-1.6.1-released/index.html | 4 +- .../29/apache-flink-1.5.5-released/index.html | 4 +- .../29/apache-flink-1.6.2-released/index.html | 4 +- .../index.html | 4 +- .../21/apache-flink-1.7.1-released/index.html | 4 +- .../22/apache-flink-1.6.3-released/index.html | 4 +- .../26/apache-flink-1.5.6-released/index.html | 4 +- .../index.html | 4 +- .../15/apache-flink-1.7.2-released/index.html | 4 +- .../index.html | 4 +- .../25/apache-flink-1.6.4-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../02/apache-flink-1.8.1-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../11/apache-flink-1.8.2-released/index.html | 4 +- .../index.html | 4 +- .../18/apache-flink-1.9.1-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../11/apache-flink-1.8.3-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../30/apache-flink-1.9.2-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../flink-community-update-april20/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../24/apache-flink-1.9.3-released/index.html | 4 +- .../index.html | 4 +- .../flink-community-update-may20/index.html | 4 +- .../apache-flink-1.10.1-released/index.html | 4 +- .../index.html | 4 +- .../flink-community-update-june20/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.11.1-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../flink-community-update-july20/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../the-state-of-flink-on-docker/index.html | 4 +- .../apache-flink-1.10.2-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.11.2-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.11.3-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.12.1-released/index.html | 4 +- .../apache-flink-1.10.3-released/index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.12.2-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.12.3-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.12.4-released/index.html | 4 +- .../apache-flink-1.13.1-released/index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.12.5-released/index.html | 4 +- .../apache-flink-1.13.2-released/index.html | 4 +- .../apache-flink-1.11.4-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../apache-flink-1.13.3-released/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../20/pravega-flink-connector-101/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../22/scala-free-in-one-fifteen/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 852 ++++++++ .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- content/404.html | 2 +- content/categories/index.html | 4 +- .../flink-agents-master/index.html | 4 +- .../flink-agents-stable/index.html | 4 +- .../documentation/flink-cdc-master/index.html | 4 +- .../documentation/flink-cdc-stable/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- content/documentation/flink-lts/index.html | 4 +- content/documentation/flink-master/index.html | 4 +- content/documentation/flink-stable/index.html | 8 +- .../index.html | 4 +- .../index.html | 4 +- .../documentation/flinkml-master/index.html | 4 +- .../documentation/flinkml-stable/index.html | 4 +- content/documentation/index.html | 4 +- content/downloads/index.html | 36 +- ...708de728e6be83e0311f12e26ed0236e1d22f2.js} | 571 +++--- ...4ec36cf016aaa908fadbbc122e9ad9252401fc.js} | 2 +- content/en/sitemap.xml | 7 +- content/flink-packages/index.html | 4 +- content/getting-started/index.html | 4 +- .../training-course/index.html | 4 +- .../with-flink-agents/index.html | 4 +- .../getting-started/with-flink-cdc/index.html | 4 +- .../with-flink-kubernetes-operator/index.html | 4 +- .../getting-started/with-flink-ml/index.html | 4 +- .../with-flink-stateful-functions/index.html | 4 +- content/getting-started/with-flink/index.html | 4 +- .../code-style-and-quality-common/index.html | 32 +- .../index.html | 32 +- .../index.html | 32 +- .../code-style-and-quality-java/index.html | 4 +- .../index.html | 32 +- .../index.html | 32 +- .../code-style-and-quality-scala/index.html | 32 +- .../contribute-code/index.html | 4 +- .../contribute-documentation/index.html | 4 +- .../documentation-style-guide/index.html | 4 +- .../how-to-contribute/getting-help/index.html | 4 +- .../improve-website/index.html | 4 +- content/how-to-contribute/index.html | 4 +- content/how-to-contribute/overview/index.html | 4 +- .../reviewing-prs/index.html | 4 +- .../dynamic-iceberg-sink.png | Bin 0 -> 121206 bytes .../multiple-dag-pipeline.png | Bin 0 -> 133243 bytes .../simple-single-kafka-topic-to-iceberg.png | Bin 0 -> 78809 bytes content/index.html | 4 +- content/index.xml | 9 +- content/material/index.html | 4 +- ...2025-10-14-kafka-dynamic-iceberg-sink.html | 10 + content/posts/index.html | 59 +- content/posts/index.xml | 7 + content/posts/page/10/index.html | 71 +- content/posts/page/11/index.html | 62 +- content/posts/page/12/index.html | 61 +- content/posts/page/13/index.html | 65 +- content/posts/page/14/index.html | 65 +- content/posts/page/15/index.html | 61 +- content/posts/page/16/index.html | 66 +- content/posts/page/17/index.html | 70 +- content/posts/page/18/index.html | 64 +- content/posts/page/19/index.html | 59 +- content/posts/page/2/index.html | 60 +- content/posts/page/20/index.html | 58 +- content/posts/page/21/index.html | 70 +- content/posts/page/22/index.html | 68 +- content/posts/page/23/index.html | 54 +- content/posts/page/24/index.html | 53 +- content/posts/page/25/index.html | 57 +- content/posts/page/26/index.html | 63 +- content/posts/page/27/index.html | 1723 +++++++++++++++++ content/posts/page/3/index.html | 73 +- content/posts/page/4/index.html | 74 +- content/posts/page/5/index.html | 61 +- content/posts/page/6/index.html | 61 +- content/posts/page/7/index.html | 63 +- content/posts/page/8/index.html | 68 +- content/posts/page/9/index.html | 75 +- content/sitemap.xml | 2 +- content/tags/index.html | 4 +- content/what-is-flink-ml/index.html | 4 +- content/what-is-flink-table-store/index.html | 4 +- content/what-is-flink/community/index.html | 4 +- .../flink-applications/index.html | 4 +- .../flink-architecture/index.html | 4 +- .../what-is-flink/flink-operations/index.html | 4 +- content/what-is-flink/index.html | 4 +- content/what-is-flink/powered-by/index.html | 4 +- content/what-is-flink/roadmap/index.html | 4 +- content/what-is-flink/security/index.html | 4 +- .../what-is-flink/special-thanks/index.html | 4 +- content/what-is-flink/use-cases/index.html | 4 +- content/what-is-stateful-functions/index.html | 4 +- .../index.html | 4 +- ...3cf9eac9b58a3a9d9711e074b45088935fef5a.js} | 8 +- ...358beefdfebd45202382952d1e75b713d490ed.js} | 2 +- content/zh/404.html | 2 +- content/zh/categories/index.html | 4 +- .../flink-agents-master/index.html | 4 +- .../flink-agents-stable/index.html | 4 +- .../documentation/flink-cdc-master/index.html | 4 +- .../documentation/flink-cdc-stable/index.html | 4 +- .../index.html | 4 +- .../index.html | 4 +- content/zh/documentation/flink-lts/index.html | 4 +- .../zh/documentation/flink-master/index.html | 4 +- .../zh/documentation/flink-stable/index.html | 8 +- .../index.html | 4 +- .../index.html | 4 +- .../documentation/flinkml-master/index.html | 4 +- .../documentation/flinkml-stable/index.html | 4 +- content/zh/documentation/index.html | 4 +- content/zh/downloads/index.html | 34 +- content/zh/flink-packages/index.html | 4 +- content/zh/getting-started/index.html | 4 +- .../training-course/index.html | 4 +- .../with-flink-agents/index.html | 4 +- .../getting-started/with-flink-cdc/index.html | 4 +- .../with-flink-kubernetes-operator/index.html | 4 +- .../getting-started/with-flink-ml/index.html | 4 +- .../with-flink-stateful-functions/index.html | 4 +- .../zh/getting-started/with-flink/index.html | 4 +- .../code-style-and-quality-common/index.html | 32 +- .../index.html | 32 +- .../index.html | 32 +- .../code-style-and-quality-java/index.html | 4 +- .../index.html | 32 +- .../index.html | 32 +- .../code-style-and-quality-scala/index.html | 32 +- .../contribute-code/index.html | 4 +- .../contribute-documentation/index.html | 4 +- .../documentation-style-guide/index.html | 4 +- .../how-to-contribute/getting-help/index.html | 4 +- .../improve-website/index.html | 4 +- content/zh/how-to-contribute/index.html | 4 +- .../zh/how-to-contribute/overview/index.html | 4 +- .../reviewing-prs/index.html | 4 +- content/zh/index.html | 4 +- content/zh/index.xml | 2 +- content/zh/material/index.html | 4 +- content/zh/tags/index.html | 4 +- content/zh/what-is-flink-ml/index.html | 4 +- .../zh/what-is-flink-table-store/index.html | 4 +- content/zh/what-is-flink/community/index.html | 4 +- .../flink-applications/index.html | 4 +- .../flink-architecture/index.html | 4 +- .../what-is-flink/flink-operations/index.html | 4 +- content/zh/what-is-flink/index.html | 4 +- .../zh/what-is-flink/powered-by/index.html | 4 +- content/zh/what-is-flink/roadmap/index.html | 4 +- content/zh/what-is-flink/security/index.html | 4 +- .../what-is-flink/special-thanks/index.html | 4 +- content/zh/what-is-flink/use-cases/index.html | 4 +- .../zh/what-is-stateful-functions/index.html | 4 +- .../index.html | 4 +- 417 files changed, 4764 insertions(+), 1998 deletions(-) create mode 100644 content/2025/10/14/from-stream-to-lakehouse-kafka-ingestion-with-the-flink-dynamic-iceberg-sink/index.html rename content/{en.search-data.min.9569f37cda97a86fe426f11fcc661d63a088fedfed0b7ceea9f69a734bca5b54.js => en.search-data.min.7a6789e629a5854619e721c43a708de728e6be83e0311f12e26ed0236e1d22f2.js} (98%) rename content/{en.search.min.2d5cbf7adccdae7fc84bdb981d1fd004104716e01aadef148ccaff27d940b82b.js => en.search.min.85ed47dbdefd0564d05408fe544ec36cf016aaa908fadbbc122e9ad9252401fc.js} (89%) create mode 100644 content/img/blog/2025-10-14-kafka-dynamic-iceberg-sink/dynamic-iceberg-sink.png create mode 100644 content/img/blog/2025-10-14-kafka-dynamic-iceberg-sink/multiple-dag-pipeline.png create mode 100644 content/img/blog/2025-10-14-kafka-dynamic-iceberg-sink/simple-single-kafka-topic-to-iceberg.png create mode 100644 content/news/2025/10/14/2025-10-14-kafka-dynamic-iceberg-sink.html create mode 100644 content/posts/page/27/index.html rename content/{zh.search-data.min.20d1cccd9455aaa40df8493b6cfbb53fc8809adef7a3d57054295a0c40d734ee.js => zh.search-data.min.8a1dfca9717c8d53dc92e206fc3cf9eac9b58a3a9d9711e074b45088935fef5a.js} (91%) rename content/{zh.search.min.b3f47ab7e189decd7531e4d2d2741caff85e1b194ecb6de42884e91b843ba0ad.js => zh.search.min.63698e7224146c2a657a9c22de358beefdfebd45202382952d1e75b713d490ed.js} (89%) diff --git a/content/2014/08/26/apache-flink-0.6-available/index.html b/content/2014/08/26/apache-flink-0.6-available/index.html index ebc972b993..c7e60687d3 100644 --- a/content/2014/08/26/apache-flink-0.6-available/index.html +++ b/content/2014/08/26/apache-flink-0.6-available/index.html @@ -28,7 +28,7 @@ - + + + + + + + + + + + + + +
+ + +
+ + +
+
+ +
+

+ From Stream to Lakehouse: Kafka Ingestion with the Flink Dynamic Iceberg Sink +

+ + + + October 14, 2025 - + + + + Swapna Marru + + + + +

Ingesting thousands of evolving Kafka topics into a lakehouse often creates complex, brittle pipelines that require constant manual intervention as its write patterns change. But what if your ingestion pipeline could adapt on its own, with zero downtime?

+

Enter the Flink Dynamic Iceberg Sink, a powerful pattern for Apache Flink that allows users to write streaming data into multiple Iceberg tables—dynamically, efficiently, and with full schema evolution support. The sink can create and write to new tables based on instructions within the records themselves. As the schema of incoming records evolves, the dynamic sink automatically evolves the Iceberg table schema in the lakehouse. It can even adapt to changes in the table’s partitioning scheme. The key is that all of this happens in real-time, without a single job restart.

+

In this post, we’ll guide you through building this exact system. We will start by exploring the limitations of traditional, static pipelines and then demonstrate how the dynamic sink pattern provides a robust, scalable solution. We’ll focus on a common use case: ingesting Kafka data with dynamic Avro schemas sourced from a Confluent Schema Registry. By the end, you’ll have a blueprint for building a scalable, self-adapting ingestion layer that eliminates operational toil and truly bridges your streams and your lakehouse.

+

+ The Building Block: A Simple Kafka-to-Iceberg Pipeline + # +

+

Let’s start with the basics. Our goal is to get data from a single Kafka topic into a corresponding Iceberg table.

+
+ +
+ +

A standard Flink job for this task consists of three main components.

+
    +
  1. +

    Kafka Source: Connects to Kafka, subscribes to the topic, and distributes records for parallel processing.

    +
  2. +
  3. +

    A RowData Converter: This component is responsible for schema-aware processing. It takes the raw byte[] records from Kafka, deserializes them, and transforms them into Flink’s internal RowData format. The logic here can vary:

    +
      +
    • Specific Logic: The converter could contain custom deserialization logic tailored to a single, known topic schema.
    • +
    • Generic Logic: More powerful, generic pipelines use a schema registry. For formats like Avro or Protobuf, the converter fetches the correct writer’s schema from the registry (using an ID from the message header or payload). It then deserializes the bytes into a generic structure like Avro’s GenericRecord or Protobuf’s DynamicMessage.
    • +
    +

    This post will focus on the generic pipeline approach, which is designed for evolving data environments. Even in this generic deserialization approach, a standard job then maps these records to a static RowData schema that is hardcoded or configured in the Flink job. This static schema must correspond to the target Iceberg table’s schema.

    +
  4. +
  5. +

    Iceberg Sink: Writes the stream of RowData to a specific Iceberg table, managing transactions and ensuring exactly-once semantics.

    +
  6. +
+

This setup is simple, robust, and works perfectly for a single topic with a stable schema.

+

+ Scaling Up: The Naive Approach + # +

+

Now, what if we have thousands of topics? The logical next step is to create a dedicated processing graph (or DAG) for each topic-to-table mapping within a single Flink application.

+
+ +
+

This looks good, but this static architecture cannot adapt to the changes: an Iceberg sink can only write to one predefined table, the table must exist beforehand, and its schema is fixed for the lifetime of the job.

+

+ Scaling Up: Problems Ahead + # +

+

This static model becomes an operational bottleneck when faced with real-world scenarios.

+

Scenario 1: Schema Evolution +When a producer adds a new field to an event schema, the running Flink job, which was configured with the old schema, cannot process the new field. It continues to write the events with the older schema. +This requires a manual update to the job’s code or configuration, followed by a job restart.

+

The target Iceberg table’s schema must also be updated before the job is restarted. Once restarted, the Flink job’s Iceberg sink will use the new table schema, allowing it to write events correctly from both the old and new schemas.

+

Scenario 2: A New Topic Appears +A new microservice starts producing events. The Flink job has no idea this topic exists. You must add a new DAG to the job code and restart the application.

+

Scenario 3: Dynamic Routing from a Single Topic +A single Kafka topic contains multiple event types that need to be routed to different Iceberg tables. A static sink, hard-wired to one table, can’t do this.

+

All these scenarios require complex workarounds and a way to automatically restart the application whenever something changes.

+ +

Here’s the new architecture:

+
+ +
+

This single, unified pipeline can ingest from any number of topics and write to any number of tables, automatically handling new topics and schema changes without restarts.

+

+ A Look at the Implementation + # +

+

Let’s dive into the key components that make this dynamic behavior possible.

+
+ Step 1: Preserving Kafka Metadata with KafkaRecord + # +
+

To make dynamic decisions, our pipeline needs metadata. For example, to use the topic name as the table name, we need access to the topic! Standard deserializers often discard this, returning only the deserialized payload.

+

To solve this, we first pass the raw Kafka ConsumerRecord through a lightweight wrapper. This wrapper converts it into a simple POJO, KafkaRecord, that preserves all the essential metadata for downstream processing.

+

Here is the structure of our KafkaRecord class:

+
public class KafkaRecord {
+    public final String topic;
+    public final byte[] key;
+    public final byte[] value;
+    public final Headers headers;
+    ....
+}
+

Now, every record flowing through our Flink pipeline is a KafkaRecord object, giving our converter access to the topic for table routing and the value (the raw byte payload) for schema discovery.

+
+ Step 2: The KafkaRecordToDynamicRecordGenerator + # +
+

It takes a KafkaRecord and performs the “late binding” of schema and table information. For each message, it:

+
    +
  1. Extracts the schema ID from the value byte array (using the Confluent wire format).
  2. +
  3. Fetches the writer’s schema from Schema Registry.
  4. +
  5. Deserializes the Avro payload and converts it into Flink’s RowData.
  6. +
  7. Bundles everything into a DynamicRecord, which contains: +
      +
    • The TableIdentifier (created from kafkaRecord.topic).
    • +
    • The org.apache.iceberg.Schema (converted from the Avro schema).
    • +
    • The PartitionSpec (e.g., based on a timestamp field).
    • +
    • The RowData payload itself.
    • +
    +
  8. +
+
public class KafkaRecordToDynamicRecordGenerator implements DynamicRecordGenerator<KafkaRecord> {
+    @Override
+    public DynamicRecord convert(KafkaRecord kafkaRecord) throws Exception {
+        // 1. Get schema ID from kafkaRecord.value and fetch schema
+        int schemaId = getSchemaId(kafkaRecord.value);
+        Schema writerSchema = schemaRegistryClient.getById(schemaId);
+
+        // 2. Deserialize and convert to RowData
+        GenericRecord genericRecord = deserialize(payload);
+        RowData rowData = avroToRowDataMapper.map(genericRecord);
+
+        // 3. Dynamically create table info from the KafkaRecord
+        TableIdentifier tableId = TableIdentifier.of(kafkaRecord.topic);
+        org.apache.iceberg.Schema icebergSchema = AvroSchemaUtil.toIceberg(writerSchema);
+        PartitionSpec spec = buildPartitionSpec(icebergSchema, kafkaRecord);
+
+        // 4. Return the complete DynamicRecord
+        return new DynamicRecord(
+                tableId,
+                "branch",
+                icebergSchema,
+                rowData,
+                spec,
+                DistributionMode.NONE,
+                1);
+    }
+}
+

Find more details on options passed to DynamicRecord here

+ +
// A single stream from ALL topics, producing KafkaRecord objects
+DataStream<KafkaRecord> sourceStream = env.fromSource(kafkaSource, ...);
+
+// A single sink that handles everything
+DynamicIcebergSink.forInput(sourceStream)
+    .generator(new KafkaRecordToDynamicRecordGenerator())
+    .catalogLoader(getCatalogLoader())
+    .append();
+

+ Project Details: Availability, Credits, and Development + # +

+

This powerful capability is now officially available as part of the Apache Iceberg project.

+

Find more details on the Dynamic Sink here.

+

+ Supported Versions + # +

+

You can start using the dynamic sink with the following versions:

+
    +
  • Apache Iceberg 1.10.0 or newer. Full release notes here.
  • +
  • Apache Flink 1.20, 2.0, and 2.1.
  • +
+

+ Contributors + # +

+

The journey began with a detailed project proposal authored by Peter Vary, which laid the groundwork for this development. You can read the original proposal here.

+

Major development efforts were led by Maximilian Michels, with contributions from several community members.

+

+ Conclusion + # +

+

In conclusion, the choice between a dynamic and a static Iceberg sink represents a trade-off between operational agility and the performance benefits of static bindings. While a simple, static Kafka-to-Iceberg sink is a performant and straightforward solution for stable data environments, the Dynamic Iceberg Sink pattern helps manage the complexity and velocity of frequently changing data.

+

The most significant advantage of the Dynamic Sink is its ability to reduce operational burden by automating schema evolution. By leveraging a central schema registry, new schema versions can be published without any direct intervention in the Flink application. The dynamic sink detects these changes and adapts the downstream Iceberg table schema on the fly, eliminating the need for manual code changes, configuration updates, and disruptive job restarts. This creates a truly resilient and hands-off data ingestion pipeline.

+

Furthermore, the dynamic sink enables powerful, data-driven logic, such as routing records to different tables based on signals within the data itself. This facilitates advanced use cases like multi-tenant data segregation or event-type-based routing without needing to pre-configure every possible outcome.

+

+
+ + + + + + + + + +
+ + + + +
+ + + + + + + + + + + diff --git a/content/2025/10/15/apache-flink-agents-0.1.0-release-announcement/index.html b/content/2025/10/15/apache-flink-agents-0.1.0-release-announcement/index.html index 914558c6ff..8c9423304f 100644 --- a/content/2025/10/15/apache-flink-agents-0.1.0-release-announcement/index.html +++ b/content/2025/10/15/apache-flink-agents-0.1.0-release-announcement/index.html @@ -28,7 +28,7 @@ - + + + + + + + + + + + + + +
+ + +
+ + +
+
+ + +
+

+ Apache Flink 0.6 available +

+ + + + August 26, 2014 - + + + + + +

We are happy to announce the availability of Flink 0.6. This is the first release of the system inside the Apache Incubator and under the name Flink. Releases up to 0.5 were under the name Stratosphere, the academic and open source project that Flink originates from. +What is Flink? # Apache Flink is a general-purpose data processing engine for clusters. It runs on YARN clusters on top of data stored in Hadoop, as well as stand-alone. + ... + +

+ Continue reading » +
+ + + + + + + + + +
+ + + + +
+ + + + + + + + + + + diff --git a/content/posts/page/3/index.html b/content/posts/page/3/index.html index 2816a3dad4..26684f7fef 100644 --- a/content/posts/page/3/index.html +++ b/content/posts/page/3/index.html @@ -24,7 +24,7 @@ - +