Add Flink quickstart #15062

rmoff · 2026-01-16T11:22:47Z

This is modelled on the existing Spark quickstart.

It uses a Docker Compose and Dockerfile; I've put these under /flink/v2.0/quickstart in the repo, but not sure if that's the right location :)

…pl will commonly look for it

MartijnVisser

Thnx for the PR, I've checked the Flink input and left some minor nits, but overall +1

flink/v2.0/quickstart/Dockerfile.flink

flink/quickstart/docker-compose.yml

site/docs/flink-quickstart.md

flink/quickstart/Dockerfile.flink

site/docs/flink-quickstart.md

pvary · 2026-01-19T09:12:41Z

flink/quickstart/Dockerfile.flink

@@ -0,0 +1,51 @@
+#  - Licensed to the Apache Software Foundation (ASF) under one or more


The spark quickstart contains the yaml file in the documentation page.

Is there a specific reason we decided to do otherwise?

Yes, because there's a Dockerfile too, and that's a lot of code to puts in a docs page when it could just be linked to :)

My fear is, that it will be missed, and we will forget to update after a new release.

The less hops we have the less likely to make these kind of mistakes.

Unless we release a docker image too

pvary · 2026-01-19T09:16:07Z

is the flink/quickstart/overview.excalidraw.svg used somewhere?

pvary · 2026-01-19T09:16:37Z

@mxm, @Guosmilesmile: Could you please review?

Co-authored-by: pvary <peter.vary.apache@gmail.com>

rmoff · 2026-01-19T11:17:37Z

is the flink/quickstart/overview.excalidraw.svg used somewhere?

I've added it into the doc as a reference image 68b53bc

- Change to use flink 2.1 (apache#15062 (comment)) - Updated version variables for easier upgrades (apache#15062 (comment))

…sion variable handling

rmoff · 2026-01-19T12:44:07Z

SeaweedFS is S3-compatible local storage that I was using in place of MinIO, which has been been moved to maintenance mode.
I investigated different options and seaweedfs seems like a good one which is why I was using it here.
On Slack, @pvary advised to use MinIO for now, and open the question on the dev list please, that we need to find an alternative. We use minio in many other place, and we will fix it with one batch.

pvary · 2026-01-19T14:57:49Z

flink/quickstart/docker-compose.yml

+#  - WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+#  - See the License for the specific language governing permissions and
+#  - limitations under the License.
+services:


I think it would be good to ask the community on the dev list about creating an official flink quickstart docker image.

We already have a /docker/iceberg-rest-fixture which is released by .github/workflows/publish-iceberg-rest-fixture-docker.yml.

Maybe we could have the same for Flink and Spark there.

If the community is not interested in having that there, I'm ok adding this to Flink

Yes, I will start this discussion, and would be happy to contribute a PR for it too.
#15114

mxm

Thank you @rmoff for the PR! This is a great addition.

mxm · 2026-01-19T14:52:07Z

docs/docs/flink.md

@@ -1,5 +1,5 @@
 ---
-title: "Flink Getting Started"
+title: "Getting Started"


Should we keep the Flink context?

I was modelling on what we've got for Spark

Maybe I should address the LHN links in a separate PR, since on this logic the other Flink pages also oughtn't have the prefix (or, the Spark ones should)

mxm · 2026-01-19T14:59:12Z

site/docs/flink-quickstart.md

+Let's create a table using `iceberg_catalog.nyc.taxis` where `iceberg_catalog` is the catalog name, `nyc` is the database name, and `taxis` is the table name.
+
+```sql
+CREATE TABLE iceberg_catalog.nyc.taxis


I'm curious, why are we fully-qualifying the table name here when we set the default catalog and database name above?

good point. copy pasta from the Spark quickstart.

Fixed bdd5763

mxm · 2026-01-19T15:00:10Z

site/docs/flink-quickstart.md

+Then make this the active catalog in your Flink SQL session:
+
+```sql
+USE CATALOG iceberg_catalog;
+```
+
+Create a database in the catalog:
+
+```sql
+CREATE DATABASE IF NOT EXISTS nyc;
+```
+
+and set it as active:
+
+```sql
+USE nyc;
+```


For brevity and to avoid confusion, I would remove changing the default catalog / database and continue to use fully-qualified table names (like below).

Fixed bdd5763

mxm · 2026-01-19T15:01:51Z

site/docs/flink-quickstart.md

+First, switch to the default catalog (otherwise the table would be created using the Iceberg details that we configured in the catalog definition above):
+
+```sql
+USE CATALOG default_catalog;
+```


I would prefer to avoid changing the default catalog because that would make these examples easier to read.

Fixed bdd5763

mxm · 2026-01-19T15:14:03Z

flink/quickstart/Dockerfile.flink

+RUN echo "-> Install JARs: Hadoop" && \
+    mkdir -p ./lib/hadoop && pushd $_ && \
+    curl https://repo1.maven.org/maven2/org/apache/commons/commons-configuration2/2.1.1/commons-configuration2-2.1.1.jar -O && \
+    curl https://repo1.maven.org/maven2/commons-logging/commons-logging/1.1.3/commons-logging-1.1.3.jar -O && \
+    curl https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-auth/${HADOOP_VERSION}/hadoop-auth-${HADOOP_VERSION}.jar -O && \
+    curl https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-common/${HADOOP_VERSION}/hadoop-common-${HADOOP_VERSION}.jar -O && \
+    curl https://repo1.maven.org/maven2/org/apache/hadoop/thirdparty/hadoop-shaded-guava/1.1.1/hadoop-shaded-guava-1.1.1.jar -O && \
+    curl https://repo1.maven.org/maven2/org/codehaus/woodstox/stax2-api/4.2.1/stax2-api-4.2.1.jar -O && \
+    curl https://repo1.maven.org/maven2/com/fasterxml/woodstox/woodstox-core/5.3.0/woodstox-core-5.3.0.jar -O && \
+    curl https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-hdfs-client/${HADOOP_VERSION}/hadoop-hdfs-client-${HADOOP_VERSION}.jar -O && \
+    curl https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-mapreduce-client-core/${HADOOP_VERSION}/hadoop-mapreduce-client-core-${HADOOP_VERSION}.jar -O  && \
+    popd


Do we have to use Hadoop in 2026? :)

We could use S3 without Hadoop.

that's the dream, right? ;)

Flink SQL> CREATE CATALOG iceberg_catalog WITH ( > 'type' = 'iceberg', > 'catalog-impl' = 'org.apache.iceberg.rest.RESTCatalog', > 'uri' = 'http://iceberg-rest:8181', > 'warehouse' = 's3://warehouse/', > 'io-impl' = 'org.apache.iceberg.aws.s3.S3FileIO', > 's3.endpoint' = 'http://minio:9000', > 's3.access-key-id' = 'admin', > 's3.secret-access-key' = 'password', > 's3.path-style-access' = 'true' > ); [ERROR] Could not execute SQL statement. Reason: java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration Flink SQL>

Yes, that would be the dream 😄 I just checked the code, and yes there is the dependency on at least Hadoop's Configuration, even with a custom catalog / IO. I think it should suffice to only include hadoop-common. We can remove all the HDFS, guava, etc.

Would you mind giving that a try?

I have iterated over them before, but let me try again and log the details. Stand by…

OK, managed to strip three out (e0f619e):

commons-logging

hadoop-aws (S3 handled by iceberg-aws-bundle)

flink-s3-fs-hadoop (S3FileIO used instead)

The others are needed though:

JAR Name Error point Error

commons-configuration2-2.1.1.jar jobmanager startup java.lang.NoClassDefFoundError: org/apache/commons/configuration2/Configuration

hadoop-auth-${HADOOP_VERSION}.jar jobmanager startup java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName

hadoop-common-${HADOOP_VERSION}.jar CREATE CATALOG java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration

hadoop-shaded-guava-1.1.1.jar CREATE CATALOG java.lang.ClassNotFoundException: org.apache.hadoop.thirdparty.com.google.common

stax2-api-4.2.1.jar CREATE CATALOG java.lang.ClassNotFoundException: org.codehaus.stax2.XMLInputFactory2

woodstox-core-5.3.0.jar CREATE CATALOG java.lang.ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper

hadoop-hdfs-client-${HADOOP_VERSION}.jar CREATE CATALOG java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.HdfsConfiguration

hadoop-mapreduce-client-core-${HADOOP_VERSION}.jar SELECT java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Thanks for checking! The outcome is a bit sad; we have some cleanup to do.

mxm · 2026-01-19T15:16:04Z

flink/quickstart/docker-compose.yml

+services:
+  jobmanager:
+    build:
+      context: .
+      dockerfile: Dockerfile.flink
+    hostname: jobmanager
+    container_name: jobmanager
+    depends_on:


Kubernetes seems to be a more typical setup from my experience, even for local testing, e.g. via Minikube.

I think everyone who does k8s does Docker, but not everyone who does Docker does k8s… so for the sake of making it as accessible to as many people, I'd suggest we stick with Docker.

mxm · 2026-01-19T15:17:35Z

flink/quickstart/overview.excalidraw.svg

Could we link this file from the docs page or remove it?

I've brought it into the page itself: 68b53bc

Guosmilesmile · 2026-01-20T05:14:09Z

site/docs/flink-quickstart.md

+
+Once you have those, save these two files into a new folder:
+
+* [`docker-compose.yml`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/docker-compose.yml)


I haven’t tried this myself, so I’d like to double‑check whether a v2.0 path will actually be created in this case.

Guosmilesmile · 2026-01-20T05:14:22Z

site/docs/flink-quickstart.md

+    * MinIO (local S3 storage)
+    * AWS CLI (to create the S3 bucket)
+
+* [`Dockerfile.flink`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/Dockerfile.flink) - base Flink image, plus some required JARs for S3 and Iceberg.


The same above.

Guosmilesmile · 2026-01-20T05:24:39Z

site/docs/flink-quickstart.md

+```sql
+USE CATALOG default_catalog;
+```


I also think it would be better to avoid changing the default catalog.

fixed bdd5763

mxm

Thanks for the update @rmoff! I wonder if we can reduce the Hadoop dependencies (see https://github.com/apache/iceberg/pull/15062/files#r2720565630).

mxm

LGTM. Thank you @rmoff!

Removed an unnecessary blank line in the documentation.

pvary · 2026-01-23T13:02:49Z

Let’s see where we land on the image location.
Directly providing the Docker image would help lower the entry barrier.

rmoff added 13 commits January 15, 2026 11:44

Add note on where to find info on building the docs in places where p…

86238dd

…pl will commonly look for it

Flink quickstart - first commit

d09202e

Add Flink Quickstart

9b84216

Fixes

b3dfa51

Fix typos / grammar

fc7f8ea

Fix title, add link to quickstart

4c9227d

fixes & nits

5927f19

spaces

e3af0da

spaces

f069a56

spaces

b79478d

spaces

b90b66d

Fixes & nits

cc51d9a

spaces

8a862c6

github-actions bot added flink docs labels Jan 16, 2026

Add license headers

9ca4ecd

MartijnVisser approved these changes Jan 16, 2026

View reviewed changes

flink/v2.0/quickstart/Dockerfile.flink Outdated Show resolved Hide resolved

flink/quickstart/docker-compose.yml Show resolved Hide resolved

site/docs/flink-quickstart.md Show resolved Hide resolved

Bump to FLink 2.2, move quickstart folder

53bb146

pvary reviewed Jan 19, 2026

View reviewed changes

flink/quickstart/Dockerfile.flink Outdated Show resolved Hide resolved

pvary reviewed Jan 19, 2026

View reviewed changes

flink/quickstart/Dockerfile.flink Outdated Show resolved Hide resolved

pvary reviewed Jan 19, 2026

View reviewed changes

site/docs/flink-quickstart.md Outdated Show resolved Hide resolved

pvary reviewed Jan 19, 2026

View reviewed changes

site/docs/flink-quickstart.md Outdated Show resolved Hide resolved

pvary reviewed Jan 19, 2026

View reviewed changes

Update site/docs/flink-quickstart.md

812d120

Co-authored-by: pvary <peter.vary.apache@gmail.com>

Flink 2.1, use variables for version numbers

171d0f4

- Change to use flink 2.1 (apache#15062 (comment)) - Updated version variables for easier upgrades (apache#15062 (comment))

rmoff marked this pull request as draft January 19, 2026 11:35

Revert to Flink 2.0 (latest supported in Iceberg), fix Dockerfile ver…

9c3bb42

…sion variable handling

rmoff marked this pull request as ready for review January 19, 2026 12:14

Switch to MinIO for local S3 stoage

f6d8af2

rmoff requested a review from pvary January 19, 2026 12:44

pvary reviewed Jan 19, 2026

View reviewed changes

mxm reviewed Jan 19, 2026

View reviewed changes

Guosmilesmile reviewed Jan 20, 2026

View reviewed changes

rmoff added 2 commits January 22, 2026 17:55

Fix catalog prefixes/qualifiers

bdd5763

Embed diagram in page, relocate image file to docs folder

68b53bc

rmoff mentioned this pull request Jan 22, 2026

Add official Flink quickstart Docker image #15114

Open

3 tasks

rmoff requested a review from mxm January 22, 2026 18:22

mxm reviewed Jan 23, 2026

View reviewed changes

Trim deps

e0f619e

rmoff requested a review from mxm January 23, 2026 11:51

mxm approved these changes Jan 23, 2026

View reviewed changes

Clean up whitespace in flink-quickstart.md

3d178ec

Removed an unnecessary blank line in the documentation.

rmoff requested a review from pvary January 23, 2026 12:54

rmoff mentioned this pull request Jan 23, 2026

Add Flink Quickstart docker image #15124

Draft

		@@ -0,0 +1,51 @@
		# - Licensed to the Apache Software Foundation (ASF) under one or more

JAR Name	Error point	Error
commons-configuration2-2.1.1.jar	jobmanager startup	`java.lang.NoClassDefFoundError: org/apache/commons/configuration2/Configuration`
hadoop-auth-${HADOOP_VERSION}.jar	jobmanager startup	`java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName`
hadoop-common-${HADOOP_VERSION}.jar	`CREATE CATALOG`	`java.lang.ClassNotFoundException: org.apache.hadoop.conf.Configuration`
hadoop-shaded-guava-1.1.1.jar	`CREATE CATALOG`	`java.lang.ClassNotFoundException: org.apache.hadoop.thirdparty.com.google.common`
stax2-api-4.2.1.jar	`CREATE CATALOG`	`java.lang.ClassNotFoundException: org.codehaus.stax2.XMLInputFactory2`
woodstox-core-5.3.0.jar	`CREATE CATALOG`	`java.lang.ClassNotFoundException: com.ctc.wstx.io.InputBootstrapper`
hadoop-hdfs-client-${HADOOP_VERSION}.jar	`CREATE CATALOG`	`java.lang.ClassNotFoundException: org.apache.hadoop.hdfs.HdfsConfiguration`
hadoop-mapreduce-client-core-${HADOOP_VERSION}.jar	`SELECT`	`java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.lib.input.FileInputFormat`


		Once you have those, save these two files into a new folder:

		* [`docker-compose.yml`](https://raw.githubusercontent.com/apache/iceberg/refs/heads/main/flink/v2.0/quickstart/docker-compose.yml)

Add Flink quickstart #15062

Are you sure you want to change the base?

Add Flink quickstart #15062

Conversation

rmoff commented Jan 16, 2026

Uh oh!

MartijnVisser left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pvary commented Jan 19, 2026

Uh oh!

pvary commented Jan 19, 2026

Uh oh!

rmoff commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rmoff commented Jan 19, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mxm left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmoff Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmoff Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rmoff commented Jan 19, 2026 •

edited

Loading

rmoff Jan 22, 2026 •

edited

Loading

rmoff Jan 22, 2026 •

edited

Loading