Skip to content

Conversation

@dongjoon-hyun
Copy link
Member

@dongjoon-hyun dongjoon-hyun commented Oct 16, 2025

What changes were proposed in this pull request?

This PR aims to add Apache Iceberg example.

Why are the changes needed?

To provide an working example of Apache Spark 4.0.1 and Apache Iceberg 1.10.0.

  1. Prepare the storage
$ kubectl apply -f examples/localstack.ym
  1. Launch Spark Connect Server with Apache Iceberg setting.
$ kubectl apply -f examples/spark-connect-server-iceberg.yaml
  1. Setup port-forwarding to test
$ kubectl port-forward spark-connect-server-iceberg-0-driver 15002
  1. Test with Apache Iceberg Spark Quickstart guideline.
$ bin/spark-connect-shell --remote sc://localhost:15002

scala> sql("""CREATE TABLE taxis(vendor_id bigint, trip_id bigint, trip_distance float, fare_amount double, store_and_fwd_flag string) PARTITIONED BY (vendor_id);""").show()

scala> sql("""INSERT INTO taxis VALUES (1, 1000371, 1.8, 15.32, 'N'), (2, 1000372, 2.5, 22.15, 'N'), (2, 1000373, 0.9, 9.01, 'N'), (1, 1000374, 8.4, 42.13, 'Y');""").show()

scala> sql("SELECT * FROM taxis").show(false)
+---------+-------+-------------+-----------+------------------+
|vendor_id|trip_id|trip_distance|fare_amount|store_and_fwd_flag|
+---------+-------+-------------+-----------+------------------+
|1        |1000374|8.4          |42.13      |Y                 |
|1        |1000371|1.8          |15.32      |N                 |
|2        |1000372|2.5          |22.15      |N                 |
|2        |1000373|0.9          |9.01       |N                 |
+---------+-------+-------------+-----------+------------------+

scala> sql("SELECT * FROM taxis.history").show(false)
+-----------------------+-------------------+---------+-------------------+
|made_current_at        |snapshot_id        |parent_id|is_current_ancestor|
+-----------------------+-------------------+---------+-------------------+
|2025-10-16 03:53:04.063|6463217948421571140|NULL     |true               |
+-----------------------+-------------------+---------+-------------------+

scala> sql("SELECT * FROM taxis VERSION AS OF 6463217948421571140").show(false)
+---------+-------+-------------+-----------+------------------+
|vendor_id|trip_id|trip_distance|fare_amount|store_and_fwd_flag|
+---------+-------+-------------+-----------+------------------+
|1        |1000374|8.4          |42.13      |Y                 |
|1        |1000371|1.8          |15.32      |N                 |
|2        |1000372|2.5          |22.15      |N                 |
|2        |1000373|0.9          |9.01       |N                 |
+---------+-------+-------------+-----------+------------------+
  1. Check the data in the storage.
root@localstack:/opt/code/localstack# awslocal s3 ls s3://warehouse/ --recursive
2025-10-16 03:53:03       1545 taxis/data/vendor_id=1/00000-3-749fe2e5-bbe3-4f0e-b976-a21749550705-0-00002.parquet
2025-10-16 03:53:03       1590 taxis/data/vendor_id=2/00000-3-749fe2e5-bbe3-4f0e-b976-a21749550705-0-00001.parquet
2025-10-16 03:53:04       7559 taxis/metadata/9f629d40-ce91-4822-aeee-283d53ec5ef6-m0.avro
2025-10-16 03:53:04       4446 taxis/metadata/snap-6463217948421571140-1-9f629d40-ce91-4822-aeee-283d53ec5ef6.avro
2025-10-16 03:52:47       1006 taxis/metadata/v1.metadata.json
2025-10-16 03:53:04       1970 taxis/metadata/v2.metadata.json
2025-10-16 03:53:04          1 taxis/metadata/version-hint.text

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Manually tested.

Was this patch authored or co-authored using generative AI tooling?

No.

@dongjoon-hyun
Copy link
Member Author

cc @viirya , @peter-toth , @jiangzho

@dongjoon-hyun
Copy link
Member Author

Thank you, @viirya . Merged to main.

@dongjoon-hyun dongjoon-hyun deleted the SPARK-53933 branch October 16, 2025 06:28
Copy link
Contributor

@peter-toth peter-toth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Late LGTM.

@dongjoon-hyun
Copy link
Member Author

Thank you, @peter-toth .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants