-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Description
I'm testing with spark3.0.1 and cdh5.14 ,iceberg0.9.1. and spark-shell
catalog config is :
spark.sql.catalog.hadoop_prod org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.hadoop_prod.type hadoop
spark.sql.catalog.hadoop_prod.warehouse hdfs://hdfsnamespace/user/hive/warehouse
I tried to create a table :
scala> spark.sql("CREATE TABLE hadoop_prod.ice.icetest (id bigint, data string) USING iceberg PARTITIONED BY (id) ")
insert some value:
scala> spark.sql("INSERT INTO hadoop_prod.ice.icetest VALUES (1, 'a'), (2, 'b'), (3, 'c')")
scala> spark.sql("select * from hadoop_prod.ice.icetest ").show(false)
+---+----+
|id |data|
+---+----+
|1 |a |
|2 |b |
|3 |c |
+---+----+
delete a partion:
scala> spark.sql("delete from hadoop_prod.ice.icetest where id=1").show(false)
scala> spark.sql("select * from hadoop_prod.ice.icetest ").show(false)
+---+----+
|id |data|
+---+----+
|2 |b |
|3 |c |
+---+----+
insert some value:
scala> spark.sql("INSERT INTO hadoop_prod.ice.icetest VALUES (1, 'a'), (1, 'b'), (1, 'c')")
scala> spark.sql("select * from hadoop_prod.ice.icetest ").show(false)
+---+----+
|id |data|
+---+----+
|2 |b |
|3 |c |
|1 |a |
|1 |b |
|1 |c |
+---+----+
show snapshots:
scala> spark.sql("select committed_at, snapshot_id, parent_id, operation from hadoop_prod.ice.icetest.snapshots").show(false)
+-----------------------+-------------------+------------------+---------+
|committed_at |snapshot_id |parent_id |operation|
+-----------------------+-------------------+------------------+---------+
|2020-09-16 13:32:39.952|628886310322778010 |null |append |
|2020-09-16 13:42:34.109|598127609483871079 |628886310322778010|delete |
|2020-09-16 13:43:14.415|6880502734717374864|598127609483871079|append |
+-----------------------+-------------------+------------------+---------+
but every snapshot I read ,show the same last state of the table:
scala> val df2 = spark.read.option("snapshot-id", 628886310322778010L).table("hadoop_prod.ice.icetest")
df2: org.apache.spark.sql.DataFrame = [id: bigint, data: string]
scala> df2.show
+---+----+
| id|data|
+---+----+
| 2| b|
| 3| c|
| 1| a|
| 1| b|
| 1| c|
+---+----+
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels