Skip to content

Time travel not work #1463

@lordk911

Description

@lordk911

I'm testing with spark3.0.1 and cdh5.14 ,iceberg0.9.1. and spark-shell
catalog config is :

spark.sql.catalog.hadoop_prod               org.apache.iceberg.spark.SparkCatalog
spark.sql.catalog.hadoop_prod.type          hadoop
spark.sql.catalog.hadoop_prod.warehouse     hdfs://hdfsnamespace/user/hive/warehouse

I tried to create a table :
scala> spark.sql("CREATE TABLE hadoop_prod.ice.icetest (id bigint, data string) USING iceberg PARTITIONED BY (id) ")

insert some value:

scala> spark.sql("INSERT INTO hadoop_prod.ice.icetest VALUES (1, 'a'), (2, 'b'), (3, 'c')")
scala> spark.sql("select * from hadoop_prod.ice.icetest ").show(false)
+---+----+                                                                      
|id |data|
+---+----+
|1  |a   |
|2  |b   |
|3  |c   |
+---+----+

delete a partion:

scala> spark.sql("delete from hadoop_prod.ice.icetest where id=1").show(false)
scala> spark.sql("select * from hadoop_prod.ice.icetest ").show(false)
+---+----+
|id |data|
+---+----+
|2  |b   |
|3  |c   |
+---+----+

insert some value:

scala> spark.sql("INSERT INTO hadoop_prod.ice.icetest VALUES (1, 'a'), (1, 'b'), (1, 'c')")
scala> spark.sql("select * from hadoop_prod.ice.icetest ").show(false)
+---+----+
|id |data|
+---+----+
|2  |b   |
|3  |c   |
|1  |a   |
|1  |b   |
|1  |c   |
+---+----+

show snapshots:

scala> spark.sql("select committed_at, snapshot_id, parent_id, operation from hadoop_prod.ice.icetest.snapshots").show(false)
+-----------------------+-------------------+------------------+---------+
|committed_at           |snapshot_id        |parent_id         |operation|
+-----------------------+-------------------+------------------+---------+
|2020-09-16 13:32:39.952|628886310322778010 |null              |append   |
|2020-09-16 13:42:34.109|598127609483871079 |628886310322778010|delete   |
|2020-09-16 13:43:14.415|6880502734717374864|598127609483871079|append   |
+-----------------------+-------------------+------------------+---------+

but every snapshot I read ,show the same last state of the table:

scala> val df2 = spark.read.option("snapshot-id", 628886310322778010L).table("hadoop_prod.ice.icetest")
df2: org.apache.spark.sql.DataFrame = [id: bigint, data: string]

scala> df2.show
+---+----+                                                                      
| id|data|
+---+----+
|  2|   b|
|  3|   c|
|  1|   a|
|  1|   b|
|  1|   c|
+---+----+

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions