-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kernel] Add monotonic inCommitTimestamp and read support for inCommitTimestamp #3282
Conversation
This PR is based on #3276 merge it first. |
...l/kernel-api/src/main/java/io/delta/kernel/exceptions/MissingInCommitTimestampException.java
Outdated
Show resolved
Hide resolved
...l/kernel-api/src/main/java/io/delta/kernel/exceptions/MissingInCommitTimestampException.java
Outdated
Show resolved
Hide resolved
@@ -121,4 +125,22 @@ public Path getLogPath() { | |||
public Path getDataPath() { | |||
return dataPath; | |||
} | |||
|
|||
public long getTimestamp(Engine engine) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a docstring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it looks like the Commit file be read every time getTimestamp
is invoked. This is inefficient.
- We should read it once and store it in a variable.
- As a future optimization, this should be read as part of
LogReplay.loadTableProtocolAndMetadata
and then passed by SnapshotManager to Snapshot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be only read once? inCommitTimestampOpt
will be present for the following reads except the first read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right, thanks for pointing that out.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionImpl.java
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/InCommitTimestampSuite.scala
Outdated
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/InCommitTimestampSuite.scala
Outdated
Show resolved
Hide resolved
a1d1196
to
1f69550
Compare
kernel/kernel-api/src/main/java/io/delta/kernel/internal/DeltaHistoryManager.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/DeltaHistoryManager.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/DeltaHistoryManager.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/DeltaHistoryManager.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/TransactionImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/SnapshotImpl.java
Show resolved
Hide resolved
kernel/kernel-defaults/src/test/scala/io/delta/kernel/defaults/InCommitTimestampSuite.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice! Few comments!
kernel/kernel-api/src/main/java/io/delta/kernel/exceptions/MissingCommitInfoException.java
Outdated
Show resolved
Hide resolved
...l/kernel-api/src/main/java/io/delta/kernel/exceptions/MissingInCommitTimestampException.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/DeltaHistoryManager.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/DeltaHistoryManager.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/DeltaHistoryManager.java
Outdated
Show resolved
Hide resolved
@@ -41,6 +43,7 @@ public class SnapshotImpl implements Snapshot { | |||
private final Protocol protocol; | |||
private final Metadata metadata; | |||
private final LogSegment logSegment; | |||
private Optional<Long> inCommitTimestampOpt; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Lazy
instance to load it atmost once. See the examples of Lazy
in Metadata.java
@@ -121,4 +125,33 @@ public Path getLogPath() { | |||
public Path getDataPath() { | |||
return dataPath; | |||
} | |||
|
|||
/** | |||
* Returns the timestamp of the latest commit of this snapshot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it say: Returns the timestamp of when this table snapshot version is created
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also be the file modification time as a fallback? So it's not necessarily of the time when this table snapshot version is created?
} | ||
return inCommitTimestampOpt.get(); | ||
} else { | ||
return logSegment.lastCommitTimestamp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the LogSegment
always initialized with the correct lastCommitTimestamp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's the modification time of the file. When we couldn't get ICT, it's a fallback.
kernel/kernel-api/src/main/java/io/delta/kernel/internal/TableImpl.java
Outdated
Show resolved
Hide resolved
kernel/kernel-api/src/main/java/io/delta/kernel/internal/actions/CommitInfo.java
Outdated
Show resolved
Hide resolved
1f69550
to
ea40a0f
Compare
0e8b844
to
7aba5e9
Compare
next #3283 |
if (TableConfig.IN_COMMIT_TIMESTAMPS_ENABLED.fromMetadata(metadata)) { | ||
if (!inCommitTimestampOpt.isPresent()) { | ||
try { | ||
Optional<CommitInfo> commitInfoOpt = CommitInfo.getCommitInfoOpt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, the ICT should be read part of the protocol and metadata query in LogReplay.java. Can you create an issue to track this improvement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EstherBear please create an issue for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created #3321
if (TableConfig.IN_COMMIT_TIMESTAMPS_ENABLED.fromMetadata(metadata)) { | ||
if (!inCommitTimestampOpt.isPresent()) { | ||
try { | ||
Optional<CommitInfo> commitInfoOpt = CommitInfo.getCommitInfoOpt( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@EstherBear please create an issue for this.
|
||
val ver2Snapshot = table.getLatestSnapshot(engine).asInstanceOf[SnapshotImpl] | ||
val ver2Timestamp = ver2Snapshot.getTimestamp(engine) | ||
assert(ver2Timestamp == ver1Timestamp + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to verify that the tables are readable. I will merge this, but could you please address it in the next PR?
Which Delta project/connector is this regarding?
Description
Add read support for inCommitTimestamp and ensure the increasing monotonicity of inCommitTimestamp assuming that there are no conflicts to prepare for the complete inCommitTimestamp support with conflict resolution in Kernel.
How was this patch tested?
Add unit tests to verify that the read of inCommitTimestamp is correct and inCommitTimestamp is monotonic.
Does this PR introduce any user-facing changes?
Yes, user can enable monotonic inCommitTimestamp assuming that there are no conflicts by enabling its property.