Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kernel] Add monotonic inCommitTimestamp and read support for inCommitTimestamp #3282

Merged
merged 1 commit into from
Jun 28, 2024

Conversation

EstherBear
Copy link
Contributor

Which Delta project/connector is this regarding?

  • Spark
  • Standalone
  • Flink
  • Kernel
  • Other (fill in here)

Description

Add read support for inCommitTimestamp and ensure the increasing monotonicity of inCommitTimestamp assuming that there are no conflicts to prepare for the complete inCommitTimestamp support with conflict resolution in Kernel.

How was this patch tested?

Add unit tests to verify that the read of inCommitTimestamp is correct and inCommitTimestamp is monotonic.

Does this PR introduce any user-facing changes?

Yes, user can enable monotonic inCommitTimestamp assuming that there are no conflicts by enabling its property.

@EstherBear
Copy link
Contributor Author

This PR is based on #3276 merge it first.

@EstherBear EstherBear changed the title Add monotonic inCommitTimestamp and read support for inCommitTimestamp [Kernel] Add monotonic inCommitTimestamp and read support for inCommitTimestamp Jun 18, 2024
@@ -121,4 +125,22 @@ public Path getLogPath() {
public Path getDataPath() {
return dataPath;
}

public long getTimestamp(Engine engine) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, it looks like the Commit file be read every time getTimestamp is invoked. This is inefficient.

  1. We should read it once and store it in a variable.
  2. As a future optimization, this should be read as part of LogReplay.loadTableProtocolAndMetadata and then passed by SnapshotManager to Snapshot

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it will be only read once? inCommitTimestampOpt will be present for the following reads except the first read.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, thanks for pointing that out.

Copy link
Collaborator

@vkorukanti vkorukanti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks nice! Few comments!

@@ -41,6 +43,7 @@ public class SnapshotImpl implements Snapshot {
private final Protocol protocol;
private final Metadata metadata;
private final LogSegment logSegment;
private Optional<Long> inCommitTimestampOpt;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Lazy instance to load it atmost once. See the examples of Lazy in Metadata.java

@@ -121,4 +125,33 @@ public Path getLogPath() {
public Path getDataPath() {
return dataPath;
}

/**
* Returns the timestamp of the latest commit of this snapshot.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it say: Returns the timestamp of when this table snapshot version is created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be the file modification time as a fallback? So it's not necessarily of the time when this table snapshot version is created?

}
return inCommitTimestampOpt.get();
} else {
return logSegment.lastCommitTimestamp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the LogSegment always initialized with the correct lastCommitTimestamp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the modification time of the file. When we couldn't get ICT, it's a fallback.

@EstherBear
Copy link
Contributor Author

next #3283

if (TableConfig.IN_COMMIT_TIMESTAMPS_ENABLED.fromMetadata(metadata)) {
if (!inCommitTimestampOpt.isPresent()) {
try {
Optional<CommitInfo> commitInfoOpt = CommitInfo.getCommitInfoOpt(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, the ICT should be read part of the protocol and metadata query in LogReplay.java. Can you create an issue to track this improvement?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EstherBear please create an issue for this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #3321

if (TableConfig.IN_COMMIT_TIMESTAMPS_ENABLED.fromMetadata(metadata)) {
if (!inCommitTimestampOpt.isPresent()) {
try {
Optional<CommitInfo> commitInfoOpt = CommitInfo.getCommitInfoOpt(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EstherBear please create an issue for this.


val ver2Snapshot = table.getLatestSnapshot(engine).asInstanceOf[SnapshotImpl]
val ver2Timestamp = ver2Snapshot.getTimestamp(engine)
assert(ver2Timestamp == ver1Timestamp + 1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to verify that the tables are readable. I will merge this, but could you please address it in the next PR?

@vkorukanti vkorukanti merged commit 87f0685 into delta-io:master Jun 28, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants