Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-24064][connector/common] HybridSource restore from savepoint #17111

Merged
merged 1 commit into from
Sep 3, 2021

Conversation

tweise
Copy link
Contributor

@tweise tweise commented Sep 2, 2021

What is the purpose of the change

Restore from savepoint fails due to deserialization of underlying splits before the underlying enumerator has been restored (details in JIRA). With this change deserialization will be deferred and be explicit in the HybridSource enumerator/reader.

Verifying this change

Existing tests don't cover restore from savepoint (ITCase performs recovery from initial state). Deserialization of HybridSplit and enumerator checkpoint covered by unit test. Changes verified with internal deployment. Planning to add unit test that just deserializes HybridSourceSplit before merging.

@tweise tweise requested a review from AHeise September 2, 2021 03:31
@flinkbot
Copy link
Collaborator

flinkbot commented Sep 2, 2021

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 8b30569 (Thu Sep 02 03:35:04 UTC 2021)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@tweise
Copy link
Contributor Author

tweise commented Sep 2, 2021

@AHeise @stevenzwu please take a look at the deserialization change in general. I'm planning for some more cleanup work on this PR tomorrow but would also like for this to go into the 1.14 release.

@flinkbot
Copy link
Collaborator

flinkbot commented Sep 2, 2021

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

@@ -92,13 +90,13 @@

private final List<SourceListEntry> sources;
// sources are populated per subtask at switch time
private final Map<Integer, Source> switchedSources;
private final HybridSourceSplitSerializer.SwitchedSources switchedSources;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is SwitchedSources nested inside the HybridSourceSplitSerializer?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did I miss sth? I didn't see switchedSources used anywhere?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please remove. switchedSources acted like a shared cache which is now not necessary anymore. (Not sure how I missed that in the initial review, I guess I was too focused on API.)

This should now just be a field in enumerator/reader that caches the sources.

Copy link
Contributor Author

@tweise tweise Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That came in after moving away from the fixed source sequence that originally both, enumerator and serializer had access to. They still needed access to the underlying serializer and therefore to the source that provided that serializer. Now that serializers are decoupled, this hacky thing is no longer needed. I just missed that in the refactor, thanks @stevenzwu for catching it.

Copy link
Contributor

@AHeise AHeise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for providing this fix! I left some comments, see below.

this.switchedSources = switchedSources;
this.cachedSerializers = new HashMap<>();
}
public HybridSourceEnumeratorStateSerializer() {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much cleaner now!

return source.getSplitSerializer();
}));
/** Sources that participated in switching with cached serializers. */
public static class SwitchedSources implements Serializable {
Copy link
Contributor

@AHeise AHeise Sep 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why this is a nested class here.

@@ -92,13 +90,13 @@

private final List<SourceListEntry> sources;
// sources are populated per subtask at switch time
private final Map<Integer, Source> switchedSources;
private final HybridSourceSplitSerializer.SwitchedSources switchedSources;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes please remove. switchedSources acted like a shared cache which is now not necessary anymore. (Not sure how I missed that in the initial review, I guess I was too focused on API.)

This should now just be a field in enumerator/reader that caches the sources.

return wrappedStateBytes;
}

public int wrappedStateSerializerVersion() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: wrappedStateSerializerVersion -> getWrappedStateSerializerVersion just to be consistent of Flink style

out.writeInt(enumStateBytes.length);
out.write(enumStateBytes);
out.writeInt(enumState.wrappedStateSerializerVersion());
out.writeInt(enumState.getWrappedState().length);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

integer would limit the state size to 2 GB. not sure if we need to worry about it or not. It can happen if the historical storage (like HDFS or Iceberg) have many files/splits for the booststrap scan.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each Iceberg split contains data files, delete files (for upsert), schema string. Each data file also contains stats for every column. if the table is wide (many columns), each split may go over 10 KB

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This discussion is probably outside the scope of this PR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please note that the limit would not be the integer to represent the size but rather the byte[] array that cannot go beyond that. We do not have 64 bit array https://www.nayuki.io/page/large-arrays-proposal-for-java

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. understood it is the choice of bye[], which is then a limitation of SimpleVersionedSerializer's API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also wonder if we would hit other issues with such large state serialized in the coordinator? Can IcebergSource limit the number of splits it keeps in the checkpoint and only add more once some have been processed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tweise I have thought about adding the optimization (of limiting the number of splits) for streaming read in the future. for bounded job, we can't. on the other hand, bounded job may not need to have checkpoint enabled

@stevenzwu
Copy link

@tweise do we need a MiniCluster unit test for the savepoint trigger and restore?

}

public SimpleVersionedSerializer<SourceSplit> serializerOf(int sourceIndex) {
return cachedSerializers.computeIfAbsent(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to cache the SplitSerializer? seems unnecessary to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To not create a new serializer instance per split, but rather once per coordinator/operator (matching how it works for the top level source).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Originally I was imagining the singleton pattern from file source. Then this caching is not necessary.

    @Override
    public SimpleVersionedSerializer<FileSourceSplit> getSplitSerializer() {
        return FileSourceSplitSerializer.INSTANCE;
    }

I guess it depends on the implementation. Some source impls may construct a new object in this method and hence this caching might be beneficial.

@tweise
Copy link
Contributor Author

tweise commented Sep 3, 2021

@tweise do we need a MiniCluster unit test for the savepoint trigger and restore?

I'm going to look into adding that to HybridSourceITCase. Probably outside of this PR because I want to backport this to release branches and not risk adding potential test instability.

@@ -21,18 +21,25 @@
/** The state of hybrid source enumerator. */
public class HybridSourceEnumeratorState {
private final int currentSourceIndex;
private final Object wrappedState;
private byte[] wrappedStateBytes;
private final int wrappedStateSerializerVersion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final int wrappedStateSerializerVersion;
private final int serializerVersion;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered that as well and prefer the verbose name to make clear that this is the serializer version for the underlying state vs that of HybridSourceEnumeratorState.

import java.util.List;
import java.util.Objects;

/** Source split that wraps the actual split type. */
public class HybridSourceSplit implements SourceSplit {

private final SourceSplit wrappedSplit;
private final byte[] wrappedSplitBytes;
private final int wrappedSplitSerializerVersion;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
private final int wrappedSplitSerializerVersion;
private final int serializerVersion;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considered that as well and prefer the verbose name to make clear that this is the serializer version for the underlying state vs that of HybridSourceSplit.

@@ -57,38 +69,64 @@ public boolean equals(Object o) {
return false;
}
HybridSourceSplit that = (HybridSourceSplit) o;
return sourceIndex == that.sourceIndex && wrappedSplit.equals(that.wrappedSplit);
return sourceIndex == that.sourceIndex
&& Arrays.equals(wrappedSplitBytes, that.wrappedSplitBytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the splitId equal?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed because splitId is already part of wrappedSplitBytes.

}

@Override
public String toString() {
return "HybridSourceSplit{"
+ "realSplit="
+ wrappedSplit
+ wrappedSplitBytes
+ ", sourceIndex="
+ sourceIndex
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add the splitId field.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I also removed wrappedSplitBytes because it doesn't provide meaningful information.

Copy link

@stevenzwu stevenzwu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after @SteNicholas 's comments are addressed

@tweise tweise merged commit 2984d87 into apache:master Sep 3, 2021
@tweise tweise deleted the hybridsource-savepoint branch September 3, 2021 23:23
@tweise
Copy link
Contributor Author

tweise commented Sep 3, 2021

@stevenzwu @AHeise @SteNicholas thanks for the review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants