Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-9601][state]Try to reuse the snapshotData array as the partitioned destination only when its capacity is enough #6174

Closed
wants to merge 1 commit into from

Conversation

sihuazhou
Copy link
Contributor

What is the purpose of the change

In CopyOnWriteStateTableSnapshot, we only reuse the snapshotData as the partitioned destination array when it's capacity enough is enough.

Brief change log

  • Try to reuse the snapshotData array as the partitioned destination on when it's capacity is enough.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes)
  • The S3 file system connector: (no)

Documentation

No

@sihuazhou
Copy link
Contributor Author

CC @StefanRRichter

@sihuazhou sihuazhou changed the title [FLINNK-9601][state]Try to reuse the snapshotData array as the partitioned destination on… [FLINK-9601][state]Try to reuse the snapshotData array as the partitioned destination on… Jun 16, 2018
@sihuazhou sihuazhou changed the title [FLINK-9601][state]Try to reuse the snapshotData array as the partitioned destination on… [FLINK-9601][state]Try to reuse the snapshotData array as the partitioned destination only when its capacity is enough Jun 16, 2018
@StefanRRichter
Copy link
Contributor

Thanks for the contribution. I think the general idea is good, but I think the implementation can be improved. How about changing CopyOnWriteStateTable::snapshotTableArrays() in such ways that the returned snapshot array is already created with a size of max(whateverTheCurrentCodeDoes, size()). This prevents that we ever have to create another array, in particular in this corner case where we are dealing with huge arrays. Then the remaining code should automagically work without further changes. Adding some nice comment might help that our optimized snapshotting algorithm assumes that the array can hold the flattened entries. What do you think?

@sihuazhou
Copy link
Contributor Author

@StefanRRichter Thanks for the review, I think that makes a lot of sense, will change it according to your suggestion.

@sihuazhou sihuazhou force-pushed the FLINK-9601 branch 2 times, most recently from 8a33522 to 6a16846 Compare June 17, 2018 01:46
…leArrays()` is big enough to hold the flattened entries.
@StefanRRichter
Copy link
Contributor

LGTM now 👍 Will merge.

@asfgit asfgit closed this in 0e9b066 Jun 18, 2018
sampathBhat pushed a commit to sampathBhat/flink that referenced this pull request Jul 26, 2018
…ateTable#snapshotTableArrays()` is big enough to hold all (flattened) entries

This closes apache#6174.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants