Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-10325] [State TTL] Refactor TtlListState to use only loops, no java stream API for performance #6683

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
5 participants
@azagrebin
Copy link
Contributor

commented Sep 12, 2018

What is the purpose of the change

Refactor TtlListState to use only loops instead of java stream API to exclude any performance impact

Brief change log

  • Refactor TtlListState.updateTs() to use loop
  • Refactor TtlListState.collect() to use loop
  • auxiliary methods in TtlUtils

Verifying this change

existing tests

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (yes)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not applicable)

@azagrebin azagrebin force-pushed the azagrebin:FLINK-10325 branch from 312b665 to 3b7ce70 Sep 12, 2018

@azagrebin azagrebin force-pushed the azagrebin:FLINK-10325 branch from 3b7ce70 to 8259e5d Sep 12, 2018

@Aitozi

This comment has been minimized.

Copy link
Contributor

commented Sep 13, 2018

Hi @azagrebin ,does for loop perform better than stream api ? And I have not found an existing test for the performance of TtlListState, may be we should add one to verify this change ?

@Clarkkkkk

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2018

Hi @Aitozi, loop does perform better than stream api, but the gain by using loop seems so little with the operations like update and collect. However, for the simplicity, I prefer the stream api. What do you think? @azagrebin

@tillrohrmann
Copy link
Contributor

left a comment

Thanks for the improvement @azagrebin. The changes look good to me. I'll address my own comments while merging your PR.

.map(this::rewrapWithNewTs)
.collect(Collectors.toList());
private void updateTs(List<TtlValue<T>> ttlValues) throws Exception {
List<TtlValue<T>> unexpiredWithUpdatedTs = new ArrayList<>();

This comment has been minimized.

Copy link
@tillrohrmann

tillrohrmann Sep 14, 2018

Contributor

Would it make sense to initialize the capacity of this ArrayList to ttlValues.size? That way we would save resizing work.

@@ -116,7 +125,12 @@ public void updateInternal(List<T> valueToStore) throws Exception {
}

private List<TtlValue<T>> withTs(List<T> values) {
return values.stream().map(this::wrapWithTs).collect(Collectors.toList());
List<TtlValue<T>> withTs = new ArrayList<>();

This comment has been minimized.

Copy link
@tillrohrmann

tillrohrmann Sep 14, 2018

Contributor

Same here with initial capacity.

}

static <V> boolean expired(TtlValue<V> ttlValue, long ttl, long currentTimestamp) {
return ttlValue != null && expired(ttlValue.getLastAccessTimestamp(), ttl, currentTimestamp);

This comment has been minimized.

Copy link
@tillrohrmann

tillrohrmann Sep 14, 2018

Contributor

If ttlValue can be null, then we should annotate it with @Nullable.

@tillrohrmann

This comment has been minimized.

Copy link
Contributor

commented Sep 14, 2018

@Clarkkkkk the TtlListState is somewhat performance critical and, thus, it makes sense imo to improve it as much as possible. One of the problems was also that we queried the timestamp for each element when updating it. With Andrey's changes this won't happen.

@asfgit asfgit closed this in 2e21582 Sep 14, 2018

asfgit pushed a commit that referenced this pull request Sep 14, 2018

[FLINK-10325] [State TTL] Refactor TtlListState to use only loops, no…
… java stream API for performance

This closes #6683.
@Clarkkkkk

This comment has been minimized.

Copy link
Contributor

commented Sep 16, 2018

@tillrohrmann Thanks for your reply. Sounds reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.