New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BEAM-12333] Changing TimerKey to include TimerFamilyId #14802
Conversation
// encode a byte (1/0) to indicate the presence/absence of timerFamilyId | ||
// We can use this approach to add additional fields in the future | ||
if (!Strings.isNullOrEmpty(value.getTimerFamilyId())) { | ||
BOOLEAN_CODER.encode(true, outStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you help me understand why do we need a boolean flag? I suppose inStream.available() > 0
could be sufficient to indicate there will be extra fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was just for future proofing in the event we add more fields to TimerKey
. TimerFamilyId
is optional and in the case it is null/empty, I wanted to encode a flag to indicate the absence of a timer family id. If we add more fields in the future, we can easily decode in a backward compatible way using the flag.
The only way we can probably omit encoding this boolean flag is to encode an empty string ("") whenever family id is not present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you, actually we do not need that because when timer family id is not present, we will still encode null
in the bytes to actually indicate that timer family id is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajothomas : I think we should be able to blindly encode timeFamilyId as it's going to be defaulted as blank "". See TimerInternals: line 198. The new TimerDataCoder (TimerDataCoderV2) also encode it directly without the need to do null check. We should be able to follow that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this. As long as it defaults to empty string, StringUtf8Coder
will be able to encode/decode it. Will remove the boolean flag.
// encode a byte (1/0) to indicate the presence/absence of timerFamilyId | ||
// We can use this approach to add additional fields in the future | ||
if (!Strings.isNullOrEmpty(value.getTimerFamilyId())) { | ||
BOOLEAN_CODER.encode(true, outStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ajothomas : I think we should be able to blindly encode timeFamilyId as it's going to be defaulted as blank "". See TimerInternals: line 198. The new TimerDataCoder (TimerDataCoderV2) also encode it directly without the need to do null check. We should be able to follow that.
// check if the stream has more available bytes. This is to ensure backward compatibility with | ||
// old rocksdb state | ||
// which does not encode timer family data | ||
if (inStream.available() > 0 && BOOLEAN_CODER.decode(inStream)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As commented above, we can ignore this bool flag.
c1c5ecc
to
48c17e4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
return new TimerKey<>(key, namespace, timerId); | ||
return timerKeyBuilder.build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we do not need to declare the builder at the begining
we can do
return TimerKey.builder().setX().setY().setZ().build();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RIght, it seems nicer to have the builder in the end instead of cutting it in half.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed it.
} | ||
|
||
STRING_CODER.encode(value.getTimerFamilyId(), outStream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the answer is Yes but just to double check whether we want to encode it if timer family id is an empty string.
48c17e4
to
3eeb6bc
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Hi there, this PR breaks https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/. I just filed https://issues.apache.org/jira/browse/BEAM-12358 for tracking and please consider either reverting this change or having a fix. |
thanks, @boyuanzz, we are looking into it. |
Thank you : ) |
SamzaTimerInternalsFactory
stores timer state, event time and processing time timers alike, in the form of aMapState<TimerKey<K>, Long>
.TimerKey
, however, doesn't include theTimerFamilyId
and therefore the timer family information is not persisted. On the other hand, event time timers uses an additional time sorted set to persist the TimerData(which contains TimerFamilyId). We need to include timer family id in the TimerKey to ensure that it is persisted for Processing time timers.This PR aims to add
TimerFamilyId
toTimerKey
. Additionally, I am switching to the use ofAutoValue
to reduce boilerplate code.ValidatesRunner
compliance status (on master branch)Examples testing status on various runners
Post-Commit SDK/Transform Integration Tests Status (on master branch)
SamzaTimerInternalsFactory
stores timer state, event time and processing time timers alike, in the form of a MapState<TimerKey, Long>. TimerKey, however, doesn't include theTimerFamilyId
and is therefore not persisted. On the other hand, event time timers uses an additional time sorted set to persist the TimerData(which contains TimerFamilyId). We need to include timer family id in the TimerKey to ensure that it is persisted.Pre-Commit Tests Status (on master branch)
See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI.