-
Notifications
You must be signed in to change notification settings - Fork 195
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle inaccurate information in backplane for fmb call #1381
Conversation
// ignore this, the worker will update the backplane eventually | ||
} else if (status.getCode() != Code.DEADLINE_EXCEEDED | ||
&& SHARD_IS_RETRIABLE.test(status)) { | ||
// why not, always | ||
workers.addLast(worker); | ||
} else { | ||
log.log( | ||
configs.getServer().isEnsureOutputsPresent() ? Level.SEVERE : Level.WARNING, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not a severe error, even if we're checking for an AC presence via EOP. SEVERE logs are reserved for bugs or consistency failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With bwob
flag enabled this one failure causes whole build to fail, also I believe the EOP flag was added to support bwob feature. So I thought SEVERE might be the better classification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Build failures because of expected behavior or conditions are still not SEVERE. Please change this and the
log elevation in bytestream to remaing a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reverted log level as per your suggestion.
List<String> workerSet = client.call(jedis -> state.storageWorkers.mget(jedis, workerNames)); | ||
|
||
return workerSet.stream() | ||
.filter(Objects::nonNull) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would you get null entries from a workerSet? (also calling it workerSet and being a list is a little confusing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redis.hmget returns null if the field doesn't exist in the map. Also updated unit test to cover this case.
https://redis.io/commands/hmget/
For every field that does not exist in the hash, a nil value is returned. Because non-existing keys are treated as empty hashes, running HMGET against a non-existing key will return a list of nil values.
Renamed from workerSet to workerList, thanks!
@werkt Can you please take a look again? I ran the shadow experiment for two days, there was no For us avg (with 120 workers) latency for FMB call reduced 15 times. With change 4.5ms, without this change 64ms |
@@ -660,7 +661,7 @@ public ListenableFuture<Iterable<Digest>> findMissingBlobs( | |||
// risk of returning expired workers despite filtering by active workers below. This is because | |||
// the strategy may return workers that have expired in the last 30 seconds. However, checking | |||
// workers directly is not a guarantee either since workers could leave the cluster after being | |||
// queried. Ultimitely, it will come down to the client's resiliency if the backplane is | |||
// queried. Ultimately, it will come down to the client's resiliency if the backplane is | |||
// out-of-date and the server lies about which blobs are actually present. We provide this | |||
// alternative strategy for calculating missing blobs. | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a refactor, whether before or after landing, to properly partition FMB into backplane and worker segments for clarity. If it's to be after, please follow up with an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created followup task, I will work on this next. #1393
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the biggest fan of passing epoch seconds around instead of Instant, but this seems well compartmentalized. Looking forward to the refactor.
A naive question: why don't workers have a unique id changing with each restart so we can uniquely identify them without referring to IP and start time? |
That was also an option. But I chose this approach because of these reasons.
|
Problem
When workers die, their stored references are not removed from the backplane. This creates the possibility that new workers may come up with the same IP address or use an IP address previously used by another terminated host. As a result, the backplane becomes unreliable, requiring us to query each worker individually to find missing blobs. Clearly, this approach is not scalable since any problems encountered by a single worker can significantly impact the performance of the buildfarm.
Past Work
We made code modifications for the
findMissingBlobs
function to exclusively query the backplane, prs: #1310, #1333, and #1342. This update implemented thefindMissingViaBackplane
flag. However, the above issues made thefindMissingViaBackplane
flag ineffective.Solution
To address the issue of imposter workers, updated code to compare the start time of each worker (first_registered_at) with the insertion time of the digest. Any worker whose start time is later than the digest insertion time is considered an imposter worker. Also, the code removes imposter workers associated with the digest in the same function call.
first_registered_at: Added new field first_registered_at to the worker data type. This field stores the initial start time of the worker. Worker informs the backplane about its start time, which is the same as the creation time of the cache directory (where all digests are stored) on the worker's disk.
digest insert time: The digest insertion time is calculated using the Time to Live (TTL) of the digest and the casExpire time. The formula for determining the digest insertion time is now() - configured casExpire + remaining ttl. In the current implementation, each worker updates the TTL of the digest upon completing the write operation. This means that the cas insert time in the backplane corresponds to the time when the last worker finished writing the digest on its disk.
Testing
Deployed the change to our buildfarm staging, and ran full monorepo build. To make sure that the code change solve terminated worker problem, terminated bunch of workers in the middle of build. This caused temporary not_found
error
, which eventually faded away (fmb call autocorrect blob location).In the above graph terminated workers during first build.
Future Improvement
The above solution might not work if user updates
cas_expire
time between two deployments as algorithm to calculatedigest_insert_time
depends tocas_expire
time.closes #1371