Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set expire and drop invocationId #1589

Merged
merged 14 commits into from
Jan 25, 2024
Merged

Conversation

coder1363691
Copy link
Contributor

i found there are some bad case when jedis.sadd for invocationId:

  1. if invocationId is null(""), then all operations will be put in the same redis set with key name ""
  2. if we don't expire invocationId, then memory seems to leak because more and more invocationId and operations in redis

Copy link

google-cla bot commented Dec 21, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@80degreeswest
Copy link
Collaborator

Coincidentally, we just noticed a memory leak in the Redis keys in our cluster last week.

Copy link
Collaborator

@80degreeswest 80degreeswest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coder1363691
Copy link
Contributor Author

Can you update the docs with the new config please. https://github.com/bazelbuild/bazel-buildfarm/blob/main/_site/docs/configuration/configuration.md

ok, have done update configuration, but unittest in Xcode failed somehow, i will check later

@werkt
Copy link
Collaborator

werkt commented Dec 22, 2023

@luxe why is the oss_audit ruleset trying to install python packages onto the host OS on mac?

@luxe
Copy link
Collaborator

luxe commented Dec 22, 2023

@luxe why is the oss_audit ruleset trying to install python packages onto the host OS on mac?

This is the behavior of rules_python. I'm guessing the mac env has also changed causing unhermetic issues.
I'd recommend we avoid auditing the jars on mac entirely:
#1592

@coder1363691
Copy link
Contributor Author

@luxe why is the oss_audit ruleset trying to install python packages onto the host OS on mac?

This is the behavior of rules_python. I'm guessing the mac env has also changed causing unhermetic issues. I'd recommend we avoid auditing the jars on mac entirely: #1592

Great, after use your patch, now ci passed.
i will merge this after your patch merged

Comment on lines 105 to 107
if (jedis.scard(invocationId) == 1) {
jedis.expire(invocationId, configs.getBackplane().getMaxInvocationIdTimeout());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change this to check the return value for sadd and set expiry if the addition was novel. Even better to do this in a serverside script, but I'll accept the hit of two reqs (rather than three) for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

addition

OK, but I didn't find a method to judge whether the addition is novel based on the return value of jedis.sadd, seems

  • return 1, successful but maybe not first addition
  • return 0, failed because value already exist
  • others, error occured
    do you have any better suggestion? thanks

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the redis docs, the integer return value is:

the number of elements that were added to the set, not including all the elements already present in the set.

And it says only that an error is returned if the value is not a set (so there may be other error behaviors), but very likely a JedisException (or some such) is thrown in this and any other problem case. The integer return doesn't seem to indicate any possibility of an error return as an integer (per 'others' here).

If you're only adding one element, the novel return is 1, the duplicate return is 0.

This sequence does represent a risk if this key is created and an error occurs before expiration set, leading me to prefer a serverside script, but we run this risk in several other places, so I'll say it's not incredibly scary. A periodic scanner to validate expirations for invocation prefixes may be in order eventually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the redis docs, the integer return value is:

the number of elements that were added to the set, not including all the elements already present in the set.

And it says only that an error is returned if the value is not a set (so there may be other error behaviors), but very likely a JedisException (or some such) is thrown in this and any other problem case. The integer return doesn't seem to indicate any possibility of an error return as an integer (per 'others' here).

If you're only adding one element, the novel return is 1, the duplicate return is 0.

This sequence does represent a risk if this key is created and an error occurs before expiration set, leading me to prefer a serverside script, but we run this risk in several other places, so I'll say it's not incredibly scary. A periodic scanner to validate expirations for invocation prefixes may be in order eventually.

Here is my test:

127.0.0.1:8888> smembers myset
(empty list or set)
127.0.0.1:8888> sadd myset foo
(integer) 1
127.0.0.1:8888> sadd myset bar
(integer) 1
127.0.0.1:8888> sadd myset bar
(integer) 0
127.0.0.1:8888> 
127.0.0.1:8888> scard nullset
(integer) 0
127.0.0.1:8888> 

Our expectation is, when we add an OperationName to redis.set named "invocationId", we should set and only set expiration of this redis.set when this redis.set newly created. In the future, if we need to add more OperationName to redis.set, we should not refresh its expiration time.
As shown in the previous test, both sadd myset foo and sadd myset bar returned 1. Therefore, in order to determine whether this redis.set is newly created, I used scard to check this. In this way, sadd, scard, and expire will be called only when redis.set is newly created for the first time, and only sadd and scard will be called when adding more OperationName to redis.set in the future.
If an error occurs when sadd trying to newly create a redis.set, then expire will not be executed because it is only executed once when the result of scard equals 1.
As for the issue of serverside scripts or pipeline(), I think it is a good method to handle batch tasks. However, in the current scenario, it seems relatively simple, so do we need to use scripts to operate this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our expectation is, when we add an OperationName to redis.set named "invocationId", we should set and only set expiration of this redis.set when this redis.set newly created.

I see that was your intent, but I prefer the policy that the last operation to run (i.e. be created) under an invocation should delimit the invocation expiration, not the first.

As for the issue of serverside scripts or pipeline(), I think it is a good method to handle batch tasks. However, in the current scenario, it seems relatively simple, so do we need to use scripts to operate this?

The critical section here is the case where a newly created invocation set receives an operation, but does not get an expiry set, letting it sit forever. The risk is low on this, so what you have is fine, pending the last operation expiry requested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

invocation

yeah, you are right. When there is a multithreading competition, there will indeed be a situation where an invocation will never be set with expiration. Sorry I missed this before.
So, we should set expiration every time we try to add an operation to invocation set which ensures it will never be expired before all operations are executed. And the expiration time is calculated after the last operation is added.
So would this be OK?

-      if (jedis.scard(invocationId) == 1) {
-        jedis.expire(invocationId, configs.getBackplane().getMaxInvocationIdTimeout());
-      }
+      jedis.expire(invocationId, configs.getBackplane().getMaxInvocationIdTimeout());

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works either way, whether trying to detect novel operations or just setting expiry on every potential sadd. I'd accept a check for sadd == 1, or just this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works either way, whether trying to detect novel operations or just setting expiry on every potential sadd. I'd accept a check for sadd == 1, or just this.

OK, had updated the MR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works either way, whether trying to detect novel operations or just setting expiry on every potential sadd. I'd accept a check for sadd == 1, or just this.

is there any update about this? so we can merge this PR, thanks

@werkt werkt merged commit fe2c51c into bazelbuild:main Jan 25, 2024
2 checks passed
@coder1363691 coder1363691 deleted the fix_invocationId branch January 26, 2024 06:45
amishra-u pushed a commit to amishra-u/bazel-buildfarm that referenced this pull request Feb 1, 2024
* set expire and drop invocationId

* fix format

* patch configuration doc

* fix typo

* re-run ci use luxe's patch

* set expire when every active operation insert

* fix return code

* fix ci if statement

---------

Co-authored-by: wangpengfei.pfwang <wangpengfei.pfwang@bytedance.com>
Co-authored-by: George Gensure <werkt@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants