Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up knife vault refresh #178

Closed
wants to merge 2 commits into from
Closed

Speed up knife vault refresh #178

wants to merge 2 commits into from

Conversation

tfheen
Copy link

@tfheen tfheen commented Jul 28, 2015

Use partial search and threads to speed up.

Fixes: #177

Doing a full search on the node index means the entire node object is
downloaded for each hit. As we are only interested in the name, only
ask for that.
@tfheen tfheen force-pushed the speedups branch 2 times, most recently from 288e941 to efc947c Compare July 28, 2015 06:55
@sysbot
Copy link

sysbot commented Jul 28, 2015

+1

@tfheen
Copy link
Author

tfheen commented Jul 28, 2015

I'm not entirely sure how to get the specs to run properly here, help appreciated. :-)

@jf647
Copy link
Contributor

jf647 commented Jul 28, 2015

Thanks for the idea/contribution.

I am however concerned about thread safety here. I don't believe that Chef guarantees that the chef-client code is thread safe, and even if the current implementation is on MRI, there's no guarantee that it will always be that way.

Inside of the thread, you're calling add_client, which in turn calls add on the ChefVault::ItemKeys object. That mutates the ItemKeys object (pushing entries onto the clients array), but without any protection around the update. On MRI that may work because all user code is implicitly single threaded, but if MRI ever improved its threading to match that of JRuby or Rubinius, it would be subject to race collisions.

It seems like the better way to speed things up is to not search for nodes names, then re-issue the search query to get the public key of each node. Is the public key something we can get from the original search query?

Otherwise, I'd want to see race protection and unit tests to exercise the edge cases in order to consider merging this.

@sysbot
Copy link

sysbot commented Jul 28, 2015

I was thinking about batching as well with the suggestion above (search for nodes, then search for keys from list of nodes). I'm concerned over possible race condition as well in between batch updates as there's no guarantee in between batch that the underlaying client info haven't changed.

@tfheen
Copy link
Author

tfheen commented Jul 30, 2015

]] James FitzGibbon

Inside of the thread, you're calling add_client, which in turn calls
add on the ChefVault::ItemKeys object. That mutates the ItemKeys
object (pushing entries onto the clients array), but without any
protection around the update. On MRI that may work because all user
code is implicitly single threaded, but if MRI ever improved its
threading to match that of JRuby or Rubinius, it would be subject to
race collisions.

Aren't pushing items onto an array thread safe in Ruby? Oh well. :-)

It seems like the better way to speed things up is to not search for
nodes names, then re-issue the search query to get the public key of
each node. Is the public key something we can get from the original
search query?

Nodes don't have keys. Clients have keys, so that's basically my other
suggestion, search the client index for

client_name = 'foo' OR client_name = 'bar' OR client_name = 'baz' OR …

This could potentially be even quicker, since fewer roundtrips.

I can whip up a patch to do this and you can see what you think?

Tollef Fog Heen
UNIX is user friendly, it's just picky about who its friends are

@jf647
Copy link
Contributor

jf647 commented Jul 30, 2015

In MRI, only one user thread can run at a time (more than one system thread can run simultaneously), so you get synchroniziation "for free", but it's implementation dependent and not guaranteed. There are also edge cases in which MRI isn't thread safe:

http://www.rubyinside.com/does-the-gil-make-your-ruby-code-thread-safe-6051.html

But even if we put a mutex around the updates to the item keys, there is no guarantee that the chef client code is thread safe or will remain so. The patch as written issues multiple API operations using the same underlying Chef::REST object in multiple threads. I'm not aware of any chef-authored code that does this, and Chef could choose to break this without warning (because it's not guaranteed by the API).

So, to do any of this multithreaded, there would need to be a mutex around the chef API operations and an mutex around the item key mutations, at which point why bother trying to multithread?

So trying to expand the clients into the "NAME or NAME or NAME" style may be a better route, but what happens with multi-thousand node installations? Does SOLR have a limit on how many clauses can be in the query? What's the algorithmic complexity of SOLR searches? I have no idea off the top of my head. Anything up to linear then your approach is worth exploring, but quadratic or worse is probably a show stopper.

I guess what I'm saying is that I don't see a safe, obvious solution to speed this up. Even the new Chef server API v1 doesn't fix this, as you don't get the key back from a client query - you have to pull a sub-resource (e.g. /clients/foo/keys/1) to get it.

I'm open to changes that make chef-vault better, but only if they don't introduce stability problems or require people to use Chef client or server v12 or Ruby 2. There's a huge installed base of people still using chef-client v11, and preventing them from using chef-vault would have to be paired with a major version bump.

Thanks

This will use a partial search for the client names for smaller
deployments, but use a full client search for large deployments.  It's
still a win due to the much lower number of roundtrips.
@tfheen
Copy link
Author

tfheen commented Aug 4, 2015

I've pushed a non-threaded solution. This runs in ~30s for me, as compared with ~15 minutes with master. Again, this will likely require test adjustments.

Happy for input on the approach.

@tfheen
Copy link
Author

tfheen commented Aug 11, 2015

Have you had a chance to think about the approach in the latest patch here? I believe it is correct (but will of course require fixing up the tests so it actually passes still)

@sysbot
Copy link

sysbot commented Aug 11, 2015

@tfheen I'll go ahead and verify the tests with the new patch (is this upstream now or still a branch?). I think it should still work out of the box since partial_search just go through the same chef-zero search mechanism that emulated by test-kitchen databags injecting databags into the local chef-zero environment.

@tfheen
Copy link
Author

tfheen commented Aug 11, 2015

@sysbot it's not merged yet (even if everything else was great, I don't think upstream would merge a patch with failing tests). As you can see from the status on the pull request, the tests fail (but fixing it should be fairly easy, from what I can see).

@jf647
Copy link
Contributor

jf647 commented Aug 21, 2015

The API for Chef::Search::Query changed between Chef 11 and Chef 12:

https://github.com/chef/chef/blob/11.18.14/lib/chef/search/query.rb

https://github.com/chef/chef/blob/12.4.1/lib/chef/search/query.rb

Using v11-style args against the v12 args works, but I don't think it works the other way. The hash of options that you're passing will be interpreted as the sort arg in 11, and it will attempt to URL-encode the .to_s representation of the Hash.

Support for chef-client v11 will not be dropped until the next major release of chef-vault.

I appreciate the attempt to solve this problem, and I'm not trying to be purposefully negative, but the speedup you're trying to achive is not trivial to do without breaking existing installations.

Perhaps what is required here is some ancillary library for managing chef-vault data bags, with its own cli. Then you could use threads with ridley or chef-api (not sure if that's threadsafe, so don't quote me) and get the speedup you desire without the change also affecting runtime decryption of secrets, which is the part that has to be compatible with Ruby 1.9 and chef-client v11.

thommay pushed a commit that referenced this pull request Nov 4, 2016
This is based on @tfheen's work in #178, and a suggestion from
@rveznaver in #237

Signed-off-by: Thom May <thom@chef.io>
thommay pushed a commit that referenced this pull request Nov 7, 2016
This is based on @tfheen's work in #178, and a suggestion from
@rveznaver in #237

Signed-off-by: Thom May <thom@chef.io>
@thommay
Copy link
Contributor

thommay commented Jan 24, 2017

Some of this landed in #240, and it's unlikely we'll rewrite to do threads any time soon. Thanks for the PR!

@tas50 tas50 added Triage: Needs Information Indicates an issue needs more information in order to work on it. and removed Status: Pending Contributor Response labels Dec 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage: Needs Information Indicates an issue needs more information in order to work on it. Type: Enhancement Adds new functionality.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants