-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform auto-completion refresh in background. #345
Conversation
What happens if a second refresh is triggered while an earlier refresh is still occurring? |
I didn't think about that. But there is at least two other enhancements I need to do:
Alternatively, we scrap the idea of triggering the completion refresh by an |
The thread that runs every 30 seconds does sound like a lot of overhead, especially if you're working with a remote server, and especially if you have those 40K tables (where did you get a DB like that)? I would rather stick to refreshing on CREATE/DROP/ALTER, but managing things with wait locks. |
I think it might be better to have each new refresh call kill any extant refresh threads, instead of waiting on them. After all, you're only really interested in the results of the most recent refresh. I also wonder if it would make sense to switch from doing refreshes in-place, to instead create a new pgcompleter object each time, so the parent pgcli doesn't have access to a partially refreshed completer. That way refreshes are atomic. |
@j-bennet I wrote a script to generate tables in a loop and the loop ran for 40k iterations. I agree that it will be a strain if we keep doing it every 30seconds over a slow network on a really large database. But we can make that 30 second interval to be adaptive. We time how long it took to refresh the completions and then multiply it by 10 and use that as the interval. So on a fast network with a tiny table set the refresh will happen every 10seconds, but on a large db it can happen as slow as once every 5 mins. Just throwing out ideas at this point. We can table this idea of refreshing periodically for now. We'll come back to it once we figure out how to do the refresh correctly. @darikg In python there is no reliable way to stop an active thread. But I do agree that we only need the results from the very last refresh and I like the idea of having the refresh atomic by replacing the pgcompleter object with a new one. So let's marry those two ideas. In each thread we make multiple queries to fill the table names, column names, view names etc. Between each query we check a flag or a semaphore. If the flag is set it means some other thread is waiting do the completion, so we quit early and let the next thread start over. Whichever thread finishes will do the atomic switching of pgcompleter object. There is one caveat, a user might request a manual refresh using It's getting a bit complicated, I'm not sure I like this idea yet. So please feel free to poke holes in it. I'll try and make prototypes tonight. Feel free to make prototypes yourself if you have other ideas. |
I've made 3 more changes since our discussion.
I'm not particularly proud of the code that is added to address point 3. (Contained in this commit: 4f74c2c). I'd love to hear your thoughts and objections to this approach. |
I feel like the code for 3 could be less repetitive if we were able to express the refreshing as an array of functions, something like:
I think the difficulty here is that you did a really good job early on of isolating your pgexecute and pgcompleter classes, so you end up writing a bunch of boilerplate to interface between them. Maybe move that code in to a third class, kinda like:
|
It was a conscious decision to isolate pgexecute and pgcompleter. I hate tight coupling of classes. I think your idea of a class called CompletionRefresher is what we need. I can see this completion logic getting even more complicated. For example: We might choose to refresh only certain completion candidates (say Tables only or Functions only etc) depending on the scenario. So encapsulating the individual refreshers in nice atomic methods is a good idea. I'll give that a shot. |
This branch is ready for review. I've added a new CompletionRefresher class as suggested by Darik. I had to add an OrderedDict module in the packages since OrderedDict is a Python 2.7 feature. It took a while because it wasn't quite as straight forward as I had expected. But I think I've covered all the bases. Please review the code and provide feedback. |
I think your CompletionRefresher needs some unit tests @amjith. On another note, why does it matter in what order you refresh completions? |
The order matters because we need to populate the schema before we can populate the tables. Otherwise the hierarchy falls apart. I'll try to add some tests for that class. |
Fair enough. This is nicer than trying to implement dependency resolution between handlers. Thanks for explaining. |
I'll be honest with you, I don't know how to write tests for this class. I can try to mock some of the refresh functions but then I'm not really sure what I'm testing. If you have some thoughts on testing it, please pitch in. |
Let me take a stab at it, I may have some time tomorrow.
|
@@ -88,6 +91,8 @@ def __init__(self, force_passwd_prompt=False, never_passwd_prompt=False, | |||
smart_completion = c['main'].as_bool('smart_completion') | |||
completer = PGCompleter(smart_completion, pgspecial=self.pgspecial) | |||
self.completer = completer | |||
self._completer_lock = threading.Lock() | |||
self._restart_completion = threading.Event() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to be used. (completion_refresher
initializes its own Event)
This is awesome. I left some trivial suggestions in a couple spots. I agree that unit-testing the refresher is pretty tricky -- we'd have to mock out pgexecute and pgcompleter, and then there's not much left to actually test. Adding |
I've updated the PR to address the comments. I'm hopeful that @j-bennet might be able to write some unit-tests for it. Behavioral tests would be nice but I don't think they'll capture the nuances of the background thread or competing refresh calls. But if it turns out to be too hard, we can merge without tests. |
Just for kicks I hacked in amjith/completion-refresh-background...darikg/show-refreshing-status that prints "Refreshing completions..." in the statusbar while the refreshing thread is running. It blinks in and out pretty quickly for my biggest database, but might be useful for larger/slower databases? |
😄 That's a clever way to use the callbacks. I knew having that callbacks as a list would come in handy some day. I didn't realize it was going to be useful this quickly. ;) Well done! I noticed that at first launch of pgcli the 'Refreshing completions...' doesn't go away until press enter. But if I enter \refresh it does go away when the refresh is done. I'd say merge your branch into this PR and we'll merge it all together to master. :) |
It's not much but I added some tests: |
The tests seem to be ok now. I agree, it's tricky, there needs to be much mocking and not that many things you're able to verify, but covering some of it with tests is better than none. |
Thank you very much for adding the tests. I appreciate it. I think you've done an excellent job of testing the thread interaction which is the important part of the refresher class without worrying about the database interactions. I like it! There was one thing that didn't make sense for me, so I've requested some explanation from you, but other than that we're ready to roll that into this PR for merge. |
I rebased my stuff on top of your latest to get rid of the merge conflict. I just need to fix that bug you mentioned about 'Refreshing completions...' showing up on launch. |
This commit adds end-to-end test, for completeness. |
Thank you for adding the end to end test. Can you merge your branch to mine and push it up? So this PR will have all the changes in one place. |
I won't be able to do it until tomorrow. But feel free to do the merge yourself if you'd rather have it sooner. |
Need to check if the cli exists *after* finishing refreshing, not before
@amjith Merged my tests into your branch. |
@darikg Any update on merging your branch? If you don't have the time to work on that corner case issue, let me know, I'll take a look. |
It should be good to go - can you test it and make sure the refreshing status disappears after startup like it's supposed to? I'm not sure how to merge it in github - make a PR against this branch and then accept it?
|
…li/pgcli into amjith/completion-refresh-background
Thanks for fixing the bug. Everything is merged in. I've checked it out one last time to make sure it all works. I'm happy with it. So merging. /cc @dbcli/vcli-core The changes in this PR is what is needed in vcli to get completion refresh running in background. Let us know if you have any questions. |
Perform auto-completion refresh in background.
@dbcli/vcli-core @dbcli/pgcli-core
This is my first attempt at making the auto-completion refresh in a background thread. I tried it on a database with 40,000 tables in them and I can see a noticeable different in the startup time. I could get more granular by spawning a thread for each type of completion (eg: tables, views, columns etc), but that'll require some granular locking on data-structures.
Since we get a tremendous speed boost with just one thread, I'm not motivated to do multiple threads.
Please pitch in and let me know if there are better ways of doing this.