Perform auto-completion refresh in background. #345

amjith · 2015-08-28T06:14:40Z

@dbcli/vcli-core @dbcli/pgcli-core

This is my first attempt at making the auto-completion refresh in a background thread. I tried it on a database with 40,000 tables in them and I can see a noticeable different in the startup time. I could get more granular by spawning a thread for each type of completion (eg: tables, views, columns etc), but that'll require some granular locking on data-structures.

Since we get a tremendous speed boost with just one thread, I'm not motivated to do multiple threads.

Please pitch in and let me know if there are better ways of doing this.

darikg · 2015-08-28T15:13:59Z

What happens if a second refresh is triggered while an earlier refresh is still occurring?

amjith · 2015-08-28T16:30:39Z

I didn't think about that. But there is at least two other enhancements I need to do:

Instantiate a separate pgexecute object inside the background_refresh() method. This will make sure the original pgexecute object is free to run queries entered by a user, while the refresh is happening in the background.
I need to make that thread a daemon thread, so the user can quit pgcli even if the auto-completion is running. The way it is right now, while the auto-completion is in progress an exit command will tie up the terminal and wait until that thread quits.
To address the issue raised by Darik. In background_refresh we set an Event when completion starts, all the threads that try to start a completion must wait on that event and only proceed when that event is cleared. That way only one of the threads is ever running the completion and additional threads are waiting for their turn.

Alternatively, we scrap the idea of triggering the completion refresh by an CREATE or ALTER statement, but instead have one thread that wakes up every 30 seconds to refresh completion. That way there is only one completion thread. Although this solves the problem of having up to date completions, I'm worried about the strain it might add to the database.

j-bennet · 2015-08-28T16:34:39Z

The thread that runs every 30 seconds does sound like a lot of overhead, especially if you're working with a remote server, and especially if you have those 40K tables (where did you get a DB like that)? I would rather stick to refreshing on CREATE/DROP/ALTER, but managing things with wait locks.

darikg · 2015-08-28T17:19:27Z

I think it might be better to have each new refresh call kill any extant refresh threads, instead of waiting on them. After all, you're only really interested in the results of the most recent refresh.

I also wonder if it would make sense to switch from doing refreshes in-place, to instead create a new pgcompleter object each time, so the parent pgcli doesn't have access to a partially refreshed completer. That way refreshes are atomic.

amjith · 2015-08-29T02:24:41Z

@j-bennet I wrote a script to generate tables in a loop and the loop ran for 40k iterations. I agree that it will be a strain if we keep doing it every 30seconds over a slow network on a really large database. But we can make that 30 second interval to be adaptive. We time how long it took to refresh the completions and then multiply it by 10 and use that as the interval. So on a fast network with a tiny table set the refresh will happen every 10seconds, but on a large db it can happen as slow as once every 5 mins. Just throwing out ideas at this point. We can table this idea of refreshing periodically for now. We'll come back to it once we figure out how to do the refresh correctly.

@darikg In python there is no reliable way to stop an active thread. But I do agree that we only need the results from the very last refresh and I like the idea of having the refresh atomic by replacing the pgcompleter object with a new one. So let's marry those two ideas. In each thread we make multiple queries to fill the table names, column names, view names etc. Between each query we check a flag or a semaphore. If the flag is set it means some other thread is waiting do the completion, so we quit early and let the next thread start over. Whichever thread finishes will do the atomic switching of pgcompleter object. There is one caveat, a user might request a manual refresh using \# and get impatient and keep doing it over and over again, which means the refresh routine will never get a chance to finish. So let's ignore requests for manual refresh if a refresh thread is already active.

It's getting a bit complicated, I'm not sure I like this idea yet. So please feel free to poke holes in it.

I'll try and make prototypes tonight. Feel free to make prototypes yourself if you have other ideas.

amjith · 2015-09-01T07:26:06Z

I've made 3 more changes since our discussion.

Use a new pgexecute to run the completion refresh queries.
Create a new completer object and atomically swap the new pgcompleter object with the existing one.
Restart the completion refresh when a new refresh request comes in while the first one is still inflight.

I'm not particularly proud of the code that is added to address point 3. (Contained in this commit: 4f74c2c).

I'd love to hear your thoughts and objections to this approach.

darikg · 2015-09-01T11:11:36Z

I feel like the code for 3 could be less repetitive if we were able to express the refreshing as an array of functions, something like:

def _schemata(completer, pgexecute):
    completer.set_search_path(pgexecute.search_path())
    completer.extend_schemata(pgexecute.schemata())

def background_refresh(self):
        completer = PGCompleter(smart_completion=True, pgspecial=self.pgspecial)
        pgexecute = PGExecute(e.dbname, e.user, e.password, e.host, e.port, e.dsn)

        refresh_funcs = [_schemata, _tables, _views, ] #...etc

        while True:
            for f in refresh_funcs:
                f(completer, pgexecute)

                if self._restart_completion.is_set():
                    self._restart_completion.clear()
                    continue

I think the difficulty here is that you did a really good job early on of isolating your pgexecute and pgcompleter classes, so you end up writing a bunch of boilerplate to interface between them. Maybe move that code in to a third class, kinda like:

class PGCompleterRefresher():
    def __init__(self, completer, pgexecute):
        self.completer = completer
        self.pgexecute = pgexecute

    def get_functions(self):
        # returns a list of functions required for refreshing
        return [self.schemata, self.tables, self.views]

    def schemata(self):
        self.completer.set_search_path(self.pgexecute.search_path())
        self.completer.extend_schemata(self.pgexecute.schemata())

    def tables(self):
        self.completer.extend_relations(self.pgexecute.tables(), kind='tables')
        self.completer.extend_columns(self.pgexecute.table_columns(), kind='tables')

    def views(self):
        self.completer.extend_relations(self.pgexecute.views(), kind='views')
        self.completer.extend_columns(self.pgexecute.view_columns(), kind='views')

amjith · 2015-09-01T18:03:47Z

It was a conscious decision to isolate pgexecute and pgcompleter. I hate tight coupling of classes.

I think your idea of a class called CompletionRefresher is what we need. I can see this completion logic getting even more complicated. For example: We might choose to refresh only certain completion candidates (say Tables only or Functions only etc) depending on the scenario. So encapsulating the individual refreshers in nice atomic methods is a good idea. I'll give that a shot.

amjith · 2015-09-04T05:54:52Z

This branch is ready for review. I've added a new CompletionRefresher class as suggested by Darik.

I had to add an OrderedDict module in the packages since OrderedDict is a Python 2.7 feature.

It took a while because it wasn't quite as straight forward as I had expected. But I think I've covered all the bases.

Please review the code and provide feedback.

j-bennet · 2015-09-04T19:39:28Z

I think your CompletionRefresher needs some unit tests @amjith. On another note, why does it matter in what order you refresh completions?

amjith · 2015-09-05T00:40:44Z

The order matters because we need to populate the schema before we can populate the tables. Otherwise the hierarchy falls apart.

I'll try to add some tests for that class.

j-bennet · 2015-09-05T00:44:59Z

Fair enough. This is nicer than trying to implement dependency resolution between handlers. Thanks for explaining.

amjith · 2015-09-05T03:03:00Z

I'll be honest with you, I don't know how to write tests for this class.

I can try to mock some of the refresh functions but then I'm not really sure what I'm testing. If you have some thoughts on testing it, please pitch in.

j-bennet · 2015-09-05T03:04:00Z

Let me take a stab at it, I may have some time tomorrow.
On Sep 4, 2015 8:03 PM, "Amjith Ramanujam" notifications@github.com wrote:

I'll be honest with you, I don't know how to write tests for this class.

I can try to mock some of the refresh functions but then I'm not really
sure what I'm testing. If you have some thoughts on testing it, please
pitch in.

—
Reply to this email directly or view it on GitHub
#345 (comment).

darikg · 2015-09-05T12:39:47Z

pgcli/main.py

@@ -88,6 +91,8 @@ def __init__(self, force_passwd_prompt=False, never_passwd_prompt=False,
        smart_completion = c['main'].as_bool('smart_completion')
        completer = PGCompleter(smart_completion, pgspecial=self.pgspecial)
        self.completer = completer
+        self._completer_lock = threading.Lock()
+        self._restart_completion = threading.Event()


This doesn't seem to be used. (completion_refresher initializes its own Event)

darikg · 2015-09-05T12:51:14Z

This is awesome. I left some trivial suggestions in a couple spots. I agree that unit-testing the refresher is pretty tricky -- we'd have to mock out pgexecute and pgcompleter, and then there's not much left to actually test. Adding \refresh to the behavioral tests would be cool though.

amjith · 2015-09-05T15:14:03Z

I've updated the PR to address the comments. I'm hopeful that @j-bennet might be able to write some unit-tests for it. Behavioral tests would be nice but I don't think they'll capture the nuances of the background thread or competing refresh calls.

But if it turns out to be too hard, we can merge without tests.

darikg · 2015-09-05T15:14:30Z

Just for kicks I hacked in amjith/completion-refresh-background...darikg/show-refreshing-status that prints "Refreshing completions..." in the statusbar while the refreshing thread is running. It blinks in and out pretty quickly for my biggest database, but might be useful for larger/slower databases?

amjith · 2015-09-05T16:27:54Z

😄 That's a clever way to use the callbacks. I knew having that callbacks as a list would come in handy some day. I didn't realize it was going to be useful this quickly. ;)

Well done!

I noticed that at first launch of pgcli the 'Refreshing completions...' doesn't go away until press enter. But if I enter \refresh it does go away when the refresh is done.

I'd say merge your branch into this PR and we'll merge it all together to master. :)

j-bennet · 2015-09-06T01:06:44Z

It's not much but I added some tests:
https://github.com/dbcli/pgcli/blob/j-bennet/completion-refresh-test/tests/test_completion_refresher.py

j-bennet · 2015-09-06T01:39:56Z

The tests seem to be ok now. I agree, it's tricky, there needs to be much mocking and not that many things you're able to verify, but covering some of it with tests is better than none.

amjith · 2015-09-06T02:00:53Z

Thank you very much for adding the tests. I appreciate it.

I think you've done an excellent job of testing the thread interaction which is the important part of the refresher class without worrying about the database interactions.

I like it!

There was one thing that didn't make sense for me, so I've requested some explanation from you, but other than that we're ready to roll that into this PR for merge.

darikg · 2015-09-06T11:50:59Z

I rebased my stuff on top of your latest to get rid of the merge conflict. I just need to fix that bug you mentioned about 'Refreshing completions...' showing up on launch.

j-bennet · 2015-09-06T17:19:44Z

This commit adds end-to-end test, for completeness.
6664337

amjith · 2015-09-07T04:30:25Z

Thank you for adding the end to end test. Can you merge your branch to mine and push it up? So this PR will have all the changes in one place.

j-bennet · 2015-09-07T04:53:41Z

I won't be able to do it until tomorrow. But feel free to do the merge yourself if you'd rather have it sooner.

Need to check if the cli exists *after* finishing refreshing, not before

j-bennet · 2015-09-08T16:16:32Z

@amjith Merged my tests into your branch.

amjith · 2015-09-12T02:39:07Z

@darikg Any update on merging your branch? If you don't have the time to work on that corner case issue, let me know, I'll take a look.

darikg · 2015-09-12T12:19:07Z

It should be good to go - can you test it and make sure the refreshing status disappears after startup like it's supposed to? I'm not sure how to merge it in github - make a PR against this branch and then accept it?

On Sep 11, 2015, at 10:39 PM, Amjith Ramanujam notifications@github.com wrote:

@darikg Any update on merging your branch? If you don't have the time to work on that corner case issue, let me know, I'll take a look.

—
Reply to this email directly or view it on GitHub.

…li/pgcli into amjith/completion-refresh-background

amjith · 2015-09-12T22:02:14Z

Thanks for fixing the bug. Everything is merged in. I've checked it out one last time to make sure it all works. I'm happy with it. So merging.

/cc @dbcli/vcli-core The changes in this PR is what is needed in vcli to get completion refresh running in background. Let us know if you have any questions.

Perform auto-completion refresh in background.

Perform auto-completion refresh in background.

9577671

amjith added the in progress label Aug 28, 2015

eliangcs mentioned this pull request Aug 30, 2015

Refresh auto-completion data in background dbcli/vcli#4

Closed

amjith added 3 commits August 30, 2015 21:02

Use a new pgexecute object to refresh completions.

f6d2c61

Atomically replace the completer object after a refresh.

5602163

Restart logic for completion refresh for concurrent requests.

4f74c2c

amjith added 3 commits September 3, 2015 22:37

Add a CompletionRefresher class.

b8d2b2a

Add a ordered dict to the packages for python 2.6 compatibility.

43494a0

Fix a race condition.

705d7e0

darikg reviewed Sep 5, 2015
View reviewed changes

Document callbacks, rename restart_completion and cleanup.

c9262ad

Added refresher tests.

62af9d3

darikg and others added 4 commits September 6, 2015 07:39

token argument wasn't being used

27d6cb3

Print refreshing in toolbar

7ade315

Initialize pgcli.cli in the constructor instead of using hasattr

278b7c3

Redraw the cli after the refreshing finishes

ef1886a

Functional test for refresh command.

6664337

Fix bug where refreshing status never disappeared

dfa7596

Need to check if the cli exists *after* finishing refreshing, not before

Merge branch 'amjith/completion-refresh-background' of github.com:dbc…

5f522b6

…li/pgcli into amjith/completion-refresh-background

amjith added a commit that referenced this pull request Sep 12, 2015

Merge pull request #345 from dbcli/amjith/completion-refresh-background

60a8d1b

Perform auto-completion refresh in background.

amjith merged commit 60a8d1b into master Sep 12, 2015

amjith removed the in progress label Sep 12, 2015

amjith mentioned this pull request Oct 19, 2015

Amjith/background completion dbcli/mycli#176

Merged

amjith deleted the amjith/completion-refresh-background branch October 31, 2015 04:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perform auto-completion refresh in background. #345

Perform auto-completion refresh in background. #345

amjith commented Aug 28, 2015

darikg commented Aug 28, 2015

amjith commented Aug 28, 2015

j-bennet commented Aug 28, 2015

darikg commented Aug 28, 2015

amjith commented Aug 29, 2015

amjith commented Sep 1, 2015

darikg commented Sep 1, 2015

amjith commented Sep 1, 2015

amjith commented Sep 4, 2015

j-bennet commented Sep 4, 2015

amjith commented Sep 5, 2015

j-bennet commented Sep 5, 2015

amjith commented Sep 5, 2015

j-bennet commented Sep 5, 2015

darikg Sep 5, 2015

darikg commented Sep 5, 2015

amjith commented Sep 5, 2015

darikg commented Sep 5, 2015

amjith commented Sep 5, 2015

j-bennet commented Sep 6, 2015

j-bennet commented Sep 6, 2015

amjith commented Sep 6, 2015

darikg commented Sep 6, 2015

j-bennet commented Sep 6, 2015

amjith commented Sep 7, 2015

j-bennet commented Sep 7, 2015

j-bennet commented Sep 8, 2015

amjith commented Sep 12, 2015

darikg commented Sep 12, 2015

amjith commented Sep 12, 2015

Perform auto-completion refresh in background. #345

Perform auto-completion refresh in background. #345

Conversation

amjith commented Aug 28, 2015

darikg commented Aug 28, 2015

amjith commented Aug 28, 2015

j-bennet commented Aug 28, 2015

darikg commented Aug 28, 2015

amjith commented Aug 29, 2015

amjith commented Sep 1, 2015

darikg commented Sep 1, 2015

amjith commented Sep 1, 2015

amjith commented Sep 4, 2015

j-bennet commented Sep 4, 2015

amjith commented Sep 5, 2015

j-bennet commented Sep 5, 2015

amjith commented Sep 5, 2015

j-bennet commented Sep 5, 2015

darikg Sep 5, 2015

Choose a reason for hiding this comment

darikg commented Sep 5, 2015

amjith commented Sep 5, 2015

darikg commented Sep 5, 2015

amjith commented Sep 5, 2015

j-bennet commented Sep 6, 2015

j-bennet commented Sep 6, 2015

amjith commented Sep 6, 2015

darikg commented Sep 6, 2015

j-bennet commented Sep 6, 2015

amjith commented Sep 7, 2015

j-bennet commented Sep 7, 2015

j-bennet commented Sep 8, 2015

amjith commented Sep 12, 2015

darikg commented Sep 12, 2015

amjith commented Sep 12, 2015