Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform auto-completion refresh in background. #345

Merged
merged 16 commits into from
Sep 12, 2015

Conversation

amjith
Copy link
Member

@amjith amjith commented Aug 28, 2015

@dbcli/vcli-core @dbcli/pgcli-core

This is my first attempt at making the auto-completion refresh in a background thread. I tried it on a database with 40,000 tables in them and I can see a noticeable different in the startup time. I could get more granular by spawning a thread for each type of completion (eg: tables, views, columns etc), but that'll require some granular locking on data-structures.

Since we get a tremendous speed boost with just one thread, I'm not motivated to do multiple threads.

Please pitch in and let me know if there are better ways of doing this.

@darikg
Copy link
Contributor

darikg commented Aug 28, 2015

What happens if a second refresh is triggered while an earlier refresh is still occurring?

@amjith
Copy link
Member Author

amjith commented Aug 28, 2015

I didn't think about that. But there is at least two other enhancements I need to do:

  1. Instantiate a separate pgexecute object inside the background_refresh() method. This will make sure the original pgexecute object is free to run queries entered by a user, while the refresh is happening in the background.
  2. I need to make that thread a daemon thread, so the user can quit pgcli even if the auto-completion is running. The way it is right now, while the auto-completion is in progress an exit command will tie up the terminal and wait until that thread quits.
  3. To address the issue raised by Darik. In background_refresh we set an Event when completion starts, all the threads that try to start a completion must wait on that event and only proceed when that event is cleared. That way only one of the threads is ever running the completion and additional threads are waiting for their turn.

Alternatively, we scrap the idea of triggering the completion refresh by an CREATE or ALTER statement, but instead have one thread that wakes up every 30 seconds to refresh completion. That way there is only one completion thread. Although this solves the problem of having up to date completions, I'm worried about the strain it might add to the database.

@j-bennet
Copy link
Contributor

The thread that runs every 30 seconds does sound like a lot of overhead, especially if you're working with a remote server, and especially if you have those 40K tables (where did you get a DB like that)? I would rather stick to refreshing on CREATE/DROP/ALTER, but managing things with wait locks.

@darikg
Copy link
Contributor

darikg commented Aug 28, 2015

I think it might be better to have each new refresh call kill any extant refresh threads, instead of waiting on them. After all, you're only really interested in the results of the most recent refresh.

I also wonder if it would make sense to switch from doing refreshes in-place, to instead create a new pgcompleter object each time, so the parent pgcli doesn't have access to a partially refreshed completer. That way refreshes are atomic.

@amjith
Copy link
Member Author

amjith commented Aug 29, 2015

@j-bennet I wrote a script to generate tables in a loop and the loop ran for 40k iterations. I agree that it will be a strain if we keep doing it every 30seconds over a slow network on a really large database. But we can make that 30 second interval to be adaptive. We time how long it took to refresh the completions and then multiply it by 10 and use that as the interval. So on a fast network with a tiny table set the refresh will happen every 10seconds, but on a large db it can happen as slow as once every 5 mins. Just throwing out ideas at this point. We can table this idea of refreshing periodically for now. We'll come back to it once we figure out how to do the refresh correctly.

@darikg In python there is no reliable way to stop an active thread. But I do agree that we only need the results from the very last refresh and I like the idea of having the refresh atomic by replacing the pgcompleter object with a new one. So let's marry those two ideas. In each thread we make multiple queries to fill the table names, column names, view names etc. Between each query we check a flag or a semaphore. If the flag is set it means some other thread is waiting do the completion, so we quit early and let the next thread start over. Whichever thread finishes will do the atomic switching of pgcompleter object. There is one caveat, a user might request a manual refresh using \# and get impatient and keep doing it over and over again, which means the refresh routine will never get a chance to finish. So let's ignore requests for manual refresh if a refresh thread is already active.

It's getting a bit complicated, I'm not sure I like this idea yet. So please feel free to poke holes in it.

I'll try and make prototypes tonight. Feel free to make prototypes yourself if you have other ideas.

@amjith
Copy link
Member Author

amjith commented Sep 1, 2015

I've made 3 more changes since our discussion.

  1. Use a new pgexecute to run the completion refresh queries.
  2. Create a new completer object and atomically swap the new pgcompleter object with the existing one.
  3. Restart the completion refresh when a new refresh request comes in while the first one is still inflight.

I'm not particularly proud of the code that is added to address point 3. (Contained in this commit: 4f74c2c).

I'd love to hear your thoughts and objections to this approach.

@darikg
Copy link
Contributor

darikg commented Sep 1, 2015

I feel like the code for 3 could be less repetitive if we were able to express the refreshing as an array of functions, something like:

def _schemata(completer, pgexecute):
    completer.set_search_path(pgexecute.search_path())
    completer.extend_schemata(pgexecute.schemata())

def background_refresh(self):
        completer = PGCompleter(smart_completion=True, pgspecial=self.pgspecial)
        pgexecute = PGExecute(e.dbname, e.user, e.password, e.host, e.port, e.dsn)

        refresh_funcs = [_schemata, _tables, _views, ] #...etc

        while True:
            for f in refresh_funcs:
                f(completer, pgexecute)

                if self._restart_completion.is_set():
                    self._restart_completion.clear()
                    continue

I think the difficulty here is that you did a really good job early on of isolating your pgexecute and pgcompleter classes, so you end up writing a bunch of boilerplate to interface between them. Maybe move that code in to a third class, kinda like:

class PGCompleterRefresher():
    def __init__(self, completer, pgexecute):
        self.completer = completer
        self.pgexecute = pgexecute

    def get_functions(self):
        # returns a list of functions required for refreshing
        return [self.schemata, self.tables, self.views]

    def schemata(self):
        self.completer.set_search_path(self.pgexecute.search_path())
        self.completer.extend_schemata(self.pgexecute.schemata())

    def tables(self):
        self.completer.extend_relations(self.pgexecute.tables(), kind='tables')
        self.completer.extend_columns(self.pgexecute.table_columns(), kind='tables')

    def views(self):
        self.completer.extend_relations(self.pgexecute.views(), kind='views')
        self.completer.extend_columns(self.pgexecute.view_columns(), kind='views')

@amjith
Copy link
Member Author

amjith commented Sep 1, 2015

It was a conscious decision to isolate pgexecute and pgcompleter. I hate tight coupling of classes.

I think your idea of a class called CompletionRefresher is what we need. I can see this completion logic getting even more complicated. For example: We might choose to refresh only certain completion candidates (say Tables only or Functions only etc) depending on the scenario. So encapsulating the individual refreshers in nice atomic methods is a good idea. I'll give that a shot.

@amjith
Copy link
Member Author

amjith commented Sep 4, 2015

This branch is ready for review. I've added a new CompletionRefresher class as suggested by Darik.

I had to add an OrderedDict module in the packages since OrderedDict is a Python 2.7 feature.

It took a while because it wasn't quite as straight forward as I had expected. But I think I've covered all the bases.

Please review the code and provide feedback.

@j-bennet
Copy link
Contributor

j-bennet commented Sep 4, 2015

I think your CompletionRefresher needs some unit tests @amjith. On another note, why does it matter in what order you refresh completions?

@amjith
Copy link
Member Author

amjith commented Sep 5, 2015

The order matters because we need to populate the schema before we can populate the tables. Otherwise the hierarchy falls apart.

I'll try to add some tests for that class.

@j-bennet
Copy link
Contributor

j-bennet commented Sep 5, 2015

Fair enough. This is nicer than trying to implement dependency resolution between handlers. Thanks for explaining.

@amjith
Copy link
Member Author

amjith commented Sep 5, 2015

I'll be honest with you, I don't know how to write tests for this class.

I can try to mock some of the refresh functions but then I'm not really sure what I'm testing. If you have some thoughts on testing it, please pitch in.

@j-bennet
Copy link
Contributor

j-bennet commented Sep 5, 2015

Let me take a stab at it, I may have some time tomorrow.
On Sep 4, 2015 8:03 PM, "Amjith Ramanujam" notifications@github.com wrote:

I'll be honest with you, I don't know how to write tests for this class.

I can try to mock some of the refresh functions but then I'm not really
sure what I'm testing. If you have some thoughts on testing it, please
pitch in.


Reply to this email directly or view it on GitHub
#345 (comment).

@@ -88,6 +91,8 @@ def __init__(self, force_passwd_prompt=False, never_passwd_prompt=False,
smart_completion = c['main'].as_bool('smart_completion')
completer = PGCompleter(smart_completion, pgspecial=self.pgspecial)
self.completer = completer
self._completer_lock = threading.Lock()
self._restart_completion = threading.Event()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to be used. (completion_refresher initializes its own Event)

@darikg
Copy link
Contributor

darikg commented Sep 5, 2015

This is awesome. I left some trivial suggestions in a couple spots. I agree that unit-testing the refresher is pretty tricky -- we'd have to mock out pgexecute and pgcompleter, and then there's not much left to actually test. Adding \refresh to the behavioral tests would be cool though.

@amjith
Copy link
Member Author

amjith commented Sep 5, 2015

I've updated the PR to address the comments. I'm hopeful that @j-bennet might be able to write some unit-tests for it. Behavioral tests would be nice but I don't think they'll capture the nuances of the background thread or competing refresh calls.

But if it turns out to be too hard, we can merge without tests.

@darikg
Copy link
Contributor

darikg commented Sep 5, 2015

Just for kicks I hacked in amjith/completion-refresh-background...darikg/show-refreshing-status that prints "Refreshing completions..." in the statusbar while the refreshing thread is running. It blinks in and out pretty quickly for my biggest database, but might be useful for larger/slower databases?

@amjith
Copy link
Member Author

amjith commented Sep 5, 2015

😄 That's a clever way to use the callbacks. I knew having that callbacks as a list would come in handy some day. I didn't realize it was going to be useful this quickly. ;)

Well done!

I noticed that at first launch of pgcli the 'Refreshing completions...' doesn't go away until press enter. But if I enter \refresh it does go away when the refresh is done.

I'd say merge your branch into this PR and we'll merge it all together to master. :)

@j-bennet
Copy link
Contributor

j-bennet commented Sep 6, 2015

@j-bennet
Copy link
Contributor

j-bennet commented Sep 6, 2015

The tests seem to be ok now. I agree, it's tricky, there needs to be much mocking and not that many things you're able to verify, but covering some of it with tests is better than none.

@amjith
Copy link
Member Author

amjith commented Sep 6, 2015

Thank you very much for adding the tests. I appreciate it.

I think you've done an excellent job of testing the thread interaction which is the important part of the refresher class without worrying about the database interactions.

I like it!

There was one thing that didn't make sense for me, so I've requested some explanation from you, but other than that we're ready to roll that into this PR for merge.

@darikg
Copy link
Contributor

darikg commented Sep 6, 2015

I rebased my stuff on top of your latest to get rid of the merge conflict. I just need to fix that bug you mentioned about 'Refreshing completions...' showing up on launch.

@j-bennet
Copy link
Contributor

j-bennet commented Sep 6, 2015

This commit adds end-to-end test, for completeness.
6664337

@amjith
Copy link
Member Author

amjith commented Sep 7, 2015

Thank you for adding the end to end test. Can you merge your branch to mine and push it up? So this PR will have all the changes in one place.

@j-bennet
Copy link
Contributor

j-bennet commented Sep 7, 2015

I won't be able to do it until tomorrow. But feel free to do the merge yourself if you'd rather have it sooner.

Need to check if the cli exists *after* finishing refreshing, not before
@j-bennet
Copy link
Contributor

j-bennet commented Sep 8, 2015

@amjith Merged my tests into your branch.

@amjith
Copy link
Member Author

amjith commented Sep 12, 2015

@darikg Any update on merging your branch? If you don't have the time to work on that corner case issue, let me know, I'll take a look.

@darikg
Copy link
Contributor

darikg commented Sep 12, 2015

It should be good to go - can you test it and make sure the refreshing status disappears after startup like it's supposed to? I'm not sure how to merge it in github - make a PR against this branch and then accept it?

On Sep 11, 2015, at 10:39 PM, Amjith Ramanujam notifications@github.com wrote:

@darikg Any update on merging your branch? If you don't have the time to work on that corner case issue, let me know, I'll take a look.


Reply to this email directly or view it on GitHub.

…li/pgcli into amjith/completion-refresh-background
@amjith
Copy link
Member Author

amjith commented Sep 12, 2015

Thanks for fixing the bug. Everything is merged in. I've checked it out one last time to make sure it all works. I'm happy with it. So merging.

/cc @dbcli/vcli-core The changes in this PR is what is needed in vcli to get completion refresh running in background. Let us know if you have any questions.

amjith added a commit that referenced this pull request Sep 12, 2015
Perform auto-completion refresh in background.
@amjith amjith merged commit 60a8d1b into master Sep 12, 2015
@amjith amjith deleted the amjith/completion-refresh-background branch October 31, 2015 04:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants