Skip to content

Cache Get-GitStatus results to speed up the prompt #52

Closed
wants to merge 3 commits into from

6 participants

@aroben
aroben commented Jun 4, 2012

Get-GitStatus now caches the last status result and reuses it when possible. We invalidate the cache when either:

  1. the .git directory changes, or
  2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we computed status and comparing it to the current .git directory. We implement (2) by listening for events from System.IO.FileSystemWatcher objects to find out when any files in the directories we care about have changed.

Note that this doesn't make Get-GitStatus any faster when it does have to compute status. (And presumably actually makes it a tiny bit slower because of the extra work to maintain the cache.) But it makes Get-GitStatus much faster overall since status often doesn't have to be computed at all.

aroben added some commits Jun 4, 2012
@aroben aroben Make Get-GitDirectory work inside submodules
`git rev-parse --git-dir` shows the relative path to the .git directory,
and even knows how to do this for submodules. This lets us get rid of
our custom logic for finding the .git directory entirely.
1f4b3c4
@aroben aroben Add Get-GitWorkingDirectory
This returns the full path to the root of the Git working directory that
contains the current directory. E.g., if your working directory root is
at C:\dev\MyRepo, and your current directory is
C:\dev\MyRepo\ProjectFoo\packages, Get-GitWorkingDirectory will return
"C:\dev\MyRepo".
a5e8305
@aroben aroben Cache Get-GitStatus results to speed up the prompt
Get-GitStatus now caches the last status result and reuses it when
possible. We invalidate the cache when either:
    1. the .git directory changes, or
    2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we
computed status and comparing it to the current .git directory. We
implement (2) by listening for events from System.IO.FileSystemWatcher
objects to find out when any files in the directories we care about have
changed.

Note that this doesn't make Get-GitStatus any faster when it *does* have
to compute status. (And presumably actually makes it a tiny bit slower
because of the extra work to maintain the cache.) But it makes
Get-GitStatus much faster overall since status often doesn't have to be
computed at all.
b824248
This was referenced Jun 4, 2012
@aroben
aroben commented Jun 5, 2012

I just noticed a bug with this: if you cd into a .git directory, then back out, your status is left as GIT_DIR!.

@robertream Is this a problem with #52 as well?

@robertream

@aroben I just checked this with #51 and do not see this behavior

@jamesmanning

I'd love to see this pull request merged in - I just started using it and it's sooo much nicer since git status takes around 1.8 seconds on a particular repo I work in (long enough to be annoying).

I didn't quite understand this part of the description, though - it makes it sound like the .git directory is copied off somewhere and there's a tree diff of sorts happening, but perhaps I'm just misparsing it?

We implement (1) by caching the last .git directory for which we computed status and comparing it to the current .git directory

@aroben
aroben commented Nov 21, 2012

@jamesmanning In that comment I should have said ".git directory's path" instead of ".git directory". We're just caching the path and doing a string comparison.

@aroben
aroben commented Nov 21, 2012

I just noticed a bug with this: if you cd into a .git directory, then back out, your status is left as GIT_DIR!.

Presumably this just means that comparing .git directory paths is not sufficient, since you can end up with different statuses for the same .git directory even when no files have changed.

@jamesmanning

@aroben I tried cd'ing into and then back out of the .git directory in my repo and everything seemed to work fine - with $global:GitPromptSettings.Debug I could see that it was still (correctly) using the cached status ("Reusing old status"). I'm not sure what "your status is left as GIT_DIR!" means, though - is there a particular set of repro steps I can use to see the issue? I'd like to try and help debug it if that's what's preventing Keith from merging in this pull request. :)

@aroben
aroben commented Nov 21, 2012

Here's what I see in current posh-git:

> cd c:\my-repo
[master] > cd .git
> cd objects
[GIT_DIR!] > cd c:\my-repo
[master] >

That behavior should be preserved, I think, unless we can think of a new, better behavior.

@ebelew
ebelew commented Apr 1, 2013

I saw a couple issues.
1> git rev-parse requires an actual process execution. It is (on my machine), 1/10th the speed of the current get-gitdirectory method. This is of course on a relatively normal path, not edge cases > 15 directories deep
2> I don't see anywhere in the caching code that takes the branch into account. I can stay in the same folder and change branches, but according to the caching code, the status would be identical

@aroben
aroben commented Apr 1, 2013

2> I don't see anywhere in the caching code that takes the branch into account.

Switching branches is taken care of automatically because we invalidate the cache on any filesystem changes beneath the working directory. Switching branches updates .git/HEAD, so the cache will get invalidated.

@Haacked
Haacked commented Apr 22, 2013

Any updates on getting this change in?

@dahlbyk
Owner
dahlbyk commented Apr 22, 2013

Any updates on getting this change in?

I had been planning to revisit this after switching to use LibGit2Sharp under the hood and neither has happened yet. This has waited far too long, let's get this merged.

Between this and #51, I think I would prefer to start with this lighter-weight solution, especially because it never has to deal with showing status that's out-of-date. I'll use this a bit with Debug=$true set and if nothing alarming comes up we shall run with it.

@dahlbyk dahlbyk added a commit that referenced this pull request Apr 23, 2013
@aroben aroben Cache Get-GitStatus results to speed up the prompt
Get-GitStatus now caches the last status result and reuses it when
possible. We invalidate the cache when either:
    1. the .git directory changes, or
    2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we
computed status and comparing it to the current .git directory. We
implement (2) by listening for events from System.IO.FileSystemWatcher
objects to find out when any files in the directories we care about have
changed.

Note that this doesn't make Get-GitStatus any faster when it *does* have
to compute status. (And presumably actually makes it a tiny bit slower
because of the extra work to maintain the cache.) But it makes
Get-GitStatus much faster overall since status often doesn't have to be
computed at all.

Closes #52
3cb54e4
@dahlbyk
Owner
dahlbyk commented Apr 23, 2013

https://github.com/dahlbyk/posh-git/compare/gh52 includes a trivial refactoring and a fix for the .git directory behavior (by clearing and disabling cache, since there's no working directory status to check), if anyone wants to try it out.

@jamesmanning

can someone have/use a modified version of Get-GitStatus that they wanted to maintain outside of this file? I had made my own changes to it in the past (unfortunately, actually modifying the GitUtils.ps1), only to have a GitHub ClickOnce upgrade overwrite it, so I'm hoping to avoid making that mistake again. :)

random thoughts from looking at that diff - i've never used submodules, so this may just be from my misunderstanding of what a tree with submodules looks like and how it should be handled. :) None of these are bugs/blockers, so feel free to ignore. :)

  • rev-parse --git-dir redirects stderr to null, but the rev-parse --show-cdup doesn't - should it?
  • the Get-GitWorkingDirectory behavior seems odd/inconsistent - if you're inside the .git directory or under it, it returns the pwd - otherwise, it returns the root of the working tree.
  • it looks like the existing code already does a "are we inside the .git directory?" check via git rev-parse --is-inside-git-dir (which certainly seems like the right tool for the job AFAICT) but we don't seem to leverage that same approach elsewhere - could these places just use that

    • Looking at the rest of the diff, it seems like these places could use the --is-inside-git-dir result instead?
    • !(Get-Location).Path.StartsWith($gitDir) (this one seems problematic from a case-sensitivity POV?)
    • if ($workingDirectory.StartsWith($gitDir)) (seems fragile IMHO since it's case-sensitive and it depends on Get-GitWorkingDirectory returning the pwd when under .git dir - if someone ever 'fixed' Get-GitWorkingDirectory to return the root of the working tree regardless of whether you're under the git subdir or not, this would break.
  • probably a stupid question, but WRT "Stop listening for changes before running git-status so we don't pick up changes from that command." - does running status really change contents of files? I could see it updating the 'last accessed time' on files (for filesystems with that enabled - recent NTFS doesn't by default IIRC) but the filesystem watcher events we're looking for don't seem like they would hit that - if that's the issue, it seems like it'd be better to just mask out the LastAccess bit in the NotifyFilters instead of unhooking/rehooking events?

@aroben
aroben commented Apr 23, 2013

does running status really change contents of files?

I was seeing file changed notifications when running git status, which is why I added that comment and adjusted when we stop listening for notifications. I can't remember exactly which files I saw change, though, and didn't write it down. :-(

@dahlbyk
Owner
dahlbyk commented Apr 23, 2013

While trying some potential refactorings (moving the Get-GitWorkingDirectory call higher up, for example), I started seeing some Changed notifications for the .git directory, likely due to the LastAccess. NotifyFilters definitely looks like it's worth using here.

@jamesmanning

Nevermind - I got a chance to just run a watch from linqpad and saw it creates and deletes a lock file:

var watcher = new FileSystemWatcher(@"c:\github\Main")
{
    IncludeSubdirectories = true,
    EnableRaisingEvents = true,
    NotifyFilter = 
        NotifyFilters.CreationTime | 
        NotifyFilters.DirectoryName |
        NotifyFilters.FileName |
        NotifyFilters.LastWrite |
        NotifyFilters.Size,
};

while (true)
{
    var result = watcher.WaitForChanged(WatcherChangeTypes.All);
    Console.WriteLine ("Change {0}, Name={1}, OldName={2}",
                       result.ChangeType,
                       result.Name,
                       result.OldName);
}

resulting in this during a 'git status'

Change Created, Name=.git\index.lock, OldName=
Change Created, Name=.git\index.lock, OldName=
Change Deleted, Name=.git\index.lock, OldName=
Change Changed, Name=.git, OldName=
@dahlbyk
Owner
dahlbyk commented Apr 23, 2013

can someone have/use a modified version of Get-GitStatus that they wanted to maintain outside of this file? I had made my own changes to it in the past (unfortunately, actually modifying the GitUtils.ps1), only to have a GitHub ClickOnce upgrade overwrite it, so I'm hoping to avoid making that mistake again. :)

I've not heard of anyone else doing this, and it's not really supported at the moment. What did you need to change?

@dahlbyk
Owner
dahlbyk commented Apr 23, 2013

Nevermind - I got a chance to just run a watch from linqpad and saw it creates and deletes a lock file

We could potentially ignore lock files - if something interesting happens other files will change as a result.

@aroben
aroben commented Apr 23, 2013

It still seems conceptually right to me not to listen for filesystem changes while we're doing internal work in Get-GitStatus. We know that we're not doing anything to modify the status, so why even listen for changes?

@jamesmanning

@dahlbyk - in the future, I could see someone attempting to use their own caching (for instance, if someone is trying to add caching libgit2sharp), but in my case I was modifying it to add @aroben's patch since git status takes over 3 seconds in one of my repos :) If it's not a simple change, it's certainly not worth it, just wanted to ask in case it was an easy thing to do :)

@dahlbyk
Owner
dahlbyk commented Apr 23, 2013

@dahlbyk - in the future, I could see someone attempting to use their own caching (for instance, if someone is trying to add caching libgit2sharp), but in my case I was modifying it to add @aroben's patch since git status takes over 3 seconds in one of my repos :) If it's not a simple change, it's certainly not worth it, just wanted to ask in case it was an easy thing to do :)

If you're going to hack on posh-git, I'd suggest just cloning it yourself and installing it stand-alone. posh-git is smart enough not to load itself twice, and AFAIK GH4W loads its posh-git after loading $PROFILE.

It still seems conceptually right to me not to listen for filesystem changes while we're doing internal work in Get-GitStatus. We know that we're not doing anything to modify the status, so why even listen for changes?

These seems sound to me, though we might lose less performance if we don't tear down and rebuild watchers each time.

@aroben
aroben commented Apr 23, 2013

I guess another option would be to check for pending events before running git status, and then clear them after running git status, in case any new events were generated by that command. (You'd also have to clear them out before returning if we decided to use the cached status.)

@jamesmanning

@aroben - it's introducing a race condition that seems problematic IMHO - sure, this particular chunk of code isn't doing anything to modify the status, but we don't know what else is running in the system (or even in the same shell). A text editor could be saving files, there could be a 'git checkout' running in another shell to switch the current branch, a 'git gui' window could be open and staging a change, etc.

If there are certain changes that are known to occur but don't mean anything (like lock files), then it seems to make more sense to either not listen to those, or if the API doesn't allow ignoring particular files/patterns (as an exclusion), then either have a function that determines whether a given changed file path is ignorable, or keep a list of such files (admittedly problematic since you'd eventually want to duplicate .gitignore behavior).

To be clear, I'm not talking about changing this behavior right now - what is there is more than good enough for the vast majority of cases, and has worked great for me. :)

@dahlbyk
Owner
dahlbyk commented Apr 23, 2013

FWIW, #51 ignores .git, .git\index.lock, and .git\objects\*.

@dahlbyk
Owner
dahlbyk commented May 1, 2013

Another caveat: checkouts with many changes (e.g. checking out a rather old branch) can result in a slow prompt that can only be fixed by allowing all recorded events to be purged:

C:\Dev\GitHub\libgit2sharp
VERBOSE: 49760:Found 64536 status events
VERBOSE: 49763:Getting status
VERBOSE: 50077:Parsing status
VERBOSE: 50078:Status: ## gh372
VERBOSE: 50080:Status:  M LibGit2Sharp.Tests/StatusFixture.cs
VERBOSE: 50081:Status:  M libgit2
VERBOSE: 50082:Building status object
VERBOSE: 50103:Watching for changes in C:\Dev\GitHub\libgit2sharp
VERBOSE: 50107:Finished
 [gh372 +0 ~2 -0]>

Yes, that's 49 seconds. Perhaps it would be better to turn off the event listener as soon as one interesting change is detected?

@aroben
aroben commented May 1, 2013
@dahlbyk
Owner
dahlbyk commented May 30, 2013

Perhaps it would be better to turn off the event listener as soon as one interesting change is detected?

This is turning out to be more challenging than I had hoped. If we continue to use the event queue, we never really have a chance to turn off the listener until the next new prompt, at which point it may be too late. Unless there's a way to clear out the status events asynchronously, that's not going to work.

The other option I've explored is using Register-ObjectEvent -Action { ... } to execute code immediately when a change is detected (specifically, to clear out the cached Status and GitDirectory). This seems to work well enough, but it doesn't like a job's action unregistering the job. And more importantly, PowerShell is crashing and I've not been able to figure out why... Any ideas?

@aroben
aroben commented May 30, 2013

What specifically is slow? Is it the Get-Event call?

@dahlbyk
Owner
dahlbyk commented May 30, 2013

The slowness is from clearing out thousands of items in the event queue.

@aroben
aroben commented May 30, 2013

Maybe it would be faster if we kept track of the SourceIdentifiers we've used when registering events, and then do:

foreach ($id in $SourceIdentifiers) {
    Remove-Event -SourceIdentifier $id
}

The list of SourceIdentifiers will be the same length as $Global:GitStatusCache.Events, rather than scaling with the number of events.

@dahlbyk
Owner
dahlbyk commented May 30, 2013

If Remove-Event clears everything that's queued for that event too, that would be much better. I'll give it a try.

@aroben
aroben commented Nov 4, 2014

This is pretty out of date now.

@aroben aroben closed this Nov 4, 2014
@aroben aroben deleted the aroben:cache-git-status branch Nov 4, 2014
@dahlbyk
Owner
dahlbyk commented Nov 16, 2014

Yeah, thanks for taking the time to explore this option... ultimately I think I've settled on an (optional) external watcher (a la #145) as the only reasonable way to address this.

@jamesmanning

I love the idea of that external watcher, and hopefully @fieryorc will be able to get the pull request for that done soon, as I'd definitely prefer it to the 'async git status call' that my fork currently does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.