Skip to content
This repository

Cache Get-GitStatus results to speed up the prompt #52

Open
wants to merge 3 commits into from

6 participants

Adam Roben Robert Ream James Manning ebelew Phil Haack Keith Dahlby
Adam Roben
aroben commented June 04, 2012

Get-GitStatus now caches the last status result and reuses it when possible. We invalidate the cache when either:

  1. the .git directory changes, or
  2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we computed status and comparing it to the current .git directory. We implement (2) by listening for events from System.IO.FileSystemWatcher objects to find out when any files in the directories we care about have changed.

Note that this doesn't make Get-GitStatus any faster when it does have to compute status. (And presumably actually makes it a tiny bit slower because of the extra work to maintain the cache.) But it makes Get-GitStatus much faster overall since status often doesn't have to be computed at all.

added some commits June 04, 2012
Adam Roben Make Get-GitDirectory work inside submodules
`git rev-parse --git-dir` shows the relative path to the .git directory,
and even knows how to do this for submodules. This lets us get rid of
our custom logic for finding the .git directory entirely.
1f4b3c4
Adam Roben Add Get-GitWorkingDirectory
This returns the full path to the root of the Git working directory that
contains the current directory. E.g., if your working directory root is
at C:\dev\MyRepo, and your current directory is
C:\dev\MyRepo\ProjectFoo\packages, Get-GitWorkingDirectory will return
"C:\dev\MyRepo".
a5e8305
Adam Roben Cache Get-GitStatus results to speed up the prompt
Get-GitStatus now caches the last status result and reuses it when
possible. We invalidate the cache when either:
    1. the .git directory changes, or
    2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we
computed status and comparing it to the current .git directory. We
implement (2) by listening for events from System.IO.FileSystemWatcher
objects to find out when any files in the directories we care about have
changed.

Note that this doesn't make Get-GitStatus any faster when it *does* have
to compute status. (And presumably actually makes it a tiny bit slower
because of the extra work to maintain the cache.) But it makes
Get-GitStatus much faster overall since status often doesn't have to be
computed at all.
b824248
Adam Roben aroben referenced this pull request June 04, 2012
Closed

Posh-git is too slow #50

Adam Roben
aroben commented June 05, 2012

I just noticed a bug with this: if you cd into a .git directory, then back out, your status is left as GIT_DIR!.

@robertream Is this a problem with #52 as well?

Robert Ream

@aroben I just checked this with #51 and do not see this behavior

James Manning

I'd love to see this pull request merged in - I just started using it and it's sooo much nicer since git status takes around 1.8 seconds on a particular repo I work in (long enough to be annoying).

I didn't quite understand this part of the description, though - it makes it sound like the .git directory is copied off somewhere and there's a tree diff of sorts happening, but perhaps I'm just misparsing it?

We implement (1) by caching the last .git directory for which we computed status and comparing it to the current .git directory

Adam Roben

@jamesmanning In that comment I should have said ".git directory's path" instead of ".git directory". We're just caching the path and doing a string comparison.

Adam Roben

I just noticed a bug with this: if you cd into a .git directory, then back out, your status is left as GIT_DIR!.

Presumably this just means that comparing .git directory paths is not sufficient, since you can end up with different statuses for the same .git directory even when no files have changed.

James Manning

@aroben I tried cd'ing into and then back out of the .git directory in my repo and everything seemed to work fine - with $global:GitPromptSettings.Debug I could see that it was still (correctly) using the cached status ("Reusing old status"). I'm not sure what "your status is left as GIT_DIR!" means, though - is there a particular set of repro steps I can use to see the issue? I'd like to try and help debug it if that's what's preventing Keith from merging in this pull request. :)

Adam Roben

Here's what I see in current posh-git:

> cd c:\my-repo
[master] > cd .git
> cd objects
[GIT_DIR!] > cd c:\my-repo
[master] >

That behavior should be preserved, I think, unless we can think of a new, better behavior.

ebelew

I saw a couple issues.
1> git rev-parse requires an actual process execution. It is (on my machine), 1/10th the speed of the current get-gitdirectory method. This is of course on a relatively normal path, not edge cases > 15 directories deep
2> I don't see anywhere in the caching code that takes the branch into account. I can stay in the same folder and change branches, but according to the caching code, the status would be identical

Adam Roben

2> I don't see anywhere in the caching code that takes the branch into account.

Switching branches is taken care of automatically because we invalidate the cache on any filesystem changes beneath the working directory. Switching branches updates .git/HEAD, so the cache will get invalidated.

Phil Haack

Any updates on getting this change in?

Keith Dahlby
Owner

Any updates on getting this change in?

I had been planning to revisit this after switching to use LibGit2Sharp under the hood and neither has happened yet. This has waited far too long, let's get this merged.

Between this and #51, I think I would prefer to start with this lighter-weight solution, especially because it never has to deal with showing status that's out-of-date. I'll use this a bit with Debug=$true set and if nothing alarming comes up we shall run with it.

Keith Dahlby dahlbyk referenced this pull request from a commit June 04, 2012
Adam Roben Cache Get-GitStatus results to speed up the prompt
Get-GitStatus now caches the last status result and reuses it when
possible. We invalidate the cache when either:
    1. the .git directory changes, or
    2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we
computed status and comparing it to the current .git directory. We
implement (2) by listening for events from System.IO.FileSystemWatcher
objects to find out when any files in the directories we care about have
changed.

Note that this doesn't make Get-GitStatus any faster when it *does* have
to compute status. (And presumably actually makes it a tiny bit slower
because of the extra work to maintain the cache.) But it makes
Get-GitStatus much faster overall since status often doesn't have to be
computed at all.

Closes #52
3cb54e4
Keith Dahlby
Owner

https://github.com/dahlbyk/posh-git/compare/gh52 includes a trivial refactoring and a fix for the .git directory behavior (by clearing and disabling cache, since there's no working directory status to check), if anyone wants to try it out.

James Manning

can someone have/use a modified version of Get-GitStatus that they wanted to maintain outside of this file? I had made my own changes to it in the past (unfortunately, actually modifying the GitUtils.ps1), only to have a GitHub ClickOnce upgrade overwrite it, so I'm hoping to avoid making that mistake again. :)

random thoughts from looking at that diff - i've never used submodules, so this may just be from my misunderstanding of what a tree with submodules looks like and how it should be handled. :) None of these are bugs/blockers, so feel free to ignore. :)

  • rev-parse --git-dir redirects stderr to null, but the rev-parse --show-cdup doesn't - should it?
  • the Get-GitWorkingDirectory behavior seems odd/inconsistent - if you're inside the .git directory or under it, it returns the pwd - otherwise, it returns the root of the working tree.
  • it looks like the existing code already does a "are we inside the .git directory?" check via git rev-parse --is-inside-git-dir (which certainly seems like the right tool for the job AFAICT) but we don't seem to leverage that same approach elsewhere - could these places just use that

    • Looking at the rest of the diff, it seems like these places could use the --is-inside-git-dir result instead?
    • !(Get-Location).Path.StartsWith($gitDir) (this one seems problematic from a case-sensitivity POV?)
    • if ($workingDirectory.StartsWith($gitDir)) (seems fragile IMHO since it's case-sensitive and it depends on Get-GitWorkingDirectory returning the pwd when under .git dir - if someone ever 'fixed' Get-GitWorkingDirectory to return the root of the working tree regardless of whether you're under the git subdir or not, this would break.
  • probably a stupid question, but WRT "Stop listening for changes before running git-status so we don't pick up changes from that command." - does running status really change contents of files? I could see it updating the 'last accessed time' on files (for filesystems with that enabled - recent NTFS doesn't by default IIRC) but the filesystem watcher events we're looking for don't seem like they would hit that - if that's the issue, it seems like it'd be better to just mask out the LastAccess bit in the NotifyFilters instead of unhooking/rehooking events?

Adam Roben

does running status really change contents of files?

I was seeing file changed notifications when running git status, which is why I added that comment and adjusted when we stop listening for notifications. I can't remember exactly which files I saw change, though, and didn't write it down. :-(

Keith Dahlby
Owner

While trying some potential refactorings (moving the Get-GitWorkingDirectory call higher up, for example), I started seeing some Changed notifications for the .git directory, likely due to the LastAccess. NotifyFilters definitely looks like it's worth using here.

James Manning

Nevermind - I got a chance to just run a watch from linqpad and saw it creates and deletes a lock file:

var watcher = new FileSystemWatcher(@"c:\github\Main")
{
    IncludeSubdirectories = true,
    EnableRaisingEvents = true,
    NotifyFilter = 
        NotifyFilters.CreationTime | 
        NotifyFilters.DirectoryName |
        NotifyFilters.FileName |
        NotifyFilters.LastWrite |
        NotifyFilters.Size,
};

while (true)
{
    var result = watcher.WaitForChanged(WatcherChangeTypes.All);
    Console.WriteLine ("Change {0}, Name={1}, OldName={2}",
                       result.ChangeType,
                       result.Name,
                       result.OldName);
}

resulting in this during a 'git status'

Change Created, Name=.git\index.lock, OldName=
Change Created, Name=.git\index.lock, OldName=
Change Deleted, Name=.git\index.lock, OldName=
Change Changed, Name=.git, OldName=
Keith Dahlby
Owner

can someone have/use a modified version of Get-GitStatus that they wanted to maintain outside of this file? I had made my own changes to it in the past (unfortunately, actually modifying the GitUtils.ps1), only to have a GitHub ClickOnce upgrade overwrite it, so I'm hoping to avoid making that mistake again. :)

I've not heard of anyone else doing this, and it's not really supported at the moment. What did you need to change?

Keith Dahlby
Owner

Nevermind - I got a chance to just run a watch from linqpad and saw it creates and deletes a lock file

We could potentially ignore lock files - if something interesting happens other files will change as a result.

Adam Roben

It still seems conceptually right to me not to listen for filesystem changes while we're doing internal work in Get-GitStatus. We know that we're not doing anything to modify the status, so why even listen for changes?

James Manning

@dahlbyk - in the future, I could see someone attempting to use their own caching (for instance, if someone is trying to add caching libgit2sharp), but in my case I was modifying it to add @aroben's patch since git status takes over 3 seconds in one of my repos :) If it's not a simple change, it's certainly not worth it, just wanted to ask in case it was an easy thing to do :)

Keith Dahlby
Owner

@dahlbyk - in the future, I could see someone attempting to use their own caching (for instance, if someone is trying to add caching libgit2sharp), but in my case I was modifying it to add @aroben's patch since git status takes over 3 seconds in one of my repos :) If it's not a simple change, it's certainly not worth it, just wanted to ask in case it was an easy thing to do :)

If you're going to hack on posh-git, I'd suggest just cloning it yourself and installing it stand-alone. posh-git is smart enough not to load itself twice, and AFAIK GH4W loads its posh-git after loading $PROFILE.

It still seems conceptually right to me not to listen for filesystem changes while we're doing internal work in Get-GitStatus. We know that we're not doing anything to modify the status, so why even listen for changes?

These seems sound to me, though we might lose less performance if we don't tear down and rebuild watchers each time.

Adam Roben

I guess another option would be to check for pending events before running git status, and then clear them after running git status, in case any new events were generated by that command. (You'd also have to clear them out before returning if we decided to use the cached status.)

James Manning

@aroben - it's introducing a race condition that seems problematic IMHO - sure, this particular chunk of code isn't doing anything to modify the status, but we don't know what else is running in the system (or even in the same shell). A text editor could be saving files, there could be a 'git checkout' running in another shell to switch the current branch, a 'git gui' window could be open and staging a change, etc.

If there are certain changes that are known to occur but don't mean anything (like lock files), then it seems to make more sense to either not listen to those, or if the API doesn't allow ignoring particular files/patterns (as an exclusion), then either have a function that determines whether a given changed file path is ignorable, or keep a list of such files (admittedly problematic since you'd eventually want to duplicate .gitignore behavior).

To be clear, I'm not talking about changing this behavior right now - what is there is more than good enough for the vast majority of cases, and has worked great for me. :)

Keith Dahlby
Owner

FWIW, #51 ignores .git, .git\index.lock, and .git\objects\*.

Keith Dahlby
Owner

Another caveat: checkouts with many changes (e.g. checking out a rather old branch) can result in a slow prompt that can only be fixed by allowing all recorded events to be purged:

C:\Dev\GitHub\libgit2sharp
VERBOSE: 49760:Found 64536 status events
VERBOSE: 49763:Getting status
VERBOSE: 50077:Parsing status
VERBOSE: 50078:Status: ## gh372
VERBOSE: 50080:Status:  M LibGit2Sharp.Tests/StatusFixture.cs
VERBOSE: 50081:Status:  M libgit2
VERBOSE: 50082:Building status object
VERBOSE: 50103:Watching for changes in C:\Dev\GitHub\libgit2sharp
VERBOSE: 50107:Finished
 [gh372 +0 ~2 -0]>

Yes, that's 49 seconds. Perhaps it would be better to turn off the event listener as soon as one interesting change is detected?

Adam Roben
aroben commented May 01, 2013
Keith Dahlby
Owner
dahlbyk commented May 29, 2013

Perhaps it would be better to turn off the event listener as soon as one interesting change is detected?

This is turning out to be more challenging than I had hoped. If we continue to use the event queue, we never really have a chance to turn off the listener until the next new prompt, at which point it may be too late. Unless there's a way to clear out the status events asynchronously, that's not going to work.

The other option I've explored is using Register-ObjectEvent -Action { ... } to execute code immediately when a change is detected (specifically, to clear out the cached Status and GitDirectory). This seems to work well enough, but it doesn't like a job's action unregistering the job. And more importantly, PowerShell is crashing and I've not been able to figure out why... Any ideas?

Adam Roben
aroben commented May 30, 2013

What specifically is slow? Is it the Get-Event call?

Keith Dahlby
Owner
dahlbyk commented May 30, 2013

The slowness is from clearing out thousands of items in the event queue.

Adam Roben
aroben commented May 30, 2013

Maybe it would be faster if we kept track of the SourceIdentifiers we've used when registering events, and then do:

foreach ($id in $SourceIdentifiers) {
    Remove-Event -SourceIdentifier $id
}

The list of SourceIdentifiers will be the same length as $Global:GitStatusCache.Events, rather than scaling with the number of events.

Keith Dahlby
Owner
dahlbyk commented May 30, 2013

If Remove-Event clears everything that's queued for that event too, that would be much better. I'll give it a try.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Showing 3 unique commits by 1 author.

Jun 04, 2012
Adam Roben Make Get-GitDirectory work inside submodules
`git rev-parse --git-dir` shows the relative path to the .git directory,
and even knows how to do this for submodules. This lets us get rid of
our custom logic for finding the .git directory entirely.
1f4b3c4
Adam Roben Add Get-GitWorkingDirectory
This returns the full path to the root of the Git working directory that
contains the current directory. E.g., if your working directory root is
at C:\dev\MyRepo, and your current directory is
C:\dev\MyRepo\ProjectFoo\packages, Get-GitWorkingDirectory will return
"C:\dev\MyRepo".
a5e8305
Adam Roben Cache Get-GitStatus results to speed up the prompt
Get-GitStatus now caches the last status result and reuses it when
possible. We invalidate the cache when either:
    1. the .git directory changes, or
    2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we
computed status and comparing it to the current .git directory. We
implement (2) by listening for events from System.IO.FileSystemWatcher
objects to find out when any files in the directories we care about have
changed.

Note that this doesn't make Get-GitStatus any faster when it *does* have
to compute status. (And presumably actually makes it a tiny bit slower
because of the extra work to maintain the cache.) But it makes
Get-GitStatus much faster overall since status often doesn't have to be
computed at all.
b824248
This page is out of date. Refresh to see the latest.
72  GitUtils.ps1
@@ -2,11 +2,22 @@
2 2
 # http://www.markembling.info/view/my-ideal-powershell-prompt-with-git-integration
3 3
 
4 4
 function Get-GitDirectory {
5  
-    if ($Env:GIT_DIR) {
6  
-        $Env:GIT_DIR
7  
-    } else {
8  
-        Get-LocalOrParentPath .git
  5
+    $gitDir = git rev-parse --git-dir 2>$null
  6
+    if ($LastExitCode -ne 0) {
  7
+        return
  8
+    }
  9
+    (Resolve-Path $gitDir).Path
  10
+}
  11
+
  12
+function Get-GitWorkingDirectory {
  13
+    $cdup = git rev-parse --show-cdup
  14
+    if ($LastExitCode -ne 0) {
  15
+        return
9 16
     }
  17
+    if (!$cdup) {
  18
+        $cdup = "."
  19
+    }
  20
+    (Resolve-Path $cdup).Path
10 21
 }
11 22
 
12 23
 function Get-GitBranch($gitDir = $(Get-GitDirectory), [Diagnostics.Stopwatch]$sw) {
@@ -77,6 +88,14 @@ function Get-GitBranch($gitDir = $(Get-GitDirectory), [Diagnostics.Stopwatch]$sw
77 88
     }
78 89
 }
79 90
 
  91
+$Global:GitStatusCache = @{
  92
+    GitDirectory = $null
  93
+    Status = $null
  94
+    Watchers = $null
  95
+    Events = "Changed", "Created", "Deleted", "Renamed"
  96
+    NextSubscriberId = 0
  97
+}
  98
+
80 99
 function Get-GitStatus($gitDir = (Get-GitDirectory)) {
81 100
     $settings = $Global:GitPromptSettings
82 101
     $enabled = (-not $settings) -or $settings.EnablePromptStatus
@@ -87,6 +106,29 @@ function Get-GitStatus($gitDir = (Get-GitDirectory)) {
87 106
         } else {
88 107
             $sw = $null
89 108
         }
  109
+
  110
+        $events = @(Get-Event | ?{ $_.SourceIdentifier -like "GitStatusEvent*" })
  111
+        foreach ($event in $events) {
  112
+            Remove-Event $event.EventIdentifier
  113
+        }
  114
+        if (($gitDir -eq $Global:GitStatusCache.GitDirectory) -and $Global:GitStatusCache.Status) {
  115
+            if (!$events) {
  116
+                dbg 'Reusing old status' $sw
  117
+                return $Global:GitStatusCache.Status
  118
+            }
  119
+        }
  120
+
  121
+        # Stop listening for changes before running git-status so we don't pick
  122
+        # up changes from that command.
  123
+        if ($Global:GitStatusCache.Watchers) {
  124
+            foreach ($watcher in $Global:GitStatusCache.Watchers) {
  125
+                $watcher.Dispose()
  126
+            }
  127
+            foreach ($event in (Get-EventSubscriber | ?{ $_.SourceIdentifier -like "GitStatusEvent*" })) {
  128
+                Unregister-Event $event.SourceIdentifier
  129
+            }
  130
+        }
  131
+
90 132
         $branch = $null
91 133
         $aheadBy = 0
92 134
         $behindBy = 0
@@ -170,6 +212,28 @@ function Get-GitStatus($gitDir = (Get-GitDirectory)) {
170 212
             HasUntracked    = [bool]$filesAdded
171 213
         }
172 214
 
  215
+        $Global:GitStatusCache.GitDirectory = $gitDir
  216
+        $workingDirectory = Get-GitWorkingDirectory
  217
+        $directoriesToWatch = @($workingDirectory)
  218
+        # Submodules' .git directories are not contained within their working
  219
+        # directory, so we have to watch them separately.
  220
+        if ((Split-Path $gitDir) -ne $workingDirectory) {
  221
+            $directoriesToWatch += $gitDir
  222
+        }
  223
+        $Global:GitStatusCache.Watchers = $directoriesToWatch | %{
  224
+            dbg "Watching for changes in $_" $sw
  225
+            $watcher = New-Object IO.FileSystemWatcher $_ -Property @{
  226
+                IncludeSubdirectories = $true
  227
+                EnableRaisingEvents = $true
  228
+            }
  229
+            foreach ($event in $Global:GitStatusCache.Events) {
  230
+                $id = $Global:GitStatusCache.NextSubscriberId++
  231
+                Register-ObjectEvent $watcher $event -SourceIdentifier "GitStatusEvent$id"
  232
+            }
  233
+            $watcher
  234
+        }
  235
+        $Global:GitStatusCache.Status = $result
  236
+
173 237
         dbg 'Finished' $sw
174 238
         if($sw) { $sw.Stop() }
175 239
         return $result
13  Utils.ps1
@@ -15,19 +15,6 @@ function Coalesce-Args {
15 15
 
16 16
 Set-Alias ?? Coalesce-Args -Force
17 17
 
18  
-function Get-LocalOrParentPath($path) {
19  
-    $checkIn = Get-Item .
20  
-    while ($checkIn -ne $NULL) {
21  
-        $pathToTest = [System.IO.Path]::Combine($checkIn.fullname, $path)
22  
-        if (Test-Path $pathToTest) {
23  
-            return $pathToTest
24  
-        } else {
25  
-            $checkIn = $checkIn.parent
26  
-        }
27  
-    }
28  
-    return $null
29  
-}
30  
-
31 18
 function dbg ($Message, [Diagnostics.Stopwatch]$Stopwatch) {
32 19
     if($Stopwatch) {
33 20
         Write-Verbose ('{0:00000}:{1}' -f $Stopwatch.ElapsedMilliseconds,$Message) -Verbose # -ForegroundColor Yellow
1  posh-git.psm1
@@ -18,6 +18,7 @@ Export-ModuleMember -Function @(
18 18
         'Get-GitStatus', 
19 19
         'Enable-GitColors', 
20 20
         'Get-GitDirectory',
  21
+        'Get-GitWorkingDirectory',
21 22
         'TabExpansion',
22 23
         'Get-AliasPattern',
23 24
         'Start-SshAgent',
Commit_comment_tip

Tip: You can add notes to lines in a file. Hover to the left of a line to make a note

Something went wrong with that request. Please try again.