Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Cache Get-GitStatus results to speed up the prompt #52

Closed
wants to merge 3 commits into from

6 participants

@aroben

Get-GitStatus now caches the last status result and reuses it when possible. We invalidate the cache when either:

  1. the .git directory changes, or
  2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we computed status and comparing it to the current .git directory. We implement (2) by listening for events from System.IO.FileSystemWatcher objects to find out when any files in the directories we care about have changed.

Note that this doesn't make Get-GitStatus any faster when it does have to compute status. (And presumably actually makes it a tiny bit slower because of the extra work to maintain the cache.) But it makes Get-GitStatus much faster overall since status often doesn't have to be computed at all.

aroben added some commits
@aroben aroben Make Get-GitDirectory work inside submodules
`git rev-parse --git-dir` shows the relative path to the .git directory,
and even knows how to do this for submodules. This lets us get rid of
our custom logic for finding the .git directory entirely.
1f4b3c4
@aroben aroben Add Get-GitWorkingDirectory
This returns the full path to the root of the Git working directory that
contains the current directory. E.g., if your working directory root is
at C:\dev\MyRepo, and your current directory is
C:\dev\MyRepo\ProjectFoo\packages, Get-GitWorkingDirectory will return
"C:\dev\MyRepo".
a5e8305
@aroben aroben Cache Get-GitStatus results to speed up the prompt
Get-GitStatus now caches the last status result and reuses it when
possible. We invalidate the cache when either:
    1. the .git directory changes, or
    2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we
computed status and comparing it to the current .git directory. We
implement (2) by listening for events from System.IO.FileSystemWatcher
objects to find out when any files in the directories we care about have
changed.

Note that this doesn't make Get-GitStatus any faster when it *does* have
to compute status. (And presumably actually makes it a tiny bit slower
because of the extra work to maintain the cache.) But it makes
Get-GitStatus much faster overall since status often doesn't have to be
computed at all.
b824248
This was referenced
@aroben

I just noticed a bug with this: if you cd into a .git directory, then back out, your status is left as GIT_DIR!.

@robertream Is this a problem with #52 as well?

@robertream

@aroben I just checked this with #51 and do not see this behavior

@jamesmanning

I'd love to see this pull request merged in - I just started using it and it's sooo much nicer since git status takes around 1.8 seconds on a particular repo I work in (long enough to be annoying).

I didn't quite understand this part of the description, though - it makes it sound like the .git directory is copied off somewhere and there's a tree diff of sorts happening, but perhaps I'm just misparsing it?

We implement (1) by caching the last .git directory for which we computed status and comparing it to the current .git directory

@aroben

@jamesmanning In that comment I should have said ".git directory's path" instead of ".git directory". We're just caching the path and doing a string comparison.

@aroben

I just noticed a bug with this: if you cd into a .git directory, then back out, your status is left as GIT_DIR!.

Presumably this just means that comparing .git directory paths is not sufficient, since you can end up with different statuses for the same .git directory even when no files have changed.

@jamesmanning

@aroben I tried cd'ing into and then back out of the .git directory in my repo and everything seemed to work fine - with $global:GitPromptSettings.Debug I could see that it was still (correctly) using the cached status ("Reusing old status"). I'm not sure what "your status is left as GIT_DIR!" means, though - is there a particular set of repro steps I can use to see the issue? I'd like to try and help debug it if that's what's preventing Keith from merging in this pull request. :)

@aroben

Here's what I see in current posh-git:

> cd c:\my-repo
[master] > cd .git
> cd objects
[GIT_DIR!] > cd c:\my-repo
[master] >

That behavior should be preserved, I think, unless we can think of a new, better behavior.

@ebelew

I saw a couple issues.
1> git rev-parse requires an actual process execution. It is (on my machine), 1/10th the speed of the current get-gitdirectory method. This is of course on a relatively normal path, not edge cases > 15 directories deep
2> I don't see anywhere in the caching code that takes the branch into account. I can stay in the same folder and change branches, but according to the caching code, the status would be identical

@aroben

2> I don't see anywhere in the caching code that takes the branch into account.

Switching branches is taken care of automatically because we invalidate the cache on any filesystem changes beneath the working directory. Switching branches updates .git/HEAD, so the cache will get invalidated.

@Haacked

Any updates on getting this change in?

@dahlbyk
Owner

Any updates on getting this change in?

I had been planning to revisit this after switching to use LibGit2Sharp under the hood and neither has happened yet. This has waited far too long, let's get this merged.

Between this and #51, I think I would prefer to start with this lighter-weight solution, especially because it never has to deal with showing status that's out-of-date. I'll use this a bit with Debug=$true set and if nothing alarming comes up we shall run with it.

@dahlbyk dahlbyk referenced this pull request from a commit
@aroben aroben Cache Get-GitStatus results to speed up the prompt
Get-GitStatus now caches the last status result and reuses it when
possible. We invalidate the cache when either:
    1. the .git directory changes, or
    2. any files within the .git or working directories change.

We implement (1) by caching the last .git directory for which we
computed status and comparing it to the current .git directory. We
implement (2) by listening for events from System.IO.FileSystemWatcher
objects to find out when any files in the directories we care about have
changed.

Note that this doesn't make Get-GitStatus any faster when it *does* have
to compute status. (And presumably actually makes it a tiny bit slower
because of the extra work to maintain the cache.) But it makes
Get-GitStatus much faster overall since status often doesn't have to be
computed at all.

Closes #52
3cb54e4
@dahlbyk
Owner

https://github.com/dahlbyk/posh-git/compare/gh52 includes a trivial refactoring and a fix for the .git directory behavior (by clearing and disabling cache, since there's no working directory status to check), if anyone wants to try it out.

@jamesmanning

can someone have/use a modified version of Get-GitStatus that they wanted to maintain outside of this file? I had made my own changes to it in the past (unfortunately, actually modifying the GitUtils.ps1), only to have a GitHub ClickOnce upgrade overwrite it, so I'm hoping to avoid making that mistake again. :)

random thoughts from looking at that diff - i've never used submodules, so this may just be from my misunderstanding of what a tree with submodules looks like and how it should be handled. :) None of these are bugs/blockers, so feel free to ignore. :)

  • rev-parse --git-dir redirects stderr to null, but the rev-parse --show-cdup doesn't - should it?
  • the Get-GitWorkingDirectory behavior seems odd/inconsistent - if you're inside the .git directory or under it, it returns the pwd - otherwise, it returns the root of the working tree.
  • it looks like the existing code already does a "are we inside the .git directory?" check via git rev-parse --is-inside-git-dir (which certainly seems like the right tool for the job AFAICT) but we don't seem to leverage that same approach elsewhere - could these places just use that

    • Looking at the rest of the diff, it seems like these places could use the --is-inside-git-dir result instead?
    • !(Get-Location).Path.StartsWith($gitDir) (this one seems problematic from a case-sensitivity POV?)
    • if ($workingDirectory.StartsWith($gitDir)) (seems fragile IMHO since it's case-sensitive and it depends on Get-GitWorkingDirectory returning the pwd when under .git dir - if someone ever 'fixed' Get-GitWorkingDirectory to return the root of the working tree regardless of whether you're under the git subdir or not, this would break.
  • probably a stupid question, but WRT "Stop listening for changes before running git-status so we don't pick up changes from that command." - does running status really change contents of files? I could see it updating the 'last accessed time' on files (for filesystems with that enabled - recent NTFS doesn't by default IIRC) but the filesystem watcher events we're looking for don't seem like they would hit that - if that's the issue, it seems like it'd be better to just mask out the LastAccess bit in the NotifyFilters instead of unhooking/rehooking events?

@aroben

does running status really change contents of files?

I was seeing file changed notifications when running git status, which is why I added that comment and adjusted when we stop listening for notifications. I can't remember exactly which files I saw change, though, and didn't write it down. :-(

@dahlbyk
Owner

While trying some potential refactorings (moving the Get-GitWorkingDirectory call higher up, for example), I started seeing some Changed notifications for the .git directory, likely due to the LastAccess. NotifyFilters definitely looks like it's worth using here.

@jamesmanning

Nevermind - I got a chance to just run a watch from linqpad and saw it creates and deletes a lock file:

var watcher = new FileSystemWatcher(@"c:\github\Main")
{
    IncludeSubdirectories = true,
    EnableRaisingEvents = true,
    NotifyFilter = 
        NotifyFilters.CreationTime | 
        NotifyFilters.DirectoryName |
        NotifyFilters.FileName |
        NotifyFilters.LastWrite |
        NotifyFilters.Size,
};

while (true)
{
    var result = watcher.WaitForChanged(WatcherChangeTypes.All);
    Console.WriteLine ("Change {0}, Name={1}, OldName={2}",
                       result.ChangeType,
                       result.Name,
                       result.OldName);
}

resulting in this during a 'git status'

Change Created, Name=.git\index.lock, OldName=
Change Created, Name=.git\index.lock, OldName=
Change Deleted, Name=.git\index.lock, OldName=
Change Changed, Name=.git, OldName=
@dahlbyk
Owner

can someone have/use a modified version of Get-GitStatus that they wanted to maintain outside of this file? I had made my own changes to it in the past (unfortunately, actually modifying the GitUtils.ps1), only to have a GitHub ClickOnce upgrade overwrite it, so I'm hoping to avoid making that mistake again. :)

I've not heard of anyone else doing this, and it's not really supported at the moment. What did you need to change?

@dahlbyk
Owner

Nevermind - I got a chance to just run a watch from linqpad and saw it creates and deletes a lock file

We could potentially ignore lock files - if something interesting happens other files will change as a result.

@aroben

It still seems conceptually right to me not to listen for filesystem changes while we're doing internal work in Get-GitStatus. We know that we're not doing anything to modify the status, so why even listen for changes?

@jamesmanning

@dahlbyk - in the future, I could see someone attempting to use their own caching (for instance, if someone is trying to add caching libgit2sharp), but in my case I was modifying it to add @aroben's patch since git status takes over 3 seconds in one of my repos :) If it's not a simple change, it's certainly not worth it, just wanted to ask in case it was an easy thing to do :)

@dahlbyk
Owner

@dahlbyk - in the future, I could see someone attempting to use their own caching (for instance, if someone is trying to add caching libgit2sharp), but in my case I was modifying it to add @aroben's patch since git status takes over 3 seconds in one of my repos :) If it's not a simple change, it's certainly not worth it, just wanted to ask in case it was an easy thing to do :)

If you're going to hack on posh-git, I'd suggest just cloning it yourself and installing it stand-alone. posh-git is smart enough not to load itself twice, and AFAIK GH4W loads its posh-git after loading $PROFILE.

It still seems conceptually right to me not to listen for filesystem changes while we're doing internal work in Get-GitStatus. We know that we're not doing anything to modify the status, so why even listen for changes?

These seems sound to me, though we might lose less performance if we don't tear down and rebuild watchers each time.

@aroben

I guess another option would be to check for pending events before running git status, and then clear them after running git status, in case any new events were generated by that command. (You'd also have to clear them out before returning if we decided to use the cached status.)

@jamesmanning

@aroben - it's introducing a race condition that seems problematic IMHO - sure, this particular chunk of code isn't doing anything to modify the status, but we don't know what else is running in the system (or even in the same shell). A text editor could be saving files, there could be a 'git checkout' running in another shell to switch the current branch, a 'git gui' window could be open and staging a change, etc.

If there are certain changes that are known to occur but don't mean anything (like lock files), then it seems to make more sense to either not listen to those, or if the API doesn't allow ignoring particular files/patterns (as an exclusion), then either have a function that determines whether a given changed file path is ignorable, or keep a list of such files (admittedly problematic since you'd eventually want to duplicate .gitignore behavior).

To be clear, I'm not talking about changing this behavior right now - what is there is more than good enough for the vast majority of cases, and has worked great for me. :)

@dahlbyk
Owner

FWIW, #51 ignores .git, .git\index.lock, and .git\objects\*.

@dahlbyk
Owner

Another caveat: checkouts with many changes (e.g. checking out a rather old branch) can result in a slow prompt that can only be fixed by allowing all recorded events to be purged:

C:\Dev\GitHub\libgit2sharp
VERBOSE: 49760:Found 64536 status events
VERBOSE: 49763:Getting status
VERBOSE: 50077:Parsing status
VERBOSE: 50078:Status: ## gh372
VERBOSE: 50080:Status:  M LibGit2Sharp.Tests/StatusFixture.cs
VERBOSE: 50081:Status:  M libgit2
VERBOSE: 50082:Building status object
VERBOSE: 50103:Watching for changes in C:\Dev\GitHub\libgit2sharp
VERBOSE: 50107:Finished
 [gh372 +0 ~2 -0]>

Yes, that's 49 seconds. Perhaps it would be better to turn off the event listener as soon as one interesting change is detected?

@aroben
@dahlbyk
Owner

Perhaps it would be better to turn off the event listener as soon as one interesting change is detected?

This is turning out to be more challenging than I had hoped. If we continue to use the event queue, we never really have a chance to turn off the listener until the next new prompt, at which point it may be too late. Unless there's a way to clear out the status events asynchronously, that's not going to work.

The other option I've explored is using Register-ObjectEvent -Action { ... } to execute code immediately when a change is detected (specifically, to clear out the cached Status and GitDirectory). This seems to work well enough, but it doesn't like a job's action unregistering the job. And more importantly, PowerShell is crashing and I've not been able to figure out why... Any ideas?

@aroben

What specifically is slow? Is it the Get-Event call?

@dahlbyk
Owner

The slowness is from clearing out thousands of items in the event queue.

@aroben

Maybe it would be faster if we kept track of the SourceIdentifiers we've used when registering events, and then do:

foreach ($id in $SourceIdentifiers) {
    Remove-Event -SourceIdentifier $id
}

The list of SourceIdentifiers will be the same length as $Global:GitStatusCache.Events, rather than scaling with the number of events.

@dahlbyk
Owner

If Remove-Event clears everything that's queued for that event too, that would be much better. I'll give it a try.

@aroben

This is pretty out of date now.

@aroben aroben closed this
@aroben aroben deleted the aroben:cache-git-status branch
@dahlbyk
Owner

Yeah, thanks for taking the time to explore this option... ultimately I think I've settled on an (optional) external watcher (a la #145) as the only reasonable way to address this.

@jamesmanning

I love the idea of that external watcher, and hopefully @fieryorc will be able to get the pull request for that done soon, as I'd definitely prefer it to the 'async git status call' that my fork currently does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jun 4, 2012
  1. @aroben

    Make Get-GitDirectory work inside submodules

    aroben authored
    `git rev-parse --git-dir` shows the relative path to the .git directory,
    and even knows how to do this for submodules. This lets us get rid of
    our custom logic for finding the .git directory entirely.
  2. @aroben

    Add Get-GitWorkingDirectory

    aroben authored
    This returns the full path to the root of the Git working directory that
    contains the current directory. E.g., if your working directory root is
    at C:\dev\MyRepo, and your current directory is
    C:\dev\MyRepo\ProjectFoo\packages, Get-GitWorkingDirectory will return
    "C:\dev\MyRepo".
  3. @aroben

    Cache Get-GitStatus results to speed up the prompt

    aroben authored
    Get-GitStatus now caches the last status result and reuses it when
    possible. We invalidate the cache when either:
        1. the .git directory changes, or
        2. any files within the .git or working directories change.
    
    We implement (1) by caching the last .git directory for which we
    computed status and comparing it to the current .git directory. We
    implement (2) by listening for events from System.IO.FileSystemWatcher
    objects to find out when any files in the directories we care about have
    changed.
    
    Note that this doesn't make Get-GitStatus any faster when it *does* have
    to compute status. (And presumably actually makes it a tiny bit slower
    because of the extra work to maintain the cache.) But it makes
    Get-GitStatus much faster overall since status often doesn't have to be
    computed at all.
This page is out of date. Refresh to see the latest.
Showing with 69 additions and 17 deletions.
  1. +68 −4 GitUtils.ps1
  2. +0 −13 Utils.ps1
  3. +1 −0  posh-git.psm1
View
72 GitUtils.ps1
@@ -2,11 +2,22 @@
# http://www.markembling.info/view/my-ideal-powershell-prompt-with-git-integration
function Get-GitDirectory {
- if ($Env:GIT_DIR) {
- $Env:GIT_DIR
- } else {
- Get-LocalOrParentPath .git
+ $gitDir = git rev-parse --git-dir 2>$null
+ if ($LastExitCode -ne 0) {
+ return
+ }
+ (Resolve-Path $gitDir).Path
+}
+
+function Get-GitWorkingDirectory {
+ $cdup = git rev-parse --show-cdup
+ if ($LastExitCode -ne 0) {
+ return
}
+ if (!$cdup) {
+ $cdup = "."
+ }
+ (Resolve-Path $cdup).Path
}
function Get-GitBranch($gitDir = $(Get-GitDirectory), [Diagnostics.Stopwatch]$sw) {
@@ -77,6 +88,14 @@ function Get-GitBranch($gitDir = $(Get-GitDirectory), [Diagnostics.Stopwatch]$sw
}
}
+$Global:GitStatusCache = @{
+ GitDirectory = $null
+ Status = $null
+ Watchers = $null
+ Events = "Changed", "Created", "Deleted", "Renamed"
+ NextSubscriberId = 0
+}
+
function Get-GitStatus($gitDir = (Get-GitDirectory)) {
$settings = $Global:GitPromptSettings
$enabled = (-not $settings) -or $settings.EnablePromptStatus
@@ -87,6 +106,29 @@ function Get-GitStatus($gitDir = (Get-GitDirectory)) {
} else {
$sw = $null
}
+
+ $events = @(Get-Event | ?{ $_.SourceIdentifier -like "GitStatusEvent*" })
+ foreach ($event in $events) {
+ Remove-Event $event.EventIdentifier
+ }
+ if (($gitDir -eq $Global:GitStatusCache.GitDirectory) -and $Global:GitStatusCache.Status) {
+ if (!$events) {
+ dbg 'Reusing old status' $sw
+ return $Global:GitStatusCache.Status
+ }
+ }
+
+ # Stop listening for changes before running git-status so we don't pick
+ # up changes from that command.
+ if ($Global:GitStatusCache.Watchers) {
+ foreach ($watcher in $Global:GitStatusCache.Watchers) {
+ $watcher.Dispose()
+ }
+ foreach ($event in (Get-EventSubscriber | ?{ $_.SourceIdentifier -like "GitStatusEvent*" })) {
+ Unregister-Event $event.SourceIdentifier
+ }
+ }
+
$branch = $null
$aheadBy = 0
$behindBy = 0
@@ -170,6 +212,28 @@ function Get-GitStatus($gitDir = (Get-GitDirectory)) {
HasUntracked = [bool]$filesAdded
}
+ $Global:GitStatusCache.GitDirectory = $gitDir
+ $workingDirectory = Get-GitWorkingDirectory
+ $directoriesToWatch = @($workingDirectory)
+ # Submodules' .git directories are not contained within their working
+ # directory, so we have to watch them separately.
+ if ((Split-Path $gitDir) -ne $workingDirectory) {
+ $directoriesToWatch += $gitDir
+ }
+ $Global:GitStatusCache.Watchers = $directoriesToWatch | %{
+ dbg "Watching for changes in $_" $sw
+ $watcher = New-Object IO.FileSystemWatcher $_ -Property @{
+ IncludeSubdirectories = $true
+ EnableRaisingEvents = $true
+ }
+ foreach ($event in $Global:GitStatusCache.Events) {
+ $id = $Global:GitStatusCache.NextSubscriberId++
+ Register-ObjectEvent $watcher $event -SourceIdentifier "GitStatusEvent$id"
+ }
+ $watcher
+ }
+ $Global:GitStatusCache.Status = $result
+
dbg 'Finished' $sw
if($sw) { $sw.Stop() }
return $result
View
13 Utils.ps1
@@ -15,19 +15,6 @@ function Coalesce-Args {
Set-Alias ?? Coalesce-Args -Force
-function Get-LocalOrParentPath($path) {
- $checkIn = Get-Item .
- while ($checkIn -ne $NULL) {
- $pathToTest = [System.IO.Path]::Combine($checkIn.fullname, $path)
- if (Test-Path $pathToTest) {
- return $pathToTest
- } else {
- $checkIn = $checkIn.parent
- }
- }
- return $null
-}
-
function dbg ($Message, [Diagnostics.Stopwatch]$Stopwatch) {
if($Stopwatch) {
Write-Verbose ('{0:00000}:{1}' -f $Stopwatch.ElapsedMilliseconds,$Message) -Verbose # -ForegroundColor Yellow
View
1  posh-git.psm1
@@ -18,6 +18,7 @@ Export-ModuleMember -Function @(
'Get-GitStatus',
'Enable-GitColors',
'Get-GitDirectory',
+ 'Get-GitWorkingDirectory',
'TabExpansion',
'Get-AliasPattern',
'Start-SshAgent',
Something went wrong with that request. Please try again.