Cache all packages.Load calls in a central object #988

vektah · 2020-01-15T13:30:20Z

The idea here is to have a central proxy to packages.Load that can be invalidated as files are written out, reducing the number of package.Load calls way down.

Would love to get your thoughts @vikstrous.

Alternative to #945 and #943 that includes some fairly heavy refactor.

Updated timing for one of our apps:

v0.10

real 0m7.437s
user 0m17.978s
sys 0m1.751s

master

real 0m5.677s
user 0m10.742s
sys 0m1.205s

package-cache

real 0m2.987s
user 0m5.286s
sys 0m0.608s

coveralls · 2020-01-16T04:54:34Z

Coverage increased (+1.1%) to 64.465% when pulling 9c87c28 on package-cache into c6b3e2a on master.

vikstrous · 2020-01-19T00:01:08Z

I'm seeing pretty different results from you:

package-cache
37.77user 18.94system 0:23.36elapsed 242%CPU (0avgtext+0avgdata 197848maxresident)k
0inputs+0outputs (1major+387685minor)pagefaults 0swaps
43.19user 19.77system 0:26.45elapsed 238%CPU (0avgtext+0avgdata 201328maxresident)k
0inputs+0outputs (1major+372326minor)pagefaults 0swaps

master (ae79e75)
39.62user 22.05system 0:26.89elapsed 229%CPU (0avgtext+0avgdata 195872maxresident)k
0inputs+0outputs (1major+428290minor)pagefaults 0swaps
45.03user 23.43system 0:29.11elapsed 235%CPU (0avgtext+0avgdata 201928maxresident)k
0inputs+0outputs (1major+408359minor)pagefaults 0swaps

v0.10.2
37.58user 21.98system 0:25.15elapsed 236%CPU (0avgtext+0avgdata 196608maxresident)k
0inputs+0outputs (1major+434955minor)pagefaults 0swaps
37.08user 21.68system 0:24.70elapsed 237%CPU (0avgtext+0avgdata 194676maxresident)k
0inputs+0outputs (1major+435000minor)pagefaults 0swaps

v0.10.1 (without skip_validation and without disabling model generation)
132.90user 77.27system 1:24.02elapsed 250%CPU (0avgtext+0avgdata 605844maxresident)k
0inputs+0outputs (1major+1054116minor)pagefaults 0swaps
111.52user 62.92system 1:13.44elapsed 237%CPU (0avgtext+0avgdata 187124maxresident)k
0inputs+0outputs (1major+760962minor)pagefaults 0swaps

no-double-load (rebased on master)
26.90user 10.25system 0:16.76elapsed 221%CPU (0avgtext+0avgdata 197608maxresident)k
0inputs+0outputs (1major+200472minor)pagefaults 0swaps
25.54user 10.30system 0:16.09elapsed 222%CPU (0avgtext+0avgdata 188176maxresident)k
0inputs+0outputs (1major+195789minor)pagefaults 0swaps

vektah · 2020-01-19T02:42:12Z

I'm seeing pretty different results from you:

The app I'm testing against has:

modelgen on: turning it off would be a fairly large chore. Lots of manual wiring would need to be done even to just bring in existing models.
autobind off: are you using wildcards in your autoload path? Could that be causing a large difference?
validation off

master (ae79e75)

master should include c6b3e2a, I was testing with ec4f6b1

I cant see any reason this should be any slower than no-double-load, it should be loading the same packages but fewer times. Could you try running this branch with this patch applied to log each call to packages.Load?

diff --git a/internal/code/packages.go b/internal/code/packages.go
index b14c45a..2bf10ba 100644
--- a/internal/code/packages.go
+++ b/internal/code/packages.go
@@ -2,6 +2,7 @@ package code
 
 import (
        "bytes"
+       "fmt"
        "path/filepath"
 
        "github.com/pkg/errors"
@@ -43,6 +44,7 @@ func (p *Packages) LoadAll(importPaths ...string) []*packages.Package {
 
        if len(missing) > 0 {
                p.numLoadCalls++
+               fmt.Println("LOAD", missing)
                pkgs, err := packages.Load(&packages.Config{Mode: mode}, missing...)
                if err != nil {
                        p.loadErrors = append(p.loadErrors, err)
@@ -85,6 +87,7 @@ func (p *Packages) LoadWithTypes(importPath string) *packages.Package {
        pkg := p.Load(importPath)
        if pkg == nil || pkg.TypesInfo == nil {
                p.numLoadCalls++
+               fmt.Println("LOAD", importPath)
                pkgs, err := packages.Load(&packages.Config{Mode: mode}, importPath)
                if err != nil {
                        p.loadErrors = append(p.loadErrors, err)
@@ -118,6 +121,7 @@ func (p *Packages) NameForPackage(importPath string) string {
        if pkg == nil {
                // otherwise do a name only lookup for it but dont put it in the package cache.
                p.numNameCalls++
+               fmt.Println("LOAD NAMES", importPath)
                pkgs, err := packages.Load(&packages.Config{Mode: packages.NeedName}, importPath)
                if err != nil {
                        p.loadErrors = append(p.loadErrors, err)
-- 
2.20.1

to see whats actually getting loaded when? compare with something similar on no-double-load?

vikstrous · 2020-01-19T03:18:51Z

Here's a redacted version of what I'm seeing with your print statements. I added "MISSING" on line 47 to differentiate the two places where loads happen.

LOAD MISSING [x/models]
LOAD MISSING [github.com/99designs/gqlgen/graphql/introspection x/service1resolver x/service2resolver x/service3resolver x/service4resolver github.com/99designs/gqlgen/graphql/introspection]
LOAD github.com/99designs/gqlgen/graphql
LOAD x/redacted1
LOAD x/redacted2
LOAD x/redacted3
LOAD x/redacted4
LOAD x/redacted5
LOAD x/redacted6
LOAD x/redacted7
LOAD x/redacted8
LOAD NAMES github.com/vektah/gqlparser
LOAD MISSING [x/resolverroot x/resolverroot]

I think the big difference between multiple load calls and one big load call is that shared dependencies of the various packages. Our codebase is more tightly coupled than I would like and there's a lot of overlap between dependencies from each call.

I did an experiment:

multiLoadTest:

package main

import "golang.org/x/tools/go/packages"

var mode = packages.NeedName |
	packages.NeedFiles |
	packages.NeedImports |
	packages.NeedTypes |
	packages.NeedSyntax |
	packages.NeedTypesInfo

func main() {
	packages.Load(&packages.Config{Mode: mode}, "redacted")
	...
	packages.Load(&packages.Config{Mode: packages.NeedName}, "github.com/vektah/gqlparser")
	packages.Load(&packages.Config{Mode: mode}, "redacted")
}

singleLoadTest:

package main

import "golang.org/x/tools/go/packages"

var mode = packages.NeedName |
	packages.NeedFiles |
	packages.NeedImports |
	packages.NeedTypes |
	packages.NeedSyntax |
	packages.NeedTypesInfo |
	packages.NeedName

func main() {
	_, err := packages.Load(&packages.Config{Mode: mode}, redacted)
	if err != nil {
		panic(err)
	}
}

multiLoadTest:

29.04user 15.36system 0:17.14elapsed 259%CPU (0avgtext+0avgdata 91152maxresident)k
0inputs+0outputs (1major+377223minor)pagefaults 0swaps
39.74user 20.35system 0:22.61elapsed 265%CPU (0avgtext+0avgdata 89468maxresident)k
0inputs+0outputs (1major+348847minor)pagefaults 0swaps
32.08user 17.35system 0:19.00elapsed 260%CPU (0avgtext+0avgdata 90116maxresident)k
0inputs+0outputs (1major+359011minor)pagefaults 0swaps

singleLoadTest:

6.54user 3.19system 0:03.79elapsed 256%CPU (0avgtext+0avgdata 154904maxresident)k
0inputs+0outputs (1major+109919minor)pagefaults 0swaps
6.70user 3.35system 0:03.98elapsed 252%CPU (0avgtext+0avgdata 153816maxresident)k
0inputs+0outputs (1major+102084minor)pagefaults 0swaps

The conclusion is that this PR actually makes the situation worse, not better. Could you share the results of adding the print statements for your repo? Can you try running my experiment with your set of loads?

vikstrous · 2020-01-19T03:20:15Z

btw, we have autobind on, validation off and model generation off

vektah · 2020-01-19T22:38:04Z

Interesting. This PR should be doing a single load. Why aren't redacted being covered by ReferencedPackages in this branch but they are in package-cache?

vektah · 2020-02-04T11:29:18Z

I'm going to roll forward with this - This gives us speed gains for the apps we can test with and centralizes a bunch of loading logic.

Would love to get to the bottom of why ReferencedPackages isnt covering your deps but that can happen on master.

Cache all packages.Load calls in a central object

Cache all packages.Load calls in a central object

9ccd7ed

vektah force-pushed the package-cache branch from 9c87c28 to 9ccd7ed Compare February 4, 2020 11:22

vektah merged commit 8ed2ec5 into master Feb 4, 2020

vektah deleted the package-cache branch February 4, 2020 11:29

vektah mentioned this pull request Feb 4, 2020

deduplicate package load between AutoBind and Binder #945

Closed

1 task

cgxxv pushed a commit to cgxxv/gqlgen that referenced this pull request Mar 25, 2022

Merge pull request 99designs#988 from 99designs/package-cache

30d7814

Cache all packages.Load calls in a central object

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache all packages.Load calls in a central object #988

Cache all packages.Load calls in a central object #988

vektah commented Jan 15, 2020 •

edited

Loading

coveralls commented Jan 16, 2020

vikstrous commented Jan 19, 2020 •

edited

Loading

vektah commented Jan 19, 2020

vikstrous commented Jan 19, 2020

vikstrous commented Jan 19, 2020

vektah commented Jan 19, 2020 •

edited

Loading

vektah commented Feb 4, 2020

Cache all packages.Load calls in a central object #988

Cache all packages.Load calls in a central object #988

Conversation

vektah commented Jan 15, 2020 • edited Loading

v0.10

master

package-cache

coveralls commented Jan 16, 2020

vikstrous commented Jan 19, 2020 • edited Loading

vektah commented Jan 19, 2020

vikstrous commented Jan 19, 2020

vikstrous commented Jan 19, 2020

vektah commented Jan 19, 2020 • edited Loading

vektah commented Feb 4, 2020

vektah commented Jan 15, 2020 •

edited

Loading

vikstrous commented Jan 19, 2020 •

edited

Loading

vektah commented Jan 19, 2020 •

edited

Loading