gb vendor fetch: do not check out same remote repository for different import paths #645

seh · 2016-09-22T19:32:06Z

When one runs gb vendor fetch , gb calls main.fetch to acquire, copy a portion of, and then discard its copy of the remote repository. After that, so long as its -no-recurse flag is false, it proceeds to fetch the missing transitive dependencies of the source it's acquired thus far.

The problem arises when one requests fetching of one import path from a repository that yields files that in turn import alternate paths within that same repository. Consider a hypothetical repository:

example.com/org/repo/.git
example.com/org/repo/p1
- file1.go

package p1

import "example.com/org/repo/p2"

var P p2.Something

example.com/org/repo/p2
- file2.go

package p2

type Something string

If one runs

gb vendor fetch example.com/org/repo/p1

then gb will fetch the repository example.com/org/repo, copy the p1 path within it, then proceed to fetch the same repository again, then copy the p2 path within it.

This doesn't matter much for small repositories, but for large ones it can take many hours, wasting bandwidth and churning the disk unnecessarily. Consider augmenting main.fetch to remember the set of repositories it's downloaded from its initial top-level invocation, and to destroy them all only when unwinding back up to the top-level. Intermediate recursive invocations could share that repository cache to avoid downloading the same repository more than once.

The text was updated successfully, but these errors were encountered:

davecheney · 2016-09-22T21:18:28Z

Yes, this is something I need to fix. It's not just inefficient, it's
actually wrong to cherry pick parts of a repo.

On Fri, Sep 23, 2016 at 5:32 AM, Steven E. Harris notifications@github.com
wrote:

When one runs *gb vendor fetch *, gb calls main.fetch
https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L84
to acquire
https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L103,
copy a portion of
https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L134,
and then discard its copy of the remote repository
https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L142.
After that, so long as its -no-recurse flag
https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L40
is false, it proceeds to fetch the missing transitive dependencies of the
source it's acquired thus far
https://github.com/constabulary/gb/blob/master/cmd/gb-vendor/fetch.go#L195
.

The problem arises when one requests fetching of one import path from a
repository that yields files that in turn import alternate paths within
that same repository. Consider a hypothetical repository:

example.com/org/repo/.git http://example.com/org/repo/.git

example.com/org/repo/p1 http://example.com/org/repo/p1

file1.go

package p1
import "example.com/org/repo/p2"
var P p2.Something

example.com/org/repo/p2 http://example.com/org/repo/p2

file2.go

package p2
type Something string

If one runs

gb vendor fetch example.com/org/repo/p1

then gb will fetch the repository example.com/org/repo
http://example.com/org/repo, copy the p1 path within it, then
proceed to fetch the same repository again, then copy the p2 path
within it.

This doesn't matter much for small repositories, but for large ones it can
take many hours, wasting bandwidth and churning the disk unnecessarily.
Consider augmenting main.fetch to remember the set of repositories it's
downloaded from its initial top-level invocation, and to destroy them all
only when unwinding back up to the top-level. Intermediate recursive
invocations could share that repository cache to avoid downloading the same
repository more than once.

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#645, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAcAzh_fdDiGNKSxgQBm2tjPN3pAGd_ks5qste2gaJpZM4KERkt
.

seh changed the title ~~gb vendor fetch: do not checkout same remote repository for different import paths~~ gb vendor fetch: do not check out same remote repository for different import paths Sep 22, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gb vendor fetch: do not check out same remote repository for different import paths #645

gb vendor fetch: do not check out same remote repository for different import paths #645

seh commented Sep 22, 2016

davecheney commented Sep 22, 2016

gb vendor fetch: do not check out same remote repository for different import paths #645

gb vendor fetch: do not check out same remote repository for different import paths #645

Comments

seh commented Sep 22, 2016

davecheney commented Sep 22, 2016