New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unnecessary git clone when updating package commit sha #1620

Open
liskin opened this Issue Jan 7, 2016 · 13 comments

Comments

Projects
None yet
4 participants
@liskin
Contributor

liskin commented Jan 7, 2016

Steps to reproduce:

  1. put this in some project's stack.yaml:

    packages:
    - '.'
    - location:
        git: git://github.com/tfausak/strive.git
        commit: cfbd995dacafb1242482c0b6ce23826fad505338
      extra-dep: true
    
  2. stack build

  3. change commit to c39704d5f008af42f5c3c2d5319476c7aa12e52d

  4. stack build again

Expected results:

  • the second stack build does just git remote update; git reset --hard

Actual results:

2016-01-07 14:06:07.661449: [debug] Run process: git clone git://github.com/tfausak/strive.git /home/tomi/src/strava-gear/.stack-work/downloaded/5727e3e6b7fab4ac5241b42c7d83482fbddc6f05b1b698540df4ab681225a8ee.git.tmp/ @(stack_1RV4odVDW2e8SkMJA1Z8Zb:System.Process.Read src/System/Process/Read.hs:255:3)
2016-01-07 14:06:13.978009: [debug] Run process: git reset --hard c39704d5f008af42f5c3c2d5319476c7aa12e52d @(stack_1RV4odVDW2e8SkMJA1Z8Zb:System.Process.Read src/System/Process/Read.hs:255:3)

This means whenever I want to update the dep, the entire git repository is cloned over the network again. That's not very good. :-(

I may use git submodules as a workaround, but it's not nice.

@mgsloan

This comment has been minimized.

Collaborator

mgsloan commented Jan 8, 2016

Hmm, yeah, I'm surprised it doesn't already do it this way. Looks like it's naming the package's directory based on the SHA. Since we don't know the previous SHA, we'll need to change this naming schema in order to reuse the prior git repo.

mgsloan added a commit that referenced this issue May 13, 2016

Use shorter hash for downloaded git repos #1620
Makes it consistent with snapshot hashes, and potentially reduces
windows MAX_PATH problems
@mgsloan

This comment has been minimized.

Collaborator

mgsloan commented May 13, 2016

Fixed on master!

@mgsloan

This comment has been minimized.

Collaborator

mgsloan commented May 21, 2016

Re-opening this, because my change only fixed the case where the commit was already in the repo.

To fix properly we'll need to use git fetch

@mgsloan mgsloan reopened this May 21, 2016

@sjakobi

This comment has been minimized.

Contributor

sjakobi commented May 21, 2016

Re-opening this, because my change only fixed the case where the commit was already in the repo.

Oh, and could we also get some logging? Depending on repo size and internet connection the wait can be a surprising.

@da-x

This comment has been minimized.

Contributor

da-x commented May 21, 2016

About git fetch, maybe you can do something like git fetch origin -f <git-commit-hash>:sync-for-build && git reset --hard sync-for-build (which is agnostic to local branch name), to ensure 'fetch' only brings the requested commit from the remote and not all changes from all remote branches.

@mgsloan

This comment has been minimized.

Collaborator

mgsloan commented May 21, 2016

I think it also makes sense to use --shallow to avoid downloading the full history, I see only upsides, no downsides.

@da-x

This comment has been minimized.

Contributor

da-x commented May 21, 2016

For fetch? It only has --unshallow. you mean --update-shallow? At any case, it would the most efficient for the shallow clone invocation to bring only the requested commit. So overall, history would not be brought either for fetch or clone. Should also verify that these operations work well for repos with submodules.

@da-x

This comment has been minimized.

Contributor

da-x commented May 21, 2016

I'm missing something - when you edit the git-hash for a dependency in the stack.yaml and re-run Stack, how does Stack knows to reuse the previous clone? What prevents the previous clone to sit idly in a directory bearing the previous hash?

@mgsloan

This comment has been minimized.

Collaborator

mgsloan commented May 21, 2016

I meant git clone --depth 1 to avoid downloading full history. I just found out why we don't do that, though http://docs.anybox.fr/anybox.recipe.openerp/trunk/devnotes/git-sparse-shallow.html / http://thread.gmane.org/gmane.comp.version-control.git/115811

Summary is that it's too computationally intensive to allow fetching specific SHAs, as it requires determining if it's accessible via a ref. Skipping the reachability check is considered to be a potential security issue and it's a non-default repo option.

We could have heuristics to determine if something looks like a tag, and do a shallow clone. Dunno if this complication is worth it.

@mgsloan

This comment has been minimized.

Collaborator

mgsloan commented May 21, 2016

I'm missing something - when you edit the git-hash for a dependency in the stack.yaml and re-run Stack, how does Stack knows to reuse the previous clone? What prevents the previous clone to sit idly in a directory bearing the previous hash?

It doesn't, actually. I made some changes towards doing this, but currently it just deletes the old repo and reclones.

At least changing things to use git fetch instead of deleting the repo would be good.

mgsloan added a commit that referenced this issue May 21, 2016

Send git clone output to user #1620
+ Better logging
+ Tweak error handling

mgsloan added a commit that referenced this issue Nov 27, 2016

@mgsloan

This comment has been minimized.

Collaborator

mgsloan commented Nov 27, 2016

This turns out to be tricky to get right! Downgrading to P3

@mgsloan mgsloan modified the milestones: P3: Optional, P2: Should Nov 27, 2016

@liskin

This comment has been minimized.

Contributor

liskin commented Nov 27, 2016

Does it really ever remove the old repo?

(Does stack ever remove anything? :-) My experience is that it basically expects an unlimited space and breaks in unexpected ways if someone manually tries to remove old things.)

@mgsloan

This comment has been minimized.

Collaborator

mgsloan commented Nov 27, 2016

No, the old repo isn't removed, and yes that could build up garbage. Removing .stack-work will clear it out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment