Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Checking in the vendor folder in Git #112

Closed
fervic opened this issue Oct 20, 2015 · 32 comments
Closed

Checking in the vendor folder in Git #112

fervic opened this issue Oct 20, 2015 · 32 comments

Comments

@fervic
Copy link

fervic commented Oct 20, 2015

Hi there,

This is more like a question, not an issue...

I just started using glide and I noticed a couple of things:

  • Pinning does not occur by default
  • Dependency repo's were added in a nested fashion

So, when adding the changed files, Git reported the files inside the vendor folder that just contained the commit id. So I thought it was great because, first because it isn't that much of extra space and second if something breaks after somebody runs glide up because of a change in the version of a dependency, you're able to look in your commit history what was the commit id at which the dependency was the last time everything worked fine.

But then I noticed that if the vendor folder already exists, glide up doesn't work well. So I may have to ignore the full folder.

I don't know exactly why it doesn't work or what is it trying to do, I just know that after removing the folder everything worked fine.

So my question is ... can I make it work having an existing vendor folder that only contains references to the commit ids of the dependencies?

@technosophos
Copy link
Member

Can you give the particular error? We've been chatting about this for a while. Since Glide keeps the repos as they are, if you try to check in the vendor folder, Git will treat this as a submodule (is that right, @mattfarina ?), which gets weird. We've talked about a few solutions, but nothing has emerged as particularly easy for users. So if you have some ideas, we'd be happy to hear.

We're re-working pinning to make that easier to do, and the big commit that happened today switched from nested vendor folders to a flatter architecture (because nesting causes some bad behaviors). The 0.7 version will be a pretty big change.

@fervic
Copy link
Author

fervic commented Oct 20, 2015

I'll play a little bit with it in order to provide more details. My first impression is that treating it as submodule is a great approach but don't know what the side effects or complications it brings.

I cannot classify it as an error, it was just that I initialized a glide project, added some dependencies, commited the changes (including the vendor folder), then a co-worker cloned it and when he tried to glide up it was requesting git credentials for each of the dependencies it was trying to download, but even entering the credentials, it didn't download anything.

But, again, I'll play with it a little bit more and provide better detail. I just wanted to know if I was missing some steps or if that was already expected/known behavior.

Thanks!

@mattfarina
Copy link
Member

I think the issues are are with the git submodule workflow.

I typically suggest two different workflows.

  1. Don't check the vendor folder in and set the ref in the glide.yaml file. This lets glide manage the versions of dependencies in each environment.
  2. Checkin the vendor/ folder but don't keep the VCS (e.g., .git) directories. This avoids submodules. Then glide.yaml can have the references and will still manage the updates. Glide will detect the stuff is versioned and update to the latest or the referenced version when glide up is run. It will take longer since it needs to get the repo rather than just the changed updates.

Going down the submodule route will bring in the submodule commands. I'm not entirely sure the best way to do that since I don't use submodules. That's the referenced stuff, as I'm reading it, from the original issue description.

Does that help? I know I'm not answering your exact question. I'm not sure the best way to use submodules.

@sdboyer
Copy link
Member

sdboyer commented Oct 21, 2015

best way to use submodules: don't use submodules.

no, but really. there's almost nothing you can't accomplish with subtrees, and submodules are so fraught with traps, unexpected behaviors, necessary workflow changes...there is a small, almost negligible subset of workflows and use cases for which they are appropriate, but including them into a general purpose tool like glide is a recipe for heartache.

here's a pretty nice, fairly recent overview.

@fervic
Copy link
Author

fervic commented Oct 21, 2015

Based on @mattfarina's comment and reading the article @sdboyer shared:

Finally, the submodule commit referenced by the container is stored using its SHA1, not a volatile reference (such as a branch name). Because of this, a submodule does not automatically upgrade which is a blessing in disguise when it comes to reliability, maintenance and QA

So if vendor/ is checked in as I initially wanted it to, project collaborators would have to use submodule commands for pulling or updating them unless Glide becomes smart (complex) enough and does that upon glide up.

Will submodules continue being the preferred road?

@sdboyer
Copy link
Member

sdboyer commented Oct 21, 2015

yeah, that line stuck out to me too - i actually had the feeling that i should note it in responding here. here's a slightly less glib breakdown (sorry, i was tired last night).

because submodules track an immutable commit, not mutable ref (branch or tag), they never move unless you explicitly tell them to. and when you tell them to, that's actually a change in the parent repository. what the parent repository tracks is actually a special, git-only filetype object, as evidenced by the diff when you add a submodule:

diff --git a/testrepo b/testrepo
 new file mode 160000
 index 0000000..4d59fb5
 --- /dev/null
 +++ b/testrepo
 @@ -0,0 +1 @@
 +Subproject commit 4d59fb584b15a94d7401e356d2875c472d76ef45

the file's "contents" are that string. and that file is of a special, different type, as indicated by git-ls-tree:

$ git ls-tree HEAD .
100644 blob 737c29c80fe97b04d2d2d1e3559a519c621f766f    README.txt
040000 tree 0676c8fbd93260d51519c3b61528a01343d62bdc    somedir
160000 commit 4d59fb584b15a94d7401e356d2875c472d76ef45  testrepo

i point this out just to say that "there be magic here." the primary purpose of that magic is to allow git to manage multiple discrete repositories together in a lockstep, controlled way. in fact, the case it's really designed for is managing external, vendor dependencies - they don't update often, and when they are, it's a significant event that should be reflected by a commit in the main repository.

that should sound familiar - it's a subset of the responsibility of a tool like glide (or npm, or composer, or bundler, or cargo, or...). which is actually an argument AGAINST their use here.

git submodules try to be a general solution to the problem of vendoring, but operate with fewer constraints AND less information than modern package mgmt tooling. they have their own manifests (.gitmodules), and lockfiles (invisible on disk - they're the magical 'commit' object type from the git-ls-tree output above). and, like all projects in this space, there's a whole mess to be dealt with when it comes to synchronizing the myriad states that can exist on disk vis-a-vis the manifests and lockfiles. they have a set of commands for dealing with this, but those commands have always failed to deliver because (IMO) they operate within the constraints of git's lifecycle - e.g., there's unexpected extra work to be done after a git clone.

all of this is to say that, unlike many other parts of git, submodules should not be looked at as just another piece of git plumbing that can effectively act as a library to deliver the functionality you want in your tool. they're a standalone thing, and if you've already got your own manifests (which glide does), then they fundamentally don't offer anything except an integration headache.

@mattfarina
Copy link
Member

@fervic if you use glide to track your version (reference or in the near future semantic version) than I would not use submodules. Instead let glide manager your version.

I would be wary of checking in vendor/. Be careful how you do so and Glide is going to try to flatten out vendor items. Let me explain...

- $GOPATH/foo
    - vendor/
        - a
        - bar/vendor/a
        - baz/vendor/a

In the above structure the package a shows up 3 times. For the project foo and for two of the included projects by foo. Go will see these as 3 different packages. Each one will be included in the binary. Instances of objects/types from one are not directly compatible with the others even if it's the same version in each case.

The way Go searches for the right a package to use it by looking up the directory tree for it being in a vendor/ directory. If it doesn't find it there it tries the $GOPATH and then the $GOROOT. If a is removed from the foo and bar projects it will use the one in foo/vendor. This is what we call flattening.

If all the projects check in their dependencies to the vendor/ folder and are then included by other projects there is potential for issues. We suggest just keeping dependencies at the top application level. Glide can flatten dependencies recorded from Godep, GPM, and gb as well.

If you do choose to check dependencies in I would suggest removing the VCS (e.g., .git) directory and letting Glide manage the version.

Does that help?

@fervic
Copy link
Author

fervic commented Oct 21, 2015

@mattfarina yes that helps, thanks!

But the thing is, in a normal workflow, Glide will use GIT submodules by default, that's what happened to me. And the vendor/ folder is not ignored unless the user does it.

Let me explain what I did (this is all using v0.6.1):

I had an existing project, so I went into its root folder and issued the following commands:

$ glide create
$ glide glide get github.com/<user A>/<project>
$ glide glide get github.com/<user B>/<project>
...
$ glide glide get github.com/<user N>/<project>
$ glide install

Then:

$ git add -A # to add everything Glide created for me
$ git commit
$ git push

In my repo (local) it downloaded all files from external packages, but on the remote (Github), it pushed the following files:
vendor/github.com/<user A>/<project>
vendor/github.com/<user B>/<project>
...
vendor/github.com/<user N>/<project>

Where each is a text file with the commit hash (as this is how Git manages submodules)

I see this as the current default behavior of Glide, so I don't know if the README should have some of this stated more clearly or if glide create should be somehow smarter and create or append to the .gitignore file and if each glide get should pin by default to the latest master commit, then the user decides if wants to change it? I may be missing what's the reasoning behind not pinning by default?

Regards!

P.S. I can help contributing with code, but wouldn't want to go unaligned with your ideas. For example what about @sdboyer's suggestion of not using submodules? He makes a good point about that Glide shouldn't rely on it for delivering certain functionality.

@mattfarina
Copy link
Member

I see where you're coming from now. There's a difference between theory and practice. Let me explain and possibly provide a path forward.

Glide is doing this for you because you're using Git. Glide works with git, svn, bzr, and hg. It's not tightly coupled to git but works with it. So, in theory it's up to the user to manage their vendor/ directory the way they would like for their VCS of choice.

Glide doesn't rely on submodules. Instead it's like composer or npm in that storing the packages or not is left as an exercise for the end user.

In practice there are cases where things like this can happen that aren't entirely clear. Given the VCS agnostic nature of Glide, how would you suggest conveying some good practices? There is a section in the FAQ about checking the vendor/ folder into your VCS.

Note, pinning to a commit will become less necessary in the future. We're going to start supporting semantic versioning in the future. Thoughts?

@fervic
Copy link
Author

fervic commented Oct 22, 2015

@mattfarina

Just as a quick side note, I remember reading somewhere in Go documentation that they (Go authors) didn't want to impose any versioning system for packages (like semver). I like semver, but Glide would depend on the community that generates packages to use semver. So maybe sticking to a VCS reference is not such a bad choice.

Regarding Git and submodules, please forgive my ignorance and misunderstanding, I thought you were using the submodule feature on purpose, now I see this behavior is just the result of basic nested git cloning and actually taking the submodule route is one of the approaches users could take.

That said, my only thought is to do some small edits to the README, be a little more emphatic about why it is discouraged to check-in the /vendor folder and then in the troubleshoot section add specific problems that may arise in the different VCS's if this is done. For example in the case of Git, extra space doesn't seem to be a concern, but extra work is. I have no idea of problems when using other VCS's.

Thanks again!

@mattfarina
Copy link
Member

@fervic there is a proposal for the Go community to adopt semver. Before moving forward they want some systems to implement semver. We already have gopkg.in but need more to move this forward. Some things already use it.

Commit ids really aren't useful enough. If someone releases a security release or bug fix and you only have commit ids will you know if you have a version with that security release or bug fix? This is an issue for package maintainers on systems like Ubuntu (according to Dave Cheney). So, we are looking to help move this along.

@albrow
Copy link
Contributor

albrow commented Dec 31, 2015

I too was frustrated by the default git submodule behavior. See this StackOverflow question for more context.

Basically, I'm writing a library and wanted to check in the vendor directory so that users of the library would not necessarily need to install Glide. The goal is to get vendoring working with go get by default. I tried to avoid using submodules for a long time, but eventually just decided to stick with the default git behavior.

As far as I can tell, all the commands I want to run (e.g. go get, glide install, and git add) will work fine when using submodules. I have not run into any instances were I needed to run any submodule commands directly. That's not to say that I won't run into problems eventually, but it seems to be working for now.

EDIT: I spoke too soon. After completely removing the project locally, attempting to install again with go get produced an error. I'm trying to figure out how to resolve this now.

@technosophos
Copy link
Member

@albrow keep us posted, please. I, too, have been frustrated that it seems like go get becomes unusable for glide projects.

I'm pondering what it would take to build a remote service along the lines of gopkg.in that could do this interim step of pulling a set of glide dependencies and then making them go get-able. I was recently told that Bundler and npm both do some server-side magic to allow their respective platforms to more quickly perform dependency resolution, and I think that could possibly be something we looked into.

What I'm imagining would be some service that could answer a go get ... request, and in the background check out a project, run a glide install, and then return the combined results to the requesting client.

@fwip
Copy link

fwip commented Jan 18, 2016

@albrow Was the error something along the lines of No submodule mapping found in .gitmodules for path 'vendor/...' ? If so, this is due to a bug in Go 1.5's submodule handling. It should be fixed for 1.6 (commit here: golang/go@761ac75 ). In the meantime, you should be able to workaround the issue by executing go get mypackage ; go get -u mypackage - the update command successfully updates the submodules, it's just the initial get that's broken.

Personally, I like the idea of using git submodules to track dependencies - it's basically just an association of "package = url + revision". A tool like glide could assist you with picking out the correct revision, whether by comparing semver, tracking a branch, etc, while restoring go-gettability (with the above caveats).

@mattfarina
Copy link
Member

First, this isn't about my opinion. More a matter of something we need to take into account for anything we craft.

Now that you’ve seen the difficulties of the submodule system, let’s look at an alternate way to solve the same problem.

This is from the Git documentation. Many find submodules to be difficult to work with and they can be a source of confusion. I learned this answering questions about submodules in the past. Anyone attempting to leverage them I would suggest paying attention to your audience to make sure they are a good solution.

Again, I'm neither for or against them. More, I'm aware of the usability issues that can happen and how that affects a project.

@stevvooe
Copy link

@mattfarina Shouldn't glide be removing the .git portion of the dependency? This behavior is fairly surprising. Otherwise, we'd need to have an extra step to remove these files, making glide less useful. We want to vendor the dependencies.

An option can be added for those who wish to experience the pain of submodules.

@sym3tri
Copy link

sym3tri commented Feb 22, 2016

For what it's worth, I tried out glide and found it to be a really awesome tool. Great work so far!

However this is the single issue that's blocking me and my company from using it in all of our projects. Removing the .git stuff is all that should be needed for us. Seems like a trivial change, and we'd be happy to help with the change if the maintainers are ok with this direction.

@mattfarina
Copy link
Member

@stevvooe and @sym3tri A few things...

  1. You can't just remove the .git directories. Go and Glide support SVN, hg, and Bzr as well. Any tool that removes VCS directories should support all four.
  2. If the VCS directory isn't present Glide detects that and can update the code.
  3. That means the missing element is something to strip the VCS directories in the first place. I'm not sure if that should be in Glide proper or a plugin. @technosophos and I need to chat about it. Glide can have plugins in the same way Git does. If someone created a plugin I would link to it. I would ask that it be built to be cross platform for Windows support. If you want to make a case for it to be in Glide proper I'm ready to listen.

@stevvooe
Copy link

@mattfarina I understand the issue here. glide opts not to remove source control directories, since other glide users may not check dependencies into source control.

To clarify, our requirements for a dependency management system are somewhat like the following:

  1. The project should work with go get and should adopt all the implications that come with this.
  2. Dependencies should be checked into the vendor/ folder.
  3. Building the project shouldn't require more than basic knowledge of git, go build tools or make.

From some of what you've written, it seems glide may not agree with number 2, albeit it is supported with submodules. Given number 3, however, submodules or plugins are a non-starter. If glide provided an option, possibly enabled by default, to remove the source control directories from vendored dependencies, number 2 and 3 would would be satisfied and we could probably use glide.

How would this look? The project maintainer could setup glide to instruct it to remove .git, .bzr, etc:

glide init --remove-repository-directories 

The configuration file would have an option in the glide.yaml file:

package: github.com/docker/swarm-v2
vendor:
   repository_directories: remove
import:
- package: github.com/gogo/protobuf
  subpackages:
  - /gogoproto
...

Then, glide would be used normally by those submitting PRs. The options could also be set at a later date, where glide would remove the directories on the next run.

This is probably not an ideal solution but I hope it helps to provided a little clarity. If you choose to take the project in another direction, I completely understand.

@dnephin
Copy link

dnephin commented Feb 26, 2016

I don't understand why this is an issue.

Can't you just find vendor/ -type f -not -path "*/.git*" -exec git add {} \; ?

That way git won't add submodules. I believe the .git directories are even "gitignored" by default.

@sdboyer
Copy link
Member

sdboyer commented Feb 26, 2016

@dnephin

because:

  1. windows
  2. bzr
  3. hg
  4. svn

FWIW, i've switched my position on this - vendor should always be committable, and SCM metadata shouldn't be left around. it's just not trivial to do.

@dnephin
Copy link

dnephin commented Feb 26, 2016

So I guess I can expand that find to exclude directories for bzr, hg, and svn, but windows I don't know about. Is there no equivalent to find on windows?

@LK4D4
Copy link

LK4D4 commented Feb 26, 2016

@dnephin nooooo, no more bash pls :)

@sdboyer
Copy link
Member

sdboyer commented Feb 26, 2016

what @LK4D4 said. But also:

find vendor/ -type f -not -path "*/.git*" -exec git add {} \;
                            not just ^  but also ^

Now there's a 4x4 matrix. Plus x2 for windows. And, does illumos have GNU find? Well, seems like this bug is well-tended...

@dnephin
Copy link

dnephin commented Feb 26, 2016

@sdboyer I don't think that's true. The target SCM is always consistent for any given user, so the second git is fine, assuming this is just something that you you add yourself as a script to the repo.

I guess it would be nice to have this as a plugin, or a go binary, but it feels like it's not absolutely necessary that it's part of glide core, because it's pretty easy to implement externally.

@LK4D4 at least it's just one line of bash that you only run when you add entirely new dependencies! if you're adding a single dependency, you can probably just do git add vendor/path/to/dep/* no real need for a script.

@sdboyer
Copy link
Member

sdboyer commented Feb 26, 2016

@dnephin that's a fair point. i'm kinda just swooping in dickishly right now, anyway. sorry :)

really, i'm doing so because, while working on my writeup, my view has shifted to where I don't believe that any user choice or interaction is necessary here, at all. it just muddies the water. both the commit-deps and don't-commit-deps cases can work equally well, and transparently, using what i call the "sync-based" approach. you shouldn't HAVE to write that script.

@mattfarina and i were maybe gonna chat about that today, but it didn't come together. hopefully tomorrow. we'll see where things land.

@jrick
Copy link
Contributor

jrick commented Feb 26, 2016

If you need that find functionality on Windows it's easily added in powershell:

Get-ChildItem . -File -Recurse -Exclude '*\.git\*' | ForEach-Object { Resolve-Path -Relative $_.FullName | & git add $_ }

Might be a better way to write that (I'm still a bit inexperienced in PS) but it beats relying on a compatible unix find being installed.

@stevvooe
Copy link

Anything that requires bash or powershell fails requirement 3 above:

  1. Building the project shouldn't require more than basic knowledge of git, go build tools or make.

While a little bit of find is okay for a personal project, but having to instruct others to do it correctly, in varying platforms, is a problem. It is extremely interruptive to the contribution process, open source or not. We already have issues trying to help contributors to use godep. If we switch to something else, we'd not like to trade one confusion for another.

@sdboyer I had a quick run through of your write up. It seems that your head is on straight. Just let us know which direction you intend to take glide. Keep up the great work!

@sdboyer
Copy link
Member

sdboyer commented Feb 26, 2016

@stevvooe thanks! i really do think glide has the potential to be best-in-class, across languages. It's exciting :) we'll certainly be public about whatever our plans are.

@tanji
Copy link

tanji commented Jun 30, 2016

Has a clear decision been made on this subject? Because it still exists right now.
And FWIW, I see that Masterminds has committed the vendor directory to the repo.

@sdboyer
Copy link
Member

sdboyer commented Jul 1, 2016

@tanji yes. Currently, you can either strip vcs dirs out of vendor or not, depending on the opts you pass (--strip-vcs). Doing so makes things nicely compatible with committing vendor/. once #384 lands, vendor/ will never have any vcs metadata in it, so it'll always be committable. Those opts will go away.

@mattfarina
Copy link
Member

Glide now:

  • No longer stores and vcs metadata in the vendor/ directory. Only exported sources.
  • Previous commands to manipulate the metadata, such as --strip-vcs, are removed.
  • If you want to mangle the material in vendor/ there is the glide-vc plugin.

Because of that I'm going to close this issue. If there are continued discussions please open a new issue and we can continue that conversation.

Thanks for all the thoughts here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests