Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: Vendor specification and experimental repository fetch code #13517

Closed
kardianos opened this issue Dec 7, 2015 · 75 comments
Closed

Comments

@kardianos
Copy link
Contributor

Proposal: Vendor specification and experimental repository fetch code

Author(s): Daniel Theophanes

Last updated: 2015-12-06

Abstract

Establish a specification file format that lists dependency revisions and
a package in the golang.org/x/exp repository that discovers, reads, and downloads
packages at a given revision. Tools may continue to use other formats to generate
this file.

Background

Many developers wish to specify revisions of vendor dependencies without copying
them into the repository. For a case study I will bring up two:

A) https://github.com/cockroachdb/cockroach

B) https://github.com/gluster/glusterd2

(A) uses github.com/robfig/glock which specifies revisions for each remote repository
in file in the project root called "GLOCKFILE". A partial list of the file is:

cmd golang.org/x/tools/cmd/stress
cmd golang.org/x/tools/cmd/stringer
github.com/agtorre/gocolorize f42b554bf7f006936130c9bb4f971afd2d87f671
github.com/biogo/store 3b4c041f52c224ee4a44f5c8b150d003a40643a0
github.com/cockroachdb/c-rocksdb bf15ead80bdc205a19b3d33415b23c156a3cf371
github.com/cockroachdb/c-snappy 5c6d0932e0adaffce4bfca7bdf2ac37f79952ccf
github.com/cockroachdb/yacc 443154b1852a8702b07d675da6cd97cd9177a316
github.com/coreos/etcd a423a55b142c2b9a82811604204cddbccd0a9cf9

(B) uses github.com/Masterminds/glide which specifies revisions for each remote
repository in a file in the project root called "glide.yaml". This file contains:

parent: null
package: github.com/gluster/glusterd2
import:
- package: github.com/gorilla/context
  version: 1c83b3eabd45b6d76072b66b746c20815fb2872d
- package: gopkg.in/tylerb/graceful.v1
  version: 48afeb21e2fcbcff0f30bd5ad6b97747b0fae38e
- package: github.com/pborman/uuid
  version: cccd189d45f7ac3368a0d127efb7f4d08ae0b655
- package: github.com/gorilla/mux
  version: ad4d7a5882b961e07e2626045eb995c022ac6664
- package: golang.org/x/net
  version: b4e17d61b15679caf2335da776c614169a1b4643
- package: github.com/docker/libkv
  version: 93099f38de7421e6979983652730a81e2bafd578
- package: github.com/codegangsta/negroni
  version: c7477ad8e330bef55bf1ebe300cf8aa67c492d1b
- package: golang.org/x/sys
  subpackages:
  - /unix
- package: github.com/meatballhat/negroni-logrus
  version: dd89490b0057cca7fe3fa3885f82935dfd430c2e
- package: github.com/Sirupsen/logrus
  version: v0.8.7
- package: github.com/hashicorp/consul
  version: v0.5.2

I would like to point out a few features these tools provide:

  • Specify commands to fetch.
  • Specify repositories at a given revision.
  • Specify repositories at a given version.
  • Specify a sub-tree of packages in a given repository.

Right now each vendor tool specifies these same properties in different formats.
A common tool cannot be built that reads a single file and downloads the needed
dependencies. This isn't a huge burden on a dedicated developer, but for a
user passing by who just wants to build the source quickly, it is an impediment.

Proposal

I propose specifying a single file format that will describe packages sourced
outside the project repository. I also propose adding a packge to the
golang.org/x/exp repository that discovers, reads, and optionally downloads
third party packages.

Furthermore I propose using the specification found at
https://github.com/kardianos/vendor-spec with one addition as the basis for this
specification. The addition is:

Package []struct {
    ...

    // Tree indicates that the specified folder, along with all sub-folders
    // are required.
    Tree bool `json:"tree"`

    ...
}

Both the specification and the proposed package will be considered experimental
and subject to change or retraction until at least go1.7. This process will be
done with an eye to possibly adding this feature to go get.

Rationale

The vendor file format needs to be able to be read and written with standard
go packages. This adds to the possibly that go get could fetch packages
automatically.

Vendor tools exist today that download packages from a specification. They are just
incompatible with each other despite using the same information to download the
dependencies. If we can agree on a single format for tools to write to, even if
it isn't the primary format for that tool, all tools and possibly go get can
download dependencies.

Existing vendor tools and their formats don't always handle corner cases or
different approaches. For example current tool file formats can't handle the
case of vendoring a patched version of a standard library package (this
would have been useful for crypto/tls forks for detecting the heartbleed
attack and for accessing MS Azure).

I am proposing a file format that "govendor" uses. I'm not trying to put my own
tool as central. Infact, "govendor" was built to validate the "vendor-spec"
proposal. The "vendor-spec" has received significant external contributions
and as such "govendor" has changed to match the spec (and will continue to do so).

Compatibility

This will be standardization of existing practices. There is no go1 compatibility
issues. Existing tools can treat the specification as a write only file.

Implementation

A file format to describe vendor packages should be accepted when this
proposal is accepted. Should this proposal be accepted a new package
should be added to the "golang.org/x/exp" repository to support reading
the vendor file and downloading packages. The author of this proposal
offers to create or assist in creating this package. This would be created
within 2 months of the proposal being accepted.

Risks

It would be ideal if other vendor tool package authors could agree to at least
write to a standard file format informally and collaboratively. Indeed the largest
risk is if vendor tools fail to write the common file format. However I think
unless there is a tangible benefit (such as go get support) there will continue
to not be a reason to collaborate on a standard.

Open issues

The proposed standard file format uses JSON, which might be better then XML, but
harder to write by then something like TOML. Tools that want the vendor file
to be hand created will be forced to generate this file from a different file.

The file format specifies packages, not repositories. Repositories can be specified
by using the root path to the repository and specifying "tree": true, but it isn't
the default for the format. Some people may take issue with that as they are used
to or desire tools that only work at the repository level. This could be a point
of division. From experience I absolutely love vendoring at the package level
(this is what github.com/kardianos/govendor does by default).

@kardianos
Copy link
Contributor Author

Responses welcome.
@robfig, @freeformz, @mattfarina

@mattfarina
Copy link

@kardianos thanks for including me here.

/cc @davecheney

@mattfarina
Copy link

For anyone reading this I want to provide some background material. While I develop on Glide this is less about my opinions (in this comment) and more I'd like to make sure to add contextually relevant information.

  • We currently have more than 5 specification files floating around the Go ecosystem. They exist today.
  • If we're going to use specification files to solve problems it's useful to know the kinds of use cases they they help solve. Some of us crafted a number of use cases to try and document these.
  • Most of the widely used and popular languages have package specifications. This isn't a new idea. Even Swift, which was just open sourced by Apple, as a spec of sorts. I know some Go developers aren't familiar with these so I wrote up an overview of a number of these. Of those looked at are both compiled and interpreted languages along with those that use dynamic and static typing.

I do ask that anyone who jumps into the discussion on this with opinions take a little time to come up to speed on this space. Outside of Go the specs and tooling are a fairly mature topic.

This is also one of those topics with an impact on developer experience so it's worth looking at that as well.

While I have my own opinions, which I will detail soon, if anyone has questions or pointers aside from my opinions I'm happy to inform. I'd like anyone who wants to discuss the topic to be well informed on the space.

@sparrc
Copy link

sparrc commented Dec 8, 2015

Yes, this would be fantastic, currently there are many different file formats, to list a few:

All the examples are remarkably similar. To me it seems like an import path and revision hash/tag are all that are necessary, although others probably would like something more complicated. This is why I opened #13483, because for me getting a dependency at a specified rev using standard Go tools is all I want.

The capability to easily create the simple Godeps (gpm) file is almost in the go/build and vcs packages already. What we still need are:

@freeformz
Copy link

I am +1 wrt this. I've spent some time basically re-implementing parts of go list and go get (not to mention fighting go/build) for godep.

@rsc
Copy link
Contributor

rsc commented Dec 28, 2015

It would be great for tools to use the same vendor spec. I thought that was the goal of the vendor-spec work.

I am concerned that tools are not already using it. We've said that's what we want tools to use, it's there for using, and yet they are inventing their own. Why? Perhaps vendor-spec is not good enough?

@rsc rsc changed the title Proposal: Vendor specification and experimental repository fetch code proposal: Vendor specification and experimental repository fetch code Dec 28, 2015
@rsc
Copy link
Contributor

rsc commented Dec 28, 2015

@kardianos, there's not enough detail here. You wrote "I propose specifying a single file format that will describe packages sourced outside the project repository." That's the vendor-spec, right? Yes, we think there should be just one, but we really want the tool authors to converge semi-organically rather than mandate something. We've done a bad job here at mandating in the past (basically what I wrote in my last comment).

But then you wrote "I also propose adding a packge to the golang.org/x/exp repository that discovers, reads, and optionally downloads third party packages." I don't know what this means. More detail needed.

@mattfarina
Copy link

First, I'm glad we're entertaining this conversations and thanks to @kardianos for putting in a bunch of work on this.

I have a number of concerns over the data structure outlined here. I believe it is insufficient for our needs. Let me explain.

  1. In some venues this has been called a lockfile. But, the Revision property can be multiple things including a tag (e.g., v1.3.5) and the description says it can be used to fetch the same or similar version. A lock file needs to be the exact same version down to the commit. This is needed to reproducible builds.
  2. There are cases where you have trees of dependencies. Those trees could list the same dependency more than once and have slightly different compatibility requirements. Any automation tooling needs to resolve the latest version that meets all the requirements. Handling this is usually done by specifying acceptable version ranges (e.g., >= 1.2.3, < 2.0.0). There needs to be a field to specify these ranges for resolution in addition to a locked revision field. In most modern systems these two types of information are captured in two different files (a config and a lock file).
  3. There are times where you don't know the VCS type. For example, the url https://example.com/foo/bar could be the path to a package but there isn't enough detail to capture which VCS is behind it. Is it Git, Svn, or something else? There really should be an opt-in property to specify the VCS since Go supports 4 out of the box. This is needed as part of the setup to reproducibly setup the environment in different systems.
  4. To produce a reproducible build you really need to capture the complete dependency tree and the pinned versions (commit ids) for everything. At the top level of an application you only want the packages for your application. I'm not sure how to deal with both using this spec.

These are just a few of my concerns. I really want to see something that allows for:

  • Provides a user friendly way to capture dependency information.
  • A nested dependency tree to be handled well with automation and variations in needed versions in that tree.
  • No requirement on packages being in the GOPATH at any point (other than the parent application being worked on). This is often requested.
  • Deals with renaming, private repos, multiple VCS, and lots of variation.

To illustrate the needs I've collected a number of use cases that need to be satisfied by any spec. I understand that a number of people come from C/C++ here. Other languages, where many Go developers are coming from, have already solved many of these problems. I wrote up how they handle a number of common cases. Building something with a similar experience or one they can understand with that background would be useful.

Note, in full disclosure I worked on a competing spec attempting to solve these use cases. This data structure is what Glide is roughly moving to and is influenced by our work there.

@kardianos
Copy link
Contributor Author

@rsc, yes, this is effectively the goal of vendor-spec. As you noted, I haven't seen convergence on a single spec for vendor packages. Perhaps another way to phrase this proposal is "give tentative blessing to a format from the go team and and ask for feedback from tool authors". I'm completely aware this is putting the cart before the horse.

I've asked for feedback in the past on why tool authors couldn't adopt it. I've heard:

  • silence
  • the spec is per package rather than per repo
  • it wouldn't work

To address the second point, I propose adding the (probably poorly named) "tree" parameter that says, everything under this point is also included. It could be the vendor-spec isn't good enough; I just don't know in which way's it is deficient.

At this point I'm not sure if the existing variety is due to a lack of consensus or just lack of caring to change existing and working tools. Thus if it was proposed that command like "go get" read and used the vendor-spec file (not 100% a good idea), then I think many more people would care about having and using a common format. As it is, it is a nuisance when exploring or auditing many different go packages, but not a complete show stopper; they are all machine readable and they all contain the same information and many large project have Makefiles that hide which vendor tool they use to some degree.

RE /x/exp/ package: You're correct, more detail would be needed. Mainly here to say, this proposal would be of two parts, a spec and a package that handles the spec. What that API looks like would need to be defined. I would love to add this if the fate of this proposal gets to that point.


I suppose what I could try to find out next is why vendor tool authors not using this:

  • Is it not workable for that tool?
  • Do they already have a format and don't see a reason to change it?
  • Do they just want to do their own thing?

@freeformz I think is open to using something like this
@mattfarina Has said it won't work, and has promised more detailed info.

I'll try to ask around.

@kardianos
Copy link
Contributor Author

But, the Revision property can be multiple things including a tag (e.g., v1.3.5) and the description says it can be used to fetch the same or similar version. A lock file needs to be the exact same version down to the commit. This is needed to reproducible builds.

Agree. For distributed vcs the revision field is the hash. I could specify that perhaps more clearly. I think we agree here.

There are cases where you have trees of dependencies. Those trees could list the same dependency more than once and have slightly different compatibility requirements. Any automation tooling needs to resolve the latest version that meets all the requirements.

The vendor-spec defines the content as everything that is or should be in a single level "vendor" folder. I think that should be sufficient for a lock file, correct?

Handling this is usually done by specifying acceptable version ranges (e.g., >= 1.2.3, < 2.0.0). There needs to be a field to specify these ranges for resolution in addition to a locked revision field. In most modern systems these two types of information are captured in two different files (a config and a lock file).

I'm only interested in specifying what we know as the lock file. I think the "version >= 1.2.3" would be fine in a different config file.

There are times where you don't know the VCS type. For example, the url https://example.com/foo/bar could be the path to a package but there isn't enough detail to capture which VCS is behind it. Is it Git, Svn, or something else? There really should be an opt-in property to specify the VCS since Go supports 4 out of the box. This is needed as part of the setup to reproducibly setup the environment in different systems.

Go get handles this with probing. I'm also fine adding a well known optional field that specifies the vcs type ("git", "ssh+git", "hg"). I don't see this as a show stopper.

To produce a reproducible build you really need to capture the complete dependency tree and the pinned versions (commit ids) for everything. At the top level of an application you only want the packages for your application. I'm not sure how to deal with both using this spec.

I'm not sure I understand your concern. If you have or want a package in the vendor folder, have the tool write down the package path and revision in the vendor-spec file and it will be captured. Could you help me see what I might be missing? To be concrete, in govendor, does it not have enough information for reproducible builds?

Provides a user friendly way to capture dependency information.

Sure, I would choose to use a CLI command in govendor, glide and glock could use a config file. We all write down what we fetch in a single lock type file.

No requirement on packages being in the GOPATH at any point (other than the parent application being worked on). This is often requested.

This is tool specific, not spec specific. I'm working on adding this to govendor and there are no issues with adding it.

Deals with renaming, private repos, multiple VCS, and lots of variation.

I'm not sure what you mean by renaming. Origin? Multiple VCS can be handled just fine, that's a tool issue. Private repos is worth talking about, but it might be handled with a stored ssh key and saying, "use ssh"? But again, I don't see a conflict with the given spec.


To make sure we are talking about the same thing, I will copy and paste in the glide.lock file for glide and the vendor.json file for govendor:

glide glide.lock:

hash: 1fdfb16656a1b4a1664afdc9e2a5fc8040165bc5b6e85812df2affceacb7fbc8
updated: 2015-12-21T09:29:33.170992254-05:00
imports:
- name: github.com/codegangsta/cli
  version: b5232bb2934f606f9f27a1305f1eea224e8e8b88
- name: github.com/Masterminds/cookoo
  version: 78aa11ce75e257c51be7ea945edb84cf19c4a6de
  subpackages:
  - .
- name: github.com/Masterminds/semver
  version: 6333b7bd29aad1d79898ff568fd90a8aa533ae82
- name: github.com/Masterminds/vcs
  version: eaee272c8fa4514e1572e182faecff5be20e792a
- name: gopkg.in/yaml.v2
  version: f7716cbe52baa25d2e9b0d0da546fcf909fc16b4
devImports: []

govendor vendor.json

{
    "comment": "",
    "ignore": "",
    "package": [
        {
            "path": "github.com/dchest/safefile",
            "revision": "33aeb10e4bb6edb4016c53b6140fc9a603346e04",
            "revisionTime": "2015-07-03T18:05:53+02:00"
        }
    ]
}

We are really talking about lock files, not a package specification. In other words, I don't think your pkg spec and the vendor-spec are competing, they are doing completely different things. Your glide lock file is pretty much exactly what the vendor-spec is trying to do as far as I can tell.

There are corner cases to discuss, but every tool that I've seen has something like a lock file that contains an import path and a revision (a hash if using a dvcs). Perhaps we can't agree on all the other meta data, but maybe we can at least write those two bit of info, and maybe a few others into the same machine format.

@mattfarina
Copy link

@kardianos thank you for clearing some things up.

I think it would be useful to clarify that you're attempting to create a lock file rather than create a package specification. The current title says, "Vendor specification".

With that in mind...

  1. What use cases does the lock file file in this current state?
  2. Why is there a RevisionTime on each package? What use case does that help to solve?
  3. What use case(s) do the comment at the top and package levels support?
  4. What are your thoughts on extensions to the spec? For example, in Glide I have more data than this in those files that get into filtering, etc.

Note, I'm asking on 2 and 3 because they do not fit into the use cases I've previously worked out. Trying to understand the details.

My issues at a high level, and I'm sorry I have to be so brief as I have to go for now, are...

  1. This doesn't provide a solution to my use cases. It's insufficient to automate package management.
  2. There is still a need for a package specification to capture relevant information.

In the goal to solve what's needed for package management for the majority of developers or is it to do one small slice of the puzzle that others still need to build on?

@kardianos
Copy link
Contributor Author

@mattfarina

End Goal: A tool provided with the vendor spec file should be able to fetch all packages at a given revision from their original repository (if available).

This would enable standard user tooling for fetching remote packages at a given revision. This also enables machine analysis of dependencies across the board, such as looking for vulnerable revisions (dvcs hashes) and mapping dependency usages.

...

Revision Time: I've worked on projects that are 15+ years old. Code bases sometimes lose touch with original source and sometimes I just want to know what year or decade it is from.

Comment: JSON sucks, but is simple and well supported. If you want to write down a comment, with a tool or by hand, put that human note there. By itself JSON doesn't support // comments, so turn in to fields.

As per the spec, all unrecognized fields are required to be persisted by other tools modifying a file. In other words, extensions are expected.

...

While I don't want to quibble with words, it is called the "vendor specification" because it is specifying vendor package revisions. It isn't called a package specification because it knows nothing about the package it is used in.

You are correct. The vendor-spec file lives in the vendor folder and talks about the vendor packages. If you want a package meta-data file it should live in a package directory and tell you about the package folder it is in.

The goal is to write down what revisions is in the vendor folder. What you call a lock file. If the source of the package isn't from the the "go get" location, then it provides the origin to write down which location it does come from. (Useful if a package has a modified package from another location, like github.com/microsoft/azure/go/vendor/crypto/tls). By default it (currently) works at the package level unless you specify the proposed "tree": true parameter.

@freeformz
Copy link

FWIW: I have not added support for the vendor-spec to godep because I've had to work on other things instead and it hasn't been a priority (i.e. users aren't asking for it). I do want to support the vendor spec but instead have been working on replacing our use of go get and go list because they don't really work for all of our use cases as is.

@mattfarina
Copy link

@kardianos another problem that would be useful to address would be to move the vendor-spec outside the vendor/ directory. That way the vendor/ directory can be put in an ignore file (e.g., .gitignore) for those not going to store dependencies in their VCS.

@freeformz it sounds like you'd like to take the concept of Godep and move it into the Go toolchain. While Godep has been around for awhile and been able to fill in a number of use cases, there are numerous use cases people have been asking for that cannot be easily implemented in its flow. I would prefer to see something that enables those as well.

To map dependencies well is a problem in this setup. For example, a lock file should really only be at the top level and not throughout the tree. Dependencies shouldn't be in multiple vendor/ directories in the tree unless you really know what you're doing. Otherwise you can end up with binary bloat and errors. So, knowing the version compatibility and mapping that in a tree is missing.

If a vendor-spec this is present throughout a tree there can be cases with many instances of a common dependency at different versions all mapped by commit id. This doesn't allow automated tooling to work out the best version to use or map a tree. This can be a problem in practice. For example, if you look at kubernetes the same dependency can be referenced many times in packages and sub-packages all to different commit ids. Resolving versions becomes difficult.

In tool chains for other languages a lock file isn't used to figure out or map the tree. Instead this is a config file knowing more (e.g., a semantic version range).

@mattfarina
Copy link

How would tools used by go:generate be specified in this setup?

@akavel
Copy link
Contributor

akavel commented Dec 29, 2015

Responding to some of the concerns/issues raised above:

@kardianos:

  • As to feedback on the vendor-spec: I've actually used it in a tool (https://github.com/zpas-lab/vendo, not yet advertised/announced because OSSed just recently and not polished/readmed yet, eh), and it worked great for me! Thus, I didn't have a need to complain, so I just used it without seeing any need to provide feedback... So, sorry to be late, but a _BIG THANKS_ for the spec and all the work that's gone to build it, I really found it very well thought out...
    • To elaborate a bit: I did add a few custom fields, but from what I understand that's totally in line with the spec, it describes only the mandatory fields. Specifically, I added:

      {
          ...
          "platforms": ["linux_amd64", "windows_amd64", ... ],
          "package": [
              {
                  "repositoryRoot": "_vendor/src/github.com/spf13/cobra",
                  ...
              }
          ]
      }
      

      where the "repositoryRoot" is described in more detail in the sources, but in short words it generally helps to find the .git/.hg/.bzr/... dir (i.e. the "tree root"?). See also the example resulting vendor.json in my repo.

    • With that said, I now notice I've apparently used an older version of the spec (built the tool some time ago), e.g. where the vendor.json file was located in the repo's root dir, not in vendor/ (or did I misunderstand something?). Seems I have to review what's changed. To tell the truth, personally I'd prefer vendor.json to stay in the root dir (one of the reasons being that for now I use "_vendor/", not "vendor/"; but not only).

_edit:_ uh, oh; I've glanced over the changes since Jun 12, and my initial impression is that I think I wouldn't be able to build my tool with the spec as it looks today, unfortunately :(

@mattfarina:

  • There are times where you don't know the VCS type. For example, the url https://example.com/foo/bar could be the path to a package but there isn't enough detail to capture which VCS is behind it. [...]

Personally, as of now I'm not really convinced this is actually needed/useful. What if the repo owner changes the VCS used? And even if not, I'm not quite sure why one can't autodetect it the same way as the go tool does this. But even if I'm wrong in this regard, the vendor-spec specifially allows to add any custom fields to the JSON file, so I don't see why a tool couldn't just go on and do that?

  • To produce a reproducible build you really need to capture the complete dependency tree and the pinned versions (commit ids) for everything. [...]

Uh, that's exactly what I'm doing with https://github.com/zpas-lab/vendo using vendor-spec; thus I believe it is totally easy to do with vendor-spec; did you have some specific trouble with that, could you elaborate?

  • Why is there a RevisionTime on each package? What use case does that help to solve? [...]

One recent event that I believe is a perfect illustration of how RevisionTime is awesome is the migration of code.google.com projects to github. Given that it often involved migration from hg to git as a side effect, you're effectively losing the information the hash ID gave you (that is, the Revision field becomes useless), but the RevisionTime should stay perfectly relevant. Thus giving a trivial way to find a corresponding commit in the new (github) repo, and also to check what new commits were introduced since last time you checked/pinned.

@robfig
Copy link
Contributor

robfig commented Dec 29, 2015

@kardianos If I could snap my fingers and vendor-spec would be supported by glock I would do it, but glock has been stable / unchanged for a while, I haven't had a need to do it, and it seems like a lot of work.

But also, I think that the manifest format is not all that differs between tools - for example, glock supports commands, it only supports lockfiles for the end user's application (not for intermediate libraries), and it doesn't vendor dependencies. Seems to me that the vendoring zeitgeist ended up at a tool that is nothing like glock, so I didn't see much of a point in trying to keep up.

I'm looking forward to a tool that finally gains widespread adoption though! Seems like "gb" is in the best spot for that?

@technosophos
Copy link

I have several issues with the original proposal, @kardianos. Two are architectural, and one is polemical.

  1. Why operate at package level instead of repo level? The only reason given is preference:
From experience I absolutely love vendoring at the package level

But not even the experiences are relayed. This seems odd to me for one very clear reason: Versions are not an attribute of packages, they are an attribute of repositories. Therefore, at even the most rudimentary, vendor spec suffers from what is called "level confusion" -- the assigning of attributes to the wrong "level" of abstraction.

This is clearly evidenced by the fact that the proposed file format would allow setting different versions to two co-located packages. Doing so would clearly allow unintended side effects and difficult state resolution.

  1. I also object to vague and misleading statements in your original proposal like this:
For example current tool file formats can't handle the
case of vendoring a patched version of a standard library package (this
would have been useful for crypto/tls forks for detecting the heartbleed
attack and for accessing MS Azure).

Given that your proposal does nothing to address this case, and that you are conflating "tool" and "file format", this seems to me to be more FUD than useful commentary. Not to mention that at least one of the tools that you point out in your proposal has handled that situation elegantly since its inception.

  1. As I read your proposal, I see no advantages in using your new solution over existing formats like Glide's lock file. You don't seem to give any reasons. You just seem to assert that we need your spec instead of just standardizing on one of the existing ones. In fact, your comparison to the Glide.lock file points out that the Glide.lock file has some important features that your spec is missing, like dev imports and a hash. (Worthwhile note: Glide.lock also has an ignore list. It's just omitted when empty.)

@dmitshur
Copy link
Contributor

There was one comment I wanted to make in this thread, and luckily @technosophos has just made an identical comment just now. So I will quote it and say I agree with it:

Versions are not an attribute of packages, they are an attribute of repositories. Therefore, at even the most rudimentary, vendor spec suffers from what is called "level confusion" -- the assigning of attributes to the wrong "level" of abstraction.

At the very least, I would want to hear good arguments for doing it another way. But assigning versions to repo roots seems like the most natural and effective solution.

@kardianos
Copy link
Contributor Author

@mattfarina RE Vendor file location: I'm fine either way. I think @rsc wanted to keep all the vendor stuff in the vendor folder, including the vendor file. If you want a gitignore line that ignores the content of the vendor folder but not vendor/vendor.json file, use vendor/*/ in your ".gitignore file.

@mattfarina and @robfig RE commands: I'm open to suggestions here but how govendor uses the vendor-spec to support commands is to just include them as a normal package. The tool itself then discovers it is a program (it finds a package main) and can decide what to do with it. I could however be missing something here. I'm assuming the developer, tool, or script could then run go install vendor/... and it would install the packages and commands so that go:generate would work. Again, let me know what I missed.

@robfig RE existing tools: Yeah, I understand. It might be easier to let projects move off of it if they choose to. I hear gb is a great tool for building go using workspaces. It also has a gb-vendor sub-command that comes with it. In this case I think we are looking vendor tools that complement the go command.

@technosophos and @shurcooL RE recording at the package level: I agree I'm the odd man out on this and as such might lose :). But I will try to explain my rationale. Let me break this down into two parts:

  1. The ability to specify individual packages from a repository.
  2. The ability to specify a revision per package.

I have a package that vendors files from vitess. Now vitess is a large repository and I only want two packages out of the entire thing. For this I would like to specify which two packages I want and leave the rest behind. For this I need (1).

Point (2) is mainly due to this: you have a stable package github.com/u/p/{a,b,c}, perhaps it is a utility repository with several repos, perhaps like golang.org/x/crypto and we wanted to update the bcrypt package leave the ssh package where it is at. With the current design tools can allow for this. This is what govendor allows.

There are times where you want to note an entire repository or sub-tree for either C files, resources, or maybe that's just how your tool works. That is why I am proposing adding the "tree": true field.

I use property (1) all the time and like selecting out package from a repo. I would like to retain (2), but I do understand objections to it. I would be interested in other's opinions on this too (try out govendor, not to use it, but to see how it works in this context).

@technosophos RE origin / std library patches: This is not FUD. This is an example from the vendor-spec itself:

        {
            "origin": "github.com/MSOpenTech/azure-sdk-for-go/vendor/crypto/tls",
            "path": "crypto/tls",
            "revision": "80a4e93853ca8af3e273ac9aa92b1708a0d75f3a",
            "revisionTime": "2015-04-07T09:07:15-07:00",
            "comment": "located on disk at $GOPATH/src/github.com/kardianos/mypkg/vendor/crypto/tls"
        },

This allows representing the package import path is crypto/tls, but get it from the azure repository (in this case it would be a patched version to allow go to connect to azure). govendor can handle this situation today because the vendor-spec allows it. It shouldn't be common, but it should be supported. This is part of the file format as the specification needs both a path and origin field and constant semantics assigned to each.

@technosophos RE existing formats: Early this year the core developers stipulated that the manifest file should be able to be reasonably read with the go std library. Either we create an ad-hoc format, or we use something kinda gross, but well supported like JSON or XML. The fields that the glide.lock file has by and large seem fine. I'm not sure if relevant, but the vendor-spec didn't come from govendor, govendor came from the vendor-spec. So the glide.lock file looks fine, but not yaml. That is a huge format to support and isn't in the std library.

@akavel RE changes to spec: I'm glad it is useful. Yes, before it noted down the relative path from the vendor file, so you could place it many places and have it resolve. It has been locked down some to just the vendor folder. The current method is slightly simpler but more restrictive. I'd love to hear other's thoughts on the matter. Relevant issue: kardianos/vendor-spec#39

@akavel RE additions: That was added early on as a suggestion. So yes, that is encouraged.

@mattfarina
Copy link

@kardianos a few things and i'll break them into bullets for easier reference:

  1. Part of the reason we're having some of these discussions is a lack of clarity on what is being solved. What are the requirements or use cases? An specs need to be crafted to solve those requirements or use cases. I've listed the use cases I believe need to be solved which this format does not do. Can you please list your requirements or use cases so we can evaluate them and look at how a spec works against them?
  2. Using vendor/*/ in an ignore file would cause user experience issues for numerous users. In the package management setups in languages Go users are coming from the lock files are not in the same directory packages are stored in. They are used to ignoring vendor/ or something similar. This minor difference will be a cause of headaches.
  3. Using go install ./... for tools with go:generate doesn't work well. If two different applications within the GOPATH require two different versions of the same tool the singular GOPATH/bin isn't capable of handling that. This is a problem beyond the scope of the vendor-spec but relates here as well.
  4. Can you explain the technical environment and reason for package level versions instead of repo level versions? What requirements are they needed for, how do they fulfill them, and how to handle the dangers of multiple packages from the same repo with differing versions (since that breaks known testable stability within the imported repo). Looking for technical details to backup your opinion and understand where they are coming from.

@kardianos @rsc Something occured to me while reviewing this material. As I work on Glide, watch requirements come in there, and discuss package management with those inside and outside the Go community I realized that this "spec" isn't born out of experience in the community. We're still learning what people need and are adapting. The vendor-spec has not taken off organically. In my opinion it's premature to put this or any other spec in the go toolchain.

I would suggest waiting for the GO15VENDOREXPERIMENT to be on by default, collect the requirements needed by its users, and make sure any spec meets those.

@kardianos
Copy link
Contributor Author

@mattfarina RE use cases:

  1. Know what revision was copied into the vendor folder.
  2. Enable standard user tooling for fetching remote packages at a given revision.
  3. Enables machine analysis of dependencies across the board, such as looking for vulnerable revisions (dvcs hashes) and mapping dependency usages.

@mattfarina RE go:generate tools: I'm all ears. What would you propose instead? Are you wanting a bin folder in the vendor folder or something similar?

@mattfarina Re experience: Most of us have many years of experience vendoring of some type in go. We are transitioning to the vendor folder now. If you want a varied experience and use cases, let's talk to people now, not later. @goinggo Bill, thoughts on this? You interact with many more people than I do.

@mattfarina
Copy link

@kardianos thanks for sharing your use cases. That helps me better understand where you're coming from. I have some comments on there.

  • For (2) there are problems when you fetch versions at a package level rather than a repo level. For example, if you fetch 2 packages in the repo at two different versions and they rely on a 3rd common package from the repo. Which version do you use there? And, this concept breaks atomic tested commits for the imported repo.
  • What tooling does machine analysis based on commit ids for vulnerabilities? Tools I'm aware of, such as David for node.js do analysis based on versions and version ranges (e.g., ^1.2.3). It's easier to know the latest non-vulnerable version number and evaluate if a version is after that than to know the last non-vulnerable commit id (hash) and if the current one is after that. With version numbers you don't need the complete commit history.
  • What kind of dependency usage (3) are you looking to map? You can already map the dependency graph today on godoc.org. For example, see the kubernetes apiserver package. With a lock file you can't really map versions required within the tree because you've stepped past that to the single commit in use for the tree. So, what kind of mapping are you looking to do that requires this?

I don't have a proposal for go:generate. That's still something to be worked out. It's one of the things we've not worked out making it premature to have a solution.

Package management in most programming language ecosystems have settled on one tool. Rust has Cargo. PHP has Composer (formerly Pear was used). Node.js has npm. You get the idea. The Go ecosystem has become quite divided. The tooling supplied by Godep (a longtime solution) was insufficient for many. Now the ecosystem is fractured and a wide array of solutions are being worked on in order to meet all the use cases people have. If the vendoring setup you talked about worked for the masses GB and Glide would not be gaining the following they are and there would have been no call for them in the first place. Those are just two of the many tools being created.

Trying to push this solution in without adapting to meet the needed use cases for many could cause further community issues. Package management has become a hot topic. Many of the elements of the go toolchain nailed it so there wouldn't be a need to debate things. Package management is something we need to get right for the majority of developers or not bring into the go tool.

@kardianos Could you possibly expand on you this spec would be needed for my use cases? Then I could see how it would fit into the broader package management situation.

@freeformz
Copy link

@mattfarina FWIW: I have no intention of wanting to take the concepts of godep and move them into the Go toolchain. I was stating that I can no longer rely on the go tool chain and am +1 a library so that I don't have to maintain my own internal versions of the tools as libraries.

@freeformz
Copy link

I share a bunch of those use cases with you, but I do not want to support
the "I want to use a version range" use case. I've been there done that and
it's a pita and usually ends up with very very narrow version specs and/or
relying on the lock file. Maybe I've just had a lot of bad experiences.

Missing from that list are "I'd like to vendor a related tool (some other
repos main package)". I see this mostly with tools like migrate and the
like.

I also need to know a little about the user's Go environment (mainly ATM)
the go version, so that I can build the code again using the same / similar
version of go (think go primary version, minor should not be relevant).

Beyond that and generally speaking though my main use case is I want a tool
to populate vendor/ with the code I need to build / run my app. That tool
should help me maintain vendor/ over time, but allowing me to diff / list
/ update / remove / add to it. vendor/ should always be checked in (for
various reasons). vendor/ should contain only the packages I need, not
entire repos to keep sizes down. vendor/ should optionally contain the
tests of the dependencies I place there so that they can be run.

My main response to this thread was a "+1" for a common library for tools
to share, not a specific tool implementation. I have mixed feelings /
technical motivations for converging on a specific tool. I am
"+1" though about converging on a shared library and a vendor spec.

On Tue, Dec 29, 2015 at 3:46 PM, Matt Farina notifications@github.com
wrote:

@freeformz https://github.com/freeformz I would be curious to hear your
take on the use cases
https://github.com/mattfarina/pkg/tree/mast%E2%80%A6 I'd previously
worked with others on.

I also want to make it clear that I don't hold anything against Godep. I
just know that it does not cover all the use cases developers have and want
to see the go tool handler more of them than Godep covers today.


Reply to this email directly or view it on GitHub
#13517 (comment).

Edward Muller
@freeformz

@freeformz
Copy link

BTW: WRT version ranges and updates ...

If two packages (a+b) rely on the same separate package (p) and p makes a new release, you need to do integration testing when you upgrade your copy of p. Anything else it just hoping it will work. When doing ruby in the past (and other languages as well), I hated having to update a dependency because it didn't really matter what the released version number was in the end. Yes, the version number gives you a clue / hint wrt compatibility, but that's it.

Because of that I'm +1 wrt version numbers (semver specifically), but in the end it just doesn't matter that package a uses version 2.4.1 of p and package b uses version 2.4.2 of p. 2.4.5 of p was released and it needs to be re-validated to work with both the version of a and b that you have. I've had to patch/upgrade either package a and/or b to work with the new p (which for arguments sake fixes a bug that I'm experiencing) more times than I care to reflect back on.

Also, just because p released 2.4.5 doesn't mean I need to upgrade any code I have using package p to the new version. I may need to (because of the aforementioned bugfix example), but that's on a case by case basis.

After reading this entire thread again I can understand why the use cases call for version ranges. However I still do not believe they are necessary in go, when using tooling like godep/govendor/etc and the vendor/ directory to record and check in your dependent packages. I do not want to inflict this pain onto the entirety of go when we can avoid it.

@freeformz
Copy link

Note: using something like govendor + vendor/ you would only have a single copy of "p" in use anyway, so there wouldn't be a state where a was using p @ 2.4.1 and b was using p @ 2.4.2. When you vendored them you would pick a version to record+copy or the tool would error and you would have to resolve it.

@sdboyer
Copy link
Member

sdboyer commented Jan 4, 2016

Yes, the version number gives you a clue / hint wrt compatibility, but that's it.

Yep, that's all they do. Wanting any more from them is a poor expectation in the first place.

That does not make them unuseful. It just means that instead being a tool for the machine to use in making a final decision about what works, they're a tool to help you go through the process of figuring out what works.

you would pick a version to record+copy

Any proposed solution, including one with version ranges, still requires you to make such a choice. The only difference is that, when version ranges are permitted, the machine can help you with that choice; it needn't just make it for you and pretend everything is fine.

That choice is hard. It will always be hard. The real benefit of the ranges is the additional information such ranges can express when you're dependent on a library (or two, or three) where one of these situations arises. If all they have is the commit id that they're pinned to, you (typically) have no idea why they're pinned to that version, and so have to go in and understand their code well enough to figure out whether or not you can move them to a different version.

If, however, they can specify a version range, then you're taking advantage of the fact that their knowledge > your knowledge when it comes to their library. Again, these are difficult decisions - I can't understand why you'd want less information on hand to resolve them. Sure, it's possible that the lib author did a bad job and put in an unhelpful or incorrect version range, but:

  • It's still probably faster to figure that out and go from there than it is to go in blind with only the commit, and
  • Once you figure out a saner range for them to use, you can send a PR to fix their poor choice. And then, the open source cherubim break out into song, as you'll have done some work that potentially helps someone else when they're in this same difficult situation in the future.

@freeformz
Copy link

@sdboyer I'm not sure how version ranges helps the machine make that choice? Version ranges are not required for the machine to help me make that choice. If I can fetch the current code, or any arbitrary revision after the one I have recorded, via vcs then I have everything I need to determine compatibility (aside from a tool to do it). I do think people need to version their packages / repositories though as it will help a developer make a decision when that tool says that versions (or revisions if version information is missing) 1, 2 and 3 are compatible, but 4 and 5 aren't because the public interfaces / structs / function signatures that the developers code is using have changed.

@sdboyer
Copy link
Member

sdboyer commented Jan 5, 2016

Version ranges are not required
via vcs then I have everything I need to determine compatibility

(pulling from the article I'm working on...)

You're right - they are not. Which is why I never said they were "required," or "needed." Necessity is not at issue; the question is whether or not they have supplemental information that makes it easier to determine compatibility.

I do think people need to version their packages / repositories though as it will help a developer make a decision when that tool says that versions (or revisions if version information is missing) 1, 2 and 3 are compatible

...well, this weirds me out, because this is pretty close to what I'm arguing for. Not sure where the communication is going wrong, so let me try to be more concrete. Say you have this dependency graph:
diamond-fail

main is your package, A, B, and C are dependencies written by other people that you've pulled in. Because A and B are pointing at different versions of C, compatibility now needs to be worked out. If the authors of A and B haven't specified version ranges, then all the information you have is that they want different versions of C, and you have to go in and figure out a compromise. Once you've found an appropriate compromise - let's say C-1.0.4 - you have to test the integration of it all together for your particular main package.

If, however, A and B do provide version ranges for their dependency on C - because the authors of those packages are good stewards, and have figured out which versions of C they actually can work with - then that's an automated step a tool can take and either present the result to us, or just accept it:

diamond-auto

...at which point, we test. Same as if we figured out the compatibility on our own. The difference is, in making this decision, we get to benefit from the knowledge the authors of A and B have about their own packages' requirements (expressed in the form of those version ranges), which is almost certainly more than we know about it.

@Crell
Copy link

Crell commented Jan 5, 2016

Quoting from the YC post that @sdboyer linked:

You don't pin versions in libraries, you pin them in applications.
Almost every package in the dependency graph should be using version ranges. That avoids version lock and makes it easier to satisfy shared dependencies. It also means libraries don't all have to bump patches that just change dependency versions.
But, in your application, in the root of the dependency graph, you pin everything that you're using right now. You check that in to ensure that everyone on your team and every machine gets the same versions of all of the dependencies.
You get good code reuse and deterministic builds.
This is exactly why Bundler separates out the lockfile from the Gemspec. It's unintuitive at first, but it works better than any other system I've seen once you grok it.

^^ That's the same point I'm making. It's the same conclusion that Composer reached for PHP. It's the same conclusion that Ruby reached. It's the same conclusion that the Glide team reached for Go, after fighting that conclusion for a while.

So if the languages that have built successful packaging tools have all reached the same conclusion (version range manifest file on libraries, pinned lock file on applications), what about Go is so inherently different that it shouldn't adopt a known-successful model? I don't mean about Go's status quo today (the status quot is obviously insufficient in this regard or we wouldn't be having this conversation), but what is intrinsic to Go that makes it so different?

That's what I don't get. When we know there's a model that's proven to work, gives everyone the flexibility they want/need, and and solves the problem space successfully, why wouldn't we go with that and benefit from everyone else's experience? (Inquiring baby Gophers want to know!)

@freeformz
Copy link

To me, in the end, it's about API compatibility, which computers are much better at figuring out than humans. With Go I believe we could use code analysis to determine API compatible versions and then let the developer choose which one to vendor instead of guessing. I'm fine with package meta data version ranges if it's just used by developers to provide hints on what to choose to do during a conflict: Will I need to edit my code, my deps code, their deps code or some combo of all of them.

In the end whatever standard gets adopted I'll have to support it, so here's hoping I've made my case.

Here is my original response: https://gist.github.com/freeformz/bd0d167dece99e210747. I aborted it though since I felt we'll just keep talking past one another.

@kostya-sh
Copy link
Contributor

Another possible (though probably not very common) use case to consider is supporting binary dependencies (when the source code is not available). E.g. see #2775, #12186. I can imagine that such libraries can be even distributed as versioned zip files (similarly to jar files in Java).

@kostya-sh
Copy link
Contributor

Two more related use-cases:

  1. As an application developer I want to use a single repository for the application code and its client library (example: a database app and a client library to talk to this database). To build the application I want to use pinned versions of dependencies. For the client library dependencies I want to specify supported version ranges.
  2. As a consumer of a client library (e.g. from the use-case 1) I want to vendor only the client package without the application code.

@kardianos
Copy link
Contributor Author

@Crell I agree that applications should pin/copy and "libraries" (packages) should use version ranges. I agree that it is good if packages are released.

The difference is static analysis and GOPATH.

If the application should pin dependencies, then a design file isn't required for application, just the revision and specific version it uses.

If the "library" should contain version ranges it should have a version range for each dependency it uses. Now let me constrain the problem of version ranges into two categories: (1) "I want my package to use a compatible API", and (2) "I want my package to use all the required features it needs". (Remember your engineering design class, user stories must not contain a technical implementation or technical requirements). In Go you can denote API compatibility with either a unique import path or a "major" release tag. In order to satisfy compatibility, you cannot remove a feature or API once added. If package authors choose to give a unique path to each "major" release, the feature set is a function of the statically knowable API or just the revision time. If a package author just uses a tag, then all we need to know is what the version tag is currently to know the major version we need. And if we can just use the current version as a range spec, then that is machine discoverable, again removing the need for a human editable design file.

govendor already pins revisions for end applications. It would be simple to inform govendor that this is a "library" and just write down what is in the environment, including revisions and any versions package authors have provided. The versions it uses should automatically give any end application using it more first-hand information.

If a package author really had an exceptional amount of knowledge of a needed package version range or wanted to blacklist a particular version, it would be trivial to add a field with a well defined interpretation of that field for human use that could be presented to any down-stream users of the package.

The main difference between glide and what I'm proposing here is I'm letting the machine do more of the work. If you want to write the design file yourself for everything, that seems silly to me, but again fine. I continue to see no technical reason why we could not write versions and version ranges to the same file.

@sdboyer
Copy link
Member

sdboyer commented Jan 5, 2016

@kostya-sh - re: binary deps, my gut is that that's mostly, though not completely, orthogonal, as we've mostly been focused on getting and arranging source code here. I'd have to research that more, though.

If I'm understanding your first use case, then yep, that makes a lot of sense.

If I'm understanding the second use case, then I have the same question as I've asked before: why do you care about getting rid of code that the compiler is going to ignore, anyway?

@freeformz -

I think our positions are actually quite close, though yes, we're talking past each other. That's at least partly my fault - I was assuming the disconnect was over a lack of understanding as to what performing a resolution with a range would actually look like, and so was trying to clarify that. But, looking at your gisted response, I think maybe we've reached the kernel of it:

I do not believe that we should rely on some arbitrary meta-data when code analysis and revision history can determine which versions (indicated by semver tags; or failing that which revisions) satisfy every package's usage (your main, A & B) of the dependency's (C) API.

Sadly, code analysis + revision history cannot do that. (If they could, I'd agree with you - no question, they'd be the way to go) At best, they can determine that code is not incompatible, not that it is compatible. Annoyingly, these are different things. Here's an example.

All of which should be taken to mean that static analysis is certainly helpful, but not sufficient, for answering this question. Trying to make it sufficient brings you into a full-contact brawl with type theory (on which I'm still quite a newbie) as you try to compute type equivalencies. That's not what Go's type system was designed to do - but it IS a goal of Hindley-Milner-like type systems (of which some variant is used in langs like Rust, Haskell, OCaml, SML). So yes, Go is different: its type system is simplistic, but sound, and that was very much the goal (as I understand it). Trying to do too much more will be swimming upstream against the design.

The reason I advocate for version ranges is because they are a sufficiently flexible system to accommodate both the helpful insights from the static analysis you want, and the insights about logical compatibility that an upstream author is more likely to have. Run your tool, and encode the results, along with whatever else you know, into a version range.

We're talking past each other because we're imagining...well, I guess different workflows, though I'm loathe to call it that. The article I'm writing tries to break it down into necessary states and necessary phases, largely without regard for worfklow. We'll see how that pans out.

@sdboyer
Copy link
Member

sdboyer commented Jan 5, 2016

I continue to see no technical reason why we could not write versions and version ranges to the same file.

Yep, probably could. But "could" isn't the question. "Should" is the question.

@kostya-sh
Copy link
Contributor

If I'm understanding the second use case, then I have the same question as I've asked before: why do you care about getting rid of code that the compiler is going to ignore, anyway?

As @mattfarina mentioned many times it is important that the spec addresses as many real use cases as possible. This is a real use case describing how some developers vendor their dependencies (vendoring sync2 package from vitess repository has been described in this issue discussion). Besides many golang.org/x repositories contain multiple packages that can be used independently (e.g. golang.org/x/net/ipv6 and golang.org/x/net/context).

I guess the main reason for doing this is efficiency. If I decided to check-in vendored dependencies to my application repository I would rather check-in 100kb client library than whole 10Mb of the source code. Additionally some VSCes (e.g. Subversion) are quite efficient at checking out a single directory (unlike Git). This might speed up build times in cases when vendored dependencies are checked out at build time.

It is also not very difficult to come up with a scenario when checking out the whole repository simply won't work. E.g. if I want to use two different packages from the same repository pinned to different revisions.

To be honest I don't care too much how the final spec will look like but it would be unfortunate if some of the use cases I described wouldn't be covered.

@freeformz
Copy link

@sdboyer

Sadly, code analysis + revision history cannot do that. (If they could, I'd agree with you - no question, they'd be the way to go) At best, they can determine that code is not incompatible, not that it is compatible. Annoyingly, these are different things. Here's an example.

Semver may not catch any of those, but code analysis would at least catch the v2 issue (as you stated code analysis can only tell me what's incompatible). Tests, as you, me and/or others have pointed out above would be required to catch the v3 issue, semver or not.

This is the crux of our disagreement AFAICT: You have faith in semver being meaningful beyond stating intent. I don't. In my mind semver is just intent and I would prefer to consider actual API changes and leave the rest to integration testing. We both view the world very differently apparently. Your article will be an interesting read for me I'm sure. :-)

I would love to get some sort of higher throughput (video / in person / etc) discussion wrt this issue. It's obvious that we all care deeply about it. Barring that I'll probably start bringing it up with every go developer I cross paths with.

@mattfarina
Copy link

I love the great conversation over the past couple days.

@freeformz I agree that some form of video, in person, or other better method of discussion would be useful. Let's see if we can figure out how to get that going. I'm happy to start figuring out the logistics of that.

To add some thoughts to the ongoing commentary:

  • @kostya-sh I agree with @sdboyer on the client and server in the same repo being an orthogonal issue. It's worth noting I've heard a lot of complaints when this happens on projects. In particular from those who want to consume the client without dealing with the server.
  • @freeformz There are a bunch of people and organizations who do not want to, for various reasons, check in dependent packages to their projects repo. Where you store packages is a slightly different problem from managing the versions you use. To make a widely usable solution we should support multiple methods of storing dependencies (in the parent projects repo and rebuilding from a lock file or other configuration file).
  • @freeformz I love the idea of parsing a codebase to know API compatibility. But, I don't think it's enough. I wonder about combining that with SemVer. I say it's not enough because of several reasons but I'll share one glaring example. I can't tell you how many times I've had to specify a range of ^1.2.3, != 1.3.4 because there's a buggy implementation sitting behind the otherwise compliant API. Looking at the API programmatically won't tell you this. Putting this information in some form of file communicates to a management program and application author consumer something a library author knows. Or, that to application authors working on the same codebase can communicate explicitly with each other. I don't see how parsing the API can do it all today but it's a great direction to start heading in. Do you see something else?
  • @kardianos The GOPATH can be a problem point as well. There are two that I'll share. First, it's a point of confusion for many new to Go. Helping people get past build issues because they misunderstand the GOPATH is one the single largest topics I spend time on with Go. Anything we can do to lower this entry level burden will be useful in on-boarding people to Go. For those who know Go it's often considered a pain point. That's why GB exists and it's gaining in popularity. If a solution here can help pull people back from that it would be useful in unifying the community. Second, if two applications being built and are in the GOPATH but rely on two different versions of a shared dependency it can be a problem to manage in the GOPATH. I, and many others, have experienced a problem where the right version is checked out for project A then I go to project B and do a build without updating the version only to have a problem.
  • @kostya-sh Pulling two different version from the same repo is generally considered a bad idea. It breaks any notion of atomic commits. There are diamond dependency problems. For example if you pull package A at version 1 and package B at version 2 while both A and B rely on C how do you determine the version to check out. At no point will this combination of package versions have gone through a test system. We should try to make something difficult for end users to screw up. Make the complexity simple for the majority. Unless there is something I'm missing?

In this problem space there are, at least, a couple distinct roles. Those who produce a package and those who consume it. If I were going to prioritize them I would prioritize the consumer slightly over the producer. What do y'all think of that?

@kostya-sh
Copy link
Contributor

@mattfarina

  1. @sdboyer "orthogonal comment" was about binary dependencies. This is something that currently doesn't exist in Go but might appear in the future. See cmd/go: work when binaries are available but source is missing #2775, cmd/go: provide a way to distribute library without source #12186.
  2. Two packages from the same repo do not have to be related. E.g. golang.org/x/net/ipv6 @ 0d2c2e17 and golang.org/x/net/context @ 3b90a77d2 - both come from the same repo. If I tested my application with certain pinned revisions them updating these dependencies separately is safer.

@sdboyer
Copy link
Member

sdboyer commented Jan 6, 2016

@kostya-sh - ah right, yes, sorry. I'm always going to struggle with splitting up an upstream repository, because it undermines commit atomicity of the upstream repository - and given how hard a problem space this is to build something both sane and usable, I like taking advantage of every bit of upstream information we can get.

I don't think golang.org/x repos following such a structure should be an example to follow. The Go authors wrote with a monorepo background, and a monorepo in mind, which is why we're having these problems in the first place. (The preceding comments here discuss this issue extensively).

I still struggle with the performance argument, though. It seems to me that exploring caching more would be preferable over carving up what amounts to generated code. Particularly for Go, where it's not necessary to fetch those packages beyond the build server (unlike an interpreted lang). And if the build server is ephemeral (e.g., hosted CI), at least some of them provide support for caching across ephemeral instances.

So, I can entirely see being convinced about it. But some (not all) of what I've seen about that so far seems to amount to complaints that "the tool doesn't currently do as well as I can manually." Well, of course not. But...cmon. Disk is very cheap. Network is relatively cheap. There is a point where it becomes preferable to eat it on those in order to reduce complexity of a real implementation.

@freeformz

Semver may not catch any of those, but code analysis would at least catch the v2 issue (as you stated code analysis can only tell me what's incompatible). Tests, as you, me and/or others have pointed out above would be required to catch the v3 issue, semver or not.

This is the crux of our disagreement AFAICT: You have faith in semver being meaningful beyond stating intent. I don't. In my mind semver is just intent and I would prefer to consider actual API changes and leave the rest to integration testing.

And even tests aren't sufficient, of course (Dijkstra: "Testing can only prove the presence of bugs, never their absence!"). But yes, you're absolutely right - semver ranges carry no guarantees whatsoever. They could be completely right, or completely wrong. What's important is that they're not mutually exclusive with static analysis.

If you're working and pull in a new project (A), which in turn has a dependency on another project (C) specified in a range, but you already had another dep (B) which also had a dependency on C, then when attempting to resolve the diamond, your tooling should ABSOLUTELY run static analysis on the A->C relationship to ensure that all the versions the semver range indicates are acceptable, actually are. Because yes - you shouldn't just take A's maintainer at their word. You'd be no better off than we are now in the unreasonable "just ensure tip always works" world.

So, let's say that main in my previous example is A, and C is the package offering the Square() func. Static analysis has knocked out v2 - great. You're left with staying with v1, or going to v3, or to some v4 (which isn't in my example, but it's easy to imagine one), any of which is permitted by the semver range.

So you go in, do the work, and figure out that A is actually incompatible with Cv3, but is compatible with Cv4.

This work you just did is extremely valuable. It should be recorded, so that no one ever has to do it again. Which you can do by filing a patch against A that further restricts the semver range to exclude v3. And now, when the next user of A comes along, they'll never hit that v3 pothole. They'll never even need to know it exists. (And the FLOSS cherubim sing.)

I think we all understand that there's a ton of uncertainty in software development. Superficially, semver may appear to just blithely ride that uncertainty train, or even make things worse. But all it's actually doing is taking a whole lot of awful, complicated shit that can happen, and providing a lens for seeing it all within a single field of view. (If you’re a fan of Cynefin, semver is an excellent example of an organizing system that moves a problem out of the complex space, into the complicated space.) While our individual builds must be reproducible and deterministic, the broader ecosystem will always be (in practice, from any one person's perspective) uncertain. All real software must constantly move back and forth between these two spaces. Semver facilitates a process by which we can inject more certainty into the ecosystem incrementally, as we learn more about it.

We both view the world very differently apparently.

Most people do :) Though I still tend to think, in this regard, maybe not so far off.

Your article will be an interesting read for me I'm sure. :-)

With any luck! Discussing over here has gotten me enmeshed in too much detail over there now, I think...I'm a bit stuck. Trying to pull back from the trees for the forest. Hopefully I'll have it done soon.

I would love to get some sort of higher throughput (video / in person / etc) discussion wrt this issue.

+1 from me.

@kardianos
Copy link
Contributor Author

My understanding of where we stand is as follows: I would like to try to determine a single file that might allow different workflows to work together using a single format. People on the Glide team don't want that because it would be a suboptimal design, it would be different than other languages, and copying the version range from a tools design file to the standard lock file would "hugely complicate the tool".

Here is my response to @mattfarina 's use cases:

  • application_information: Name, description, keywords are already present in source. Homepage and license are not. However, I don't see how this affects vendoring. (Looks like something for a central package manager). Maybe we could improve godoc.org, but not in this issue.
  • consistent_team_setup_with_private: Maybe you can't do that today with go get or similar tools that just work off an import path. You don't need a package spec for that. If there is a problem with private repo creds, then we could leverage existing stored credential files or make our own. Doesn't appear to relate to this issue.
  • contact_owners: Unless you need to automate the contacting and message writing, then why can't you just read the README or private message their github profile? Doesn't appear to relate to this issue anyway.
  • license_scan: If you vendor something or use a dependency, you had better review the license first. Sarbanes-Oxley reviews will require more then just oh, it is MIT. You need to know copyright assignment and actual license full text. Doesn't really relate to this issue anyway.
  • lock_version; Ah, here we go. You want to allow non-main packages to declare their allowed version ranges on each dependency and pin an exact version for main packages. We have established this. I can at least roughly agree with this. I think the vendor-spec + additions talked about could write this information down.
  • managed_vendored_dependencies: When building specific versions, I agree use the vendor folder to put in specific versions of your dependencies. I agree, organizations can choose to either check them in or fetch them after cloning or updating a repo. That's great. The above use case needs a meta-data file. This I roughly the same, but needs support for it from the tools.
  • single_import: I agree. The tool govendor has always done this and govendor uses vendor-spec so no problem here.
  • use_specific_version: looks like a more specific version of "lock_version". Same response. I'm fine with a meta-data file having the ability to note version ranges for packages of some sort and exact revisions for main packages. Defining a single field on vendor-spec should allow this.
  • work_with_private_packages: looks like a partial duplicate of "consistent_team_setup_with_private". Similar response. The answer isn't more meta-data files I would argue. However, I don't think it is relevant to this issue.
  • working_with_forks: Can't do that in Go and unless you want to build your own build system distinct from "go" and "gb", it will never do that. But it doesn't relate to this issue anyway.

So of the user stories you wrote down that relate to this issue, I really don't have a problem with them. I continue to not understand why vendor ranges can't live in a vendor-spec (lock typeish) file for those who wish to use them.

@freeformz
Copy link

I've been talking to a lot of people about this, both Gophers and not and of course opinions are all over the place.

I think I've come to the conclusion that semver+ranges are important socially more so than anything else. ATM a lot of packages don't release versions and/or change things up drastically on master at times. So basically anything that forces people to think more about releases is ++. With that said, my opinion atm, is that ranges should be limited to non main packages/libraries.

@mhoglan
Copy link

mhoglan commented Feb 10, 2016

Not sure if this conversation has moved on elsewhere, but I enjoyed reading through it as it is at the heart of the exact problems I have been struggling to deal with. Feel free to point me elsewhere if it has moved on in the last month.

@kardianos I would disagree with that working_with_forks is not related to this issue.

This is precisely the problem I keep having. Our product is using a 3rd party dependency (doesn't even matter really, happens with internal ones too), there is a bug or hotfix in that dependency affecting our product, that has to be fixed immediately and release a new version of our product. One of the typical ways you do this is to fork the dependency, fix the bug, build the product using the forked dependency and release. You then push the change upstream and close the loop later of having your application switch back to the mainline after it is merged.

I know there are multiple ways to solve this, but the easiest way would be to update a spec file that says use the following URL (fork) for import X;

  • I do not want to have to go rewrite (mutate...) import paths,
  • I do not want to have to copy code into some subdirectory
  • I do not want to have to play games with routing hostnames or other similar tricks at the environment level (think what people do with SSH configs to work with multiple GitHub accounts and different SSH keys...)
  • I do not want to have to script around it with the build process by having it clone the dependency into the path representing the upstream repository and then manage the remotes of the workspace and checkout the fork changes

I want to be able to make the changes on the forked dependency, make the fork available. Then update the application using the dependency. Ideally, all I should need to do is update a spec file that says, use version blah of this dependency. Since golang ties source, import paths, and other things related to projects so tightly together, it hinders these pivot points that almost every other language provides.

This is most evident when it comes into 'what is a version' of a dependency. Because golang ties the import path to the repo home (URL) of the dependency it implies that the version the of a dependency is only within the scope of that repo URL. I believe that to be not ideal.

A version of a dependency should be an 'instance' of that dependency, and an 'instance' of that dependency should be able to originate from multiple places, and thus that origin should be part of the scoping of the version. In golang, we are saying that origin should be URL addressable so it can be retrieved as an import. That would allow using forks.

@mhoglan
Copy link

mhoglan commented Feb 10, 2016

btw I do realize that the spec formats proposed in govendor and glide both address this origin aliasing capability. Was bringing the point above up more out of that I believe it to be a primary use case for using a manifest file for specifying dependencies.

@sdboyer
Copy link
Member

sdboyer commented Feb 12, 2016

Finally finished the article I kept mentioning.

@kardianos
Copy link
Contributor Author

@sdboyer I finished reading the article you wrote. I'm having a hard time getting past the "LOLZ CATZ" tone in it. There are many assertions of fact. For instance, I believe Dave's proposal was not accepted not because people don't want to encourage semver, but because it wasn't actionable by the any mainline go tool. I commend Dave for the proposal, but presenting Dave as the valiant hero who was shot down without good cause doesn't do anyone any good.

I think most of the technical points present in the article have already been presented here. Though from the writing style it is difficult for me to unravel when you are presenting a point of view, an assertion of fact, or a proposal for action; I may have not accurately understood everything you intended to convey.

A few responses:

  • A tool should and can work with any size of repo, monorepo or microrepos.
  • Using a dvcs to download source code doesn't limit the ability to work with individual packages.
  • Who uses a package manager is greatly determined by the language itself. For instance, in users of programs written in go shouldn't ever touch a package manager, they should touch end binaries. Developers of a given project should think about package managers, but only when updating dependencies. This is much different than PHP, Python , or Ruby.
  • In go, the build system will never know anything about the package manager, as it is the package manager's responsibility to put packages in the correct location for the build system, just as the compiler knows nothing about the build system.
  • I'm not a fan of JSON, but it is in the std lib where TOML is not (nor has it reached 1.0 yet). And YAML is sooo much more than a static configuration file, the spec is huge and extremely hard to implement. If you want to have a chance at someday integrating with the go tool, I would recommend against using YAML.

Some of your points don't seem to be founded in actual issues: you have paragraph emotionally targeting people who don't think we need reproducible builds. In the Go ecosystem I don't see that attitude to begin with, so even aside from your tone, there isn't anything to be argued there: we all want reproducible builds at some level depending on our exact needs.

You do offer a good summary of different issues present in specifying version ranges and a good point in that the developer can treat them as a suggestion and override them.

Thank you for your work on glide. I would encourage you to continue exploring what benefits you can get from doing static analysis on a project's dependencies that can augment or assist a manually created list of declared dependencies.


I don't see this issue going forward and will probably close it soon.

In govendor this conversation has pushed me to plan to support version ranges despite the pain I've seen them bring. I already plan to support directly fetching remotes and that is closer than it was before.

@sdboyer
Copy link
Member

sdboyer commented Feb 12, 2016

I finished reading the article you wrote. I'm having a hard time getting past the "LOLZ CATZ" tone in it.

There are a variety of strategies out there for getting people to read almost 13000 words. You get to make your stylistic choices, I get to make mine. The substantive points remain.

For instance, I believe Dave's proposal was not accepted not because people don't want to encourage semver, but because it wasn't actionable by the any mainline go tool.

I think that's an inference you made, not something I said. I simply said that it failed; I didn't say why.

I commend Dave for the proposal, but presenting Dave as the valiant hero who was shot down without good cause doesn't do anyone any good.

I've amended the wording there to be explicit that it failed because it lacked concrete outcomes, but again, I don't think I actually said that. What I DID say was that it probably wasn't incorrect that it failed.

The valiant-ness refers to the willingness to jump into what was sure to be a fractious discussion. I'd ascribe the same to you for this thread, even though I don't agree with your approach.

A tool should and can work with any size of repo, monorepo or microrepos.

And I said as much. In fact, I was quite careful about saying it. What I said was that monorepos were harmful for sharing - not that they should be neglected by a tool.

Using a dvcs to download source code doesn't limit the ability to work with individual packages.

Not much to say here except that I don't think you really understood the constraints presented in the article.

Who uses a package manager is greatly determined by the language itself. For instance, in users of programs written in go shouldn't ever touch a package manager, they should touch end binaries. Developers of a given project should think about package managers, but only when updating dependencies. This is much different than PHP, Python , or Ruby.

The differences are not so big, as...well, the entire article more or less lays out. But directly to your point: Cargo/Rust.

But again, now for the third time, this isn't inconsistent with what I wrote. Right from the outset, I indicated that go get, being an LPM, is a tool at least in part for end users. The issue is having an LPM that's not underpinned by a PDM - the developer tool.

In go, the build system will never know anything about the package manager, as it is the package manager's responsibility to put packages in the correct location for the build system, just as the compiler knows nothing about the build system.

Again, now for the fourth time...this is basically the text on one of the captions.

I'm not a fan of JSON, but it is in the std lib where TOML is not (nor has it reached 1.0 yet). And YAML is sooo much more than a static configuration file, the spec is huge and extremely hard to implement. If you want to have a chance at someday integrating with the go tool, I would recommend against using YAML.

Yep. That's why I didn't touch this in the Go section, but only in the general section. @bradfitz outlined this preference a year ago. It doesn't change my stance on what the right general decision is, of course, but it's a relatively minor issue that would have distracted from main the point.

Ironically, using a non stdlib library for tooling is the kind of thing having a proper PDM would make easier.

Some of your points don't seem to be founded in actual issues: you have paragraph emotionally targeting people who don't think we need reproducible builds.

I do indeed. In part for levity, and in part because, as I was explicit about in paragraph three, the article is targeted at more than just Go. So yes, that is an actual issue - just not for Go.

In the Go ecosystem I don't see that attitude to begin with, so even aside from your tone, there isn't anything to be argued there: we all want reproducible builds at some level depending on our exact needs.

Nor do I see that attitude. ...and, also, I said as much in the article:

While there’s some appreciation of the need for harm reduction, too much focus has been on reducing harm through reproducible builds, and not enough on mitigating the risks and uncertainties developers grapple with in day-to-day work.

The value of including it all, even the stuff that doesn't immediately narrowly apply to your particular language of concern, is that it can help expand your perspective on what the overall problem looks like. Which was the high-level goal of the article.

You do offer a good summary of different issues present in specifying version ranges and a good point in that the developer can treat them as a suggestion and override them.

Thanks. I'm glad you found that useful.

I don't see this issue going forward and will probably close it soon.

That's a shame; per the article, my sense is that we could indeed make incremental progress by defining a proper lock file. Perhaps it would be best to start a clean issue for that, though.

@golang golang locked and limited conversation to collaborators Jul 26, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests