Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-lying versions with convenient api evolution #1185

Open
LasseBlaauwbroek opened this issue Jan 6, 2015 · 39 comments
Open

Non-lying versions with convenient api evolution #1185

LasseBlaauwbroek opened this issue Jan 6, 2015 · 39 comments

Comments

@LasseBlaauwbroek
Copy link

This proposal combines elements from #833 and a discussion on the irc channel about versions. The goal of the proposal is to allow for "non-lying" versioning of modules and enable convenient api evolution.

For the sake simplicity in this discussion, I'm going to assume that the API's of two versions of a program are compatible if and only if the ABI's of those versions are compatible. If this is not the case, some additional complexity might need to be introduced.

In this proposal, we do not derive any semantic meaning from version indicators. This means that version 1.1.0 is not neccesseraly "better" or "newer" than version 1.0.0. We just see version indicators as strings to uniquely identify versions.


Providing for old modules

Assume we have the following modules:

module org.example.library "1.0.0" {}
module org.example.application "1.0.0" {
    shared import org.example.library "1.0.0";
}

When a new version 1.1.0 of the library module is published, there are by default no restrictions on any compatibility with older versions of the module. This means that the application module cannot at runtime load the library version 1.1.0 instead of 1.0.0. The maintainer of the application has to explicitly change the dependency to 1.1.0 to use it.

In order to provide backwards compatibility with older versions, one has to use the provides clause:

module org.example.library "1.1.0" provides "1.0.0" {}

The provides clause indicates that version 1.1.0 can be freely and without consequenses substituted for version 1.0.0 if this is neccessary. When the new module version is uploaded to a repository (Herd), the validity of the provides clause is checked by seeing if version 1.0.0 actually exists, and then seeing if the API's of the two versions are actually compatible. This is of course not a perfect guarantee that the versions are fully compatible and no bugs have been instroduced, but it should come a long way.

A module can have multiple provides clauses. A provides clause is transitive. This means that if module 1.2.0 provides 1.1.0, it will also provide 1.0.0.

Replacing old modules

When a bug or security issue is fixed in a new version, one generally wants to stop using the old version, because it is dangerous. This can be automatically accomplished with the replaces clause:

module org.example.library "1.1.1" replaces "1.1.0" provides "1.1.0" {}

When library version 1.1.1 has been released, version 1.1.0 may not be used anymore because it has been replaced. This is determined at runtime. This means that a client needs to periodically check for new module releases, to see if the current modules are still valid (the time between checks are probably client-dependent, and possibly user-driven).

The replaces clause is transitive in the sense that if 1.1.2 replaces 1.1.1 and 1.1.1 replaces 1.1.0, the only currently valid module is 1.1.2. I would, however, not call this a transitive property. Note that in the above example, version 1.0.0 is not replaced. If the fixed bug is also present in that version, the new module should look like

module org.example.library "1.1.1"
    replaces "1.1.0" 
    replaces "1.0.0"
    provides "1.1.0" {}

Note that the replaces and provides clauses are completely separate constructs. This can be usefull when a critical security bug has been found that cannot be solved without changing the API of the module. In this case one publishes a new module version 1.2.0 that does not provide 1.1.0:

module org.example.library "1.2.0" replaces "1.1.0" {}

This means that if an application relies on the library version 1.1.0, it will give an error at runtime because no suitable library version has been found. This then needs to be fixed by the maintainer of the application by explicitly upgrading the library to version 1.2.0.

This can also be usefull when the support cycle of an old release comes to an end. One then publishes a completely empty module that replaces the old releases.

API deprecation

The above constructs allow for API expansion (adding new classes and functions), and creating a completely new API, that breaks from the old API. There is, however no possibility to deprecate and replace an old API. In order to do this I would like to adapt the deprecated annotation, to be used in approximately the same way as the protocol annotation in #833:

module org.example.library "1.2.0" replaces "1.1.1" provides-deprecated "1.1.1" {}
 shared object system {
     shared deprecated("1.1.1") Integer now => ... ;
     shared Instant now => Instant(now);
 }
shared deprecated("1.1.1") class Foo {}

For each deprecated annotation, it is checked that the argument version is present in the provides-deprecated clause in the module-descriptor. (TODO: It might be a good idea to use something somewhat more typesafe than strings for version indicators?) The provides-deprecated clause is a semi-transitive version of provides. Since 1.2.0 provides-deprecated for 1.1.1, and 1.1.1 provides for 1.1.0, version 1.2.0 also provides-deprecated for 1.1.0. However, a future version 1.3.0 that provides 1.2.0, only has to support the non-deprecated API parts, and does not have to support 1.1.1 in any way (it can do this however, by also including a provides-deprecated "1.1.1" clause).

When an application imports version 1.2.0, by default, it has no access to the deprecated parts. In order to access the deprecated API parts, one has to import using enable:

module org.example.application "1.0.0" {
    shared enable("1.1.1") import org.example.library "1.2.0";
}

Now, the application can access the deprecated API in approximately the same way as mooted in #833 (again something more typesafe than strings may be needed):

Integer now = system."1.1.1"::now;
"1.1.1"::Foo foo = "1.1.1"::Foo();

Note that if the application upgrades from 1.1.1 to 1.2.0, all references to system.now need to be fixed, but since this is a statically typed language, this should be managable. Besides, I think this will encourage the migration process to the new API. Usage of deprecated API's should result in a warning, unless deactivated.

When a new version 1.3.0 that provides 1.2.0 is introduced, the application can only use 1.3.0 if it does not use the deprecated API. If, however, 1.3.0 also has a provides-deprecated "1.1.1" clause, the application can use the new version even if version 1.1.1 is enabled.

Note that an application that imports version 1.1.1, can still without any issues be combined at runtime with version 1.2.0.

Module version resolution

Because new versions of the library are possibly not tested by the manager of applications that use the library, we want to use the library version that is closest to the specified version. This way, we minimize the chance that new bugs are introduced in the application. In order to do this, we introduce a timestamp in the module of the moment it was uploaded to the repository.

When the application has multiple shared imports of the library module with different version indicators, a list of possible versions that provide all version indicators is computed. And then the version with the earliest timestamp is selected.

When a module has multiple non-shared imports of the library module, the module is loaded multiple times for each import, where the closest version is used each time.

In order to triage a bug that arises when a new module is loaded, the application-user should be able to run a tool that outputs the module versions that have been loaded. It should also be possible to force the loading of a specific module at runtime, for testing purposes. And possibly also in order to keep the chosen version stable if the library authors abuse their powers to publish new versions.

@gavinking
Copy link
Member

I like this proposal. We should seriously consider this approach.

@gavinking gavinking added this to the 1.5 milestone Jan 11, 2015
@tombentley
Copy link
Member

I appreciate that some thought has gone into trying to mitigate the situation when a library author erroneously publishes a new module that provides/replaces an older version, but accidentally breaks an application.

What's suggested gives a way of dealing with the problem locally (e.g. within an organization that uses an application that uses the library), but for a popular application that could still amount to a lot of peeved sys admins who each have to spend time figuring out it was library X buried deep within the dependencies of this application (that they know nothing about) that caused the problem.

It would be nice that once someone, somewhere, has figured out that X is to blame for the new problem, there was a standard way for them to share that the provides clause is in fact erroneous, perhaps by notifying the repository (e.g. herd).

If other people knew that this supposed security update actually introduced a bug they might decide to ignore the update until the issue is fixed. Also, the library author needs a way to find out that they messed up. Maybe doing this in an automated way is over engineering the whole thing, but I thought it was at least worth mentioning...

@FroMage
Copy link
Member

FroMage commented Jan 12, 2015

Parts of this proposal, like provides, replaces exist in the Debian package metadata.

@LasseBlaauwbroek
Copy link
Author

@tombentley
I'm not really sure how the ceylon team envisions the usage of the package repository once ceylon really starts to take off. How should modules be used by operating systems like linux distributions? Is the idea to only put a wrapper script of the application into the specific distribution repositories, and let all the modules be downloaded from the cental Herd? Or should each operating system distribution start maintaining their own herd repository? Or, alternatively, should all neccessary modules be packaged into the native repository system of the distributions?

If the idea is to let major linux distributions maintain their own Herd repositories, the problem is essentially solved, because they will probably also have a test repository (as they do for their native repositories). I would, however, question if these linux distributions want to commit to the overhead of maintaining an extra server, and all the modules, just for one specific programming language.

If this is not the idea, and all users should download their modules from the central Herd repository, it might be a good idea to create an additional "trusted" repository, managed by the ceylon team. This repository contains the most popular and essential applications, and their dependencies. If a developer pushes an update of a module to the "non-trusted" repository, the ceylon team will test it (possibly through a "testing" repository), and then upload it to the "trusted" repository. This way, sysadmins have some guarantee of always having non-buggy applications if they use the trusted repository, and can choose to use applications from the non-trusted repository at their own risk.

I would be very interested in hearing how the ceylon team sees the whole repository-thing growing in the future.

@tombentley
Copy link
Member

How should modules be used by operating systems like linux distributions?

Well, that's a problem for those distributions (even if we figured out what we thought was best, we couldn't force that onto those distributions). And anyway they're better placed to know what their users ultimately want.

If a developer pushes an update of a module to the "non-trusted" repository, the ceylon team will test it (possibly through a "testing" repository), and then upload it to the "trusted" repository.

I don't think really works as it requires a lot of testing effort by the Ceylon team, who are then in the firing line ifwhen they get it wrong. No one really wants that job, do they? Personally I think it works "better" if a repo can act as an impartial aggregator of people's solutions to version hell problems. But this vague idea is in danger of derailing discussion of your proposal.

@LasseBlaauwbroek
Copy link
Author

No one really wants that job, do they?

Well, I would say that a big chunk of the work done in linux distributions is exactly that job, so clearly some people are willing to do it, even for free :-)

If there is not going to be a trusted repository in one way or another, there are only two options in my view:

  1. Do not implement the proposal above and keep versions fully static. Application developers will have to migrate dependencies by hand. This will ensure applications keep working properly, but comes with all the obvious downsides.
  2. Implement some sort of social construct in Herd like you propose, were people can flag bad modules. This turns Herd essentially in a bug tracker. But I think this solution will never be perfect, because it will always react to problems after they have occured.

@luolong
Copy link
Member

luolong commented Jan 22, 2015

I would say, keep these two issues separate for now.

  1. Declarative non-lying versions as outlined in the original proposal and a module resolve4r changes to make this possible.
  2. Updating/upgrading Herd to support redirecting requests to a specific version of module to newest version that replaces it, is another issue that can be implemented separately.

If or whether we need testing/validation around herd modules to ensure version compatibility is a social problem first, technical and organizational second, so I suggest we should postpone this decision until we know what are the actual issues here.

@LasseBlaauwbroek
Copy link
Author

@luolong I agree, with the exception that I think it would be a good idea to add the possibility for

  1. Giving the programmer the option of opting out of this versioning system by only using static module versions (by using the static annotation or something like that on module imports)
  2. Giving the sysadmins the option of opting out on the versioning system at runtime using a configuration option

This way, if problems might arise, there is already a simple mechanism in place to remedy the situation.

@luolong
Copy link
Member

luolong commented Jan 22, 2015

I think what is missing here is the assemblies spec that Gavin has been talking about.

The way I understand it, assemblies would be a way for solidifying the selection of module versions deployable/runnable artifact.

Maybe even to the point where dynamic module resolution would be disabled or restricted by the assembly to the set of well controlled stable/fixed set of module/version mappings.

Maybe even deployed as a single executable unit.

@LasseBlaauwbroek
Copy link
Author

From the available information on github it is not clear to me if the purpose of assemblies is to package all dependencies, and not to use an external module repository at all. If that is the case, the module resotulution is, by design, static.

However, the point @tombentley makes is that it could be very difficult for a sysadmin or app developer to know which dependency is buggy and should be replaced by an earlier version if it is burried very deeply. The assembly level may therefore not be the correct place for this.

@LasseBlaauwbroek
Copy link
Author

Note that I'm assuming that #1134 is not needed anymore if this proposal is implemented, because if there is a version conflict that cannot be resolved, the two modules are really incompatible (or the library author does not make correct use of replaces, provides and provides-deprecated). So the only reason to force a module version is because of a (temporary) mistake by a module author.

@gavinking
Copy link
Member

From the available information on github it is not clear to me if the purpose of assemblies is to package all dependencies, and not to use the an external module repository at all. If that is the case, the module resotulution is, by design, static.

Yes, that's what I have in my head at least...

@jvasileff
Copy link
Member

Well, I would say that a big chunk of the work done in linux distributions is exactly that job, so clearly some people are willing to do it, even for free :-)

I think Herd is a very different thing than a specific Linux distribution, such as RHEL v.Latest. And even if it isn't, would the Linux community succeed with only one distribution? Would it be ok to force early adopters and laggards into using exactly the same versions of all software?

Likewise cars are very different than RPMs. Consider that RPMs include patches and customizations, and when included in a repository, are carefully reviewed, patched, built, etc, by trusted vendors such as Red Hat. They are also usually tailored to a specific version of a distribution, whereas cars are universal.

Do not implement the proposal above and keep versions fully static. Application developers will have to migrate dependencies by hand. This will ensure applications keep working properly, comes with all the obvious downsides.

I don’t see the proposal as a whole being inseparable from the replaces functionality. In fact, it seems that replaces could be removed entirely, relying on options from only versions explicitly specified in the dependency tree. Or, perhaps replaces could be re-imagined as a sort of version advisor, providing upgrade recommendations.


A module can have multiple provides clauses. A provides clause is transitive. This means that if module 1.2.0 provides 1.1.0, it will also provide 1.0.0.

I think this makes sense, but in practice the provides list will need to be denormalized and made available in each package. Otherwise, it would be necessary for all versions of all required modules to always be available, which isn’t feasible when you consider internal vs. external repositories, offline mode, etc.

And given the lack of a central authority, it would be impossible to validate the denormalized provides list, which makes it unsafe to rely on a claimed transitive property.

That’s not to say this shouldn’t be a recommended practice. Herd could even produce a warning when a violation is detected.

we introduce a timestamp in the module of the moment it was uploaded to the repository

I understand the intent, but this introduces a problem that critical metadata is:

  • generated arbitrarily (what if I upload version “2” before version “1”?)
  • generated from a potentially unreliable source (the server’s clock)
  • generated without consistency guarantees across repositories

When the application has multiple shared imports of the library module with different version indicators, a list of possible versions that provide all version indicators is computed. And then the version with the earliest timestamp is selected.

Interesting. I guess this would usually result in the “newest explicitly specified version”, but could also be a later version, if the later version is more compatible in a way that matters.

While powerful, I think a concern here is that the repository itself becomes a dependency, in that the composition of the application will change based on the repository it is compiled or executed against.

In practice, I think it would be very rare that a newer-than-newest-specified module would be required, so this more advanced feature may not be worth the tradeoff.

So the only reason to force a module version is because of a (temporary) mistake by a module author.

I think any scheme that requires ultimate trust be extended to dozens of module maintainers will cause significant problems. Just think of the number of attack vectors and failure opportunities!

@jvasileff
Copy link
Member

Are repeatable builds and runs considered to be a goal in all of this?

What are the actual module dependencies? Are they just the items listed in module.ceylon, or do they include the point-in-time state of enabled repositories?

What is the definition of a Ceylon repository? Is each repository just some subset of the universe of cars, where builds will either succeed or fail based on the completeness of enabled repositories (as in the Maven world), or are they curated selections of cars that help shape build and execution environments?

Who will need to maintain repositories? Just organizations concerned about licensing issues, or anyone desiring control over build and execution environments?

@LasseBlaauwbroek
Copy link
Author

it seems that replaces could be removed entirely

I think replaces is quite essential here. It keeps a balance in what version is selected, together with the fact that we always select the earliest possible version. Selecting the earliest version minimizes the chance for introducing changes to a library, that the library user has not anticipated. On the other hand, replaces provides the author of a library with the options to force an upgrade if a (critical) bug has been found. I think this can be a powerfull mechanism.

Another very convenient thing about replaces is that library authors can keep only the most recent versions "active", by replacing all older versions. Suppose the library has a current API and some older legacy API. Withing both API's, only bugs are fixed, or new functionality is intruduced using the provides-deprecated functionality. Then they can make sure that at all times only two modules are "active". This way, it can be much easier to debug a problem because you can (more or less) guarantee that one specific module is used.

A third important usage of replaces is the "Oh, crap we made a mistake, roll it back" usecase, when a module is published that turns out to be completely wrong. One can the immediately publish a new module that replaces it.

the provides list will need to be denormalized and made available in each package
[...]
And given the lack of a central authority, it would be impossible to validate the denormalized provides list

The transitive list would indeed have to be expanded. I don't really see a problem here. As far as I see, there is a central authority. This is the repository to which the module is first uploaded (probably Herd). Everyone (including users and repositories relying on this repository) that uses the repository is assumed to trust it. The provides list can be expanded against the modules present in the repository on upload (either by the publish tool or Herd). At this point the provides list is assumed to be trusted and does not need to be checked anymore. What security issues do you see arising from this?

I understand the intent, but this introduces a problem that critical metadata is:

  • generated arbitrarily (what if I upload version “2” before version “1”?)
  • generated from a potentially unreliable source (the server’s clock)
  • generated without consistency guarantees across repositories

I agree that this could be a problem. An easy solution to this, is to instead of a timestamp, use an integer that is incremented everytime a version is uploaded. This integer is then a linearization of the version graph, and can be used instead of the timestamp. This solves your 2nd objection.
As for the first objection: I don't really see this as a problem, because why would you upload two versions that provide the same older version at (approximately) the same time? The only time you want to upload two versions at the same time is if they are both from separate version 'branches'.
The third objection: I assume each library has a master repository, to which it is always uploaded. All subsequent repo's should simply copy the timestamp (integer) of this master repo. Even if multiple master repo's are used for the library this is no problem as long as they are not both used by the user. I think that the chance of this becoming a real problem is fairly small. And as far as I know there are no better options.

Interesting. I guess this would usually result in the “newest explicitly specified version”, but could also be a later version, if the later version is more compatible in a way that matters.

It would result in a later version when the "newest explicitly specified version" has been replaced.

While powerful, I think a concern here is that the repository itself becomes a dependency, in that the composition of the application will change based on the repository it is compiled or executed against.

Well, this is always the case with repositories that provide any kind of resolution scheme. The only way to prevent this is by keeping the static model currently in use.

In practice, I think it would be very rare that a newer-than-newest-specified module would be required

I think it would happen frequently. Whenever a bug has been fixed in a library, and the old version is replaced.

I think any scheme that requires ultimate trust be extended to dozens of module maintainers will cause significant problems. Just think of the number of attack vectors and failure opportunities!

If some entity (organisation, sysadmin, user) wants more security, it should setup a personal repository in which only tested modules are included. In practice everyone is always at security risk when using third-party applications, because you never know what kind of backdoor is installed.

Are repeatable builds and runs considered to be a goal in all of this?

I would say no. This is more in scope of assemblies, because as @gavinking says above, in that case the assembly is its own repo, and does not use any other repo's.

What are the actual module dependencies? Are they just the items listed in module.ceylon, or do they include the point-in-time state of enabled repositories?

I don't know what you mean by this exactly?

What is the definition of a Ceylon repository?

As far as I know, except for Herd ceylon repositories are actually just HTTP (Webdav) servers. They do not build or check anything. However, the publish tool should check that if a module is uploaded, all its dependencies are actually in that repo. This way, unless you circumvent the publish tool, a repo should be in a consistent state.

Who will need to maintain repositories?

Anyone who wants to... In principle every user maintains a local repo on their computer.

@jvasileff
Copy link
Member

@LasseBlaauwbroek, thanks for the reply. But despite trying to convince myself otherwise, I believe most will demand control and repeatability. I say this after thinking about the various concerns:

Developers like to know "what changed?" and A/B test things when problems occur. QA testers like to do the same, but even before problems occur. Ops folks don’t want anything to change without their knowledge. Lawyers require that the licenses of all dependencies be reviewed. Security policies prohibit running code from unknown sources. Customer support folks want to have input into upgrade scheduling. The list goes on.

I can, however, think of two scenarios where “latest version” would be nice:

  1. As an upgrade advisor feature, to analyze dependencies, report vulnerabilities, etc.
  2. At runtime, for consumers (acknowledging that we all can be identified as consumers at times)

Many of the same reliability, trust, and safety concerns are relevant for 2, but they may not always apply, and are certainly more subjective in this case.

So, at it’s foundation, I believe this should be built with a repeatability guarantee and be repository agnostic. Support for replaces and any other features that are in conflict with these items should be optional and tailored for the two use cases above.

I believe that it should be possible to adjust the original proposal to meet these requirements without sacrificing its novel and interesting aspects. And, in fact, this might help it sidestep some potential problems, such as what to do in the case that replacement modules make incompatible changes in their shared dependencies.

Now, you won’t be surprised that my preference would be for the “latest module” feature to be turned off by default at runtime. But regardless, with adequate configuration options, everyone’s needs can be met. I suggest offering:

  • A global setting (CEYLON_OPTs?)
  • An assembly descriptor setting (fine grained/per module options would be nice here)
  • A command line option

I’ll hold off on responding to the other items, as most of the them are impacted by the direction on repeatability and repository agnosticism.

@LasseBlaauwbroek
Copy link
Author

So, I think our opionons on this matter are fairly close. Some of the things you propose are already in my proposal or in later comments. The things we disagree on are a bit subtle, so I'm having a little trouble properly materializing them on paper and connecting the various concerns here. But here it goes anyways (note that if I seem agressive here, this is not my intention, I'm just trying to make it clear):

First, you kind of seem to downplay the whole replaces concept to "latest version". I think this is a oversimplification that seems to make the concept very bad for repeatablility and what not. Altough I acknowledge that it does bear certain risks, I think it is by far not as bad as you make it out to be. Some differences between replaces and "latest version":

  • replaces, in combination with provides makes sure the API´s of the replacer and replacee (nice words ;-) ) are actually compatible. This can and should be mechanically checked by the publishing tool. "latest version" is a much blunter instrument.
  • The author can also be much more specific about the intention of a new version:
    • If a version replaces and provides another version, the author explicitly says the new version is a replacement for the old version, and can (and should/must) be used without problems instead of the old version;
    • If a version replaces but not provides another version, the author explicitly says the old version contains some critical problem that cannot be fixed by being backwards compatible. Depending on the policy, and application using the old version either won't run at all, or will issue a severe warning. This situation should obviously be used sparingly
    • If a version provides but not replaces another version, the author is not quite sure if the new version can cause subtle problems; in this case it is up to the programmers that use the library to upgrade and make sure everything is oké

In light of the points above, I argue that replaces can also increase security and reliability if used properly. Therefore I would argue that it should by default be on. This also because the kind of company you are describing that wants to keep a very tight grip on the dependencies already has a plethora of options to do this:

  • Publish their applications using assemblies only
  • Maintain their own online repository and distribute only a run script that executes their application like the following snippet. I think this is a very powerfull option that allows a company to upgrade dependencies in it's own pace, by using the machinery of this proposal.
ceylon run --no-default-repositories --cacherep="~/app-specific/local/repo/" --rep="http://company.org/app-specific/repo"
  • I already proposed the static annotation on module imports to force the usage of that specific version

And to respond to some more specific things you wrote:

So, at it’s foundation, I believe this should be built with a repeatability guarantee and be repository agnostic.

I think that if a company needs the repeatability guarantee, they should just maintain their own repositories. That way, they can easily have their repeatability. As for being repository agnostic, I don't really see how you ever create a repository agnostic system. I can simply upload two different modules with the same name/version to two repo's, and you are already not repo agnostic anymore. And besides, I do not really see what you want to accomplish with this property. Could you elaborate on this a bit?

And, in fact, this might help it sidestep some potential problems

I do not immediately see a problem with changes in shared dependencies, a shared dependency is simply part of the public API of the library. Besides, I think that objections/problems of a more technical nature should not be part of the more philosophical/social discussion we are having about this issue right now.

As an upgrade advisor feature, to analyze dependencies, report vulnerabilities, etc.

I believe there should indeed be a tool that can be given a module, and two (collections of) repo's. This tool then returns a report about the differences between the dependency versions that are loaded if the module is run using the two repo collections. This way, a company can detect new relevant module versions in upstream repositories and include them in their own repo.

I believe I have reponded to all your points, but if I'm forgetting something, please let me know.

@LasseBlaauwbroek
Copy link
Author

I'm sorry, it accidentally hit the comment button. Let me edit my comment further.

@LasseBlaauwbroek
Copy link
Author

Oke, done.

Note that if the ceylon team thinks that this proposal is the general direction they want to take this thing to, I would be happy to work on a more detailed design document that elaborates on implementation and potential issues. This way, the intricacies of this proposal can be better developed and individual issues can be discussed easier.
Please let me know if this is wanted, or if it is to early for this kind of thing.

@jvasileff
Copy link
Member

@LasseBlaauwbroek, my intention was not to simplify or downplay the sophistication of your proposal, but instead to suggest that repeatability and repository agnosticism be worked in as foundational principles. If you reject this, then we lack a shared context to debate or refine many of the details.

maintain their own repositories

Given that every project, even for individuals, has different needs at different times, the logical conclusion of this is that each project will need to maintain its own repository. So, put to a not-so-unreasonable extreme, we either trust an unknown developer to know what’s best for us, or we go back to the old-school method of committing dependencies to source control.

I can simply upload two different modules with the same name/version to two repo's

Of course. That would be against the “rules”.

Could you elaborate on this a bit?

It’s a key component of repeatability.

The question is whether repositories should be simply a storage resource, or a meaningful dependency of the application in its own right. A repository agnostic system will produce identical results regardless of the repository in use (the run/compile will simply fail if required module is not available).

A helpful exercise may be to try to explain exactly what it means to compile and run the code from git tag “T" on date “D" using repositories “R1" through “R5”. With a repository agnostic system, you can disregard “D” and “R1-5”.

Examples:

Maven is repository agnostic and supports repeatable builds. The makeup of a repository will have no impact on dependency resolution other than for dependency resolution to fail if the resolved versions are not available.

Yum is not necessarily repository agnostic, as it generally aims to find the “latest” version of packages. Of course, that is also the point, yum repositories are used to keep systems up to date. To make up for the lack of repeatability, most yum repositories come with certain commitments, such as promising no major version upgrades, no breaking changes in dependencies, or things like security patches only. Most yum repositories do not rely on the actual library/application developers to make those determinations.

The proposal is not repository agnostic, with version resolution being highly dependent on its contents. And as originally presented, there are factors that affect resolution beyond what is present with yum, such as the order of upload and the presence/absence of “bridge” versions.

As it stands, the proposal is neither repository agnostic like Maven, nor does it provide a patched and curated offering as is standard with yum.

I’ll be happy to clarify any of this if you’d like, but otherwise I think I’ve made my point, and I’ll just sit back and watch this evolve for a while.

@LasseBlaauwbroek
Copy link
Author

Oké, so I agree that repeatability is important, and your git example is very compelling. Let me try do decompose this some more. If at any point you see a problem in my reasoning, please say so.

First the agnosticism. What are the usecases you have for using multiple repo's with the same modules in them? The only one I see, is that you decide at some point to create and use your own repo. In this case you should simply copy the the exact metadata from the old repo to the new repo. I agree that there may be a transition period where you use both repo's, and this can cause some trouble. However, this should be a very rare occurrence, and I would say not important enough to warrant strict agnosticism.


Then on to repeatability. As I see it, there are mainly two usecases.

  1. To to make sure all users run are using the same dependencies at the same point in time. The exact dependencies can evolve in time (controlled by the developer).
  2. To enable developers to pull out an old version of the program and examine it just as it was at the point in time it was created/used, for the purpose of bisection or reproducing an old bug.

The second scenario is not possible with the current proposal, so let me try to extend it. When a module is published to a repo, we add a timestamp. We can now easily reconstruct the state of the repo at a specific time-point. We then add an option to the ceylon run-tool to:

  • Supply a timestamp, so that the dependency resolution is performed with the then-available versions
  • Supply a specific set of dependency versions to be used; When a bug is reported, the user can supply exactly which modules where in use; This method should be even more precise than the time-stamp method.

Using this system, we are back to the unreliable clocks problem. However, I feel that it should be possible to let the publish tool pull the time from a timeserver. Even then there may be slight discrepantions because other systems like git are not time-syncronized. But do you really think this will be a very major problem?


For the first scenario we get back to the whole "manage your own repository" and "assemblies" discussion. I think that what you mean by "different needs at different times", is that some projects need "cutting edge" and others need "stable"? I don't see why it would not be possible for a library to have both a "stable" and "unstable/beta" version branch. In any case, it would be very stupid for a library to just commit unstable features to a repo without making it clear it is not stable. So I think that most projects that use well-maintained libraries, should be able to just use Herd. If they have to use a library that is not well-behaving, they can use static.

And if that is not enough, it should be possible to use assemblies. Given the above timestamp-proposal, you can specify a timestamp in the assembly descriptor. The dependencies for the assembly are then resolved with that specific date in mind. This way, you don't have to explicitly specify each dependency version, and you don't have to commit blobs to source control.

And if that is not flexible enough, then you would have to resort to using a custom repo.

I think this system should at least alleviate some of your objections. Please let me know what you think of it.

nor does it provide a patched and curated offering as is standard with yum.

Offering curation and patching is a service, and I think beyond the scope of this proposal. However, I think it is very possible for someone to actually offer that service using this proposal.

@luolong
Copy link
Member

luolong commented Jan 24, 2015

Wouldn't generating some sort of 'Bill Of Materials' from module dependency resolution solve this repeatability issue?

If you need a repeatable setup, you direct the Ceylon Module Runtime to use bill of materials instead of dynamic module resolution and problem is solved.

On the other hand, I do tend to think that while replaces keyword proposes an interesting concept, enforcing it is somewhat questionable. At best I would treat it as a hint and issue a warning that some module I depend on is replaced by another.

I would definitely not like to see replaces keyword silently overriding my declared dependencies...

@jvasileff
Copy link
Member

Wouldn't generating some sort of 'Bill Of Materials' from module dependency resolution solve this repeatability issue?

Well, I don’t think so. You would then have to track and synchronize this additional artifact among developer workstations, build servers, shipping product, etc., which would add points of failure and be a hassle. It would be better for module.ceylon to hold the canonical description of the module’s dependencies, on its own.

In addition, if you wanted to upgrade or add a dependency, you’d have to re-generate the bill of materials. But this would upgrade all of your dependencies to whatever is current at the time (like it or not). So then what, hand edits? The bill of material generation should be repeatable too!

It occurs to me that the possible implementation of this hasn’t been obvious. The idea is that for a repeatable build, you have to a) use the same inputs each time, i.e. don’t rely on a mutable database, and b) have a deterministic algorithm, which is necessary anyway. The steps would be:

  1. Download/acquire metadata (module.ceylon) for all dependencies (and their dependencies recursively) listed in module.ceylon, using the exact version identifiers specified in each module.ceylon, or fail if metadata for any dependency cannot be found.
  2. Perform version resolution using the metadata from step one, based on provides, replaces, serial no, etc. logic as envisioned in the original proposal. (I previously thought replaces might not apply here, but I suppose it would.)

As mentioned, the metadata for each module in step 1 would have to be complete on its own; it would not be permissible to use data from sources such as unlisted "bridge" modules.

For the non-repeatable-give-me-the-latest-replacements scenario, the only difference is that the metadata in step 1 would be augmented with additional module metadata available in whatever repositories are configured for the build (i.e. replacements not explicitly referenced in the original dependency tree could be considered.)

treat it as a hint and issue a warning

Yes!

@LasseBlaauwbroek
Copy link
Author

I would like to summarize the discussion. I think the text below is a fair and objective description. If anyone does not agree, I would like to hear that.

It comes down to two choices, both with different tradeoffs:

  1. Disable the replaces functionality by default.
    • This gives you, by default, repeability

    • This does not give things like "automatic" security upgrades, unless:

    • We add an option to enforce replaces by

      • Adding an indicator to the module/assembly descriptor
      • Adding an indicator to the run-tool
      • Adding a global option (I think it is questionable if it is a good idea to do this)

      The downside of this option is that you permantently disable repeatability (you could mitigate this with features from (2), but I think this would make everything simply to complicated for developers)

    • In order to be repository agnostic, we have to add an additional restriction: For each dependency, given that provides imposes a partial order on the versions, there must be a greatest element in the set of explicitly declared versions.

  2. Enable the replaces functionality by default.
    • We always have repeatability but if we want to use it, we have to include an explicit timestamp by:

      • Adding it to the module/assembly descriptor
      • Adding it to the run-tool

      Note that if you use a timestamp in the module/assembly descriptor, you can still upgrade a dependency past the specified time, by just updating its version-specifier. The dependency is then simply resolved using the method in (1)

    • Unless we turn it off by using timestamps, this gives things like "automatic" security upgrades

    • This is not repository agnostic

All other features of this proposal will be the same for both choices (such as using replaces as an upgrade advisor).

Note to @jvasileff: In your algorithm replaces is actually useless, because there must always be an explicitly defined greatest version. Otherwise it will not compile. Given a greatest verion, that one will always be selected, without intervention of replaces.

@luolong
Copy link
Member

luolong commented Jan 25, 2015

Well, I don’t think so. You would then have to track and synchronize this additional artifact among developer workstations, build servers, shipping product, etc., which would add points of failure and be a hassle. It would be better for module.ceylon to hold the canonical description of the module’s dependencies, on its own.

No you wouldn't. A bill of materials is just a list of the URLs the actual modules that were resolved to. I would make them absolute URLs and these would necessarily be pointing at the particular repositories they originated from when first resolved. Thus, you only need those repositories to be available to you containing the same artifacts, when resolving from BOM.

You would choose to generate that BOM when first building your module, store it as your build artifact and use it when you need to recall a repeatable build.

@jvasileff
Copy link
Member

store it as your build artifact and use it when you need to recall a repeatable build

Right. Thats what I meant.

@jvasileff
Copy link
Member

Note to @jvasileff: In your algorithm replaces is actually useless, because there must always be an explicitly defined greatest version. Otherwise it will not compile. Given a greatest verion, that one will always be selected, without intervention of replaces.

Right, at least for shared imports. I haven't worked through the details, but I'm guessing that on a constrained set of modules, where each is listed as a dependency, provides alone will usually result in "latest", at least for well-behaved modules. But, I think replaces would factor in for non-shared imports, when we have a replacement package available within our selection pool, due to some other module's import of the same dependency with a different version. This would of course be fine - upgrading to a replacement package is fine here since a) it is explicitly mentioned (for repeatability), and b) because of a, we can infer that it has been vetted.

Of the two options:

  1. Disable the replaces functionality by default
  2. Enable the replaces functionality by default

I would instead make the distinction by enabling or disabling arbitrary queries against the repositories. Lookup by module name + version would always be available, but searching for replacements or extra metadata would only be available in what would correspond to mode 2. This distinction would imply further properties on packaging that would ensure repeatability.

Sorry if theses notes are a bit rough - busy day and GTG!

@LasseBlaauwbroek
Copy link
Author

But, I think replaces would factor in for non-shared imports

In theory, it could indeed be factored in. I think, however, it would be an exceedingly bad idea, for much of the same reasons you give in favor of repeatability and control over dependencies. Different non-shared dependencies should never have impact on eachother. This means that they are resolved completely separately. Otherwise it could happen that on upgrading one module, a completely unrelated module also gets upgraded. That seems to go against all the properties you argue in favor of. I also think that even if different dependencies resolve to exactly the same module version, they should still be loaded twice, in different classloaders. This ensures complete safety. Perhaps this can relaxed if the dependency itself indicates that it has no shared state...

I would instead make the distinction by enabling or disabling arbitrary queries against the repositories.

I believe that characterisation is functionally completely equivalent to my characterization, and both imply complete repeatability.

@luolong
Copy link
Member

luolong commented Jan 26, 2015

store it as your build artifact and use it when you need to recall a repeatable build

Right. Thats what I meant.

If you mean to imply that this BOM needs to be synchronized across developers workstations, making this a hassle, then I disagree.

Yes, for recalling a repeatable build you need this BOM. And you need to store it at some place, to be able to recall it. But then at least you have a precise list of artifacts and their DL locations (preferably with their MD5 hash signatures, for further validation).

But without such an artifact, how can you really say, that the recalled build is exactly same as the one you are trying to recall?

Without BOM you are reduced to trusting maintainers of the repository never to modify their artifacts. And throwing replaces into the mix, you can not really guarantee any kind of repeatability.

But I have a feeling that this discussion has gone off the rail. The original proposal was about resolving module versions in a way that would introduce least possible conflict between different versions of transitive module dependencies.

I believe that provides (I would actually prefer to re-use satisfies in this context) construct will give module resolver enough information about making smart decisions about which version of the module to download, allowing to reduce number of binary artifacts to download and number of modules to load while still remaining true to the intent of the developer.

The replaces keyword seems useful, but I would not use it as a directive to silently replace a module unless specifically allowed by the user (via a command line option or an API). This is a slippery slope here, that I would like to avoid.

Repeatable builds are an issue but I believe they are an orthogonal to the problem of reliable module version resolution.

@LasseBlaauwbroek
Copy link
Author

But I have a feeling that this discussion has gone off the rail. The original proposal was about resolving module versions in a way that would introduce least possible conflict between different versions of transitive module dependencies.

I would like to make clear that the intent of my original proposal was to provide a framework that allows repositories much in the style of various linux distribution repositories. The exact social contract of these repositories (curation, patching, stability, whatever) is outside of the scope and should be determined on a per repository basis. replaces is an essential and powerfull (as explained above) part of that framework.
I believe that such a framework is very important, because while you don't trust other developers to just automatically replace a dependency, the user (being a person, company, os vendor) may not trust you to always make sure your application is patched with the most up-to-date version.

If it is decided that such there is no place for such a framework in ceylon, I'm oké with that. But I think that decision has to be made now, because it will be very difficult to change the policy of the module system later.

(I realize that you made the same point in your first post, and then I kind of agreed with you after not reading it correctly. Sorry about that)

Without BOM you are reduced to trusting maintainers of the repository never to modify their artifacts.

I think that if you have such a low trust in the people that maintain your repo, you should not be using that repo. Note that Herd enforces immutability.

@FroMage
Copy link
Member

FroMage commented Feb 16, 2015

Note that even though you're shifting the knowledge of compatibility from users (Debian, RPM, Maven) to providers (with provides) we're still limited by two things:

  • humans: library writers are famous for not specifying anything about compatibility other than binary compat. The overwhelming majority of library authors ATM rely on third-parties (packages) to specify how they interact. Perhaps we can teach module authors to care though.
  • version ranges: provides "1.0", "1.1", "1.2", "1.2.1", ... gets tiring really fast. We're still better off with provides (1.0 < 1.3).

Otherwise, in Linux, provides is used for something else, which we're also missing: implementation of specs. For example, I think hibernate and toplink each provides jpa 2 and we should really be able to do that, because then others can import jpa 2 (hibernate 3) and get jpa/2 from whatever other provider is already imported specifically (possibly toplink) and hibernate if nothing is explicitly specified.

In Linux, replaces is used for something else as well: replacing older versions of self. For example org.hibernate:hibernate-core has a replaces org.hibernate:hibernate. Or pidgin has a replaces gaim when they changed name and for equal functionality both versions should be installed. I suspect there's a use for that too in our module system.

@quintesse
Copy link
Member

I'm not sure about the "replaces". IMO that's meant for a dynamic system where at one point you might have "gaim" installed and when "pidgin" came along the system gets told that it needs to remove "gaim" if you want to install "pidgin", because having both not only doesn't make sense they might actually cause conflicts (they might have services running for example).

This makes sense in that dynamic system I mentioned where new packages are introduced each day and you want to keep things up-to-date.

I see an application as something much more static. You install it, exactly as the developer intended it, and it works.

In the case that you are a developer creating an application it's your job to find all the "magic incantations" (overrides) that make things work. I think the more we try to do things automatically the deeper we'll sink into the swamp that is malformed/erroneous metadata.

@luolong
Copy link
Member

luolong commented Feb 16, 2015

I see an application as something much more static. You install it, exactly as the developer intended it, and it works.

Not necessarily. I have in my previous life used systems (OSGi) where you installed it in one configurtation and then gradually upgraded it with new versions of bundles as project evolved. All that with virtually no downtime...

Admittedly though, the way I understand it, hotswapping modules in the runtime is a non-goal...

@luolong
Copy link
Member

luolong commented Feb 16, 2015

version ranges: provides "1.0", "1.1", "1.2", "1.2.1", ... gets tiring really fast. We're still better off with provides (1.0 < 1.3).

Not really.

The way I read into the spec, provides is meant to be transitive.
Meaning, if I have a module M version "1.0" and module M version "1.1" provides "1.0".
Then if I create another module M version "1.2" provides "1.1", then transitively also provides "1.0"

This can also be effectively used for the case you mentioned for jpa, where generic jpa version '2' module defines just the API and both hibernate and toplink modules van provide jpa "2".

@luolong luolong closed this as completed Feb 16, 2015
@luolong luolong reopened this Feb 16, 2015
@luolong
Copy link
Member

luolong commented Feb 16, 2015

Accidentally closed the issue. Reopening...

@FroMage
Copy link
Member

FroMage commented Feb 17, 2015

Transitivity is bad: it requires loading of N module descriptors.

@LasseBlaauwbroek
Copy link
Author

As said earlier, the provides list can be mechanically expanded during publishing time to a full list.

@luolong
Copy link
Member

luolong commented Feb 18, 2015

Transitivity is bad: it requires loading of N module descriptors.

N being the depth of the actual transitive dependencies in each particular case.

To alleviate this on the side of the ceylon module runtime, the repository format may need to provide metadata to module runtime for easily query those transitive relationships.

As said earlier, the provides list can be mechanically expanded during publishing time to a full list.

And no, I do not support any repository features that rewrite module metadata...

@LasseBlaauwbroek
Copy link
Author

And no, I do not support any repository features that rewrite module metadata...

Just to be clear: I, of course, don't mean to literally change the source code of the module by expanding the list inline. What I mean is to add machine-generated information to the compiled module during publishing time. This seems to be the same thing you are suggesting.

Note that this expansion is a strict requirement because not every repository will have access to all module versions all the time (the local user repo for example), and will therefore not be able to resolve the modules correctly without an expanded provides list.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants