Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Package registration process #849

Closed
StefanKarpinski opened this issue Oct 22, 2018 · 35 comments
Closed

Package registration process #849

StefanKarpinski opened this issue Oct 22, 2018 · 35 comments

Comments

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Oct 22, 2018

Before we can move from METADATA being the source of truth for registered packages to JuliaRegistries/General being the source of truth, we need a process and infrastructure for registering packages. Here's an outline of some of my thoughts on what the process should look like.

1. Request

Who: package maintainer

Package maintainer proposes a new version tag

  • perhaps via an API endpoint?

Possible data to include:

  • uuid
  • repo url
  • branch
  • tree hash
  • commit hash
  • version number

Or maybe just one of patch, minor or major and version number is automatic based on existing version numbers. This is almost certainly too much data but I wanted to mention all the possible info one might want. Ideally we'd like to automate as much of this as possible from the repo.

2. Check

Who: automated

Automated validation of whether a proposed version tag is acceptable.

Check that:

  • version number is sensible (next patch, minor or major)
  • install the package version by itself
  • run the tests for the installed package
  • test variations on dependencies
  • check if compat claims are plausible
  • reverse dependency testing
    • check semver compliance?

Produces report of results

3. Review

Who:

  • package maintainer (always)
  • registry manager (sometimes)

Maintainer review:

  • accept or reject based on report

If maintainer accepts, go to registry manager review:

  • if meets auto-approval criteria, it is fully approved without manual review
  • otherwise proposer can request manual review from a registry manager
    • abilities needed:
      • maintainer to give reasoning
      • manager to give feedback
      • manager to make a decision
    • probably good to take place on a PR somewhere

4. Tagging

Who:

  • ideally automated via GitHub API
  • otherwise manually done by package maintainer

Propagate git tag to package repo

  • should used signed tags signed by some authorized entity

Can we use the github API to create tags automatically?

What would the workflow be for non-github packages?

  • even if it’s very manual, we should have one
  • have a git repo endpoint that we create tags at
    • then document how to pull tags from it?

The tagged tree also needs to be publicly accessible but I think tagging guarantees that in git. If the tag does not match the version approved in the review step above then the version is not properly tagged and the rest of the process is blocked until the tag is fixed.

5. Registration

Who: automated

Once a new version has been approved and tagged, it can finally be registered. This consists of making the appropriate updates to the registry repo. This will be completely automatic.

@simonbyrne
Copy link
Contributor

Can we use the github API to create tags automatically?

No: unfortunately this requires full write access to the repo: https://developer.github.com/v3/apps/permissions/#permission-on-contents

As part of step 2 or 4, you will probably also want to ensure the Project.toml file is up-to-date: it is possible to ask for restricted access to just that file: https://developer.github.com/v3/apps/permissions/#permission-on-single-file

@KristofferC
Copy link
Sponsor Member

I think the steps here make sense as a long-term goal to strive for but aiming for all of this as a start feels a bit ambitious. I think we should start off with something as similar as possible to the current workflow we have. That would be the following:

  1. Request

A git tag. That is all.
Based on that, the git commit can be found, tree-hash computed, the version can be found (by looking at the Project.toml) file, the registration script that creates the commit in JuliaLang/PkgDev.jl#144 can be run, the resulting Registry can be checked for consistency and after that, a PR to the registry is created.

  1. Check

CIBot runs on the registry PR. Long-term goal is to move this to move this to be report-based.

  1. Review

If CIBot + diff looks ok, merge. Otherwise, delete tag and redo the process.

My goal with this is to be able to remove METADATA as quickly as possible and with as little work as possible. I think it is fair to say that we are extremely starved for work when it comes to these things and we should have that in mind when deciding what to do next.

@kescobo
Copy link
Contributor

kescobo commented Oct 22, 2018

I know it's early to start bikeshedding, but this sounds like Package version registration process, as opposed to Package registration. To me, the later implies new packages, no? Or is the idea that there won't be much daylight between registering a new version and registering a new package? It seems like there are a couple of additional steps for a new package, though I suppose a lot of things (like checking for name conflicts etc) are less critical with the new system.

Anyway, I agree with @KristofferC w/r/t getting moving as quickly as possible except where moving too quickly blocks paths for future improvement. I continue to love Pkg3, thanks for all the hard (and thoughtful) work!

Edit: aaaaand I now see that your message on slack was to specific people, not a general call. Sorry for butting in 😳

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Oct 22, 2018

Edit: aaaaand I now see that your message on slack was to specific people, not a general call. Sorry for butting in 😳

No worries, I wanted feedback from anyone, but wanted to make sure specific people saw it.

I know it's early to start bikeshedding, but this sounds like Package version registration process, as opposed to Package registration. To me, the later implies new packages, no? Or is the idea that there won't be much daylight between registering a new version and registering a new package? It seems like there are a couple of additional steps for a new package, though I suppose a lot of things (like checking for name conflicts etc) are less critical with the new system.

Registering versions of packages was what was intended. Registering a package is just registering the first version of it. We can have a rule like the first version always requires manual registry manager approval, which would give a chance to review the name and license and whatever.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Oct 22, 2018

I think the steps here make sense as a long-term goal to strive for but aiming for all of this as a start feels a bit ambitious.

My goal with this is to be able to remove METADATA as quickly as possible and with as little work as possible. I think it is fair to say that we are extremely starved for work when it comes to these things and we should have that in mind when deciding what to do next.

Personally, once I can see a big picture that makes sense to me and I know what each piece needs to do and be, filling in the parts becomes easier. These parts don't need to be very complex or do very much initially. The key is to get the skeleton in place so that when we want to add features, there's a clear and well-defined place to do it. Getting the shape of the thing wrong for the sake of getting something out there a bit faster strikes me as a false economy.

  1. Request

A git tag. That is all.

We already know that tagging first has serious problems. People don't and won't delete tags—and git intentionally makes tag deletion quite difficult. Saying "A git tag. That is all." is also sweeping under the rug things you still need such as the repo in question. A tag is just git's way of associating a symbolic version name with a commit. Why is making an API request with a repo and a tag name simpler then sending an API request with a repo, a version number and a tree hash? They're both equally easy for a bot to deal with; the version number + tree hash can be retried easily if it fails without screwing around with deleting and retagging in git (which you're not really supposed to do—once a tag is public it's supposed to be written in stone, which is why git makes deleting them so annoying and difficult).

  1. Check

CIBot runs on the registry PR. Long-term goal is to move this to move this to be report-based.

All that a "report" means is "the output of whatever check processes we run". Initially it can just be the output of CIBot. However, in the future when we want to check more things, this is where we put those checks and the report includes the output of whatever verification processes we run.

Perhaps you've interpreted my outline as a document of all the things we need to do in the first version? That's not the intention: it's an outline of how all the things we eventually want to do fit into a coherent process. The initial version will be a minimal subset of this outline.

  1. Review

If CIBot + diff looks ok, merge. Otherwise, delete tag and redo the process.

This can be in the form of a PR, the merging of which triggers the rest of the process. The key point is that it should require package maintainer approval and mostly not require any manual intervention by a registry manager.

The main simplification of your three-step proposal is by not putting tagging between review and registration. This allows the review to be done on a registration PR, which is nice and is how we do it today. That does seem like a good simplification and collapses the review and register steps into one step since review approval is indicated by merging the PR which is the registration step.

I still think it's better to put tagging at the end once we know that a version is valid, correct and registered. I originally had it after registration but then started thinking "but what if the package maintainer doesn't do the tagging correctly?" But that's probably a silly worry. We can give package maintainers the option of either giving the bot write access and letting it create the tag or there can be a manual alternative if they don't want to give write access. If they don't do it right, it can be fixed. Honestly, it hardly even matters if tags get created at all—they really only exist for convenience when working with git tools in a cloned package repo.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 22, 2018

Thinking further out, I think we might want to further separate this into official "channels". For example, common channels might include:

  • test / bleeding edge
  • LTS
  • community

Where the first two imply a different rate of updates, and strictness of criteria—and corresponding reliability differences—and the third largely bypasses step 3 (human review).

@StefanKarpinski
Copy link
Sponsor Member Author

I'm not really sure how that fits into this registration process proposal?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 22, 2018

A tag is just git's way of associating a symbolic version name with a commit. Why is making an API request with a repo and a tag name simpler then sending an API request with a repo, a version number and a tree hash?

I think of the tag as the canonical record of the existence of the version, as it is under the direct control of the project. The registration process is simply an external reference to that existing fact (and not the other way around). Similar to how the canonical copy of other artifacts is project-local (url, manifest, dependency list, uuid), but can then also get duplicated across an arbitrary set of other registries, webpages, applications, git repos, etc.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Oct 22, 2018

I'm not really sure how that fits into this registration process proposal?

By "thinking further out", I meant to imply this was a list of non-goals for the current implementation (which is only intended to cover the primary use case—directly replacing the functionality of METADATA.jl—and not yet providing new features).

@StefanKarpinski
Copy link
Sponsor Member Author

I think of the tag as the canonical record of the existence of the version, as it is under the direct control of the project. The registration process is simply an external reference to that existing fact (and not the other way around).

Yes, and this is why tagging has to happen after version validation, not before.

@simonbyrne
Copy link
Contributor

Yes, and this is why tagging has to happen after version validation, not before.

I agree, unfortunately it is kind of hard to fit that into the GitHub model (they really need a "Release Request" workflow).

@00vareladavid
Copy link
Contributor

FWIW I tried to outline a protocol independently and I came up with essential the same thing:

  • REQUEST: Open a PR at a registry
  • CHECK: Registry runs consistency checks (can be minimal at first) (In an ideal world can even test semver against other packages which depend on the proposed change)
  • REVIEW + TAG: Checking that a tag exists could be the final part of the review. So a PR having all green lights except for tagging signals that the registry is willing to merge. The maintainer can approve this by tagging: at which point the registry gets all green lights and merges.
  • REGISTER: Merge the registry PR

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Oct 22, 2018

I think we can have three basic components for this whole thing:

Registration bot

  • triggered by a request to tag a new version
  • immediately creates a PR to register the new version
  • runs CIBot and whatever else we want to verify
  • posts a report of the results into the registration PR
  • pings the package maintainer to review
  • approving PR review indicates approval
  • [requires no special permissions]

Merge bot

  • merges registration PRs that have:
    • approval by the package maintainer and
    • has passed all automated checks, or
    • has approval from a registry manager
  • [requires merge permissions to registry repo]

Tag bot

  • tags versions in package repos when new versions are registered
  • [requires commit access to package repos]

The system is already usable with just the registration bot. The merge bot alleviates the need for registry managers having to manually merging all registration PRs: ones that pass checks and are approved by the package maintainer are merged automatically. The tag bot is only necessary to eliminate the need for package maintainers to manually tag versions.

@c42f
Copy link
Member

c42f commented Oct 23, 2018

From a maintainer point of view, I think the current AttoBot based workflow is pretty good, the only real place it falls down is having to rewrite git tags. This is definitely an abuse of git and can easily (and by design) result in different users having the same tag pointing to different commits.

However, I'd like to point out that deleting or writing new tags with a unique name is not such a big deal. We could drive the workflow off a simple release candidate tag naming convention as follows:

  1. Maintainer writes a tag for a release candidate, with naming convention something like v1.0.0-rc1.
  2. AttoBot or equivalent picks up release candidate tag, and performs automated checks.
  3. Maintainer reviews the results of automated checks on a PR.
    • If broken, maintainer increments rc1 to rc2 and go back to step 1.
    • If good, go to step 4
  4. Maintainer pushes a final tag for the version, eg v1.0.0
  5. Bot picks up version tag v1.0.0, checks that appropriate release candidate tag exists, points to the same commit, and that checks have passed for it, and updates the registry with the new version (by merging the associated PR?)

The benefits of this are that it is all "native git", and I believe uses tags as they are meant to be used, unlike our current workflow. It also allows maintainers to have complete control over their own tags which is great for people like myself who like to write release notes in the tag annotation.

@tpapp
Copy link
Contributor

tpapp commented Oct 23, 2018

Would the "bots" be Github-specific? Can the process be later generalized to other (public) repository hosting services, like Gitlab or Bitbucket?

@c42f
Copy link
Member

c42f commented Oct 23, 2018

There's no inherent reason the release automation bot would be github specific. For example all of bitbucket, gitlab and gitea have webhooks for git pushes so integration with those systems would presumably be a "simple matter of work".

With a purely tag-based workflow, presumably even a plain git server could be integrated with the release bot using a few standard git hooks POSTing to the bot API.

@StefanKarpinski
Copy link
Sponsor Member Author

The rc-tag based workflow had occurred to me as well but it seems like potentially a lot of unnecessary tags littering the repo. Note also that this workflow requires the bot to maintain state whereas the other workflow does not. It's worth considering but it's a pretty large departure from the PR-based workflow that we have already been using and which is discussed above. Note that if you want to write tag notes, then you can still do so in a PR-based workflow by tagging manually.

@c42f
Copy link
Member

c42f commented Oct 23, 2018

Well, the rc-related tags could be deleted after release by maintainers who don't like them. It's ok to delete tags from a repository, it's just not ok to delete them and replace them by another tag with the same name.

I'm not sure where this requires the bot to maintain state? Let me restate the order of events from the bot's point of view:

  • Bot receives a POST (likely from a webhook) saying that user has created a tag
  • Bot pattern matches it and determines it's a version number, either $version-rc$N or $version.
  • Bot finds PR for $package-$version in the list of registry PRs if it exists
    • If new tag is an rc, create or update the PR
    • If new tag is a bare version and the -rc PR doesn't exist or checks have failed, fail and report an error in a visible way
    • If new tag is a bare version, the PR exists and checks have passed, update registry by auto merging the PR.

Seems to me like the PR on the registry can still be used to hold the state. The main difference here is that we'd be driving the process via immutable tags, rather than mutable github releases. I think?

@tkf
Copy link
Member

tkf commented Oct 26, 2018

Why not use a release branch instead of the release tags? Git branches are much more fluid than tags so I think it fits well with the release workflow.

This way, the registration bot can watch branches (say) release/v* in github.com/me/MyPackage.jl and initiate a PR to the registry if a new branch is created. If there is a problem during check or review process, the package author can simply do git push [-f] after the fix. Once registration bot detects a push to the branch, it updates the registration PR and the check and review processes is restarted. Having a "post-registration bot" that sends a PR to github.com/me/MyPackage.jl for merging release/v1.2.3 back to master after registering MyPackage.jl v1.2.3 would be nice.

@c42f
Copy link
Member

c42f commented Oct 26, 2018

That's a pretty good idea, it would be super easy to use as a maintainer and also addresses Stefan's worry about having a lot of useless tags lying around.

So this would address the pre-release workflow very nicely. What about finalizing the release and creating the tag? Some ideas:

  • Finalize the release (automerge the PR) when the maintainer tags the head of the version branch with the matching tag name, and all the automated tests have passed.
  • Finalize the release (automerge the PR) when an associated github release is made. (Problem: how does the repo get the appropriate tag? As discussed above a tagging bot would need full push access.)

[edited for clarity]

@tkf
Copy link
Member

tkf commented Oct 26, 2018

It probably is a dumb question, but why would you need git tag or github release? Since the registry is the source of truth, git tag and github release are not required for Pkg.jl to work, right?

It certainly is nice to have git tags to be a well-behaving git repository. But wouldn't it make sense to exclude it from the minimal requirement for the registration process, if it's not required?

Even if it's excluded from the registration process, I think we can have something like (say) PkgDev.synctags() to synchronize tags of the packages you own with ~/.julia/registries/General/. (You can even put julia -e "using PkgDev; PkgDev.synctags()" in ~/.julia/registries/General/.git/hooks/post-merge so that sync is more or less automated. edit: actually it looks like libgit2 does not run git hooks)

@simonbyrne
Copy link
Contributor

The lack of tags could be an issue if we want to support downloading tarballs from arbitrary git servers: by default they restrict the downloading of archives that aren't pointed too by a ref (i.e. a current branch or tag), e.g. https://git-scm.com/docs/git-upload-archive#_security

This isn't an issue with GitHub, as it doesn't even support git archive, but instead uses it's own API, which doesn't seem restricted in this way.

@c42f
Copy link
Member

c42f commented Oct 26, 2018

why would you need git tag or github release

You don't, strictly speaking. But you need some way for the package maintainer to signal that they've reviewed the release. And it would be extremely useful if the registration process ensured that versions were reflected natively and reliably in git tags across the ecosystem.

@tkf
Copy link
Member

tkf commented Oct 26, 2018

@simonbyrne Thanks. I didn't know about git-upload-archive. In a way, I think it's more like an optimization, as I suppose you already have a libgit2-based function to download the source tree. Also, from this view, another "optimization" (= get_archive_url_for_version you linked) already works for github which covers most of the cases. But I guess it's nice to support git archive to use it for other git servers...

@c42f Yes, I totally agree that having git tags is a good practice. I'm just suggesting to exclude it from the hard requirements since the permission requirement in github complicates the implementation of the process. Regarding the "way for the package maintainer to signal that they've reviewed the release", I think by requesting the registration the intent is pretty clear (= "please register it if there is no problem"). Is there a reason why to have a human intervention between the approval (review) and merging registration PR? Package maintainer has to fix the problem after the rejection but why not just go ahead and merge the PR if the registry manager (or a bot) say yes?

@c42f
Copy link
Member

c42f commented Oct 27, 2018

why not just go ahead and merge the PR if the registry manager (or a bot) say yes

Yes, I wondered about that. It would be nice if the registration process could be completely fire-and-forget in the case that all tests pass, given sufficiently good tests.

Having the tagging bot do the work could be optional so if you want a fire-and-forget process, you can give the tagging bot permission to do the tag for you once the tests pass. If you don't want to give it permission just write the tag yourself. If you have your own infrastructure you can run your own tagging bot. etc.

@c42f
Copy link
Member

c42f commented Oct 27, 2018

It would be nice if the registration process could be completely fire-and-forget

On the other hand, the release bot can provide valuable insight in its report which is not easily available on the developer's local machine when they create the release candidate branch/tag. (eg, interactions with other packages in the ecosystem).

@tkf
Copy link
Member

tkf commented Oct 27, 2018

the release bot can provide valuable insight in its report

I had the impression that CIBot is limited by its computing resource at the moment (e.g., by reading https://discourse.julialang.org/t/the-current-metadata-release-process/16672/15). I thought you'd want to avoid people casually invoking CIBot for, e.g., testing an alpha version. (Though it would be great if that's OK.)

Thinking about the tag-invoked release model, it's probably not so hard to use if the registration bot directly responds to the push of a tag of the right format. Then re-registration after check/review failure can be done by just a few git commands (instead of manual re-release in github UI). Then creating PkgDev.reregister() or something to automate such git operations is not so hard.

@StefanKarpinski
Copy link
Sponsor Member Author

StefanKarpinski commented Nov 13, 2018

I also started implementing the core process, which I'm calling Registrator.jl. Here are some design notes I've made on this so far:


Server is at the center

  • public: register.julialang.org
  • private: register.juliateam.com
  • client API talks to server
    • server farms out work to other machines
  • server talks to GitHub/GitLab/...
    • avoids client needing to be able to talk to all services
    • we control authentication, avoiding many possible issues

HTTP request: PkgDev client ==> register server

  • registry URL
  • package name
  • package UUID
  • version number
  • tree hash

register server:

  1. clone/update registry
  2. create branch “register/$pkg/v$ver”
  3. change the relevant files
  4. commit the changes
  5. push branch

register server ==> GitHub/GitLab API

  1. find registration PR
  2. if it doesn’t exist
    • create registration PR with info as text
  3. if it does exist
    • post info as new PR comment

HTTP reply: register server ==> PkgDev client

  • pull request link, client prints it

register server: queue up registration checks

register server ==> GitHub/GitLab API

  • create check run on PR (queued)

register server ==> CI server (request checks)

CI server: run checks

CI server ==> register server (check response)

register server ==> GitHub/GitLab API

  • update check run on PR (sucess/failure/timed out)

register server ==> GitHub/GitLab API

  • post comment on PR
    • instructing package maintainer how to tag the release
  • PkgDev client can have easy command for this
    • can get relevant info from the PR

Process that pulls tags from each repo with an open PR

  • when a tag appears
    • check that it matches the right tree
    • if it does, merge the registration PR
      To find out what repos to pull from:
  • list open registry PRs
  • get a repo for each one
    • extract from PR text
    • (or maybe commit text?)
    • (or maybe from registry metadata?)

Authentication

  • Create GitHub OAuth token

Maybe kick of with CheckRunEvent

Or maybe kick of with PR that changes version number in project file?

  • can drive the entire process from the package repo
  • linked to registry PR but no need to go there

@StefanKarpinski
Copy link
Sponsor Member Author

@vchuravy has asked

What happens if you have several packages in the same GitHub repository?

Which is a good question and one we should keep in mind.

@StefanKarpinski
Copy link
Sponsor Member Author

Possible process for triggering registator from discussion on triage:

  • listen to all PRs on registered package repos
  • if a PR changes a version number in a project file, it triggers registrator
    • the tree is the directory containing the project file
    • the name, uuid and version are taken from the project file

@timholy
Copy link
Sponsor Member

timholy commented Feb 3, 2019

Unless I'm doing it the hard way, currently the process of tagging a new release for one's own private registries is a bit painful and error-prone (e.g., I forget to update the version number in the Deps.toml, or miss an entry in the Compat.toml file, and suddenly there are weird breakages...).

I'm guessing there's some script somewhere for mapping new METADATA.jl tags to changes to the General registry. Is it possible to get one's hands on that?

@StefanKarpinski
Copy link
Sponsor Member Author

The script that runs in a loop and converts METADATA to General is here:

https://github.com/JuliaLang/Pkg.jl/blob/master/bin/loop.sh

It calls lots of stuff in the bin directory, so it's a bit complex. If you let me know what you need I may be able to facilitate something since I'm probably the person most familiar with all the sync machinery.

@iamed2
Copy link
Contributor

iamed2 commented Feb 4, 2019

At Invenia we do all operations on our METADATA fork and convert to registry format every time with sync with upstream METADATA or perform any operations on our fork. All the steps on the registry are done automatically through GitLab CI pipelines and @StefanKarpinski's code mixed with scripts that @ararslan wrote.

tpapp added a commit to tpapp/skeleton.jl that referenced this issue Feb 9, 2019
@StefanKarpinski
Copy link
Sponsor Member Author

WIP implementation of a registration server: https://github.com/JuliaComputing/Registrator.jl

@fredrikekre
Copy link
Member

Seems like this can be closed now?

janlisse added a commit to JuliaEnergy/PowerDynamics.jl that referenced this issue Aug 29, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests