-
Notifications
You must be signed in to change notification settings - Fork 329
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Release artifact build and import process #957
Comments
Good stuff. Here are my comments after my first review:
I'm in strong opposition to introducing yet another file format and dependency. The Ansible core team has standardized on YAML and JSON are are transition away from INI. (YAML is now the preferred format for configuration and inventory.) There are also too many different file formats (JSON this, YAML that, TOML over here) in this proposal already. I can only see it complicating things with no apparent value to the user or function of this system.
This file needs an extension and is too generic a filename. Also, there are too many forms of "meta" flying around galaxy/mazer that it is getting confusing. Can we just call it what it is -- a manifest?
I agree with this EXCEPT it should be pushed to Galaxy rather than GitHub. Galaxy can talk to GitHub or whatever other backend storage mechanisms we decide to support. That needs to be transparent to the user and administered by Galaxy as needed. Also should we even help publish artifacts before they've been verified as some thing Galaxy can import?
Is this really necessary if the push goes thru Galaxy? Galaxy can inspect it and collect all of that info and then push the artifact to the Github backend or whatever.
I'd like to see more in this proposal on how "subsequent updates" will be handled and resolved since Galaxy will be entirely dependent on an external system that does not have the constraints. For example, I ask for |
Tend to agree about file format proliferation in general. Though for the manifest, it could be useful to create it in a format with a canonical representation. That could make it simpler to externally verify the validity of a manifest. If a tool could reproduce the manifest from the artifact contents bit perfectly, that could make the manifest more powerful. Hard to do with yaml/json/toml though.
Is this something supported by the github auth scheme? The plus would be that in theory galaxy could create a slightly stronger link between the "source" and the built "release" (at least if you trust galaxy). One downside is that then galaxy becomes a single point of failure. If galaxy is compromised, then potentially all published releases could be compromised. |
Overall proposal sounds good to me. |
One related issue that has had some discussion about is how mazer/ansible can use galaxy content under development. Essentially, if ansible can use mazer style content directly out of a git working tree. |
File types and namesI think we're all in agreement on the format of the repository level metadata and manifest files. We'll stick with YAML and JSON. YAML for files that humans edit/maintain. JSON for anything generated. Agree on naming of the generated Manifest file too. Calling it Pushing to GitHub vs Galaxy
I like this idea, pushing to Galaxy, rather than GitHub. Travis CI takes this approach, kind of, if you look at their docs on GitHub release pushing. Travis requires the user to provide an OAuth token that has a scope of either Different from Travis, Galaxy could actually inspect the artifact, and perform static analysis on it, prior to pushing to GitHub. We just need to be very clear on what our criteria is for anlyzing/testing an artifact. We may even want to give the user the ability via Mazer to analyze and test the artifact prior to handing it off to Galaxy. Once the artifact passes through Galaxy's static analysis/testing, the Galaxy server can use the OAuth token to push it to GitHub. I think the pushing process also needs to live in Mazer so that it can be run within the context of the CI session (Travis or otherwise), and the user can see immediate status feedback. It's import that the user knows whether or not the push succeeded, so we can't simply make a web hook API call to the Galaxy server and hope for the best. |
It makes sense, we can use either YAML or JSON. It doesn't really matter. I proposed TOML in my example because it's user friendly and easy to write.
This workflow would have major drawbacks and disadvantages for end user:
|
I have some concerns regarding overall direction of this discussion. I mostly agree with @cutwater. But anyways I want to add 2c. Traditional way to organise repos At the moment, it's normal to organise repositories in the most simple and reliable way. Here is direction you go in your discussion. You propose to build a repository on the top of a distributed virtual file storage system, where you are not responsible for either consistency or data accessibility. You have limited ACLs for this storage and you may lost access at any time. You also need to consider these questions:
My proposal Wrap all these things with python setuptools and distribute them as python packages. Use PyPI or your own repo or both. |
The idea of using python package and pypi is a good idea and has a lot of benefits. Some of the previous discussions have hit on some concerns though, some more valid than others. In no particular order:
I think the list of questions to consider is good. I'll see what I can answer in another comment. |
Here's the full process we're thinking of, with examples of the tooling we would provide to make i fairly simple. Not sure this got stated at the outset, but here are some benefits of having a more formal release process:
For the purposes of what follows, we’ll assume the following:
The process consists of 3 steps: build the artifact, push the artifact to GitHub, and publish the artifact on Galaxy. The following describes these steps in detail: Build the release artifact
Push the release artifact to GitHubThe artifact will be hosted on GitHub, and indexed in Galaxy. To make it easier to push the artifact to GitHub, Mazer will provide a command that interacts with the GitHub API, and performs the push. Other mechanism for pushing an artifact to GitHub are available, and thus it’s not strictly required that Mazer be used for this purpose. Adding support into Mazer is for convenience only. To perform the push to GitHub using Mazer, the user will run the command
Publish the release artifact on GalaxyOnce the archive is available on GitHub, the user will run the command Mazer will do the following:
Galaxy server will do the following:
Galaxy server will store the following information:
As stated at the beginning, using the above information, Galaxy will:
|
This is no different than where we are today. Today GitHub hosts the content, Galaxy indexes it. It seems to work OK. It's not perfect, but it solves the problem of making it easy to find and download content. Where we want to get to is a consistent way of formatting and versioning content, and thus the desire to add a more formal process to creating releases. If our content is more consistent, and reliably versioned, then we can start building an We could pivot to hosting the content on Galaxy. It's been discussed. There are infrastructure and cost challenges with making Galaxy the host. But regardless, we still need a process for turning content into a versioned release artifact that Galaxy can work with.
Maybe. Maybe not. GitHub seems to work for 90% of more of the community. There's been some small amount of users that have asked for other public SCMs, but not enough to drive us to build support for them.
Nothing horrible has gone wrong to date, while keeping the content on GitHub. If we add support for additional SCMs, then yes, things could get more complex. Another reason to only support GitHub, or at least keep the number that we do support very small. |
@alikins This is a kind of offtopic for this discussion, but anyways I want to comment your concerns regarding distribution with python packages..
Yes, but you say that
So this concern does not count 😉 And PyPI already have packages with foreign code, for example, JS extensions for Django. It's very convenient and cool (see PROS below)
This is mostly false because
Not true.
And people do this already, f.ex. https://pypi.org/project/ansible-alicloud-module-utils/ and https://pypi.org/project/django.js/ PROS.
And I'm not forcing you to publish everything to pip (which is cool) but you can use existing utilities without writing your own stuff. E.g. import pip and just override backend url -> you automagically get search, install, update, requirements file and so on.. Or you can even propose patch to pip to be also able to install ansible stuff. And this will be super cool! |
@chouseknecht if it's just improvement for current release process than it looks sane, but if author uses mazer already, then you may check artefact before uploading
— you may execute (optional?) pre-publish step: |
And after
.. why not store it locally (at least for backup)? 🙂 |
@akaRem Thank you for participating in this dicsussion.
I like that idea, it allows us to run checks before user publishes a package and doesn't require Galaxy to manage user's repository. It'll kill two birds with one stone. |
And I want to recommend you to make backups. You may not serve stored packages, but it's better not to be in the center of scandal .. or at least it's better to be able to recover in reasonable time. |
Storing content for backup purposes only will definitely be less expensive than maintaining full featured infrastructure for content delivery. I think it's worth consideration. |
@akaRem Thank you for taking the time to make such detailed and impassioned suggestions. We really do appreciate it. Using pip has been researched and considered a few times over the years including @alikins and myself. We’ve all come to the same conclusion that it is not a good fit for the Ansible and Red Hat community as a whole. That matter is settled. This proposal is about the merits of continuing with a pull based SCM backed system for storing and distributing artifacts that better support providing verifiable versioned content in a way that is reliable and consistent with the Ansible way. There you make some excellent points we need to take into consideration and work into whatever we come up with. It would be really helpful if we could focus the conversation there. Thanks for your continued feedback. |
Thanks for writing up the proposal. I was discussing it @bmbouter today and we had a few questions around how it will work.
Thanks. |
Good questions. Here are my thoughts...
Yes, it will coexist. Don't know for how long, but definitely at the start, and possibly for a good while.
I think what you're suggesting is, 'Why not use an RPM spec file, or something similar?' I think we're trying to hold true to the original concept of a role, which is that a role = an SCM repository of YAML files. However, instead of limiting a project or repository to one single role, we're allowing for it to contain multiple roles, and potentially Ansible modules and plugins. Here's an example of what we're contemplating. As an Ansible content developer, I should be able to create such a project, and consume it's contents in an Ansible playbook directly, without having to un-package or install it. I think that's why using a packaging tool like RPM doesn't fit. Otherwise, I think you're right.
I think we specifically don't want to encourage pulling pieces of repos together. We may enable repository level dependencies, where you could, for example, have repository B dependent on repository A, and reasonably expect the installer to also install all of repository A when asked to install repository B. What we're trying to solve is a world where third party partners can deliver their collection of Ansible content in a package. One example might be the Microsoft Azure modules that currently live in ansible/ansible. We would like it to be possible, and easy, for Microsoft to ship that suite or collection of modules via Galaxy, rather than have them baked into the Ansible source. They should be able to ship a suite of modules, plugins, and roles together, in which case we think it makes sense to have it all versioned in unison.
We want the tooling to move independently and potentially faster than Ansible. We also want room to experiment with stuff, and not have it end up baked into a supported release of Ansible that we then have to live with for a number of years. Once something lands in an official Ansible release, it's hard to remove it. |
Following up on the discussion with @daviddavis, @bmbouter, @alikins and @cutwater... We decided the following:
Just to be clear, we're not forcing contributors to use this process day one. Galaxy will continue to support the existing import process that relies only on GitHub repositories. This new process will be optional. Consider it the first phase in moving Galaxy toward hosting content. |
There is an ansible proposal for standardizing how role install requirements files work, notably establishing that install-time role requirements will live in a requirements file at meta/requirements.yml The requirements file mentioned here is based on the requirements file as described at https://galaxy.ansible.com/docs/using/installing.html#installing-multiple-roles-from-a-file For example a role that now includes a meta/main.yml and some requirements YAML file somewhere in the role, would standardize on meta/requirements.yml. Since the info the meta/requirements.yml is installation requirements, that info could also be included in galaxy collection artifacts. The install requirements for a set of roles could be populated into the collections MANIFEST.JSON For example, a collection like this:
{
"collection_info": {
"namespace": "some_namespace",
"name": "my_collection",
"version": "11.11.11",
"format_version": 0.0,
"author": "Cowboy King Buzzo Lightyear",
"license": "GPLv2"
},
"# Don't sweat the details of the requirements data,
# this is just a strawman example. It could end up
# being simpler or more complicated.": [],
"install_requirements": [{"requirement": {"name": "geerlingguy.nginx",
"version": "1.2.3"},
"needed_by": "roles/some_role_a"},
{"requirement": {"name": "testing.ansible-test-content",
"version": "1.2.3"},
"needed_by": "roles/some_role_b"}],
"format_version": 0.0,
"files": [
"# lots of file info here"
]
} So ansible/proposals#57 is something that we likely want to support with mazer build, and eventually by 'mazer install' as a way for it to resolve collection requirements and deps. |
'meta/requirements.yml' may also be something we want to support at the collection level. For example, a collection:
In that case, my_collection/meta/requirements.yml could be - some_namespace.my_collection_utils and my_collection/roles/some_role_b_that_needs_module_from_collection_foo_bar/meta/requirements.yml - foo.bar (or perhaps have a way to indicate a install requirement is a collection or a specific role, if we want to Those would get combined into 'install_requires' in MANIFEST.json by 'mazer build' |
Think we resolved this with the introduction of Collections. Closing. |
Background
There are two models of building content: “push” and “pull”. In a “push” model, the user builds an artifact (e.g., software package, content archive, container image, etc.) locally, and pushes it to a content server. In a “pull” model, the content server downloads or pulls the source code, and builds the artifact for the user. In both models, there are defined procedures, formats, metadata, and supporting tooling to aid in producing a release artifact.
Most popular content services use a “push” model, including: PyPi (Python packages), Crates.io (Rust packages), and NPM (Node.JS packages). For these services, the content creator transforms the source code into a package artifact, and takes on the responsibility of testing, building, and pushing the artifact to the content server.
In rare cases content services take on the process of building artifacts. Docker Hub is one such example, where a content creator is able to configure an automated build process. The build process is triggered by a notification from a source code hosting service (i.e., GitHub or Bitbucket), when new code is merged. In response to the notification, Docker Hub downloads the new code, and generates a new image.
Problem Description
The Galaxy import process works as a “pull” model that can be initiated manually via the Galaxy website, or triggered automatically via a webhook from the Travis CI platform. However, unlike other content services, Galaxy does not enforce an artifact format, does not provide a specification for artifact metadata, and does not provide tooling to aid in building a release artifacts.
When it comes to versioning content, Galaxy relies on git tags stored in the source code hosting service (GitHub). These tags point to a specific commit within the source code history. Each tag represents a point in time within the source code lifecycle, and is only useful within the context of a git repository. Removing the source code from the repository and placing it in an artifact causes the git tags to be lost, and with it any notion of the content version.
Galaxy provides no concept of repository level metadata, where information such as a version number, name and namespace might be located and associated with a release artifact. Metadata is currently only defined at the content level. For example, Ansible roles contain metadata stored in a
meta/main.yml
file, and modules contain metadata within their source code. Combine multiple content items and types into a single release artifact, and the metadata becomes ambiguous.The Galaxy import process does not look for a release artifact, but instead clones the GitHub repository, and inspects the local clone. This means that any notion of content version it discovers and records comes directly from git tags. It’s not able to detect when a previously recorded version of the content has been altered, nor is it able to help an end user verify that the content being downloaded is the expected content. It’s also not able to inspect and test release artifacts, and therefore can offer no assurances to the end user of the content.
Since it doesn’t interact with release artifacts, as you might expect, Galaxy offers no prescribed process and procedures for creating a release archive, nor does it offer any tooling to assist in the creation a release archive. The good news is, Galaxy is a blank canvas in this regard.
Proposed Solution
Define repository metadata and build manifest
A repository metadata file, galaxy.toml, will be placed at the root of the project directory tree, and contain information such as: author, license, name, namespace, etc. It will hold any attributes required to create a release artifact from the repository source tree.
The archive build process (defined later) will package the repository source contents (e.g., roles, modules, plugins, etc.), and generate a build manifest file. The generated manifest file will contain the metadata found in galaxy.yml, plus information about the package structure and contents, and information about the release, including the version number.
The generated manifest file will be a JSON formatted file called METADATA that will be added to the root of the release artifact during the build process. Consumers of the release artifact, such as the Galaxy CLI, and the Galaxy import process, will be able to read the manifest file, and verify information about the release and its contents.
Enable Mazer to build packages
Given a defined package structure and a process for building a release artifact, it makes since to build the necessary components into Mazer that automate the artifact build process.
Use GitHub Releases as content storage
GitHub Releases will be the mechanism for storing and sharing release archives. GitHub provides an API that can be used by CI platforms and Mazer to push release artifacts to GitHub.
Mazer will be extended with the ability to push a release artifact to GitHub. This provides a single, consistent method for content creators to automate release pushes that can be called from any CI platform.
Notify the Galaxy server when new release artifacts are available
On the Galaxy server, add the ability for users to generate an API token that can be used by clients, such as Mazer, to authenticate with the API.
Extend Mazer with the ability to trigger an import process. Mazer will authenticate with the API via a user’s API token, and trigger an import of the newly available release.
Verify release artifacts
Enable Mazer to verify the integrity of release artifacts downloaded from GitHub at the time of installation.
There are several solutions widely used for verifying the integrity of a downloaded artifact, including checksums and digital signatures. In general, a checksum guarantees integrity, but not authenticity. A digital signature guarantees both integrity and authenticity.
Using a digital signature for user content requires a complex process of maintaining a trusted keychain, and still does not guarantee perfect authenticity. Since release artifacts are not hosted by Galaxy, but rather by a third party, it’s impossible to perfectly guarantee authenticity.
However, since Galaxy is a centralized packages index, and data transfer between the Galaxy server and client is secured via TLS encryption, Galaxy can be considered a trusted source of metadata, and integrity verification can be achieved by storing release artifact checksums on the Galaxy server.
During import of a repository, Galaxy will store metadata, including the checksum, for a specific content version only once. Any subsequent updates to a version will be prohibited.
Import workflow.
Install Workflow
mazer install
command to install an Ansible collectionThe text was updated successfully, but these errors were encountered: