-
Notifications
You must be signed in to change notification settings - Fork 9
Future of the "ansible" package #82
Comments
ansible
package
Very early on in the discussions about the collection split, the intention of the I am in favor of actually deprecating the I've witnessed a large number of cases where users were happier with smaller intentionally installed collections. In fact, we've seen that even just with |
Copying my Reddit comment over here: No matter what, I still see great value in a kitchen sink, because being able to install Ansible, then run a playbook, no additional steps, is pretty huge for me, and very beginner friendly. If eventually I'd have to introduce execution environments, dependency management, etc just to introduce a new user to Ansible automation... they'd more than likely stick to shell scripts or raw python/perl/whatever because those just work and you don't have to cosplay as a node.js dev just to get your first automation running. I think it becomes too big when it takes more than a minute or two to install on an average (100 Mbps?) internet connection on an average computer (eg i3 or MacBook Air type computer). I think we need to be careful with how far we swing the pendulum to comfortable/existing Ansible users' use cases, unless we're happy with the size of the Ansible community and want to slow down growth any more. From a technical standpoint, I would rather there just be core + collections. I was also an early advocate for Drupal doing something similar with a "smallcore" initiative, where all modules were ripped out of Drupal core. Drupal ultimately decided against that but had its own different issues with growth. In either case there's a huge tradeoff between maintenance burden and beginner-friendliness. |
Thanks for opening this @Andersson007 and for thoroughly laying out the options with pros and cons, you've already covered a lot of points I would have made so it's nice to start from that point. I feel strongly against options 2 & 3. I would be in favor of option 1, or of @sivel 's suggestion of ceasing the ansible package's existence. Ansible Package ProsI should point out that in terms of usage, both what I do and what I recommend, is to install core and separately install collections. That being said, as mentioned, the package is a very strong motivator for creating Ansible content and creating it in a way that adheres to a set of standards designed for stability and long term maintenance. In that way, inclusion becomes a strong motivator for the community to ensure that the ecosystem around Ansible is full and high quality. We are all power users of Ansible, so I think we naturally tend toward installing collections separately, but for many users, that's not as palatable; even if they are already installing separately, I think users feel much more comfortable installing from the curated list of collections than they do from collections that aren't included. In environments where taking new dependencies or installing new packages falls under security review, this is huge: it's far more likely to be easier to have the ansible package and its included collections reviewed and approved at once than it would be to get a number of all independent collections reviewed (whether you think that's a good security move or not is up for debate, I'm only speaking practically). SummaryIn my opinion the pros of the package are by far more about encouraging participation, contribution, and trust in the community. and ecosystem, and less about convenience of having everything in one package, however it's the "one package" that motivates contributors: knowing that their content will be in users' hands, and that it's more likely to be used. Cons of Option 2The cons already stated pretty much cover it. We'd keep the "trust" aspect for end users, but will significantly stifle growth and contribution. We will kind of go back to everything being in core, from a community POV, it's just that the package will be the new core. People will endeavor to contribute to an existing included collection or not at all, collections will expand in scope, which goes against the point of them, and as stated, this will disproportionately affect the likes of c.g, making it more of a kitchen sink than it is. The "blessed" thing really doesn't feel good, from a community POV. Cons of Option 3For me, the big question with narrowing the criteria, is all about subjectivity vs objectivity. The current inclusion criteria are very objective, and my subjective opinion is that that is a good thing. I think that introducing subjective criteria is a big decision that can't be made lightly, and I'd be against that personally, but I think if we wanted to add subjective criteria, there should be a vote specifically for that, even without considering an actual criterium; the reason being that we would need to change the way we review collections. I'll try not to go too far down that road, but for example, we'd want to be sure that any subjective criteria are decided and voted on early, and quickly, because putting maintainers through the paces of meeting all the objective criteria only to be blocked later by opinion is not a good look. I'm especially wary of the example requirement of being "generic", as already stated it's easy to look at tech areas we aren't familiar with and decide it's too niche. I'm also not a fan of popularity being used as an inclusion criteria because the trajectory of a collection being developed and published to galaxy independently and becoming super popular before being included seems rather unlikely. It's far more likely that something a few people find useful will be used by more people as they need it, because they found there was already content for it alongside the rest of the Ansible content they use. On option 3, I'd still be open to adding more objective requirements if we think that's useful, but it's important that we do purposefully. Concerns with maintaining the package
IMO, size (as in bytes) is not a problem. Install time is the main thing I've heard as a negative, and that is a concern.
This is a very real concern.. it takes a lot of time and effort, that we'd probably rather be spending on other things. There's no getting around the burden to review or deal with unmaintained collections. What we're already doing about that:
Some points about inclusion review:
Other thoughts
AlternativesInstead of a package, "inclusion" can be publishing in a "trusted"/"official" galaxy space/server, somehow separate from unincluded collections. This would be akin to the "official" package repositories in a distro for example. I realize this isn't feasible without changes to galaxy or other things but looking to think about this conceptually and not get into the technical hurdles. Pros
Cons
|
I agrree with @geerlingguy Cons of Option 2
I cannot think of any valid pro argument here. When there comes a new whidespread, heavily used collection, then it won't be included,... just because it does not exist in time? Option 3The inclusion criteria shouldn't only focus on code quality etc. |
From using I'd rather see I'd much rather see better tooling around testing collections and getting more data and quality there than a "just install |
Considering option 3, I think the cons you've mentioned outweigh the pros. So I think this would be the worst approach. Although this depends on how you "narrow the door", that is how the new requirements would look like exactly. To be fair, they should also be applied to the collections already included. That is some might have to be removed. But this leads to a new discussion. So for me it's a choice between option 1 and 2 at the moment. Option 2 sounds a bit unfair, since it means some collections are part of Ansible because they're old enough, and some aren't because they're too young. In the long run, I think option 2 should only be considered if you plan to deprecate the package. The steering committee spending a lot of time to review collections might be a problem. On the other hand, this might improve collection quality (what you called "Motivation to create content satisfying the collection requirements.") I think this is a very strong point in favor of this option. Although it's easy for me to say this because I'm not part of the steering committee that has to do the work. I also agree with @geerlingguy that an "all batteries included" package is more beginner friendly. One additional thought about installation via galaxy: Some environments (like ours) don't have direct internet access. We wouldn't be able to do this. Luckily we do have a PyPI mirror, so having all collections in one package is very convenient for us. Spinning up a galaxy mirror wouldn't be that hard... technically. But there's a lot of politics involved (IT security, compliance...) which takes time. So if you want to go into this direction, that is force users to install collections via galaxy, you should give people enough time to find a solution how to do it in their special environment. |
First and foremost: I cannot even consider the possibility of the I am convinced it would harm adoption as well as the community if we did that. The numbers speak for themselves: Source: https://pypistats.org/ -- I think it's OK if there are users who would rather install No one is forcing users to install the "kitchen sink" if they don't want it and so I don't see the ansible-core and ansible packages as mutually exclusive. These options can and should continue to exist to cater to the different needs and requirements of our users. I could make a parallel to Linux distributions: some are minimal, some are batteries-included, some have a different focus, some are stable and some are "bleeding edge" -- you know, sometimes you end up with the entire suite of LibreOffice installed even though you don't need it but it's good to have options. Now, I don't mean to say that the current state of the ansible package is perfect. I must address the concerns around bloat and installation performance some have mentioned because those are legitimate.
That doesn't mean we should stop there -- we should always strive to improve and do better.
In other words, we should be more proactive in making sure that what we ship is maintained and I hope we can do a better job with that in the future. I have other thoughts but this comment is already long enough, I will pause for now :) |
I don't really need to get into this here, but I don't think those numbers are really showing us anything, other than that people still are using Many people also likely don't even realize that the packages are split, and have just been doing the same ol' thing they've always done. On a different topic, there is always the time and resources required to manage the |
I'm not sure about this. It's possible that a lot of people realize this, but they still keep installing ansible from PyPI because it's more convenient for them. It's hard to be sure here.
That's definitively an important point. |
This is a pretty big assumption to make: the data does not specify which versions of the packages are downloaded, only the python versions. I am curious and I will try to find out if package versions are part of the PyPI package downloads dataset because it would be interesting to know.
This work is heavily sponsored by Red Hat but there are a number of ways where we limit the burden of maintenance and support in favor of sustainability, including:
As the one currently doing the releases of the ansible package, I must say that it's about 30 minutes of work to build, test, review, release and send an announcement email out. Building and testing the package itself takes less than five minutes and it's an ordinary ansible playbook. There are various improvement opportunities to make this even less work. I don't intend to discount the helpful contributions from the community and in fact I must particularly credit Felix Fontein and Toshio Kuratomi for their effort in laying the foundation and working on the tooling to make this process relatively easy. Alas, the topic is not just about building and releasing the package but I wanted to say that I am not worried about this particular part of the equation. We can talk about the time and resources spent reviewing collection inclusions, meetings, discussing issues like this one or initiatives such as the steering committee, sure. |
It appears that package versions are part of the data set but are unfortunately not implemented on pypistats.org (see this issue). Here's a breakdown for ansible-core (link) and for ansible (link): There's still a significant amount of users on ansible 2.9 and it is interesting to see that a lot of people are actually holding onto ansible 4.10.0 as well as ansible-core 2.11. If I would have to take a guess, it could be due to the new python3.8 requirement in ansible 5 which started shipping ansible-core 2.12. Edit: I figured I would validate my guess and it seems to correlate with the data from pypistats on python versions, there's still a lot of python2.7: |
Well, I don't usually need networking libraries when I'm doing cloud work. But, if I'm not doing cloud work I don't need those libraries either. The current install brings everything. This might actually be better handled with package Where And if someone really needs all the packages - like maybe setting up Tower - then I'd Anyway, just my opinion. |
@dmsimard I always wonder when seeing such download indicators how many of these come from CI systems (and which percentage comes from Ansible collection CIs :) ), and how many are actual users. |
It would be interesting to know, yes. IMO the absolute numbers aren't as relevant as the trends and the differences between packages and versions. Maybe 50% is CI, who knows, but it averages out so you can have a general idea. |
Well, Zuul uses Ansible as it's CI/CD system so... https://zuul-ci.org/ |
I am familiar with Zuul, though I wonder how much of an impact it would have on the numbers considering that, at least for the openstack/opendev community, they run local mirrors. |
+1 to all of this. I firmly agree with @geerlingguy Ansible got to where it is today in terms of popularity and organic usage growth via the kitchen-sink package. I think a large part of that is because the first-time-user has a very low barrier of entry and having baked-in support for a bunch of things the user cares about definitely contributes to that first impression and ease of use. Ansible Core is great, wonderful, and powerful but without a bunch of content it doesn't actually do a whole lot for functional automation that users know and love. The migration to Collections was necessary, I don't question their merit, nor would I attempt to debate that here. However, I think it would a grave detriment to Ansible as a whole, the community, the userbase, and the project to get rid of the Option 1 is my preferred option Option 2 shouldn't even be considered, this is intentionally stopping growth and it feels strange to discuss it Option 3 has the potential to be a slippery slope because the criteria are hard to quantify what is/isn't important or widely used enough to make the cut purely because we can't know that what specific thing might spark a new user's interest but if implemented correctly this could be a reasonable option For whatever it's worth, that's my $0.02 |
I'd love to see some data to support this assertion. From my perspective and own research with publicly available data (at the time), the vast majority of playbooks use a small subset of modules and those were the ones we decided to leave in core. https://tannerjc.net/ansible/galaxy.html I'm sure non-public playbooks could paint a slightly different story though, so i'm open to being re-educated. I'm just not a fan of false correlations to ansible's success based on emotion. My vote obviously is for option 4 as proposed by sivel ... "ACD" as it was originally called, is unnecessary. I've seen multiple arguments made in this ticket and elsewhere that users have "difficulty" knowing what collections they need. Wouldn't those same users also have difficulty installing the python and system package dependencies for those collections they supposedly know nothing about? Sure execution environments help there ... but only in the context of modules that aren't run on remote systems. The execution environment doesn't magically ship python and system deps onto the remote machines for all those modules. At some point, the users have to understand what their playbooks are doing, what modules/collections are in use and how to provide the right dependencies for those plugins on the controll host or all the remote hosts. There is no escaping that reality. The way things are going now [pardon my slippery slope] in 20 or 50 years, the ansible package will be a museum of modules that configure systems that are no longer manufactured, for companies long defunct and for technology nobody cares about anymore. If you all want to persist in maintaining a dumping ground for irrelevant tech and trying to maintain compatibility with dead systems, by all means. I hope you have the time and the budget to carry all those dead python libraries into the future. |
That's one take on it—the other take is look at how colossal Microsoft has become, maintaining support for things that are 20-30 years behind the times. And think of networking deployments, where I still see ancient Cisco switches in use from the late 90s ;) Breaking existing use cases and backwards compatibility are two ways to quickly disenfranchise users. I won't say 'don't do this, it's definitely the wrong thing', but I will say my prediction is if a batteries-included Ansible distribution goes away, growth will slow more rapidly than it already has. That might not be a bad thing, from the perspective of the business goals for Red Hat / IBM / Ansible, especially if the team of paid developers would mutiny were it not to happen (that's a serious consideration—Ansible as a whole dies if the ansible-core team disbands). But I do have a pretty strong feeling that it will slow adoption of Ansible where it's not already in use. |
I have no real desire to inflict emotion on it, but my point is that Ansible got to where it is as a batteries-included tool. This attribute is also something that Ansible Inc and Red Hat post-acquisition spent a considerable amount of marketing dollars highlighting. My point was simply, the thing that exists as kitchen-sink is literally the thing that got us where we are today. I don't want to decide to throw it away with limited data sets that enforce a narrative that could be flawed purely because we don't have the right metrics. It could also be correct, but I don't know and I don't think we have enough data to know for certain. If we screw this up, it could be at the detriment of Ansible as a whole. I worry "what's in Galaxy as re-usable generic roles" is potentially selective sampling compared to what would actually be in a playbook to handle the more grungy aspects of the logic required to setup site-specific requirements.
This isn't a new problem though, this has been a known quantity since the dawn of Ansible. It's not a net new challenge that would be solved by throwing away the kitchen-sink, the same can be said for modules in Core... it's just a smaller set of things that need dependencies.
This is a completely fair concern. |
Fair point. My report also included every single repo on github that included what looked like either a playbook or a role. Your point still stands though and that's why I mentioned "publically available" data.
Very true, and for the most part, modules in core typically only require dependencies already found on the system. Core also has a deprecation path is used quite often to purge old features/functions. When something isn't useful anymore or it goes against the grain of how core wants something to work, it can be taken out. We went through the debate over "kitchen sink" numerous times during the collection migration journey. It was heated and contentious and most of the time, it felt like only core wanted it to happen. We even had a group or set of people or person(s) who were insistent that we couldn't leave -any- modules in core so that it would be "crippled" without at least one collection, thereby people might be more inclined to pay or acquire a fully assembled distro/solution. What I learned through that process, is that there are -many- individualized definitions of what "ansible" is. I know that from the beginning of the company, Ansible's marketing exploited the module count as a growth hacking metric. I don't know whether that was because they didn't understand enough to talk about other things or thought their audience wouldn't understand any of the other cool stuff that got into ansible, but it was an over-valued metric in my opinion. I wonder if the linux kernel would have been served well by having marketing announcements that only touted the number of new drivers included, but that's an aside. I also experienced first hand the way including new modules in core was used as political capital to make inroads into other groups inside Red Hat. It led to strife over who's modules got merged and who's didn't and whose marketing deadlines would be met. It slowly became a choice the core team wasn't allowed to make anymore, even though the only choice being made was "I can't test this proprietary (or paid for) thing in CI, so how can I bring it into the project?". Marketing became the governing body of the project. On top of all that, it was constantly assumed that the core team was responsible for any of the modules that didn't work. It didn't help that at Red Hat, we used to have posters on the wall that said "we ship it, we support it". It was a very untenable situation. Things had to change, and the moving to collections was the first time we had a real opportunity to do something about it. Could have done the same thing with roles (minor technical issues aside), but there was too much resentment of roles due to the "1000 nginx roles" controversy. All in all, I see ACD aka "ansible" as a continuation what some people think "ansible" is and all of the same problems. I highly commend the community team and the volunteers for how they have curated the collection of collections and tried to ensure some standards. Although, I don't personally think the standards will ever be high enough to weed out the cruft that is going to recreate the "1000 nginx roles" problem. I asked in a community meeting once, what criteria could actually cause a module or a collection to be taken out of the distro, but there was no answer. I really wish the community at large could see the problems like I do, but i know that's not realistic. I think it would be incredible though if less effort was spent on growing the module count, and instead more effort was spent on new language features for core and it's related commands (ansible-playbook, ansible-galaxy, ansible-test, etc). Again, I think the community folks have been awesome with helping fix bugs in core in relation to collections. |
@jctanner you certainly have a perspective most of us don't and that's valuable; I appreciate the peek into how things were within RedHat and how that may have affected Ansible. But I don't see how you can reasonably presume this statement if you've been involved in the community recently.
We've talked about removing collections from the package for a while, and as we've gotten to the point where we literally have broken / abandoned collections, we are filling in the details and preparing to do exactly that. Although today isn't the first meeting where it has been discussed, it was the major topic of today's meeting, spurring follow-up issues and reviving discussions in existing ones:
I find the idea that we're all going to inexplicably support and look to keep decades of accumulated cruft for long-dead companies and technologies kind of silly; I haven't seen anyone in the community trying to argue that we should keep unmaintained or broken collections in the package like some kind of technology hoarder.
Putting aside the fact that roles lend themselves to being opinionated which naturally leads to many variations, I think the package is just about the best option we have for preventing this problem. I think contributors are far more likely to want to add something to an existing collection that's included, than they are to publish a new one or contribute to a collection that's not included in the package. |
@jctanner I appreciate your candor and willingness to share some history for the broader community to consider, you have brought forth extremely valid concerns and highlighted the nuance of the module count being a magical metric that is unknown whether or not it actually contributes to success. Also, the ambiguity that has been introduced to the brand "Ansible" because some marketing team(s) chose to put the name on many things has absolutely conflated the issues outlined above. I also think there needs to be a proper deprecation process for the kitchen-sink to remove things, something as simple as health metrics on the code could work such that if working stale code continues working then it's fine to leave in but if not working code is abandoned then after some accepted period of time, it is deprecated. That's likely a bigger discussion for somewhere else though. This all brings up a very interesting question that I think the Ansible Community should somehow quantify and answer for our collective selves, "what does success of Ansible mean and what is the desired future of the project?" in terms of what metrics matter, what are we attempting to accomplish, what's the future beyond just the If we can answer the larger grand questions, then that can advise what should be done here in an objective and informed way. It's entirely possible the scope of the question, "what is the future of this software package?" is too narrow. |
Just a quick note thrown into the round. My company needs and relies on "site local installs" this means caching/provisioning of packages. Ansible galaxy itself doesnt offer a convenient way to proxy such installs for us. Only scripted provisioning for collections is possible which can be cumbersome to maintain versions up-to-date etc. Overall package installation should be easier to point to a proxied repository (e.g. remote location containing the |
My 0.002 cents: I think a collection of active/popular/highlyused/curated collections is very helpful for the time being, especially for new ansible users. I do think there needs to be a process to add/remove things from it to keep it useful. |
Here's a link to the Reddit discussion btw: https://www.reddit.com/r/ansible/comments/tjfbqx/future_of_ansible_package/ |
(Posting from Reddit as well, because I think its important to have it here). I heavily agree with @geerlingguy In the 7 years I've been using Ansible having the batteries included approach has made is so much easier to just start using it and keep using it. I'm now trying to introduce it at my current company, where they have a custom written config management (written back around 2008ish). Having the batteries included option is a big selling point to them. So my preference is #1, and #2 is right out, with #3 as a distant second. |
Thanks everyone for your great feedback! The Community and Steering Committee will continue to include new collections! |
Summary
Future of Ansible package
Currently Ansible 5 consists of
ansible-core
plus ~100 collections.The number of included collections per release:
So far we've been including new collections submitted for inclusion satisfying the collection requirements.
What is the future of the Ansible package?
Considering the past couple of years, we can expect the package will grow by 10-20 new collections every year.
Main questions
kitchen-sink
if all of the shipped collections can be easily installed with the ansible-galaxy one-line command? (besides backwards compatibility issue with 2.9 but no new collections are needed for that)Possible options:
1. Keep things as-is (as was mentioned considering the current trend we can expect up to several tens of new collections per year).
Pros:
Cons:
Questions:
2. New collections are not allowed to get in (users can easily install them with ansible-galaxy if they need it).
Pros:
Cons:
Questions:
3. Keep the possibility to get a collection included but "narrow the door" by introducing new requirements for collections, for example, to be generic. Example, a collection that works with DNS or other widely used service, can get in but a collection that serves a rarely used devices in specific areas can't get in.
Pros:
Cons:
Questions:
4. (suggest your option)
Any ideas?
The text was updated successfully, but these errors were encountered: