Skip to content
This repository has been archived by the owner on May 14, 2024. It is now read-only.

Future of the "ansible" package #82

Closed
Andersson007 opened this issue Mar 21, 2022 · 28 comments
Closed

Future of the "ansible" package #82

Andersson007 opened this issue Mar 21, 2022 · 28 comments

Comments

@Andersson007
Copy link
Contributor

Andersson007 commented Mar 21, 2022

Summary

Future of Ansible package

Currently Ansible 5 consists of ansible-core plus ~100 collections.
The number of included collections per release:

  • 2.10: 81
  • 3: 88
  • 4: 93
  • 5: 99

So far we've been including new collections submitted for inclusion satisfying the collection requirements.

What is the future of the Ansible package?

Considering the past couple of years, we can expect the package will grow by 10-20 new collections every year.

Main questions
  • What is the purpose of the Ansible package?
  • Is this a problem or not? Can this become a problem in the future? If yes, why (criteria)?
  • Is there a point to deliver a kitchen-sink if all of the shipped collections can be easily installed with the ansible-galaxy one-line command? (besides backwards compatibility issue with 2.9 but no new collections are needed for that)
  • Should we stop including new collections or shouldn't? Pros cons?
  • ...(suggest yours).
Possible options:
1. Keep things as-is (as was mentioned considering the current trend we can expect up to several tens of new collections per year).

Pros:

  • Motivation for developers to create new collections which can become a part of Ansible package ("so popular and important in IT world", etc).
  • Motivation to create content satisfying the collection requirements.
  • Motivation to maintain included collections to avoid kicking them out from the package "I can't show off with my collection any more..:("
  • ...(suggest yours)

Cons:

  • Bigger and bigger package. Note: Is size a problem? If not yet, when it can become a problem? Do we have a size threshold?
  • Delivering a kitchen-sink - most of shipped collections people don't need.
  • Steering committee spends a lot of time to review collections, to discuss what to do with unmaintained collections; package maintenance burden.
  • ...(suggest yours).

Questions:

  • ...(suggest yours)
2. New collections are not allowed to get in (users can easily install them with ansible-galaxy if they need it).

Pros:

  • The package will stop growing (with new collections; it'll continue to grow with new modules).
  • It would probably encourage developers and users to leverage/focus on Galaxy.
  • ...(suggest yours).

Cons:

  • Maybe many possible developers will loose motivation as their collections cannot become a part of the Ansible package any more.
  • No such a strong motivation to create content satisfying collection requirements (Note: the Steering Committee can't afford to review collections super thoroughly, especially big ones, so the quality of the collection we're including can also vary from module to module).
  • The above two points can hypothetically lead to loosing interest to contributing to the project in general. I.e., impossibility to get in can make new content contribution less attractive.
  • The above can lead to growth of community.general/community.network collections with new modules as the only way to get in.
  • ...(suggest yours)

Questions:

  • What to do with included collections? They will feel like "blessed".
  • ...(suggest yours)
3. Keep the possibility to get a collection included but "narrow the door" by introducing new requirements for collections, for example, to be generic. Example, a collection that works with DNS or other widely used service, can get in but a collection that serves a rarely used devices in specific areas can't get in.

Pros:

  • The package will grow much slowly.
  • Probably only generic/very popular collections will get in.
  • Possibility to see your collection in the package exists, so the motivation to create new collections satisfying the requirements exists.
  • Motivation to maintain included collections to avoid kicking them out from the package "I can't show off with my collection any more..:("
  • It would probably encourage developers and users to leverage/focus on Galaxy.
  • ...(suggest yours)

Cons:

  • Any such criteria (e.g., a requirement to be generic) is pretty subjective. "I'm not familiar with this stuff, so for me it doesn't seems generic enough"
  • It's hard to measure abstract things.
  • Currently included not generic collections will seem "blessed"
  • Can lead to growth of community.general/community.network collections with new modules as the only way for non-generic modules to get in.
  • ...(suggest yours)

Questions:

  • ...(suggest yours)
4. (suggest your option)

Any ideas?

@gundalow gundalow changed the title Future of Ansible package Future of the ansible package Mar 21, 2022
@gundalow gundalow changed the title Future of the ansible package Future of the "ansible" package Mar 21, 2022
@sivel
Copy link

sivel commented Mar 21, 2022

Very early on in the discussions about the collection split, the intention of the ansible package was to be a temporary stop gap, to offer an easy backwards compat way to get you what pre 2.10 looked like. For various reasons, this plan fell to the wayside, and we kind of moved towards what the ansible package has become.

I am in favor of actually deprecating the ansible package, and moving towards explicit collection installs. At which point, we could also stop accepting new collections.

I've witnessed a large number of cases where users were happier with smaller intentionally installed collections. In fact, we've seen that even just with community.general users wish it was split, and there was not "dumping ground" or kitchen sink even from that level.

@geerlingguy
Copy link

Copying my Reddit comment over here:

No matter what, I still see great value in a kitchen sink, because being able to install Ansible, then run a playbook, no additional steps, is pretty huge for me, and very beginner friendly.

If eventually I'd have to introduce execution environments, dependency management, etc just to introduce a new user to Ansible automation... they'd more than likely stick to shell scripts or raw python/perl/whatever because those just work and you don't have to cosplay as a node.js dev just to get your first automation running.

I think it becomes too big when it takes more than a minute or two to install on an average (100 Mbps?) internet connection on an average computer (eg i3 or MacBook Air type computer).

I think we need to be careful with how far we swing the pendulum to comfortable/existing Ansible users' use cases, unless we're happy with the size of the Ansible community and want to slow down growth any more.

From a technical standpoint, I would rather there just be core + collections. I was also an early advocate for Drupal doing something similar with a "smallcore" initiative, where all modules were ripped out of Drupal core. Drupal ultimately decided against that but had its own different issues with growth. In either case there's a huge tradeoff between maintenance burden and beginner-friendliness.

@briantist
Copy link

Thanks for opening this @Andersson007 and for thoroughly laying out the options with pros and cons, you've already covered a lot of points I would have made so it's nice to start from that point.

I feel strongly against options 2 & 3.

I would be in favor of option 1, or of @sivel 's suggestion of ceasing the ansible package's existence.

Ansible Package Pros

I should point out that in terms of usage, both what I do and what I recommend, is to install core and separately install collections.

That being said, as mentioned, the package is a very strong motivator for creating Ansible content and creating it in a way that adheres to a set of standards designed for stability and long term maintenance. In that way, inclusion becomes a strong motivator for the community to ensure that the ecosystem around Ansible is full and high quality.

We are all power users of Ansible, so I think we naturally tend toward installing collections separately, but for many users, that's not as palatable; even if they are already installing separately, I think users feel much more comfortable installing from the curated list of collections than they do from collections that aren't included.

In environments where taking new dependencies or installing new packages falls under security review, this is huge: it's far more likely to be easier to have the ansible package and its included collections reviewed and approved at once than it would be to get a number of all independent collections reviewed (whether you think that's a good security move or not is up for debate, I'm only speaking practically).

Summary

In my opinion the pros of the package are by far more about encouraging participation, contribution, and trust in the community. and ecosystem, and less about convenience of having everything in one package, however it's the "one package" that motivates contributors: knowing that their content will be in users' hands, and that it's more likely to be used.

Cons of Option 2

The cons already stated pretty much cover it. We'd keep the "trust" aspect for end users, but will significantly stifle growth and contribution. We will kind of go back to everything being in core, from a community POV, it's just that the package will be the new core.

People will endeavor to contribute to an existing included collection or not at all, collections will expand in scope, which goes against the point of them, and as stated, this will disproportionately affect the likes of c.g, making it more of a kitchen sink than it is.

The "blessed" thing really doesn't feel good, from a community POV.

Cons of Option 3

For me, the big question with narrowing the criteria, is all about subjectivity vs objectivity.

The current inclusion criteria are very objective, and my subjective opinion is that that is a good thing.

I think that introducing subjective criteria is a big decision that can't be made lightly, and I'd be against that personally, but I think if we wanted to add subjective criteria, there should be a vote specifically for that, even without considering an actual criterium; the reason being that we would need to change the way we review collections. I'll try not to go too far down that road, but for example, we'd want to be sure that any subjective criteria are decided and voted on early, and quickly, because putting maintainers through the paces of meeting all the objective criteria only to be blocked later by opinion is not a good look.

I'm especially wary of the example requirement of being "generic", as already stated it's easy to look at tech areas we aren't familiar with and decide it's too niche.

I'm also not a fan of popularity being used as an inclusion criteria because the trajectory of a collection being developed and published to galaxy independently and becoming super popular before being included seems rather unlikely. It's far more likely that something a few people find useful will be used by more people as they need it, because they found there was already content for it alongside the rest of the Ansible content they use.

On option 3, I'd still be open to adding more objective requirements if we think that's useful, but it's important that we do purposefully.

Concerns with maintaining the package

Bigger and bigger package. Note: Is size a problem? If not yet, when it can become a problem? Do we have a size threshold?

IMO, size (as in bytes) is not a problem. Install time is the main thing I've heard as a negative, and that is a concern.
However we're already addressing that in other ways:

  • shipping wheels
  • excluding common sources of unneeded files (tests, etc.) from install: number of files affects install time more than size, at the sizes we are talking about (but this does reduce disk space too)

Steering committee spends a lot of time to review collections, to discuss what to do with unmaintained collections; package maintenance burden.

This is a very real concern.. it takes a lot of time and effort, that we'd probably rather be spending on other things. There's no getting around the burden to review or deal with unmaintained collections.

What we're already doing about that:

  • Allowing inclusion during minor releases, which eases the crunch for both reviewers and maintainers. For reviewers, it means we can take the time we need to do it right and not to become too stressed. For maintainers, the same thing with making necessary changes.
  • We've already been considering when and how to eject collections. Luckily I think this will be a rare event, but accepting more collections will make it more likely, both statistically, and because collections that are used less are more likely to be abandoned.

Some points about inclusion review:

  • the more objective the criteria are, the easier reviews should be. For some criteria, maybe these tests can be automated.
  • more collections, more maintainers, is a pool from which we can expand the steering committee

Other thoughts

  • If there is no package, no "inclusion", how do we decide which docs get hosted on docs.ansible.com?
  • If there's no package, what happens to redirects for short names?
  • If there's no package or concept of inclusion, how do we decide who gets RedHat resources for things like CI time? Who gets to stay under the ansible-collections organization?

Alternatives

Instead of a package, "inclusion" can be publishing in a "trusted"/"official" galaxy space/server, somehow separate from unincluded collections. This would be akin to the "official" package repositories in a distro for example.

I realize this isn't feasible without changes to galaxy or other things but looking to think about this conceptually and not get into the technical hurdles.

Pros

  • encourages use of galaxy and installation of needed packages
  • lessens the burden around building "the package"
  • ejection is less serious, making it easier to do: end users are already installing it, if they still want they just need to explicitly decide to trust it / install it from the other repo / etc.
  • still an incentive for contributors to "go official"
  • still a way for us to maintain standards
  • (could also be a con) for companies/projects that are big enough, they may never need inclusion: we readily add package sources for certain software without much thought, because already trust the publisher and their repo, could be the same in this case and then there's no burden on reviewers
  • inclusion here can fit the criteria for certain RedHat resources (docs generation and publishing, CI resources, etc.).

Cons

  • public galaxy is currently set up as a free for all, anyone can publish, and installs are by and large done from one source; I can't think of a simple way to segregate that without breaking install for all the non-included collections out there
  • It's not quite the same incentive for contribution and participation that I think the current package is, but I believe it comes pretty close in terms of user trust
  • No matter how much we like installing collections separately, not having a package will necessarily alienate some users, and that will disproportionately affect beginners and newcomers

@markuman
Copy link
Contributor

I agrree with @geerlingguy
Basically I don't see any problems with the growing number of collections.
When someone just wants a minimal ansible installation without hundreds of collections, the recommendatios is imo just: pip install ansible-core
Afterthat it's possible to install only the required collections selective.

Cons of Option 2

Should we stop including new collections or shouldn't? Pros cons?

I cannot think of any valid pro argument here. When there comes a new whidespread, heavily used collection, then it won't be included,... just because it does not exist in time?
Or when there is a new trend what comes after cloud/container/foo, ansible cannot include it, will miss the train and loose users...

Option 3

The inclusion criteria shouldn't only focus on code quality etc.
As also mentioned here #34 (comment), it's worth thinking about some quantity characteristics (number of active maintainers, number of downloads, usage/feedback ...and all the things that are difficult to measure 🙂)

@MarkusTeufelberger
Copy link

From using ansible-galaxy a bit it also does not feel enough like a proper feature complete package manager (yet?) and I'm unsure if that's the best way forward to focus development resources. A "everything that passes basic checks at the time of packaging" and a "minimal, install the rest on your own" package is probably better than starting to argue about what constitutes "generic enough" or "not widely used/developed a few months too late to be included, sorry".

I'd rather see ansible.builtin being also developed as a collection or at least some/most modules being removed from there into their own (ansible.official?) collection to slim down ansible-core even more (e.g. I don't see why https://docs.ansible.com/ansible/latest/collections/ansible/builtin/apt_module.html needs to be shipped version locked to ansible-core while I assume that modules like assert or debug might have more tight integration to the core engine itself).

I'd much rather see better tooling around testing collections and getting more data and quality there than a "just install ansible-not-just-core and use ansible-galaxy to install whatever else you need" approach. Yes, a "fully featured" package will carry a lot of dead weight for a lot of use cases, but it also gives a lot of options once it is present. IMHO there's not that much of a sprawling ecosystem yet(?) around Ansible that people start to re-implement the same functionality in different ways, the existing ones are usually "good enough" at least when it comes to modules and plugins (roles is a different issue and not in scope for this discussion I guess - those are far more opinionated usually).

@mariolenz
Copy link
Contributor

Considering option 3, I think the cons you've mentioned outweigh the pros. So I think this would be the worst approach. Although this depends on how you "narrow the door", that is how the new requirements would look like exactly. To be fair, they should also be applied to the collections already included. That is some might have to be removed. But this leads to a new discussion.

So for me it's a choice between option 1 and 2 at the moment. Option 2 sounds a bit unfair, since it means some collections are part of Ansible because they're old enough, and some aren't because they're too young. In the long run, I think option 2 should only be considered if you plan to deprecate the package.

The steering committee spending a lot of time to review collections might be a problem. On the other hand, this might improve collection quality (what you called "Motivation to create content satisfying the collection requirements.") I think this is a very strong point in favor of this option. Although it's easy for me to say this because I'm not part of the steering committee that has to do the work.

I also agree with @geerlingguy that an "all batteries included" package is more beginner friendly.

One additional thought about installation via galaxy: Some environments (like ours) don't have direct internet access. We wouldn't be able to do this. Luckily we do have a PyPI mirror, so having all collections in one package is very convenient for us.

Spinning up a galaxy mirror wouldn't be that hard... technically. But there's a lot of politics involved (IT security, compliance...) which takes time. So if you want to go into this direction, that is force users to install collections via galaxy, you should give people enough time to find a solution how to do it in their special environment.

@dmsimard
Copy link

dmsimard commented Mar 22, 2022

First and foremost: I cannot even consider the possibility of the ansible package going away because it is appreciated and useful for such a wide variety of users and use cases.

I am convinced it would harm adoption as well as the community if we did that. The numbers speak for themselves:

Source: https://pypistats.org/
ansible

--

ansible-core

I think it's OK if there are users who would rather install ansible-core and then cherry-pick the collections that they want and need. We should continue to support this and encourage it whenever it makes sense.

No one is forcing users to install the "kitchen sink" if they don't want it and so I don't see the ansible-core and ansible packages as mutually exclusive.
I feel the same way about execution environments: it's great that they exist, they are well suited for particular use cases but that doesn't mean everyone should be forced to use them.

These options can and should continue to exist to cater to the different needs and requirements of our users.
Some prefer simplicity, some need extra features or entreprise capabilities, some have security requirements, this is all fine.

I could make a parallel to Linux distributions: some are minimal, some are batteries-included, some have a different focus, some are stable and some are "bleeding edge" -- you know, sometimes you end up with the entire suite of LibreOffice installed even though you don't need it but it's good to have options.

Now, I don't mean to say that the current state of the ansible package is perfect.
The future of the ansible package is still an important discussion to have, even if just to collect input and feedback from the wider community.

I must address the concerns around bloat and installation performance some have mentioned because those are legitimate.
I share these concerns and I am happy to report that they will, in part, be addressed in the upcoming release of Ansible 6:

  • We'll start shipping python wheels (cutting installation time in more than half)
  • We'll exclude unnecessary files from the installation (further improving installation time)

That doesn't mean we should stop there -- we should always strive to improve and do better.
I've brought up another angle to address bloat in last week's meeting topic:

  • Some included collections should be removed because they have been superseded by a newer collection
  • Some included collections have, over time, grown inactive or neglected to a point where it is hard to vouch for their quality

In other words, we should be more proactive in making sure that what we ship is maintained and I hope we can do a better job with that in the future.

I have other thoughts but this comment is already long enough, I will pause for now :)
In the meantime, please note that we have a good amount of discussion on this topic on reddit as well.

@sivel
Copy link

sivel commented Mar 22, 2022

The numbers speak for themselves:

I don't really need to get into this here, but I don't think those numbers are really showing us anything, other than that people still are using ansible<2.10 a lot, which is about to be EOL. In any world where we exclude ansible<2.10, ansible-core downloads should always exceed that of ansible. I doubt many people are downloading the ansible>=2.10 package without also fetching ansible-core in any useful manner.

Many people also likely don't even realize that the packages are split, and have just been doing the same ol' thing they've always done.


On a different topic, there is always the time and resources required to manage the ansible package that should be taken into account. It's highly reliant on a number of people who aren't paid to be doing what they do. Community desires aside, there are other points to take into consideration about the long term supportability of a package.

@mariolenz
Copy link
Contributor

Many people also likely don't even realize that the packages are split, and have just been doing the same ol' thing they've always done.

I'm not sure about this. It's possible that a lot of people realize this, but they still keep installing ansible from PyPI because it's more convenient for them. It's hard to be sure here.

On a different topic, there is always the time and resources required to manage the ansible package that should be taken into account.

That's definitively an important point.

@dmsimard
Copy link

dmsimard commented Mar 22, 2022

I don't think those numbers are really showing us anything, other than that people still are using ansible<2.10 a lot, which is about to be EOL.

This is a pretty big assumption to make: the data does not specify which versions of the packages are downloaded, only the python versions. I am curious and I will try to find out if package versions are part of the PyPI package downloads dataset because it would be interesting to know.

On a different topic, there is always the time and resources required to manage the ansible package that should be taken into account. It's highly reliant on a number of people who aren't paid to be doing what they do. Community desires aside, there are other points to take into consideration about the long term supportability of a package.

This work is heavily sponsored by Red Hat but there are a number of ways where we limit the burden of maintenance and support in favor of sustainability, including:

  • Settling on a cadence of a minor release every three weeks and a major release every 6 months
  • Halting maintenance on the previous major release once a new major release is out
  • Automating the packaging as well as changelog and porting guide generation

As the one currently doing the releases of the ansible package, I must say that it's about 30 minutes of work to build, test, review, release and send an announcement email out. Building and testing the package itself takes less than five minutes and it's an ordinary ansible playbook. There are various improvement opportunities to make this even less work.

I don't intend to discount the helpful contributions from the community and in fact I must particularly credit Felix Fontein and Toshio Kuratomi for their effort in laying the foundation and working on the tooling to make this process relatively easy.

Alas, the topic is not just about building and releasing the package but I wanted to say that I am not worried about this particular part of the equation.

We can talk about the time and resources spent reviewing collection inclusions, meetings, discussing issues like this one or initiatives such as the steering committee, sure.

@dmsimard
Copy link

dmsimard commented Mar 22, 2022

I am curious and I will try to find out if package versions are part of the PyPI package downloads dataset because it would be interesting to know.

It appears that package versions are part of the data set but are unfortunately not implemented on pypistats.org (see this issue).
It is suggested in the issue to look at https://pepy.tech which does provide a breakdown by package version.

Here's a breakdown for ansible-core (link)
ansible-core

and for ansible (link):
ansible

There's still a significant amount of users on ansible 2.9 and it is interesting to see that a lot of people are actually holding onto ansible 4.10.0 as well as ansible-core 2.11. If I would have to take a guess, it could be due to the new python3.8 requirement in ansible 5 which started shipping ansible-core 2.12.

Edit: I figured I would validate my guess and it seems to correlate with the data from pypistats on python versions, there's still a lot of python2.7:
Screenshot from 2022-03-22 21-26-07

@kuwv
Copy link

kuwv commented Mar 22, 2022

Well, I don't usually need networking libraries when I'm doing cloud work. But, if I'm not doing cloud work I don't need those libraries either. The current install brings everything.

This might actually be better handled with package extras. Just package the collection individually and setup the Ansible package to be more dynamic in what gets installed.

Where pip install ansible[cloud] and pip install ansible[network] make sense in my mind.

And if someone really needs all the packages - like maybe setting up Tower - then I'd
expect pip install ansible[all] for that.

Anyway, just my opinion.

@felixfontein
Copy link
Contributor

@dmsimard I always wonder when seeing such download indicators how many of these come from CI systems (and which percentage comes from Ansible collection CIs :) ), and how many are actual users.

@dmsimard
Copy link

dmsimard commented Mar 22, 2022

@dmsimard I always wonder when seeing such download indicators how many of these come from CI systems (and which percentage comes from Ansible collection CIs :) ), and how many are actual users.

It would be interesting to know, yes.
The numbers include CI and there is no meaningful way to differentiate "manual" installations from automated ones. This is mentioned in the pipystats FAQ.

IMO the absolute numbers aren't as relevant as the trends and the differences between packages and versions.
Edit: by that, I mean I wouldn't go around saying ansible has 5 million users if we have 5 million downloads.

Maybe 50% is CI, who knows, but it averages out so you can have a general idea.

@kuwv
Copy link

kuwv commented Mar 23, 2022

@dmsimard I always wonder when seeing such download indicators how many of these come from CI systems (and which percentage comes from Ansible collection CIs :) ), and how many are actual users.

Well, Zuul uses Ansible as it's CI/CD system so...

https://zuul-ci.org/
https://opensource.com/article/20/2/zuul

@dmsimard
Copy link

@dmsimard I always wonder when seeing such download indicators how many of these come from CI systems (and which percentage comes from Ansible collection CIs :) ), and how many are actual users.

Well, Zuul uses Ansible as it's CI/CD system so...

https://zuul-ci.org/ https://opensource.com/article/20/2/zuul

I am familiar with Zuul, though I wonder how much of an impact it would have on the numbers considering that, at least for the openstack/opendev community, they run local mirrors.

@maxamillion
Copy link

Copying my Reddit comment over here:

No matter what, I still see great value in a kitchen sink, because being able to install Ansible, then run a playbook, no additional steps, is pretty huge for me, and very beginner friendly.

If eventually I'd have to introduce execution environments, dependency management, etc just to introduce a new user to Ansible automation... they'd more than likely stick to shell scripts or raw python/perl/whatever because those just work and you don't have to cosplay as a node.js dev just to get your first automation running.

I think it becomes too big when it takes more than a minute or two to install on an average (100 Mbps?) internet connection on an average computer (eg i3 or MacBook Air type computer).

I think we need to be careful with how far we swing the pendulum to comfortable/existing Ansible users' use cases, unless we're happy with the size of the Ansible community and want to slow down growth any more.

From a technical standpoint, I would rather there just be core + collections. I was also an early advocate for Drupal doing something similar with a "smallcore" initiative, where all modules were ripped out of Drupal core. Drupal ultimately decided against that but had its own different issues with growth. In either case there's a huge tradeoff between maintenance burden and beginner-friendliness.

+1 to all of this. I firmly agree with @geerlingguy

Ansible got to where it is today in terms of popularity and organic usage growth via the kitchen-sink package. I think a large part of that is because the first-time-user has a very low barrier of entry and having baked-in support for a bunch of things the user cares about definitely contributes to that first impression and ease of use. Ansible Core is great, wonderful, and powerful but without a bunch of content it doesn't actually do a whole lot for functional automation that users know and love. The migration to Collections was necessary, I don't question their merit, nor would I attempt to debate that here. However, I think it would a grave detriment to Ansible as a whole, the community, the userbase, and the project to get rid of the ansible package or to cripple it going forward. The broad support provided by the kitchen sink package is a catalyst effect, people find Ansible because it supports their thing, whatever that might be.

Option 1 is my preferred option

Option 2 shouldn't even be considered, this is intentionally stopping growth and it feels strange to discuss it

Option 3 has the potential to be a slippery slope because the criteria are hard to quantify what is/isn't important or widely used enough to make the cut purely because we can't know that what specific thing might spark a new user's interest but if implemented correctly this could be a reasonable option

For whatever it's worth, that's my $0.02

@jctanner
Copy link

Ansible got to where it is today in terms of popularity and organic usage growth via the kitchen-sink package.

I'd love to see some data to support this assertion. From my perspective and own research with publicly available data (at the time), the vast majority of playbooks use a small subset of modules and those were the ones we decided to leave in core.

https://tannerjc.net/ansible/galaxy.html

I'm sure non-public playbooks could paint a slightly different story though, so i'm open to being re-educated. I'm just not a fan of false correlations to ansible's success based on emotion.

My vote obviously is for option 4 as proposed by sivel ...

"ACD" as it was originally called, is unnecessary. I've seen multiple arguments made in this ticket and elsewhere that users have "difficulty" knowing what collections they need. Wouldn't those same users also have difficulty installing the python and system package dependencies for those collections they supposedly know nothing about? Sure execution environments help there ... but only in the context of modules that aren't run on remote systems. The execution environment doesn't magically ship python and system deps onto the remote machines for all those modules. At some point, the users have to understand what their playbooks are doing, what modules/collections are in use and how to provide the right dependencies for those plugins on the controll host or all the remote hosts. There is no escaping that reality.

The way things are going now [pardon my slippery slope] in 20 or 50 years, the ansible package will be a museum of modules that configure systems that are no longer manufactured, for companies long defunct and for technology nobody cares about anymore. If you all want to persist in maintaining a dumping ground for irrelevant tech and trying to maintain compatibility with dead systems, by all means. I hope you have the time and the budget to carry all those dead python libraries into the future.

@geerlingguy
Copy link

The way things are going now [pardon my slippery slope] in 20 or 50 years, the ansible package will be a museum of modules that configure systems that are no longer manufactured, for companies long defunct and for technology nobody cares about anymore.

That's one take on it—the other take is look at how colossal Microsoft has become, maintaining support for things that are 20-30 years behind the times. And think of networking deployments, where I still see ancient Cisco switches in use from the late 90s ;)

Breaking existing use cases and backwards compatibility are two ways to quickly disenfranchise users. I won't say 'don't do this, it's definitely the wrong thing', but I will say my prediction is if a batteries-included Ansible distribution goes away, growth will slow more rapidly than it already has.

That might not be a bad thing, from the perspective of the business goals for Red Hat / IBM / Ansible, especially if the team of paid developers would mutiny were it not to happen (that's a serious consideration—Ansible as a whole dies if the ansible-core team disbands). But I do have a pretty strong feeling that it will slow adoption of Ansible where it's not already in use.

@maxamillion
Copy link

maxamillion commented Mar 23, 2022

Ansible got to where it is today in terms of popularity and organic usage growth via the kitchen-sink package.

I'd love to see some data to support this assertion. From my perspective and own research with publicly available data (at the time), the vast majority of playbooks use a small subset of modules and those were the ones we decided to leave in core.

https://tannerjc.net/ansible/galaxy.html

I'm sure non-public playbooks could paint a slightly different story though, so i'm open to being re-educated. I'm just not a fan of false correlations to ansible's success based on emotion.

I have no real desire to inflict emotion on it, but my point is that Ansible got to where it is as a batteries-included tool. This attribute is also something that Ansible Inc and Red Hat post-acquisition spent a considerable amount of marketing dollars highlighting. My point was simply, the thing that exists as kitchen-sink is literally the thing that got us where we are today. I don't want to decide to throw it away with limited data sets that enforce a narrative that could be flawed purely because we don't have the right metrics. It could also be correct, but I don't know and I don't think we have enough data to know for certain. If we screw this up, it could be at the detriment of Ansible as a whole.

I worry "what's in Galaxy as re-usable generic roles" is potentially selective sampling compared to what would actually be in a playbook to handle the more grungy aspects of the logic required to setup site-specific requirements.

My vote obviously is for option 4 as proposed by sivel ...

"ACD" as it was originally called, is unnecessary. I've seen multiple arguments made in this ticket and elsewhere that users have "difficulty" knowing what collections they need. Wouldn't those same users also have difficulty installing the python and system package dependencies for those collections they supposedly know nothing about? Sure execution environments help there ... but only in the context of modules that aren't run on remote systems. The execution environment doesn't magically ship python and system deps onto the remote machines for all those modules. At some point, the users have to understand what their playbooks are doing, what modules/collections are in use and how to provide the right dependencies for those plugins on the controll host or all the remote hosts. There is no escaping that reality.

This isn't a new problem though, this has been a known quantity since the dawn of Ansible. It's not a net new challenge that would be solved by throwing away the kitchen-sink, the same can be said for modules in Core... it's just a smaller set of things that need dependencies.

The way things are going now [pardon my slippery slope] in 20 or 50 years, the ansible package will be a museum of modules that configure systems that are no longer manufactured, for companies long defunct and for technology nobody cares about anymore. If you all want to persist in maintaining a dumping ground for irrelevant tech and trying to maintain compatibility with dead systems, by all means. I hope you have the time and the budget to carry all those dead python libraries into the future.

This is a completely fair concern.

@jctanner
Copy link

jctanner commented Mar 23, 2022

I worry "what's in Galaxy as re-usable generic roles" is potentially selective sampling

Fair point. My report also included every single repo on github that included what looked like either a playbook or a role. Your point still stands though and that's why I mentioned "publically available" data.

This isn't a new problem though, this has been a known quantity since the dawn of Ansible. It's not a net new challenge that would be solved by throwing away the kitchen-sink, the same can be said for modules in Core... it's just a smaller set of things that need dependencies.

Very true, and for the most part, modules in core typically only require dependencies already found on the system. Core also has a deprecation path is used quite often to purge old features/functions. When something isn't useful anymore or it goes against the grain of how core wants something to work, it can be taken out.

We went through the debate over "kitchen sink" numerous times during the collection migration journey. It was heated and contentious and most of the time, it felt like only core wanted it to happen. We even had a group or set of people or person(s) who were insistent that we couldn't leave -any- modules in core so that it would be "crippled" without at least one collection, thereby people might be more inclined to pay or acquire a fully assembled distro/solution.

What I learned through that process, is that there are -many- individualized definitions of what "ansible" is.
If we were to ask Red Hat customers or sales reps, they'd often say it's the thing you point and click on, aka tower or awx or controller. People on github might say it's the thing you git clone or pip install or yum install and then run via ansible/ansible-playbook. The internal community team might say it's the ecosystem of modules (or now collections). In either case, all those different definitions give a different foundation to how those same groups might define ansible's "success".

I know that from the beginning of the company, Ansible's marketing exploited the module count as a growth hacking metric. I don't know whether that was because they didn't understand enough to talk about other things or thought their audience wouldn't understand any of the other cool stuff that got into ansible, but it was an over-valued metric in my opinion.

I wonder if the linux kernel would have been served well by having marketing announcements that only touted the number of new drivers included, but that's an aside.

I also experienced first hand the way including new modules in core was used as political capital to make inroads into other groups inside Red Hat. It led to strife over who's modules got merged and who's didn't and whose marketing deadlines would be met. It slowly became a choice the core team wasn't allowed to make anymore, even though the only choice being made was "I can't test this proprietary (or paid for) thing in CI, so how can I bring it into the project?". Marketing became the governing body of the project.

On top of all that, it was constantly assumed that the core team was responsible for any of the modules that didn't work. It didn't help that at Red Hat, we used to have posters on the wall that said "we ship it, we support it". It was a very untenable situation. Things had to change, and the moving to collections was the first time we had a real opportunity to do something about it. Could have done the same thing with roles (minor technical issues aside), but there was too much resentment of roles due to the "1000 nginx roles" controversy.

All in all, I see ACD aka "ansible" as a continuation what some people think "ansible" is and all of the same problems. I highly commend the community team and the volunteers for how they have curated the collection of collections and tried to ensure some standards. Although, I don't personally think the standards will ever be high enough to weed out the cruft that is going to recreate the "1000 nginx roles" problem. I asked in a community meeting once, what criteria could actually cause a module or a collection to be taken out of the distro, but there was no answer. I really wish the community at large could see the problems like I do, but i know that's not realistic. I think it would be incredible though if less effort was spent on growing the module count, and instead more effort was spent on new language features for core and it's related commands (ansible-playbook, ansible-galaxy, ansible-test, etc). Again, I think the community folks have been awesome with helping fix bugs in core in relation to collections.

@briantist
Copy link

@jctanner you certainly have a perspective most of us don't and that's valuable; I appreciate the peek into how things were within RedHat and how that may have affected Ansible.

But I don't see how you can reasonably presume this statement if you've been involved in the community recently.

The way things are going now [pardon my slippery slope] in 20 or 50 years, the ansible package will be a museum of modules that configure systems that are no longer manufactured, for companies long defunct and for technology nobody cares about anymore. If you all want to persist in maintaining a dumping ground for irrelevant tech and trying to maintain compatibility with dead systems, by all means. I hope you have the time and the budget to carry all those dead python libraries into the future.

We've talked about removing collections from the package for a while, and as we've gotten to the point where we literally have broken / abandoned collections, we are filling in the details and preparing to do exactly that.

Although today isn't the first meeting where it has been discussed, it was the major topic of today's meeting, spurring follow-up issues and reviving discussions in existing ones:

I find the idea that we're all going to inexplicably support and look to keep decades of accumulated cruft for long-dead companies and technologies kind of silly; I haven't seen anyone in the community trying to argue that we should keep unmaintained or broken collections in the package like some kind of technology hoarder.

Although, I don't personally think the standards will ever be high enough to weed out the cruft that is going to recreate the "1000 nginx roles" problem.

Putting aside the fact that roles lend themselves to being opinionated which naturally leads to many variations, I think the package is just about the best option we have for preventing this problem.

I think contributors are far more likely to want to add something to an existing collection that's included, than they are to publish a new one or contribute to a collection that's not included in the package.

@maxamillion
Copy link

@jctanner I appreciate your candor and willingness to share some history for the broader community to consider, you have brought forth extremely valid concerns and highlighted the nuance of the module count being a magical metric that is unknown whether or not it actually contributes to success. Also, the ambiguity that has been introduced to the brand "Ansible" because some marketing team(s) chose to put the name on many things has absolutely conflated the issues outlined above.

I also think there needs to be a proper deprecation process for the kitchen-sink to remove things, something as simple as health metrics on the code could work such that if working stale code continues working then it's fine to leave in but if not working code is abandoned then after some accepted period of time, it is deprecated. That's likely a bigger discussion for somewhere else though.

This all brings up a very interesting question that I think the Ansible Community should somehow quantify and answer for our collective selves, "what does success of Ansible mean and what is the desired future of the project?" in terms of what metrics matter, what are we attempting to accomplish, what's the future beyond just the ansible pypi module state?

If we can answer the larger grand questions, then that can advise what should be done here in an objective and informed way. It's entirely possible the scope of the question, "what is the future of this software package?" is too narrow.

@Manuelraa
Copy link

Manuelraa commented Mar 28, 2022

Just a quick note thrown into the round. My company needs and relies on "site local installs" this means caching/provisioning of packages.
So actual installs are never though the internet.

Ansible galaxy itself doesnt offer a convenient way to proxy such installs for us. Only scripted provisioning for collections is possible which can be cumbersome to maintain versions up-to-date etc.

Overall package installation should be easier to point to a proxied repository (e.g. remote location containing the tar.gz files.
Similar to pip find-links functionality.
Better then vendorlocking

@nirik
Copy link

nirik commented Mar 29, 2022

My 0.002 cents: I think a collection of active/popular/highlyused/curated collections is very helpful for the time being, especially for new ansible users. I do think there needs to be a process to add/remove things from it to keep it useful.
That said, I think eventually most users are going to move to a model of using ansible-core + just the collections they need and someday a large meta collection will not be as important. As to when that might be I am not sure. I'd also say there's been a lot of change in the ansible packaging and setup in recent years and many users are growing weary of all the changes, so another reason it might be good to avoid more changes now.

@felixfontein
Copy link
Contributor

Here's a link to the Reddit discussion btw: https://www.reddit.com/r/ansible/comments/tjfbqx/future_of_ansible_package/

@apple4ever
Copy link

(Posting from Reddit as well, because I think its important to have it here).

I heavily agree with @geerlingguy

In the 7 years I've been using Ansible having the batteries included approach has made is so much easier to just start using it and keep using it.

I'm now trying to introduce it at my current company, where they have a custom written config management (written back around 2008ish). Having the batteries included option is a big selling point to them.

So my preference is #1, and #2 is right out, with #3 as a distant second.

@Andersson007
Copy link
Contributor Author

Thanks everyone for your great feedback! The Community and Steering Committee will continue to include new collections!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests