New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No user interface to remove old cookbooks. #954
Comments
Something to limit the impact to Berkshelf of having many, many old releases all still available in Supermarket would be awesome. 👍 |
Also covered in chef/chef#3334 |
Copied from chef/chef#3334: This probably has the same problem that yanking gems has. If someone has their environment locked in Berksfile.lock and regardless of how you feel about it the cookbook is working for them, and you then yank it, then you potentially break their production deployments (suddenly, at 3am on a friday night their local time, etc). You might mark a cookbook so that it will not depsolve and be offered in the universe but it must still download to not break existing Berksfile.locks. |
This would be great as an alternative. |
So what is the "impact to berkshelf"? Why do we need to fix this? Just "having lots of old versions" is a bad reason to solve this because you can never fully get rid of them because of the yanking problem. If its largely a cosmetic issue then this issue should not get fixed because its going to be a yak with little practical payoff (or possibly disastrous payoff if it gets implemented incorrectly). If there's perf issues, there may be some other solution like having the server had down a pruned subset of the universe. |
I think there are two issues at play here. Regarding the impact to berkshelf, I've seen a number of circumstances where removing a cookbook from Supermarket and reuploading only the latest version (not my preferred solution, but I've definitely seen people do it) actually causes Berkshelf to successfully do dependency resolution, whereas before the change, it would just eventually time out and fail to ever resolve dependencies. Making additional versions available to Berkshelf appears to cause the search space for possible solutions to grow by orders of magnitude. I also think we may still want to yank a cookbook version under specific circumstances. Imagine a scenario where the openssh cookbook accidentally opens SSH for the root user with an empty string for the password. You know that no user will ever actually want that, and it's probably an issue where it's better to cause dependency resolution to fail than to supply an outdated version. I could see this even for something as simple as a cookbook that accidentally leaves in a In almost every scenario (Friday, 3am, etc), they will still have a version of the cookbook on the chef server. We aren't depriving someone of the use of an extant cookbook simply by making it no longer listed in the Berkshelf endpoint.
That would be a great solution. I want to limit the old versions available for depsolving, not necessarily make old versions unavailable in environments that are already version locked. Perhaps there's even something that can be done when feeding the constraints to gecode that would fix this particular timeout issue. Berkshelf issues (either timing out or simply failing outright) have been the major drivers I've seen for people wanting to remove old cookbooks. |
So your first perf problem is likely more involved than just having additional cookbook versions available. Where I've seen that before is when the cookbook has a dep on itself and every version of itself satisfies its own circular dependency. That could be addressed by making it illegal to upload a cookbook that depended on itself or stripping the self-dependency on the fly. I should actually commit a foodcritic test at least to start removing that problem. You are right that if a truly pathological cookbook is uploaded that there's a case for really yanking it. That should be reserved for truly stupidly dangerous code that code published or for secret key material that got accidentally published though. The problem is that if people don't understand the implications of yanking and then yank cookbook versions it will cause chaos. And I think you don't appreciate the impact if someone has a continuous delivery system or autoscaling in which case their depsolving may change unexpectedly or their Berkshelf.lock files (or Policyfile.locks in the future) may not be able to resolve. People who are building pipelines should be able to pick up a lock file that worked yesterday and apply it to production today without having the cookbook disappear and breaking their pipeline (in this case it doesn't necessarily break prod, but it breaks the ability to promote to prod which could bring deployments to a halt). I'd like to see the issues that were causing Berkshelf (and modern berks 3.x berkshelf with gecode, not the 2.x ruby depsolver) to timeout on depsolving and see what kind of pathological graphs were there. I strongly suspect that there's more going on than a lot of old versions (and pruning the old versions may fix the issue, but that doesn't mean that old versions were to blame for the root cause). |
And actually let me state that clearer: As long as people think that yanking cookbooks is the solution to long berkshelf depsolving timeouts, without any proof that it really is so, then implementing yanking is the LAST thing that we should do because that will guarantee that people who do not understand the implications of yanking cookbooks will start yanking them a lot, which will cause chaos. So, a BIG 👎 on implementing this until we actually dig into the depsolving issues you mentioned and understand them. |
Never open up a footgun manufacturing shop in the land of the people with the giant feet. |
Well, the berkshelf issue aside, I'd argue there's still a case for yanking, for the pathological case.
I feel that any depsolving or re-downloading cookbooks during autoscaling or CD is a really dangerous thing to do, in general. I do autoscaling with chef today, and I don't re-run any parts of Berkshelf just to add capacity or deliver software. It's just too risky. It's a constant battle today to keep it flowing (gem yanking breaks it, as you mention). I've designed my production workflows not to depend on Supermarket's availability/reachability (ditto for rubygems.org). I reserve depsolving and re-uploading cookbooks for CI, and everything after CI is just an artifact that does not change. It feels like we should recommend that people download cookbooks, and lock both the versions and the actual cookbook artifacts, at CI time, if they are concerned about stability (which you have to be in CD or Autoscaling). Yanking should not break that workflow. |
That's a good and paranoid way to design it, but I guarantee that people won't do that, and yanking cookbooks will lead to breakage in locked versions of cookbooks disappearing. |
added Foodcritic/foodcritic#328 to start addressing auto-dependency problems |
Reviving this discussion: What is being requested: Many users have requested the ability to "yank" a certain version of a Chef cookbook from the public Supermarket. Potential Problems: As soon as a cookbook version is uploaded to Supermarket, there is the potential that someone will have downloaded and used said cookbook version. If someone were to yank a version of a cookbook because they accidentally uploaded security credentials (people have accidentally uploaded security credentials in cookbooks they uploaded to Supermarket multiple times this year), it could create a false sense of security. Additionally, if someone is depending on the version of the cookbook, yanking it could break their stuff. Potential Solution:
I am absolutely open to discussion on this, but would like to decide yes or no within the next two weeks so we can either move forward or shelve this with clear reasons why. |
Hi @nellshamrell!
Do you have a link to that PR?
👍 I think your items 1, 2, and 3 applied together seem reasonable. As I mentioned in a previous comment, yanking versions would be useful for a number of reasons, even if dangerous to do without understanding the implications. I'd like to allow/enable the feature, and then make strong recommendations about how to protect yourself from mis-use. As well, your proposal would stop people from deleting the whole cookbook and then re-uploading only the latest version, which I see happening not-infrequently today. We had someone on IRC in #chef the other day who actually did this, and then lost the cookbook name when someone else claimed it in between his delete and re-upload (we should remove the ability to delete a cookbook, or have both delete and yank, so that users aren't having to choose something worse, and have, IMHO, a very not-delightful experience). |
Just make certain that the warning for item 2 is exceptionally clear. I got burned very badly by yanking an ohai gem version awhile back. I got a warning about it being bad practice but I thought to myself "Yes, but I'm a professional software developer, and I have $REASONS" and so I went ahead. The chaos that ensued as every Gemfile.lock that had it pinned exploded caught me completely off guard as to how bad it was. The warning should clearly state that it will break anyone's production deployments who has that version in a Berksfile.lock, pinned in an environment file, or used in a compiled Policyfile.lock.json, and that cleaning up old versions on the public supermarket is highly discouraged. |
The reason why being able to yank things is needed is that the resolver does an exhaustive depgraph to figure out what to pull in, if you have too many versions it simply times out... |
I don't think this is strictly true, and I think the statement needs to be as accurate as possible. I would change 'will break' to 'very likely to break' and I would also add something about "when using public Berkshelf endpoints for dependency solving and downloading." Or I'd say "will eventually break" if you don't want to have the context of Berkshelf caches, public Berkshelf endpoints, etc. |
That IS NOT a reason to yank supermarket versions, that is a reason to fix the dep-solver: @danielsdeleo asked people to test that patch in the berkshelf thread and then got crickets back and it dropped off his radar (plus Policyfiles are using a different depsolver and gecode is going to eventually go away, so that is ultimately going to be time wasted to fix how we're using gecode). Fixing depsolver bugs via pulling publicly released versions of an artifact when you have zero visibility into if someone is using them or not is an extremely bad approach and why I'm starting to lean back to 👎'ing this feature again. |
Yeah, okay so it won't break your production converges when you have your own copy of the cookbook uploaded to the chef-server. But if you ever want to run test-kitchen against pinned environments then it'll break. As a user running the command you have no visibility into if people are pulling equality pinned versions of that cookbook down, it is a VERY anti-social command to run. We also have actually no way to audit this or provide any information either since berks just pulls down /universe and we never see the dep constraints. |
wasn't stating that the dep solver shouldn't be fixed |
Here is the pull request to knife supermarket that I mentioned: chef-boneyard/knife-supermarket#16 |
I'm starting to realize we're talking about a few different issues in this thread:
|
ya, sorry for the distraction on 2, it was simply the pain point that caused the need for yanking in my experience. |
The problem is that you didn't have a 'need' to yank cookbooks. That is precisely the road that I went down with yanking the ohai gem. I had a very clear 'need' (so i thought) but in retrospect that was entirely the wrong solution. The fact that you keep on talking about 'needing' to yank cookbook versions to solve the problem you have just continues to reinforce to me how dangerous this feature is. And yes, yanking entire cookbooks is clearly horrible as well. But arguing we should allow yanking cookbook versions because we can yank entire cookbooks is like arguing the morality of cluster munitions because we already have nuclear weapons... |
Actually @nellshamrell slightly more constructively than arguing about removing cookbook versions... The rubygems community has talked about a better API than "yank" where it is just omitted from the rubygems depsolving, but can still be fetched if its already solved in a Gemfile.lock Instead of dropping the row out of the table for the cookbook version and purging it, we could add a flag which only has the effect of omitting it from /universe. That solution would mitigate all the issues that I have with it. While it would still let authors 'tidy' up the cookbook versions and would even 'solve' the depsolving issues by allowing removing old versions--without causing chaos. |
I like that solution 👍 |
I'm a fan. 👍 |
This just came up again in the rubygems context where celluloid 0.16.1 was yanked from rubygems. I just very nearly watched someone do a 'bundle update' to update their Gemfile.lock which pulled 0.16.1 from their gemset in their ruby which then changed their Gemfile.lock which then very nearly got uploaded to a project and would have badly broken the build. In the cookbook/Berkshelf work this will work similarly since the ~/.berskhelf cache will have cookbook versions in it which could have been removed from supermarket, and which will work locally but which will be broken on a clean system which needs to pull from supermarket. You basically have a massive, distributed cache invalidation problem which goes completely unaddressed when you delete cookbook versions. On the public supermarket, it should likely never be allowed to happen in any form, and cookbooks and cookbook versions should be considered like erlang variables and never deleted -- only marked 'hidden' to prevent depsolving/UI clutter. |
We will revisit this issue in 2016. |
Thanks @nellshamrell! 👍 |
Another good example of why cookbook version deletion should not be allowed, this time coming from the npm world: http://www.theregister.co.uk/2016/03/23/npm_left_pad_chaos/ The issue around the 'kik' module there is a legit legal problem. There needs to be a mechanism to delete cookbooks for legal reasons, but the supermarket admins can take care of that. The fact that the user then went and nuked all his modules including 'left-pad' is an interesting abuse of module deletion. Regardless of how anyone feels about the action that user took, if you put yourself in the feet of a system admin trying to keep their sites up and their CI/CD process humming smoothly having a random software dev delete critical modules that you depend on would come as a bit of a shock. The rust/cargo community also defines 'yank' similar to my proposal for 'hide':
The rust/cargo developers also handle a legal situation like the 'kik' one via administrative intervention and deliberately do not provide a command:
|
Just wanted to capture the fact that based on skimming that npm-left-pad-chaos thread that for security reasons I think we have a hard, absolute requirement that we MUST NOT implement a user-accessible delete function on supermarket. There is a risk of denial-of-service from a "bad actor". The people who use Hosted and Supermarket and pay for that service expect to be able to run production infrastructure without it being taken down. While running a private supermarket with an internal cache of cookbooks is obviously the 'Enterprise-class' way of doing this, we still cannot expose paying Hosted customers to the risk that any random supermarket dev with rights to a popular cookbook can yank it. There's also a security issue in that if cookbooks can go away or be replace or mutate that actual bad actors could hijack cookbooks and inject malicious code into production systems. This is not usually a problem with cookbook since it requires a cookbook both becoming popular (through good acting) and then maliciously injecting code (through bad acting). But if popular cookbooks could be completely removed and their namespaces obtained and new versions uploaded, then actual attackers could cheaply take over popular cookbooks and inject code without having to go through the time consuming process of building trust in the first place. This is a real risk in the case of the left-pad-chaos since it looks like npm is pretty much a wild-west and the user pulled the plug on over 250 modules leaving all kinds of gaping holes that attacks could upload code into to exploit. The ability for a supermarket admin, with privs, to nuke cookbooks and cookbook versions via the UI is probably a useful feature, that would allow private supermarket admins to delete versions -- but exposing that feature to users I don't think can ever be allowed to happen. |
I feel like there's a subtle distinction between removing a cookbook from Supermarket and having it removed from the Chef server that your nodes depend on. The two comments above make them sound like the same thing, but actually, removing from Supermarket does not take the code away from any Chef server that already has it. I feel like we should be sure to make that explicit in the debate, since production infrastructure depends on the cookbook artifacts being on Chef server, not on Supermarket. Unless I've missed something obvious here about the architectures people are using? (entirely possible) As far as the 'line in the sand', I'm in favor of a yank/hide exposed to users, but reserving a true delete for Supermarket admins. I think that means I agree with @lamont-granquist 👍 |
Problem is that CI/CD and TK may be considered "production services" and taking them down may be unacceptable and they may depend directly on supermarket cookbook availability. |
I'd think if CI/CD caught an upstream issue (issue = missing cookbook), they're working as intended, and catching a breaking change. I don't expect upstream to ensure my own CI/CD pipeline is working. I'll leave it alone, since I think we agree on the details, but I feel like this is probably an interesting question worth exploring overall in the scoping & constraints of Supermarket as it grows/changes/evolves. |
I've witnessed people being fired over stopping the developer workflow inside of a company. |
Bumping this as we have users running into this. Currently both public and private Chef server allows Expectation: Also, to @nellshamrell list of points, I do not think deleted version should be blocked from reuse due to use case below Use case: (Pending testing command in 1st comment against latest version of supermarket.) |
+1, at least for onprem supermarket. Not having control over my own supermarket is frustrating, especially when someone makes a mistake and the fix is either version bump it and all of its contingent cookbooks or simply undo the version and release the fixed version. Also, the documentation on https://docs.chef.io/plugin_knife_supermarket.html#unshare does not work. Simply errors out when I try to remove a version. |
again 'hide' handles the 'oops this is a bad version do not use it' use case, and does not have the sharp edges of 'delete'. |
Yeah, you will need the nuclear option of really deleting a cookbook, for cases like putting credentials in the code, but that's about it and it should be restricted to a few admins. Deleting the artifact because it had a bug just causes a ton of extra pain. |
Good summary on why |
Noting that it was Deciderated in RFC72 Artifact Yanking how Supermarket should proceed with artifact removal/hiding. I'll move this issue up in the queue for us to consider next week for the next round of work. |
OK, cookbook removal is disallowed now, but I don't see a command for hiding versions.:
So, how the version could be hidden if needed? |
@voroniys Hiding cookbooks as described by the RFC has not been implemented. It is not clear at the moment when the feature will be available. |
would be nice to be able to have a checkbox or something on each entry of the drop-down containing the list of versions of the cookbook along with the ability you delete selected versions.
Apparently this is supposed to work from command line.
Also, this might be a stretch, but having a max number of cookbook versions be something we set could be good. This could help with berkshelf's dep resoultion problems (seems to do an exhaustive search). Might help with berkshelf/berkshelf#1356
The text was updated successfully, but these errors were encountered: