Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use Berkshelf to manage Organization repo like the Librarian-chef does? #535

Closed
millisami opened this issue May 18, 2013 · 77 comments

Comments

@millisami
Copy link

I'm a huge fan of Berkshelf and I've released few community cookbooks using it and its awesome.

Now, I'm starting a new chef project and I went ahead with Berkshelf for this too.

But I'm finding some confusions/difficulties using it for the project.

Following is in the Berksfile:

site :opscode

cookbook 'mediawiki', github: 'millisami/chef-mediawiki'
cookbook 'sp-mediawiki', path: 'site-cookbooks/sp-mediawiki'

I've generated my application cookbook inside the site-cookbooks folder.

When I do berks install, it errors out:

An error occurred while reading the Berksfile: no metadata.rb or metadata.json found at \
/Users/millisami/Code/chef-sp/site-cookbooks/sp-mediawiki

Now I'm wondering where do I generate my application sp-mediawiki cookbook?

If just create a new one berks cookbook sp-mediawiki, it will be similar to the library cookbook.

This sort of flow is perfectly done using librarian-chef which I am using on another project.

So, I'm trying to put a line that:

  1. Berkshelf is good to develop individual cookbooks
  2. Librarian-chef is good to manage the top-level orchestration

Am I right/wrong? How you folks use Berkshelf to manage your Org's chef-repo?

@sethvargo
Copy link
Contributor

@millisami If you have a Berksfile at the top of your chef-repo, you can you Berkshelf to manage all your cookbook dependencies for you.

For example, let's say I'm writing an application cookbook that depends on the apache2 cookbook. I would add to my Berksfile:

site :opscode

cookbook 'apache2', '~> x.x.x' # optional version constraint

And run the berks install command to install this cookbook. Because it's a library, it's installed on your machine "somewhere" and you shouldn't bother finding it. Now, you generate your application-cookbook (let's call it my-apache2):

$ berks cookbook my-apache2

And this will create the skeleton for you. Then you can add apache2 as a dependency on this new cookbook in the metadata.rb:

name 'my-apache2'
# suppressed
version '1.0.0'

depends 'apache2'

And your directory structure looks like:

chef-repo
  |  Berksfile
  |_ cookbooks
    |_ my-apache2

Notice the apache cookbook is not there. The library cookbooks all live in ~/.berkshelf/cookbooks, but you shouldn't worry about that. They are automatically pulled in and added to your path for you.

If another teammate wants to use your chef-repo, simply have them run berks install and all the necessary dependencies will be installed on their machine as well.

When you run commands like berks upload, Berkshelf will automatically find and resolve all the necessary cookbooks for you.

Does this make sense?

@reset
Copy link
Contributor

reset commented May 19, 2013

@millisami it is also worth noting that it we strongly recommend against managing your cookbooks in a single Chef repository. You can achieve this behavior as @sethvargo pointed out above, but it's a bad practice.

Each cookbook should have it's own Git repository, build process, and test suite. Cookbooks should be treated as software projects of their own.

@sethvargo
Copy link
Contributor

@millisami, @reset is correct. This is not a recommended workflow.

@millisami
Copy link
Author

Thanks for the clarification.

So, the supposed structure pointed by @sethvargo looks like:

chef-repo
  |  Berksfile
  |_ cookbooks
    |_ my-apache2

Alright, but does that mean to have my app deployment chef repo structure can be:

chef-repo
  |  Berksfile
  |_ cookbooks
    |_ my-wordpress
    |_ my-rails
    |_ my-sinatra

Or does it just apply for the library cookbooks only?
If not, then do I've to have single repo for each of my:
my-wordpress and my-rails cookbooks ?

@sethvargo
Copy link
Contributor

Yes, each cookbook should be it's own repo, just like each RubyGem should be it's own repo.

This message was sent from my mobile device.

I apologize in advance for any typographical errors or autocorrections.

On May 21, 2013, at 12:11 AM, millisami notifications@github.com wrote:

Thanks for the clarification.

So, the supposed structure pointed by @sethvargo looks like:

chef-repo
| Berksfile
|_ cookbooks
|_ my-apache2
Alright, but does that mean to have my app deployment chef repo structure can be:

chef-repo
| Berksfile
|_ cookbooks
|_ my-wordpress
|_ my-rails
|_ my-sinatra
Or does it just apply for the library cookbooks only?
If not, then do I've to have single repo for each of my:
my-wordpress and my-rails cookbooks ?


Reply to this email directly or view it on GitHub.

@reset
Copy link
Contributor

reset commented May 21, 2013

@millisami the suggested structure is to completely remove the idea of putting Cookbooks in your Chef Repo all together. Every cookbook would be contained within it's own Git repository and every cookbook has it's own Berksfile.

From there we suggest creating a build job on a CI server for every cookbook. This job would test and then upload the cookbook it is managing to your Chef Server.

I'm going to close this ticket since it is not an issue. We can of course continue discussion here if you need more help!

@reset reset closed this as completed May 21, 2013
@millisami
Copy link
Author

Alright, but still I'm not clear enough here.

If the application cookbook are separated into its own cookbook, but I should still have one repo that orchestrates the other app/library cookbook. i.e. It should have some .chef/knife.rb with the chef-server creds files.
So in which cookbook these creds will go?

Is there any example cookbook whipped up so I could get my head clear.

@tknerr
Copy link
Contributor

tknerr commented May 22, 2013

Fully agree that each cookbook should live in its own git repo and I think @millisami does too.

But that does not mean that a chef-repo is useless or not needed anymore - its just that the cookbooks should not live in there but rather be pulled in via dependency management tools like Berkshelf or librarian. Do we all agree here?

What I'm missing both in Berkshelf and librarian is the ability to dependency-manage application (aka top-level) cookbooks, as they need special treatment imho.

Let's take a simple example:

  • given you have my-app-foo and my-app-bar as the two application cookbooks in chef-repo/Berksfile
  • in my-app-foo/metadata.rb you have depends 'apache2', '= 0.9'
  • in my-app-bar/metadata.rb you have depends 'apache2', '= 1.0'

As of today you can not resolve the two application cookbooks in one chef-repo/Berksfile due to their conflicting library cookbook dependencies. But this limitation is totally artificial, because my-app-foo and my-app-barare both application cookbooks, i.e. each one is installed on its own node and thus they (and their library cookbook dependencies) are totally independent from each other.

I know that this also a limitation in the chef-repo structure assuming a single consistent set of cookbooks in the
chef-repo/cookbooks/ directory.

@millisami: I think we have exactly the same problem. My current approach is give dependency resolution of the application cookbooks in the chef-repo a special treatment:

  • instead of chef-repo/Berksfile I have a app-cookbooks.yml file which specifies the app cookbook's git location and branch
  • then I use a small script to resolve the application cookbooks. It basically does this:
    • git clone -b <branch> https://your.gitrepo.com/<app-cookbook>.git /var/tmp/<app-cookbook>-<branch> (where <branch> is version tag, e.g. v0.1 and <app-cookbook> the name of the cookbook, e.g. my-app-foo)
    • cd /var/tmp/<app-cookbook>-<branch>
    • berks install --path /path/to/chef-repo/cookbooks/<app-cookbook>-<branch>

This assumes that you have a Berksfile (or Cheffile) in your app cookbook, which is fair to assume in this context I guess.

Note that this is only half of the story - now you have a structure like this...

chef-repo
  |  app-cookbooks.yml
  |_ cookbooks
    |_ my-app-foo-0.1
       |_ apache2               (--> v0.9)
    |_ my-app-bar-0.2
       |_ apache2               (--> v1.0)

...and you need to adjust your cookbook paths (e.g. in knife.rb, Vagrantfile, solo.rb, etc..) accordingly to pick up only the specific dependencies per application cook.

While this is something that works for me now, it seems that others having the same problem. I don't like having a special hack that works for me, but I'd rather see this integrated in Berkshelf, especially as Berkshelf / @reset introduced the differentiation between application and library cookbooks, which is an essential one IMHO.

But: as of today, Berkshelf is good at resolving library cookbooks but lacks the capability to properly resolve application cookbooks.

Would you agree with this view?

@reset
Copy link
Contributor

reset commented May 22, 2013

@tknerr what you've got is close, and if it works for you then stick with it, but the problems you are having are because you've missed an important point.

Cookbooks should not live in your Chef repository - at all. Your Chef repository should never contain a cookbooks or a site-cookbooks directory, even if it's automatically generated by a Berksfile. Your Chef repository should never contain a Berksfile, either.

There are a number of different ways to put things together, though, so you see quite often people who continue to place a Berksfile at the root of their Chef repository to resolve/vendor their cookbooks into their Chef repo. This idea needs to be let go of.

Cookbooks should have their own CI (build) process where they are tested, version number is incremented, and then they are uploaded to your Chef server(s). The Chef repository is not a part of this process at all.

Is there a reason in particular that you feel you need to vendor all of your cookbooks into a cookbooks directory in your Chef repository?

@sethvargo
Copy link
Contributor

@reset I think @millisami and @tknerr are just looking for the answer to the question "where do I run berks" - if I'm understanding correctly.

Jamie is correct - it's ideal that each cookbook is it's own repository and is tested, uploaded, and verified by a build server (Jenkins).

However, if you don't have such a pipeline established 1. you should investigate doing so, because it's awesome, but 2. you can use the "monolithic" chef-repo pattern as a "management console". It would have a Berksfile, .chef/knife.rb, environments, etc. All of your cookbooks would be defined in a Berksfile:

cookbook 'a'
cookbook 'b'
cookbook 'c'
# ...

And each of those would point to an individual repository or path location. From here, you could use Berkshelf to package, upload, version, etc. Does that make sense?

There's no need to vendor the cookbooks.

@tknerr
Copy link
Contributor

tknerr commented May 22, 2013

@reset: you are right, I can't let go of that idea of having it all in one place :-)

The main reason is that I want to describe my infrastructure in code, and the chef repo as the single point of entry seems to be a good place for it. Having a top-level Berksfile in the chef repo is nice as well as it gives me an overview of which applications are part of my infrastructure, and it allows me to easily resolve them into the file system for inspection and for use as cookbooks_path in vagrant, knife-solo, mccloud, knife-ec2, etc...

Cookbooks should not live in your Chef repository

...is a strong opinion - not sure if I agree with that yet, but I definitely have to consider and think (and sleep :-)) over it.

@sethvargo: I'm totally aware that the typical use case is run berks * from within the coobkook's repo. I'm just thinking that having a top-level Berksfile in your chef-repo for resolving application cookbooks is a valid use case too.

you can use the "monolithic" chef-repo pattern as a "management console"

I wouldn't call it "monolithic chef-repo pattern" as the cookbooks are not version controlled in the chef-repo - they are just pulled in as dependencies, but that's just wording I guess...

And each of those would point to an individual repository or path location. From here, you could use Berkshelf to package, upload, version, etc. Does that make sense?

Nope, I wouldn't want to package or version cookbooks from here. That feels wrong to me and should be done in the individual cookbook repos. My main concern is to resolve / pull in the dependencies as specified in Berksfile and treat them as read-only. Uploading all cookbooks that are part of my infrastructure / chef-repo would be a valid use case too.

I think vendoring gems in a Rails app is a good comparison. I'm not into Rails, but I guess the motivation for vendoring in rails would be similar.

But coming back to your example Berksfile above @sethvargo: the crucial point is that currently this does not work properly:

  • what if cookbook a in its metadata.rb depends 'd', '=1.0'...
  • ...and cookbook b in its metadata.rb depends 'd', '=2.0'?

@ivey
Copy link
Contributor

ivey commented May 22, 2013

Which is why you don't want to do what you're doing. If some portions of your infrastructure use 1.0, and some use 2.0, you can't just use one "for your infrastructure". You need both. Berkshelf can't fix that for you.

@tknerr
Copy link
Contributor

tknerr commented May 22, 2013

@ivey: I partly disagree here

If we consider a and b to be library cookbooks referenced in some application cookbook's Berksfile, then I fully agree with you, then a and b can not have a conflicting dependency on d

If we consider a and b to be application cookbooks referenced in an "infrastructure's" Berksfile, then I disagree with you, then a and b are totally independent of each other (i.e. completely separate dependency graphs) and represent the applications to be installed on two different nodes. In this case it is perfectly valid that on the a node we use a different version of d than on the b node.

There's nothing Berkshelf has to fix for me, but it would be nice if Berkshelf would support this case, especially considering the fact that the application vs. library cookbook pattern became popular with Berkshelf originally.

To put it bluntly: why would I have the distinction between application and library cookbooks if I can not reap its full benefits?

@reset
Copy link
Contributor

reset commented May 22, 2013

@tknerr I think that there is some confusion in the community as to what the difference between an Application cookbook and a Library cookbook is. Let me try to explain:

An application cookbook is a cookbook which installs and configures an application. my-webapp is an application, but so is NGINX. Just because the my-webapp cookbook might depend on the NGINX cookbook to setup an http proxy for it, doesn't mean that NGINX is a library cookbook.

A library cookbook is a cookbook which provides another cookbook with additional functionality through LWRPs, Definitions, and Libraries. A good example of a library cookbook is the database cookbook. Typically these cookbooks don't include recipes, and if they do, it's probably just duplicating work that another cookbook should be doing.

Does this clear anything up for you?

@tknerr
Copy link
Contributor

tknerr commented May 23, 2013

@reset: thanks, that clears up a lot of things :-)

Yep, agree that there is some confusion or at least a different understanding and different terms being used in the community (application cookbooks a.k.a. role cookbooks a.k.a. top-level cookbooks?). Maybe this is because an "official" definition was never published (did I miss that?) - or it was published by different people with different interpretations... ;-)

I for myself have a strict definition, but different from yours (so I should probably stop calling it application/library cookbooks). For me the properties of the two types of cookbooks are:

  • my definition of application cookbooks:
    • represent a whole node to be installed/managed
    • very specific, not likely to be reused due to its focus on managing a complete system
    • "combines" a set of library cookbooks into a coherent whole
    • uses strict versioning of dependencies (e.g. "= 1.0.0"), including transitive ones(!), so that the exact same set of library cookbooks is used on this node, even years later
  • my definition of library cookbooks:
    • represent a single piece of software to be installed/managed on a node
    • generally useful, highly reusable due to its focus on managing a single piece of software
    • may depend on other library cookbooks
    • uses lax / optimistic versioning of dependencies (e.g. ">=", "~>", etc or no version constraint at all) so that it can be easily combined with other library cookbooks

So by my definition probably all of the community cookbooks are library cookbooks. I don't care whether they carry only libraries / LWRPs or recipes as well. The important distinction for me is the fact that they are created to be reused and usually one such cookbook is never makes up for a whole system.

On the other hand you have the application cookbooks to set up a very specific system by reusing as much as possible from the library cookbooks, but this system is so specific that it would unlikely be reusable in other projects / customers / companies.

P.S.: I don't want to convince anyone that one definition / view is better than the other, but I think its good to make it explicit (and note to self: stop calling it application / library cookbooks)

P.P.S.: I slept over it and it became a bit clearer why I want to vendor my "application cookbooks" to chef-repo/cookbooks: I'm using chef-solo, thus I have no chef server, thus I don't use berks upload, thus I need the cookbooks on the filesystem => totally different workflow, not compatible with "The Berkshelf Way"

@reset
Copy link
Contributor

reset commented May 23, 2013

@tknerr I did speak at ChefConf about these patterns and have been openly talking and writing about them for the last year or so, but I don't think there's an "official" definition. I don't think we'd ever see something like that since Opscode doesn't want to prescribe solutions for us.

This actually makes communication difficult within our community. When I use words which are labels on ideas that have been reified as something for me and I use those words to express something to you, things just end up coming out all crossed and confused. Do you think it would be helpful if I were to create some sort of blog post/article about the three patterns I went over at ChefConf? Maybe you could use that as a base to put your ideas on top of? I'm sure there are patterns which we have not yet discovered or the ones which we have may not be 100% fleshed out.

I sent you a response on the Chef mailing list regarding your use-case. I don't actually think Chef-Solo is a non-starter for "The Berkshelf Way". I love Chef-Solo and use it every single day. I setup a Chef Server to act as nothing more than an artifact server for my cookbooks even if I wasn't planning on ever using Chef-Client. It's just so easy to do.

@tknerr
Copy link
Contributor

tknerr commented May 23, 2013

@reset thanks for the pointer, I should watch your ChefConf talk I guess :-)

Do you have a link to the slides as well?

Regarding chef-solo, I still have the need to vendor the cookbooks. This might be because I'm not using only vagrant (which has berkshelf integration) but also knife-solo and mccloud. For the latter ones I need to specify 'cookbooks_path' explicitly and can rely on berkshelf integration magic :-/

But that's probably very specific to my use case...

@reset Thanks for the clarification!
@millisami sorry for hijacking the issue - hope your initial problem got resolved?

@reset
Copy link
Contributor

reset commented May 23, 2013

@tknerr you can see the talk here. The slides haven't been uploaded yet by Opscode. No problem man, I hope this info helps a bit even if you don't immediately use it or need it ;)

@tknerr
Copy link
Contributor

tknerr commented May 24, 2013

@reset thanks! I have watched the talk / slides - very nice!!

The application / library / wrapper cookbook distinction got clearer to me and I think our mental models are not that far away from each other.

From your myface example it looks though that application cookbooks are the top-level cookbooks representing the whole system on a node. I like this definition because it gives a clear distinction between application and "other" cookbooks.

For the "other" cookbooks I'm still missing a good name. I initially thought that these must be the "library cookbooks" then, but I see now that library cookbooks are just a special case of "other"...

So much to my confusion.

Trying to find out whether our mental models have chance to converge, two questions for you:

  • if "application cookbooks" were strictly defined as the top-level cookbooks representing a node - would that restricted view still fit into your world?
  • if so, would you have a good name for the "other" cookbooks that are being reused by the application cookbooks (and where "library" and "wrapper" are two special cases of)?

@tknerr
Copy link
Contributor

tknerr commented May 24, 2013

P.S.: and heck yes I think we should write a blog post, however that discusison ends ;-)

@millisami
Copy link
Author

Me too had the same mindset as @tknerr described in the above #535 (comment) comment.

@reset Yes a blog post would be wonderful to change the mindset what to call what issue?

@millisami
Copy link
Author

I re-watched the talk Berkshelf way http://www.youtube.com/watch?v=hYt0E84kYUI

This time I'm pretty sure what @reset was trying to say.
The myface cookbook had the components for load_balancer, app_server, etc... and it makes perfect sense to go that way.

But the limitation I found was that its suitable for a single app serving instance.

What if I want host another app, say a php app or even another rails app named your_face

So in this situation, I have to create a new cookbook berks cookbook your_face and have to configure it with chef-server creds again.

How to use Berkshelf is such scenario?

@reset I searched for the myface cookbook and found this https://github.com/reset/myface-cookbook
But it doesn't have any recipes that was shown during the talk? Is it elsewhere?

@sethvargo
Copy link
Contributor

@millisami - you can put a single knife.rb in ~/.chef/knife.rb with all your credentials and it will use those.

@millisami
Copy link
Author

@sethvargo That seems pretty limiting for me since I'm using 1 Opensource Chef-Server and 2 Opscode org account.
So in such case just putting only one creds at ~/.chef/knife.rb won't be practical.
I've to change that file file frequently.

And @sethvargo do you know where those myface @reset 's cookbook repo is?

@johntdyer
Copy link

Check out knife-block, it allows u to have multiple knife configs.

-John Dyer
m. +1.407.474.c0214

e. johntdyer@gmail.com

Sent from mobile

On May 26, 2013, at 5:12 AM, millisami notifications@github.com wrote:

@sethvargo That seems pretty limiting for me since I'm using 1 Opensource Chef-Server and 2 Opscode org account.
So in such case just putting only one creds at ~/.chef/knife.rb won't be practical.
I've to change that file file frequently.

And @sethvargo do you know where those myface @reset 's cookbook repo is?


Reply to this email directly or view it on GitHub.

@sethvargo
Copy link
Contributor

@millisami To the best of my knowledge, the myface cookbook is a fictitious. It's only meant to serve as an example.

I would look into what @johntdyer recommended with knife-block for using multiple knife configs.

@jgerry
Copy link

jgerry commented Jul 31, 2013

I just wanted to say -- this is the best discussion of these ideas that I've been able to find anywhere. It should absolutely be a blog post somewhere. I may start writing one.

I'm looking at moving our Chef workflow into Berkshelf from a monolithic Chef repo, and I'm pretty convinced the cookbook-per-repo pattern will work for our needs. Cool stuff.

@RSO
Copy link

RSO commented Aug 6, 2013

@jgerry I was thinking the exact same thing, but wouldn't a wiki on the berkshelf repository be more suiting?

@corbesero
Copy link

I understand that. But I am often working in several environments and projects at the same time, each with their own chef server. A single cookbook might be used in more than of these environments. i have always worked out of the chef-repo to take advantage of its .chef/knife.rb to know which chef server to manage. It seems to me that a cookbook in its own project space is (and should be) independent of a single chef server.

@sethvargo
Copy link
Contributor

@corbesero #580. How would librarian solve this?

@seanodell
Copy link

Sorry for resurrecting this ancient thread, but this seems the most relevant place to ask.

I work with an org which does NOT use Berkshelf and has a lot of cookbooks in a single git repo. I want to use Berkshelf to cherry pick the cookbooks out of that repo. I've tried specifying both git: and path: in my Berksfile, but Berkshelf always balks at the repo not having a metadata.rb at the root. I've played around a bit, and I can't figure out how do this short of git cloning the repo down as a separate step and simply using path: to locate the cookbook.

I don't have a lot of say in how that org manages their repo, so rather than take on that battle, I was hoping Berkshelf could make this work. Any ideas?

@yoshiwaan
Copy link

I'm not sure where else to ask this question either, and considering this seems to be the place of the best discussion around this I guess I'll ask it here.

I'm fully on board with having cookbooks be isolated objects with their own dependencies, using Berkshelf and testing cookbooks in isolation, but the manager baulked here when I suggest having one repo per cookbook, stating his previous experience where managing multiple repos for related code has proved to be a hinderance not an advantage (for general software development, he's zero experience with Chef). He's told me I need to convince him otherwise if I'm do it with one cookbook per repo.

The other way around this that I can see is to simply use directories under the cookbooks repo to do the same thing. So you would have a structure like this:
-cookbooks_repo
--company-apache2 (wrapper cookbook)
---Berksfile
--company-php (wrapper cookbook)
---Berksfile
--company-cookbook (role containing cookbook)
---Berksfiles
---recipes\webserver.rb
---metadata.rb (pointing to company-apache2 and company-php)

The Berksfiles for the downstream cookbooks could reference the others by git ref using rel: option for a directory. Each cookbook would also have it's own Vagrantfile for testing.

I guess the problem I'm having is I don't really see the disadvantage of this myself, so can anyone help inform me why I should use one cookbook per repo instead of managing them in directories like this (but still treating them as essentially separate objects otherwise).

I should also add that these are all internal cookbooks that are company specific, they would never be shared in the community.

@flatrocks
Copy link

@yoshiwaan, despite all the cheerleading for one-cookbook-per-repo, the only undeniable advantage is that Berkshelf and other tools expect it.

I know where your boss is coming from... it is a PITA to deal with a bunch of repos when one would do just fine, especially during development when (despite best efforts) everything's changing a lot. Consider the extra workflow steps for tracking down and fixing a problem in a cookbook that's a dependency to another of your cookbooks. What could have been a single commit probably affects three repo's, requiring an exponential(? this may not be a mathematically rigorous statement) amount of git activity.

But my guess is that as time passes, alternate tools will fall behind and you'll spend more and more time trying to make things work under any other architecture. There's a lot of money and manpower behind the tools for doing this the "official" way, and I think that's the best argument.

@reset
Copy link
Contributor

reset commented May 4, 2015

@flatrocks It's not fair for me to come in and say this since I'm the principle author of Berkshelf and the person who pushed this pattern into the spotlight, but just because tools expect one repo is not where the advantage lies. The tools were designed around this pattern for a reason to begin with.

@yoshiwaan @flatrocks cookbooks are software and, as such, should follow the same rules and principles that we employ in other software projects. Keeping all of your cookbooks in a single repository breaks the Single Responsibility Principle. It adds noise to the commit history and makes the assumption that all developers should be granted access to all cookbook projects.

A lot of the modern tooling makes the assumption you are coming from a software engineering background or that you are ready for a taste of it. Chef is a big toolbox, however, and you can put things together in anyway that you would like. I personally would recommend sticking to the most travelled path - it works well.

@yoshiwaan
Copy link

Thanks Jamie. I looked around for your official reasoning behind this and couldn't find it so it's good to hear it from the hors.. developers mouth.

Access control is definitely something I'd not considered. When I sat down and though about it in depth I came up with quite a few other reasons to use one cookbook per repository, most of them fairly pragmatic. The point @flatrocks makes was pretty much at the top of the list.

  • The tools work that way, so it's a path of least resistance once set up
  • There's more help available as more people have implemented this way
  • Your git tags match your cookbook version, plus there's tooling to help with that, e.g. https://github.com/RiotGamesMinions/thor-scmversion, where both your metadata file and build job can reference the version file
  • You can write generic build jobs for all cookbooks as they have the same structure, there's no unique references or paths. Plus if there's a separate org in SCM to house cookbooks you can have your build server do them all without having to manually add anything
  • At some point SCM tags/branches matching cookbook version metadata SemVer numbers might be usable to parse multiple versions of a cookbook. For example in a Berksfile you could point to a github repo for a cookbook and then any tag that matches SemVer could be parsed and understood to be a particular version of a cookbook.

@reset
Copy link
Contributor

reset commented May 5, 2015

@yoshiwaan I think you've got a great list there. @flatrocks definitely makes a great point that was worth explaining why a lot of tools are built the way that they are.

@bridiver
Copy link

My big issue with the one repo per cookbook is the sheer number of repositories that you now have to manage. It seems like a lot of added complexity and after reading through I'm not seeing anything that couldn't be done just as easily with a single repo. If we keep them separate, now we have to setup CI, phabricator, etc.. for every single one of them individually. We've looked at similar things in the past like keeping application code in one repo vs multiple repos with gems and development and debugging quickly become a nightmare. Of course there are big advantages to being able to pull in git-based cookbooks, but within an organization splitting them out just seems to complicate development. If you need one somewhere else you can easily extract it, but why do that proactively if it may never be necessary?

@bridiver
Copy link

I also don't agree with "Keeping all of your cookbooks in a single repository breaks the Single Responsibility Principle". We don't keep each class or even module of our application code in it's own repo, so why is it a problem here? I think applying the single responsibility principle to a repository is really stretching its meaning.

@sethvargo
Copy link
Contributor

Hi @bridiver

I would recommend taking your thoughts to the Chef mailing list or open a Chef RFC if you disagree with this approach. Thanks!

@reset
Copy link
Contributor

reset commented Jun 25, 2015

@bridiver you don't keep a class or module in it's own repository because it is a component of a project. Project's, like a cookbook, are an umbrella for those components. Best practices in version control systems suggest that you version control things at the project level.

You can organize your projects however you'd like, you absolutely don't need to follow any documented best practices. Sometimes by going off the well travelled road you'll land on something which becomes a best practice.

Three years ago that's what the Berkshelf project did. It won't ever support the single repository pattern as a first class citizen because the tool and it's maintainers don't believe that is a best practice. You can absolutely use the tool to manage your cookbooks however you'd like, though!

If I'd recommend reading through The Pragmatic Programmer which was a big influence for me and helped direct the development of this, and all of the other open source tools I've created or been a part of.

@bridiver
Copy link

But that's my point. We have a bunch of cookbooks and they are all part of a coherent project and I think that's the case more often than not. It feels like this is taking a pattern that might be necessary for a very large organization with lots of different unrelated projects sharing cookbooks and access control issues and pushing it on everyone. Of course everyone is free to do their own thing, but I think this pattern should be considered in the context of your organization and not held up as the "best solution". It's easy to extract cookbooks into their own repositories down the road if that becomes necessary, but otherwise wouldn't it make sense to start with something easier to manage?

@reset
Copy link
Contributor

reset commented Jun 25, 2015

@bridiver it may appear that they are all part of the same project, but they aren't. You have one cookbook which is part of that software project and then you have dependencies of that. The dependencies, like any other software library, can be consumed by other projects.

If you are coming at this from a systems administration perspective then it is very easy to see a mass of cookbooks as being "one project". They just aren't.

@bridiver
Copy link

@reset that is simply one way to think about it and it's not realistic to say that you have the one right view that applies in every case. I am not coming at this from a systems administration perspective, I am coming at this with substantial experience as a software developer. It's fine for software to be opinionated, but IMO good tools allow you to swap out those opinions.

@reset
Copy link
Contributor

reset commented Jun 25, 2015

@bridiver I think it's important to address the idea that "this is hard" or "we should grow into this slowly" before we move anywhere else with this discussion. I'm not part of a large organization by any means - we have 12 engineers. Versioning software with dependencies that include projects created internally to an organization is something that software engineers are very used to in 2015.

@reset
Copy link
Contributor

reset commented Jun 25, 2015

@bridiver like I had said before, you are of course free to follow whatever you believe is the right path for your organization. You can absolutely use Berkshelf to manage a single repository - there are a number of blog posts and guides that you can follow on how people are doing this. We just aren't doing you any favors by making it a first class citizen.

@bridiver
Copy link

@reset I'm not saying "grow into this slowly", I'm saying that it's added complexity that is unnecessary in a lot of cases. Like anything else it has its advantages and disadvantages and not everyone wants to make the same tradeoffs

@yoshiwaan
Copy link

As someone who went through this whole process and had to make decisions about how to implement without fully understanding everything I think I'm in a decent position to comment on this.

I ended up following the best practices recommended here, namely having a single repository per cookbook, and I'm finally seeing the full value of making those decisions. There is definitely an extra outlay of work initially but now that's in the past I just don't see how I could have ended up in such a good position.

The advantages for me are:

  • Development is easy
  • Integration with a build server is easy
  • I have tags to match cookbook versions
  • Everything gets tested in right place

Didn't see full benefit until I moved to using a build job for cookbooks. Because each cookbook is a single repo it means I can use the same build script for all of my cookbooks, with the only variable being the repository to build. The same test suite (foodcritic -> chefspec -> server spec) get run every time for every cookbook. I also get to tag my repository with the current cookbook build version as a part of the build, so I have an easy reference in github for my versions.

When I want to deploy a new version of a cookbook I make my changes, push to a certain branch and my build server automatically does it's thing and I can go and do something else. If I want to get more clever in the future I could have my environment cookbook trigger a build (which is a slightly different job) from a successful upstream cookbook build and have that available with a new version too, I have the option. Right now I do that manually when I need and all the new changes applied by other cookbook builds get tested in one go when the tests for that cookbook run.

Technically this is 'added complexity' because behind the scenes there's plenty going on, and it was a good investment of time to set up, but in terms of ease of using it's never been easier for me or anyone else here to develop a cookbook. Everything works in such a simple way from the developers perspective and I don't see how that could have been possible having a bunch of cookbook directories in my repository. That being said a single repo can definitely work and it's something I seriously considered, you can put up artificial walls so each cookbook is maintained individually, but it just seems sub optimal in the long run as you're essentially trying to do the same thing you can do with a single repo per cookbook, but less successfully.

The most difficult part of the whole process, at least when you're new, is understanding why you're doing what you're doing in the first place as it takes such a long time and an intermediate understanding of all the components to see any benefit.

@bridiver
Copy link

That's interesting because development is one part I really worry about getting complicated. Sometimes you have to iterate over cookbooks to sort out issues with your application tests due to breaking changes with upgrades, configuration, etc... and there can be a lot of changes to sort out the issue. That's fairly simple when you don't have to commit things to test them, but it can get tedious when you have to commit, tag, update dependencies, etc... because your cookbook tests won't catch things you didn't know you needed to test. I have seen plenty of examples of workflow with Berkshelf for developing a cookbook, but it seems like the workflow for dealing with unexpected issues when running application tests again VMs configured with these cookbooks would be cumbersome. Is there a workflow that doesn't involve repeated commits to the remote repo or making manual changes and then sorting back through them to update the cookbook?

@yoshiwaan
Copy link

I'm not entirely sure I get what you're asking...
I use test kitchen to iterate on a test instance of the same type as where the cookbook will be deployed to, running kitchen on my local machine when developing. Running kitchen converge will apply the current cookbook to the node and berkshelf will pull in dependencies as needed if the cookbook has a Berksfile. If you use the chef_zero provisioner then you can replicate a chef-client run and if you use a cloud service then there are test kitchen plug ins to run spool instances there.

I only push something to build when I've sorted out the kinks.

Maybe when I say developing is easy I should say the whole flow of development to deploy is easy.

@bridiver
Copy link

An example from the past was upgrading from Postgres 8.x to 9.x. There were breaking changes in how Postgres handled timezones and we didn't discover them until we ran the app tests against the VM configured with the updated cookbook. Figuring out what needed to change took some back and forth with changes to both the app code and the cookbook. Since we had everything in a single repo for chef, the whole workflow only involved local changes to the chef repo and the app repo and wasn't pushed out to either until everything worked.

@sethvargo
Copy link
Contributor

@bridiver you might find this post interesting: https://sethvargo.com/berkshelf-workflow/, it details the workflow @yoshiwaan is discussing a bit more. I think it would suit your upgrade of postgres just fine.

@glenjamin
Copy link

Sorry for dredging this ol' chestnut up again, I'm building up a new project and facing the how-many-chef-repos question - and this thread seems to be a font of knowledge on that front.

My context is a relatively small "ops" team ~5 people, with a medium size dev team (~40), building, deploying and running a co-ordinated suite of applications.

Things I'm sold on:

  • Having library cookbooks be in their own repo (ideally on github, or githubbable)
  • Having application cookbooks be in their own repo (ideally on github, or githubbable)
  • Having an "infrastructure repository" which contains data_bags, roles, and environment files

The bit I'm strugging with is how and where wrapper & environment cookbooks fit into this. At the moment I've taken the path of least resistance, and included some cookbooks in the infrastructure repository along with Berksfile to pull in library and application cookbooks from supermarket.

The cookbooks in the main repo are mostly about doing things specific to our organisation and tying those together into a whole ecosystem. This is the bit where I'm struggling to see how to represent this across many independently versioned cookbooks.

When producing a bunch of independent but related software applications, the master chef repo has always been the piece I've used to stitch them all together.

Can anyone point me in the direction of some good examples of this done well?

@tknerr
Copy link
Contributor

tknerr commented Sep 10, 2015

@glenjamin my personal preference is to stitch the individual pieces of one application / server together in a "top-level cookbook" (terminology varies, it's probably the same as a "role cookbook").

This means in my infrastructure repository I can now use different versions of it:
https://github.com/tknerr/sample-infrastructure-repo/blob/master/Vagrantfile#L30-L31
https://github.com/tknerr/sample-infrastructure-repo/blob/master/Vagrantfile#L51-L52

Works for me but YMMV... ;-)

@glenjamin
Copy link

@tknerr Thanks! that's a helpful example.

The part I'm still strugging to get my head around is when an application sub-components are spread across many different nodes. My general preference would be to try and use searches for the bits of my app to find each other - is this still a recommended way?

Common examples which spring to mind: collectd deamons finding their server, application servers finding the database cluster, jenkins finding it's slaves. A load balancer having a bunch of otherwise unrelated microservices attached as independent vhosts.

@patcon
Copy link
Contributor

patcon commented Feb 1, 2016

Here's and issue related to something librarian-chef could handle, but Berkshelf can't atm, in case anyone here has any thoughts:

#1505: Allow using a test cookbook from a subpath of another cookbook

@Mister-Meeseeks
Copy link

Mister-Meeseeks commented Apr 29, 2016

Sorry to bump this thread. But Google seems to have deemed it the de facto discussion forum for Berksfile best practices. I'm not sure if I'm either missing some functionality in Berkshelf or am not grok'ing the underlying philosophy. (Don't have much Ruby(gem) experience, so that probably doesn't help). But let's say we have three cookbooks {X, Y, Z}. X depends on Y. Y depends on Z. X has only a transitive dependency on Z. However cookbook Z's Berksfile must contain this:

cookbook 'Y', path: /cookbooks/Y/
cookbook 'X', path: /cookbooks/X/

If the reference to X is left out, Berkshelf will raise a "missing artifact" fatality after it tries to evaluate Y's metadata.rb. Nearly every major package manager recursively evaluates dependency. Otherwise it's a major violation of the DRY-principle. What if the path to cookbook X changes? What if cookbook Y is modified and requires another dependency? Or we want to freeze Y's version of X? It requires editing not just Y's Berksfile, but also Z's. What if hundred of cookbooks are dependent on Y. Should we really have to edit 100+ Berksfiles just to modify one library cookbook?

I don't think DRY is an absolutely inviolable principle. But not evaluating transitive dependencies seems like a pretty limiting handicap for a package manager. One that doesn't seem accompanied by any substantial upsides outside a few corner cases. But I feel like I have to missing something obvious, right?

@berkshelf berkshelf locked and limited conversation to collaborators Jun 16, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests