Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a new YAML format for template settings #269

Closed
wants to merge 2 commits into from

Conversation

pfmoore
Copy link
Contributor

@pfmoore pfmoore commented Aug 21, 2014

This is the initial version of support for a YAML format settings file, to supersede cookiecutter.json. Before committing, I would like to refactor this further, to make the "internal" format completely encapsulated in settings.py so that client code need have no knowledge of how the settings are stored. But the PR is sufficiently complete that it's ready for initial review.

At the moment, it's only the context data that is in the settings, so the refactoring won't actually impact much, but it establishes the principle that following changes should follow. The main change at this stage will be in the tests, which currently know far too much about how the settings/context is stored.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.31%) when pulling bbab3e4 on pfmoore:new_config_format into a944be3 on audreyr:master.

@pfmoore
Copy link
Contributor Author

pfmoore commented Aug 21, 2014

The refactor is done for now. I'm aware that the function names in the settings module are a bit verbose. This is actually somewhere that I think would benefit from a class - used purely as a stateless user defined container, not as any sort of complex OOP design. I know @audreyr has explicitly disallowed class-based design in the core committers guide. I've assumed that prohibition would apply here - am I right?

To be clear, all I'm suggesting is an API that looks a bit more like:

settings = load_settings_from_file('name/of/file')
context = settings.context()
version = settings.version()

Nothing more complex than that - just saving a bit of typing and making the internal layout of the settings object explicitly opaque. But I won't do this without explicit approval, I just wanted to clarify the intention here.

Apart from this question, I think this is ready to go. The hard work of adding new features to actually use the flexibility can then start ;-)

@pfmoore pfmoore added this to the 0.8.0 - Extra Context milestone Aug 21, 2014
@khorn
Copy link

khorn commented Aug 22, 2014

This looks pretty good, though:

  1. As I mentioned in New config file format #249, maybe version should be a top-level key in the new format.
  2. What happens if you pass a settings object generated from a json file to get_version_from_settings? Wouldn't it return 2? And shouldn't it return 1? Or did I miss something?

Other than that, the settings code looks pretty good to me, though I haven't actually, you know, run it or tested it or anything. 😉

(Also, I didn't look too hard at the tests)

@khorn
Copy link

khorn commented Aug 22, 2014

@pfmoore Am i right in thinking that you intentionally left out something like get_config_from_settings?

with io.open(name, encoding='utf-8') as f:
# The JSON file just contains the context.
context = json.load(f, object_pairs_hook=OrderedDict)
settings = get_default_settings(version=1)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@khorn Here's where a JSON-format file gets a version of 1.

@pfmoore
Copy link
Contributor Author

pfmoore commented Aug 22, 2014

@khorn JSON files get a version of 1 in the line I pointed out with a line note. I see where your confusion comes from though. I'll see if I can modify the code to make it clearer what's going on - maybe apply the default in the loading code and not in the get function.

And yes, the lack of get_config_from_settings is deliberate. I'm expecting to add things like get_encoding_from_settings and get_root_from_settings as needed, so the fact that they are stored in a single config element is hidden from client code.

@khorn
Copy link

khorn commented Aug 22, 2014

@pfmoore Aha. That makes sense now.

@pfmoore
Copy link
Contributor Author

pfmoore commented Sep 11, 2014

@audreyr, @pydanny, @michaeljoseph - any comments on this PR? In particular, I'd like a ruling on whether having a settings object (the alternative API I mentioned above) would be acceptable.

This is blocking a number of other changes that depend on extra template settings, so it would be good if we could get it in.

@pfmoore
Copy link
Contributor Author

pfmoore commented Oct 5, 2014

@audreyr is there anything I can do to move this on? Could you comment on whether you prefer the function based API (which I think runs a risk that people will access the settings data structure directly) versus an explicitly encapsulated object API?

If you prefer the functional API, I'll look into adding some extra comments to the file and the docs to make it clearer that the settings data has to be treated as completely opaque outside of settings.py.

ctx = settings.get('context', {})

# Use the default_context argument to override values in the context
if default_context:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not obvious that a method called get_settings would also be responsible for merging context.
I would suggest moving the rest of this method to the calling code in cookiecutter.main

@michaeljoseph
Copy link
Contributor

@pfmoore, can you articulate why you want to prevent the settings data structure from being accessed directly? I can't see the benefit...

I'm 👎 on introducing a class for this (but I do think that stateless classes could be an exception to the rule, where it makes sense)

Otherwise, this pull looks good.
Needs a merge.

@pfmoore
Copy link
Contributor Author

pfmoore commented Oct 5, 2014

The idea behind making the settings opaque is that I want to protect us from needing to update all of the internal code if we introduce a new format. If we access the settings object directly, then the code is closely coupled to the data format, which is bad, as it means that a format change will potentially require extensive code changes. Consider for example if we need to add a new level of nesting in the file format. Ideally, we should be able to do that without having to modify any code outside of settings.py.

The main point of having a class would be to avoid ugly function names like get_context_from_settings. I'm pretty sure that someone will fairly quickly think the verbosity is unnecessary (like you did) and break the encapsulation.

If we don't mind close coupling of the file format with the code, this all goes away. Just do yaml.load on the config file, and change the code whenever the layout changes. But clearly I'm -1 on that.

I'll rebase the PR when I get a chance (or is merge considered better practice here?).

@michaeljoseph
Copy link
Contributor

🔔 merge is probably safest, but since (probably) no one else has your branch, I'm ok with rebasing.

@pfmoore
Copy link
Contributor Author

pfmoore commented Oct 23, 2014

@michaeljoseph Sigh. I've just had a look at this. It clashes hard with the "extra context" change, which is to "allow for injection of extra context via the Cookiecutter API" (from the history file). But this patch removes the generate_context function, which basically contradicts the whole idea that it's part of the "Cookiecutter API".

@pydanny and @audreyr both commented in the discussion on PR #260 that there were risks of a clash with this PR, and wanted to consider the changes carefully. It looks like the clash noted above was missed. It certainly never occurred to me that generate_context was a fixed part of the API, and I missed the implication while we were discussing #260 (my bad, I got distracted by the encoding and testing changes).

I'm tempted to say we drop this whole PR for now. My motivation for doing it was

But to be honest, I'm not likely to get the time or motivation to push this PR through and then go through the same process with the others as well, any time soon. So rather than leave this hanging, I feel like it would be better to close the PR and leave the idea for someone else to pick up.

Comments?

@michaeljoseph
Copy link
Contributor

Yeah, agreed, closing.

@michaeljoseph michaeljoseph removed this from the 0.8.0 - Extra Context milestone Oct 24, 2014
@pydanny
Copy link
Member

pydanny commented Nov 8, 2014

@pfmoore, I'm really sorry about not being able to properly address this pull request, not to mention some of your other incredible work. We've been very busy, but that's no excuse for not even explaining that we didn't have time to look at your work. I know an apology is only worth words, so let's see what I can do to fix things.

(This apology is also directed to @michaeljoseph, the two of you are awesome).

Going forward I have a bit of time to spend on cookiecutter. You might have noticed my activity on the project and the release of 0.8.0. If possible we want to push out 0.9.0 quickly, and continue this path of frequent, small, atomic releases.

Therefore, what we would like to do is focus an entire release on this feature (config changes) and related pull requests. We would focus 100% on this feature change (only bugfixes allowed otherwise). Personally I would love to see 0.10.0 focused on this effort (1.0 being when we drop support for Python 2.6).

This kind of focus is important because some of these new features touch on a lot of things.

If you want to work this feature (config changes in 0.10.0), then we'll (me, @michaeljoseph, @audreyr, et al) will focus on speedy, but thoughtful responses.

I hope this makes up for everything.

One more thing, the same idea of focusing on a particular feature for a release goes for other efforts championed by @michaeljoseph.

@pydanny pydanny reopened this Nov 8, 2014
@pfmoore
Copy link
Contributor Author

pfmoore commented Nov 8, 2014

@pydanny thanks for the reply. I appreciate your comments. I think the most crucial quiestions for this change are:

  1. Is it acceptable that we decouple the user interface (the cookiecutter.json/cookiecutter.yaml file layout) from the internal representation? I doubt this is controversial, but I'd like to be sure.
  2. Is it acceptable to use a class to encapsulate the internal settings format, (a) to avoid client code relying on the exact details of the internal format, and (b) to avoid long, ugly function names that no-one will want to use?
  3. The concept of a "cookiecutter API" came up in the discussion - what is that API? I don't see it documented anywhere, and I assume it's obvious that we can't assume that everything is fixed in stone.

I guess the whole point here is, I'm a huge fan of encapsulation - code should know as little as possible about the internals of other parts of the codebase, and all interaction should be via clearly defined interfaces. That doesn't necessarily mean class-based design, but exposing generic data types like dictionaries can make it needlessly hard to ensure encapsulation.

At the moment, the only clear boundary that matters to "ordinary users" is the CLI and the spec for writing templates. All other boundaries are only relevant to users of the code - i.e., the core developers. A programming API for cookiecutter implies we need to define further user-level interfaces.

All other (internal) encapsulation only matters to the core developers, and people who work on the cookiecutter codebase. But such internal interfaces are useful (IMO) because they act as clear agreements on how closely parts of the code are coupled. At the moment, things like the context dictionary are used throughout the code, and closely couple everything. I'd like to fix that. In the context of this change, we have to decouple the user input from the internal object (that's the point of the change!) and so I'd like to take the opportunity to enforce that decoupling internally at the same time.

Enough talk. I don't want this to be an endless debate, so if you can give me a general indication that the above ideas aren't totally unacceptable, and introducing a settings class won't get the change instantly rejected, then I'll put together a new patch (I'll probably just overwrite/rebase this one, so we don't need a new PR) and we can discuss an actual implementation.

BUT - can I please ask people to hold back on other changes once the new PR appears - the PR is bound to affect big chunks of the codebase, and it'll be hard to move this forward if I keep having to track conflicting changes hitting master.

@pfmoore
Copy link
Contributor Author

pfmoore commented Nov 8, 2014

Just to clarify my question about the API. The only documented function is cookiecutter.main. But there are a number of aspects of the code where functionality exists that is never used by the cookiecutter command. For example, generate_context decides on the name of the top level element of the context based on the name of the context file. But that file is only ever cookiecutter.json. So all that complexity is unused.

Are there 3rd party users of that functionality? Or, to put it another way, if I can no longer support that (or choose to not support it, or to support it differently) with this change, will anyone be affected?

context = generate_context(
context_file=context_file,
template_settings = get_settings(
name=context_file,
default_context=config_dict['default_context']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, that instead of replacing the generate_context method here, you should be calling your new method here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the whole point is that I want to isolate the management of the context from the reading of the file. generate_context only returns the context, so how would I retain all of the rest of the data from the settings file? Under the new format, the settings file will contain a lot more than just the context, and should not be managed by the context related functions.

I want one set of functions that reads the settings, and provides an API for client code to read the bits they need. Nothing outside of the settings module should be reading the file, or worrying about which bits of the settings object are the context, or anything like that.

generate_context is a bad function in the new model, because it does both reading of the settings file and managing of the context (merging the user config and extra_context data etc).

I may not have fully understood all of the implications of how the context works (as I said in the discussion about the cookiecutter API, there's a lot of complexity in here that's not used by the command line interface, and I don't know what use cases are as they aren't documented anywhere. Maybe we'll need some sort of compromise solution, but I don't know what until I know what can be refactored, and what is a user-visible API.

@michaeljoseph michaeljoseph added this to the 0.10.0- Config changes milestone Nov 9, 2014
logging.debug('context_file is {0}'.format(context_file))

context = generate_context(
context_file=context_file,
template_settings = get_settings(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am 👍 on encapsulating Templates as a module/class, that would allow us to write different template definition implementations (like Python?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think we're mostly agreeing, but I think you might not be expecting to split things down as finely as I would. I was looking at things like:

  • The template settings (context, encoding, top-level object(s), etc)
  • The reader (filesystem, git, hg, and we could add more like Python or zipfiles)
  • The renderer (Jinja, but simple templates might not need that - also "none" would be useful for binary files)
  • The template - but this just contains a settings object and list of files linked to their renderers. The template's purpose is mostly about organising calling functions in the right order.

A reader would produce a template object, and then "rendering" the template would go through its files rendering each in turn with its associated renderer.

At the moment, that's more complexity than we really need. I'd rather introduce the separation as needed - so at the moment we're working on the user interface for the settings, so it makes sense to encapsulate that. When we want to add a new reader, we factor out the reader functionality then. Similarly when we want to add renderers other than Jinja, we refactor that part.

@pydanny
Copy link
Member

pydanny commented Nov 10, 2014

Just a note that we'll be digging into this massive conversation tomorrow. I can't wait until 0.10.0 when we get to factor it in. 😄

@pydanny
Copy link
Member

pydanny commented Nov 11, 2014

Whew. Have a whole day in one place. With broadband internet. So lovely! Okay, to answer @pfmoore's questions:

.1. Is it acceptable that we decouple the user interface (the cookiecutter.json/cookiecutter.yaml file layout) from the internal representation? I doubt this is controversial, but I'd like to be sure.

I don't see why not.

.2. Is it acceptable to use a class to encapsulate the internal settings format, (a) to avoid client code relying on the exact details of the internal format, and (b) to avoid long, ugly function names that no-one will want to use?

Right now the policy for classes is "classes only for exceptions and in tests". Therefore, is there a way that a dictionary could be used instead of a class?

Note: I personally think classes are quite dandy, but I recently assembled a project that used decorators, partials of decorators, and callables as arguments in both the decorators and partials of decorators. So what do I know?

.3. The concept of a "cookiecutter API" came up in the discussion - what is that API? I don't see it documented anywhere, and I assume it's obvious that we can't assume that everything is fixed in stone.

We actually use a number of the cookiecutter functions (i.e. the 'API') outside of cookiecutter. We aren't the only ones. That said, which functions are being used in this way isn't documented. I'll assemble some notes, be it a pull request or a simple gist. From there were can document a formal user backend interface.

IMPORTANT As soon as we release 0.9.0 and move towards the 0.10.0 release, we are on feature freeze in order to make @pfmoore's effort to get this pull request (or better yet, a series of smaller pull requests for this issue) into master easier. Small bugfixes for minor issues may be accepted, as well as emergency items (rare). Why? Because I think we all agree we would rather be coding than merging and rebasing.

@dnephin
Copy link

dnephin commented Dec 4, 2014

+1 , I'd like to be able to use yaml instead of json

Not being able to comment things in the config (with json) is unfortunate

@msabramo
Copy link
Contributor

👍 for YAML config

Let me know if I can help move this along.

@pydanny
Copy link
Member

pydanny commented Jun 6, 2016

I'm sad to close this PR, because @pfmoore deserves so much respect. However, #743 moves us away from YAML. From there we can explore other formats and techniques for configuration.

@pydanny pydanny closed this Jun 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants