Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for custom options #119

Open
adriangb opened this issue Jul 14, 2020 · 17 comments
Open

Add support for custom options #119

adriangb opened this issue Jul 14, 2020 · 17 comments

Comments

@adriangb
Copy link
Contributor

Custom options are a vanilla proto feature documented here. I personally have a use case for them to apply constraints, similar to protoc-gen-validate. I think supporting generic custom options would be a great addition to python-betterproto.

In order for this to happen, I see a couple changes that would be needed:

  1. The options proto needs to be compiled by protoc and imported into the namespace from which CodeGeneratorRequest. ParseFromString is called. This req is documented in Python ParseFromString not parsing custom options protocolbuffers/protobuf#3321.
  2. plugin.py needs the ability to parse the custom options and somehow store them.
  3. We'd need to figure out how to use the custom options: Python has no concept of applying "options" (aside from comments). There are many mechanisms that could be employed (getters/setters, pydantic, comments, etc.). My feeling is that to cover all use cases, plugin.py would need to be made extendable so that individual implementations can decide how to parse custom options and how to use them in the generation of the output .py files.

Is there any interest in this? I'd be willing to try and work on something.

@adriangb
Copy link
Contributor Author

Separate comment to keep the first post clean.

Towards point (3) I started working on a refactor of plugin.py to make it object oriented since I think that would make it easier to extend/modify. Something like:

@dataclass
class Message:
    """Representation of a protobuf message.
    """
    parent: Union[OutputFile, Message]
    proto_obj: DescriptorProto
    path: List[int]
    fields: List[Union[Field, Message]] = field(default_factory=list)

    @property  # really this should be functools.cached_property but that's >=3.8 only
    def proto_name(self) -> str:
        return self.proto_obj.name

    @property
    def py_name(self) -> str:
        return pythonize_class_name(self.proto_name)

    def process_options(self, ??) -> ??:
        pass

Where the parent attribute allows recursively retrieving an OutputFile object from the field level to add imports, etc and the process_options method could be overridden to implement handling of custom options.

@boukeversteegh
Copy link
Collaborator

Hi Adrian!

Options and extensibility

Yes! Its a great idea, and it would be great to support extending betterproto with options.

  1. I haven't had time to get into and properly understand the option parsing process, but if betterproto needs to do a bit of extra work to make it happen, I think thats fine.

  2. & 3 Gotcha. I agree that interpreting the options should not be betterprotos job, because that means we introduce a meta-language on top of Protobuf, which is beyond the scope of betterproto; but ingesting them and compiling it in some way to python would be OK. Would you have any idea (no matter how primitive) of how an extension could hook into betterproto?

    Either the extension could activate at compile-time as you seem to suggest, but alternatively it might work at runtime.

    Compile time by running multiple protoc plugins:

    I recall that protoc actually supports running multiple plugins in sequence, where the second plugin can somehow modify the output of the first plugin. How that really works, I have no idea, but that sounds like something we could apply, or was this already implied as part of your proposal?

    Runtime extensions by generating classes with hooks

    E.g. we add an @options(['some_validation_options', 'some_other_option'], OPTION_PARSERS) annotation to a generated class or property/method, and extensions can hook into the global OPTION_PARSERS.

    Or some other public betterproto.HOOKS object that is somehow used / called inside the generated classes at runtime.

    The question is, do you want to create extensions that generate code or add behavior? For the first one, compile-time is needed, for the second one, you could use either approach (second might be easier).

OOP

Great initiative! I was looking forward to see some progress towards making the project more OOP.

#110 still needs to be merged, so take care that this wouldn't make your work incompatible. Because I may not have time to look into that PR until the next week, perhaps we could start by laying out some classes and their responsibilities?

I like idea of a message object where we can store information of both the source (proto file) and the output (like package, file path, needed imports etc). However, I would suggest to separate this refactor from the support of extensions (i.e. oop first, then extensibility), so that we can merge oop improvements, even when we haven't worked out the extensions.

@adriangb
Copy link
Contributor Author

adriangb commented Jul 14, 2020

Compile time by running multiple protoc plugins:
I recall that protoc actually supports running multiple plugins in sequence, where the second plugin can somehow modify the output of the first plugin. How that really works, I have no idea, but that sounds like something we could apply, or was this already implied as part of your proposal?

This does sound interesting, I just have no idea how it works, I'd have to look into it further. Even if one plugin was able to get the output from another, we would have to somehow encode the custom options into the python source or the request/response objects, at which point I feel like we'd still be coming up with some meta language.

I think the same applies to any type of hooks or pre-made structures: we would have to come up with (and justify) that syntax/structure for encoding and managing the custom options. Since it would be hard to justify a universal approach that works with all possible uses of custom options, I was leaning towards delegating that work to the implementer of the custom options.

In order to do that, I think plugin.py needs a cleaner and more extensible API, with a clear place where a custom options consumer could override a couple of methods to inject the source code that processes the custom options into the generated .py files. As suggested above, we could have an empty method that by default just does nothing with custom options, but at least presents them as Python objects with a consistent API so that they can easily be processed.

I actually have a somewhat working version of plugin.py structured as described above on my branch here. Needs a lot of work but it's promising :)

@adriangb
Copy link
Contributor Author

@boukeversteegh I opened a PR at #121 to discuss modularization as a first step

@nat-n
Copy link
Collaborator

nat-n commented Jul 18, 2020

I like the sound of starting with a pure refactor to make the code easier to work (maybe in multiple stages) with before adding functionality.

Concerning each of the three approaches to extending or building on top of betterproto for custom option based behavoir:

A) Running multiple plugins in sequence has the advantage of minimal coupling between protoc and extensions. I'd suggest looking at protoc-docs-plugin as a relatively simple example of this. That's not to say it's a simple task, but it should still be possible to produce an easy to use template for creating new extensions.

B) Creating an architecture for directly extending betterproto at compile time would obviously require a greater number of explicit API design choices to make (and live with thereafter), but might have the advantage of being more powerful and maybe nicer in the end if it works well. One way of realising this (similar to how I've seen such things done before) would be for betterproto to look at compile time for packages on sys.path with names derived from the custom option. So for example on encountering a custom option betterproto would do something along the lines of:

try:
    extension = importlib.import_module(f"betterproto_extension_{custom_option_name}")
    annotated_entity = extension.apply(annotated_entity)
except ImportError:
    # ignore

where annotated_entity is the object modelling the message/field in the brave new OOP implementation. This mechanism has the advantage of being super simple to work with, but the potential disadvantage of being a bit implicit and uncontrolled WRT option names vs ownership of packages in pypi.

C) Applying a generic annotation to generated code might lie somewhere in between options A & B in terms of coupling, required complexity, and power, though my (not very well substantiated) intuition is that it would be a worse tradeoff and less elegant solution than either of them overall.

@boukeversteegh
Copy link
Collaborator

boukeversteegh commented Jul 18, 2020

@adriangb i would be very curious to see a working proof of concept of something that extends betterproto. I saw the empty method, but at the moment I don't see how it could be used by an extension. But all in due time of course. I might prototype something myself if i get around to it. It's an interesting challenge in itself.

@adriangb @nat-n i agree with your observations that defining an API surface will require careful deliberation and that it will be a longterm commitment. The impact of breaking the extension api will be possibly higher than breaking betterproto api itself, since upgrading users can always refactor their own code when upgrading betterproto, but if the plugins they depend on are broken, they will be stuck.

Not being able to upgrade means the library is at end of life, so that's not an option imo.

With that in mind, I am most positive towards suggestion 2 by @nat-n . The extension point is straightforward, although we will still need to define an api surface for the extension to interact with. The smaller a surface we can come up with, the better, in my opinion.

Anything we expose will be forever frozen and slow down development. At this stage I think we can't afford it, unless we expose literally nothing of betterproto internals.

I thought, we could do this by passing the compiled result to the extension, with a list of options that we found (Idea 1, basically) The extension could then modify the code at will. But i realize that this would, instead of exposing a few well defined objects as api surface, would make the plugins depend on the specific generated code (such as search and replace, modifying the syntax tree). This would be even worse, as we now cannot change the compiled output for fear of breaking extensions.

So in conclusion, it seems better to define Some surface, than none (using the entire output as interface), and the smaller and more explicit the surface, the better.

@adriangb
Copy link
Contributor Author

I definitely think this is a problem that requires careful deliberation.

One option is to simply attach the reference like this:

@dataclass
class MyMessage(betterproto.Message):
    myfield: str = betterproto.string_field(options={"package.message.field": "value"})

And then let the implementer figure out how to deal with that. We could make some suggestions, such as:

from xyz import MyMessage

class MessageWithOptions(MyMessage, MyOptionsParser):
    pass

Where MyOptionsParser takes control of __init__ and injects functionality at runtime or something like that.

Another option, that would work at compile time, would be to give each message/service/etc in these new OOP classes attribute that is an arbitrary text field that will be added at the end of the betterproto generated text. Then provide a single API point that allows modifying this field at compile time based on the custom options parsed.

I hope this all makes sense, I'm on mobile but can try up something more detailed later today.

@boukeversteegh
Copy link
Collaborator

But also, it may be more practical to look at how other proto compilers allow their extensions to hook in, rather than reinventing the wheel 😝

@nat-n
Copy link
Collaborator

nat-n commented Jul 18, 2020

@adriangb I'm not sure I can picture what your last point entails.

Contrary to my previous comment about what I labeled as approach C (which your latest comment expanded on), maybe this approach is worth considering in that it is quite low commitment with small footprint on the API, and does preclude adding more powerful compile time mechanisms later. Though I'm not sure how one would use it.

One problem with the way you suggested it is that overriding __init__ on dataclasses doesn't tend to work well.

Could you perhaps tell us a bit about your own use case to help make the discussion more concrete?

@adriangb
Copy link
Contributor Author

adriangb commented Jul 18, 2020

But also, it may be more practical to look at how other proto compilers allow their extensions to hook in, rather than reinventing the wheel 😝

Of course! Do you know of any other Python-based proto compilers that have similar issues? I'd love to take a look and learn from others. I am too unfamiliar with the space to really know which ones are good to look at...

@adriangb
Copy link
Contributor Author

adriangb commented Jul 18, 2020

@adriangb I'm not sure I can picture what your last point entails.

Very understandable, I do think I wasn't very clear.

Basically, I was suggesting that we give the new OOP objects an attribute which holds arbitrary text:

class Field(Message):
    self.options_code: str = ""

At some point, somehow, we'd give the custom options parser to modify this text. Then in the Jinja template:

class {{ message.py_name }}(betterproto.Message):
    {% if message.options_code%}
{{ message.options_code}}

But really that custom options parser would still need access to Field in this case, so at the end of the day we would need to expose at least some API of these intermediary OOP objects.

Could you perhaps tell us a bit about your own use case to help make the discussion more concrete?

My use case is to convert protoc-gen-validate custom options into Pydantic validators (see my comment in PGV). There are many ways to do this, as we have been discussing.

That said, I think it is important for this library to consider all use cases, not just my own. As far as I understand, custom options are little more than comments on a field/message. The implementer can choose to do with them whatever they want. I feel that the best solution would be one that fully allows the implementer to inject whatever behavior they want, be that a decorator on the class, a new method in the class, etc.

Hope this helps clarify!

@adriangb
Copy link
Contributor Author

adriangb commented Jul 19, 2020

Another option that might be nice but would require more extensive API planning and refactoring is to have each one of these OOP classes render itself:

class MessageCompiler:

    def render(self):
        class_header = f"class {self.py_name}(betterproto.Message):\n"
        body = ""
        body += 'f""""{self.comment}""""\n'
        for field in self.fields:
            body += INDENT * 4 + field.render()

This could also be done with Jinja templates I guess:

from .templates import MessageCompilerTemplate

class MessageCompiler:

    render_template: str = jinja2.Env(...MessageCompilerTemplate..)

    def render(self):
        return self.render_template(message=self)

The downstream plugin/options handler would have to override the render method or the jinja template to inject whatever functionality/comments correspond to the option.

@adriangb
Copy link
Contributor Author

Now that #121 is merged, I think tackling this is a good next step since it will be very informative in designing any further changes to the plugin.

Currently, I am thinking the option I proposed in #119 (comment) makes the most sense within the context of #121, I'm going to attempt an implementation and see how it works out.

@nat-n
Copy link
Collaborator

nat-n commented Jul 26, 2020

Another option that might be nice but would require more extensive API planning and refactoring is to have each one of these OOP classes render itself:

I like this idea. I had something along these lines in mind as well, sort of like how components compose in UI frameworks like ReactJS. The API will take some iteration to get right, but this could also be quite nice for handling a few other things that are planned like, nested classes.

@adriangb
Copy link
Contributor Author

Do you have any strong feelings regarding use of Jinja templates within the class vs. just using g-strings and such (kind of how I did for the message fields in #121 )? My gut feeling is to keep the rendering within the dataclasses pure Python and use a Jinja template to render the overall output file.

@upcFrost
Copy link

Any progress so far?

I was just wondering why options can't be attached to the field metadata? It might be a bit hard to get them out of the dataclass, but at least it should be very easy to implement.

@maciejstromich
Copy link

Hi, awesome work and I appreciate the effort! We have bunch of option fields in our models and it would awesome if betterproto could render them in any form (personally I like the @Property approach). So is there any progress on this or it's just stalled without a clear path how to proceed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants