Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prophy python codec cleanup #16

Merged
merged 19 commits into from
Jan 15, 2019
Merged

Prophy python codec cleanup #16

merged 19 commits into from
Jan 15, 2019

Conversation

kamichal
Copy link
Collaborator

@kamichal kamichal commented Aug 2, 2018

A step ahead to:

  • creating wire pattern stamps that can be converted with md5 into a unique version identifier
  • making debugabble hex dumps in case of decode fail

@kamichal kamichal self-assigned this Aug 2, 2018
@aurzenligl
Copy link
Owner

I'll need a while to read into it, most probably on weekend.

@coveralls
Copy link

coveralls commented Aug 3, 2018

Coverage Status

Coverage increased (+0.05%) to 99.336% when pulling 7a2e7cd on wire_interface_versioning into 1c60fa2 on master.

@kamichal kamichal changed the base branch from pep8_propagation to master August 3, 2018 08:28
_SIZE = padding_size

@staticmethod
def encode_mock():
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unused.

@classmethod
def get_descriptor(cls):
"""
FIXME: I'm afraid it rapes YAGNI rule
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a public mechanism to introspect struct/union layout. It's needed for users who want to implement non-standard algorithms on messages. I see that bricks_walk is essentially the same thing, but instead of returning descriptor for underlying struct/union, it just flattens everything. I suppose it's better to preserve the structure and return a descriptor for nested type, instead of unpacking it as a set of "bricks".

Rationale: "bricking" may be good for stamping schema of prophy message, but they're not good for custom algorithms which print, transform, check, fill, clone, etc. messages. All current internally implemented algorithms in Python prophy codec use recurrence and e.g. call proper struct method if member of struct is another struct. "Descriptor" or - how you called it - "brick" mechanism should probably allow user to do the same, so that boundaries between structs are not removed.

See ListFields and Descriptor for reference on protobuf reflection API, which this get_descriptor API is supposed to be modelled after:
https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.message.Message-class
https://developers.google.com/protocol-buffers/docs/reference/python/

So... I'd propose to stick to the "descriptor" terminology and design, without flattening, and just extend this mechanism, so that it's complete. "Bricking" (meaning: flattening all fields of message) is just one of many algorithms which can be implemented when one has message descriptors publicly available (and then it could be implemented as free function, because all details relevant to "bricking" algorithm would be available publicly in single message type).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • 1st: In protobuf:

ListFields(self)
Returns a list of (FieldDescriptor, value) tuples for all fields in the message which are not empty.
Fields are returned with its value. The "get_descriptor" returns neither values nor paddings if any is there.

  • 2nd. To determine unique identifier of the class stamp I need to iterate through each brick of each subclass and make it flatten. Consider this:
class Struct(prophy.with_metaclass(prophy.struct_generator, prophy.struct)):
    _descriptor = [("x", prophy.u32),
                   ("y", prophy.u32)]
    class Host(prophy.with_metaclass(prophy.struct_generator, prophy.struct)):
        _descriptor = [("a", Struct),
                       ("b", Struct)]

What if Struct.x changes type to say u64?
Iterating through Host.get_descriptor() will not notice any difference:
[('a', <class 'Struct'>, ('STRUCT', 4)), ('b', <class 'Struct'>, ('STRUCT', 4))]
Yes, then I could go recursively deeper, flatten the list, turn into strings and then meld-up with hashlib.md5 to get an wire interface identifier. It's possible, but not satisfying, because my point is to also dig into bytes-stream decode/encode debugging. Then getting information about paddings becomes essential. Walk across the wire and tell/report what happened.

But going back to the Host.Struct case. What if the Struct becomes optional? The Host.get_descriptor will return exactly the same: [('a', <class 'Struct'>, ('STRUCT', 4)), ('b', <class 'Struct'>, ('STRUCT', 4))]
The optional field wil be not included.

I don't know if the get_descriptor is used by anybody. I assume it is and I decided to make a similar mechanism aside - iterating the "bricks_walk".

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ListFields(self)
Returns a list of (FieldDescriptor, value) tuples for all fields in the message which are not empty.
Fields are returned with its value. The "get_descriptor" returns neither values nor paddings if any is there.

Yes, current get_descriptor is quite limited and should be enhanced. In defence of protobuf - they don't have the problem of padding, so they don't have to describe it. And when it comes to optional fields... there is some API in protobuf which allows to get a list of all fields, not only filled ones, but I don't quite remember which ;) I used protobuf as an example - to explain that it exposes publicly details about its schema in Python codec, so that users can do virtually anything with protobuf messages without necessity of changing protobuf implementation (which would be a hassle).

But going back to the Host.Struct case. What if the Struct becomes optional? The Host.get_descriptor will return exactly the same: [('a', <class 'Struct'>, ('STRUCT', 4)), ('b', <class 'Struct'>, ('STRUCT', 4))]
The optional field wil be not included.

This is a shortcoming which can be corrected in get_descriptor mechanism - it can imho return all paddings and details like: whether field is optional or not. I don't see any problem in metadata that you propose in brick mechanism - the only thing that is questionable is flattening. My point is: it's easier to flatten nested structure, than to nest flattened list, so it seems to me keeping get_descriptor (but enhancing it by all necessary details) - and then flattening it would:

  • fulfil all requirements that bricking algorithm does as it is implemented now,
  • allow all kinds of usages which require nested descriptors - and that is a nice thing to have.
    It's a bit more legwork, but only a bit ;)

I don't know if the get_descriptor is used by anybody. I assume it is and I decided to make a similar mechanism aside - iterating the "bricks_walk".

I think it was added by @Jiish when he was working with Nokia as a trainee - for his master's thesis - I'm not sure if it's used in production, but I'm sure that there is a need for a fully fledged get_descriptor mechanism which would return all greasy details about prophy schema. Otherwise people would (and still do!) depend on private prophy details, which is obviously a wrong thing to do.

All in all - I wouldn't worry about changing get_descriptor API, but I think that enhancing it to a point when other uses (like walking through flattened "bricks") is possible leaves us with a prophy that is more useful for further purposes, while adding a parallel mechanism similar to get_descriptor makes things more confusing and harder to maintain, because there'd be two competing mechanisms for doing essentially the same thing.

@aurzenligl
Copy link
Owner

The refactoring of Python codec is a very good idea, things got cleaner now. I have a couple of suggestions regarding placement of functionalities throughout modules.

I'd probably move base_array to container.py.
I'd probably move (keep?) enum to scalar.py.
I'd probably retain composite.py name instead of data_types.py which doesn't define all data types.
I'd probably rename desc_item.py to descriptor.py, and store all descriptor-related things there (reflection API, etc.).
I'd probably rename descriptor_item_type to field which sounds simpler and to the point (statement "descriptor is composed of fields" rings the bell).
I'd probably move codec_kind to descriptor.py - since that's the place with things common to all data types.

All in all - separation or generators and descriptors from big composite blob is definitely an improvement.

"""
Base metaclass type intended to validate, supplement and create all
prophy_data_object classes.
All of this magic happen at the very moment of the created class' import.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, all of this magic happens at the very moment when class definition gets interpreted, which usually happens at codec module import. In unit tests we delay moment of class creation until fixture function is called - which happens way after import.

@kamichal
Copy link
Collaborator Author

kamichal commented Aug 6, 2018

Sometimes I bother with class names. There are 38 class definitions (excluding tests) in prophy package and only ProphyError is written in camel case.
You asked me to rename descriptor_item_type to field, but that's a common variable name used massively in the package, so it collides.

I can either to call it e.g. field_type or use the PEP8 convention, i.e.: Field. Of course that breaks the convention and even PEP8 recommends keeping existing convention.

So I would like to as to break the convention and rename the classes to CamelCase in whole prophy package. Of course external interface (prophy usage) will not change, because prophy.__init__.py allows to rename exported definitions, (e.g. from .generators import EnumGenerator as enum_generator).

Do you, Krzysztof want me to do that? 💃

@aurzenligl
Copy link
Owner

I can either to call it e.g. field_type or use the PEP8 convention, i.e.: Field. Of course that breaks the convention and even PEP8 recommends keeping existing convention.

That's definitely OK. I don't care whether it's field_type or field or Field or FieldType, I'd just want the core part of this type name to be some form of field word :D

So I would like to as to break the convention and rename the classes to CamelCase in whole prophy package. Of course external interface (prophy usage) will not change, because prophy.init.py allows to rename exported definitions, (e.g. from .generators import EnumGenerator as enum_generator).

Both ways are ok - all in snake case (against PEP8) or all in CamelCase (aligned to PEP8). I agree that keeping backwards compatibility by exposing small case is a good idea.

Do you, Krzysztof want me to do that?

If you feel like it - then do it ;) Code will become yet more clear.

@kamichal kamichal changed the title Wire interface versioning Prophy python codec cleanup Jan 13, 2019
@kamichal
Copy link
Collaborator Author

I decided to discontinue the "wire interface versioning" implementation on this side (in prophy) and to implement it in prophyc. I removed the fishy part with descriptor iterator (called like "wire bricks walk"). So this pull request consists only of refactoring.

Copy link
Owner

@aurzenligl aurzenligl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of good work @kamichal :) I've read the diff - looks fine for pulling.

@kamichal
Copy link
Collaborator Author

kamichal commented Jan 14, 2019

Thanks. It's ready to be merged. Version bump and release can wait for "babel part 1".

@kamichal kamichal merged commit 094e29b into master Jan 15, 2019
@kamichal kamichal deleted the wire_interface_versioning branch January 15, 2019 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants