-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prophy python codec cleanup #16
Conversation
I'll need a while to read into it, most probably on weekend. |
96b0d9c
to
abea62b
Compare
bfad93b
to
7f1efbc
Compare
prophy/data_types.py
Outdated
_SIZE = padding_size | ||
|
||
@staticmethod | ||
def encode_mock(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems unused.
prophy/data_types.py
Outdated
@classmethod | ||
def get_descriptor(cls): | ||
""" | ||
FIXME: I'm afraid it rapes YAGNI rule |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a public mechanism to introspect struct/union layout. It's needed for users who want to implement non-standard algorithms on messages. I see that bricks_walk
is essentially the same thing, but instead of returning descriptor for underlying struct/union, it just flattens everything. I suppose it's better to preserve the structure and return a descriptor for nested type, instead of unpacking it as a set of "bricks".
Rationale: "bricking" may be good for stamping schema of prophy message, but they're not good for custom algorithms which print, transform, check, fill, clone, etc. messages. All current internally implemented algorithms in Python prophy codec use recurrence and e.g. call proper struct method if member of struct is another struct. "Descriptor" or - how you called it - "brick" mechanism should probably allow user to do the same, so that boundaries between structs are not removed.
See ListFields
and Descriptor
for reference on protobuf reflection API, which this get_descriptor
API is supposed to be modelled after:
https://developers.google.com/protocol-buffers/docs/reference/python/google.protobuf.message.Message-class
https://developers.google.com/protocol-buffers/docs/reference/python/
So... I'd propose to stick to the "descriptor" terminology and design, without flattening, and just extend this mechanism, so that it's complete. "Bricking" (meaning: flattening all fields of message) is just one of many algorithms which can be implemented when one has message descriptors publicly available (and then it could be implemented as free function, because all details relevant to "bricking" algorithm would be available publicly in single message type).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 1st: In
protobuf
:
ListFields(self)
Returns a list of (FieldDescriptor, value) tuples for all fields in the message which are not empty.
Fields are returned with its value. The "get_descriptor" returns neither values nor paddings if any is there.
- 2nd. To determine unique identifier of the class stamp I need to iterate through each
brick
of each subclass and make it flatten. Consider this:
class Struct(prophy.with_metaclass(prophy.struct_generator, prophy.struct)):
_descriptor = [("x", prophy.u32),
("y", prophy.u32)]
class Host(prophy.with_metaclass(prophy.struct_generator, prophy.struct)):
_descriptor = [("a", Struct),
("b", Struct)]
What if Struct.x
changes type to say u64
?
Iterating through Host.get_descriptor()
will not notice any difference:
[('a', <class 'Struct'>, ('STRUCT', 4)), ('b', <class 'Struct'>, ('STRUCT', 4))]
Yes, then I could go recursively deeper, flatten the list, turn into strings and then meld-up with hashlib.md5
to get an wire interface identifier
. It's possible, but not satisfying, because my point is to also dig into bytes-stream decode/encode debugging. Then getting information about paddings becomes essential. Walk across the wire and tell/report what happened.
But going back to the Host.Struct
case. What if the Struct
becomes optional? The Host.get_descriptor
will return exactly the same: [('a', <class 'Struct'>, ('STRUCT', 4)), ('b', <class 'Struct'>, ('STRUCT', 4))]
The optional field wil be not included.
I don't know if the get_descriptor
is used by anybody. I assume it is and I decided to make a similar mechanism aside - iterating the "bricks_walk".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ListFields(self)
Returns a list of (FieldDescriptor, value) tuples for all fields in the message which are not empty.
Fields are returned with its value. The "get_descriptor" returns neither values nor paddings if any is there.
Yes, current get_descriptor
is quite limited and should be enhanced. In defence of protobuf - they don't have the problem of padding, so they don't have to describe it. And when it comes to optional fields... there is some API in protobuf which allows to get a list of all fields, not only filled ones, but I don't quite remember which ;) I used protobuf as an example - to explain that it exposes publicly details about its schema in Python codec, so that users can do virtually anything with protobuf messages without necessity of changing protobuf implementation (which would be a hassle).
But going back to the Host.Struct case. What if the Struct becomes optional? The Host.get_descriptor will return exactly the same: [('a', <class 'Struct'>, ('STRUCT', 4)), ('b', <class 'Struct'>, ('STRUCT', 4))]
The optional field wil be not included.
This is a shortcoming which can be corrected in get_descriptor
mechanism - it can imho return all paddings and details like: whether field is optional or not. I don't see any problem in metadata that you propose in brick mechanism - the only thing that is questionable is flattening. My point is: it's easier to flatten nested structure, than to nest flattened list, so it seems to me keeping get_descriptor
(but enhancing it by all necessary details) - and then flattening it would:
- fulfil all requirements that bricking algorithm does as it is implemented now,
- allow all kinds of usages which require nested descriptors - and that is a nice thing to have.
It's a bit more legwork, but only a bit ;)
I don't know if the get_descriptor is used by anybody. I assume it is and I decided to make a similar mechanism aside - iterating the "bricks_walk".
I think it was added by @Jiish when he was working with Nokia as a trainee - for his master's thesis - I'm not sure if it's used in production, but I'm sure that there is a need for a fully fledged get_descriptor
mechanism which would return all greasy details about prophy schema. Otherwise people would (and still do!) depend on private prophy details, which is obviously a wrong thing to do.
All in all - I wouldn't worry about changing get_descriptor
API, but I think that enhancing it to a point when other uses (like walking through flattened "bricks") is possible leaves us with a prophy that is more useful for further purposes, while adding a parallel mechanism similar to get_descriptor
makes things more confusing and harder to maintain, because there'd be two competing mechanisms for doing essentially the same thing.
The refactoring of Python codec is a very good idea, things got cleaner now. I have a couple of suggestions regarding placement of functionalities throughout modules. I'd probably move All in all - separation or |
prophy/generators.py
Outdated
""" | ||
Base metaclass type intended to validate, supplement and create all | ||
prophy_data_object classes. | ||
All of this magic happen at the very moment of the created class' import. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, all of this magic happens at the very moment when class definition gets interpreted, which usually happens at codec module import
. In unit tests we delay moment of class creation until fixture function is called - which happens way after import.
Sometimes I bother with class names. There are 38 class definitions (excluding tests) in I can either to call it e.g. So I would like to as to break the convention and rename the classes to CamelCase in whole Do you, Krzysztof want me to do that? 💃 |
That's definitely OK. I don't care whether it's field_type or field or Field or FieldType, I'd just want the core part of this type name to be some form of
Both ways are ok - all in snake case (against PEP8) or all in CamelCase (aligned to PEP8). I agree that keeping backwards compatibility by exposing small case is a good idea.
If you feel like it - then do it ;) Code will become yet more clear. |
…, kept passin' although.
…you know what's still passin').
…ets much cleaner now.
cd20876
to
fa0b9ae
Compare
I decided to discontinue the "wire interface versioning" implementation on this side (in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of good work @kamichal :) I've read the diff - looks fine for pulling.
Thanks. It's ready to be merged. Version bump and release can wait for "babel part 1". |
A step ahead to:
wire pattern stamps
that can be converted withmd5
into a unique version identifier