-
Notifications
You must be signed in to change notification settings - Fork 2.5k
Auto-generate FormatSerialize/FormatDeserialize code from JSON that defines the schema #8156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This allows to move back default constructors as private. Alternatives where all sort of ugly / unclear, like having secret token passed around or private factories. It's more wordly than make_uniq<>, but I guess given it's anyhow generated code it should not be too relevant.
This should allow stricter checks by the compiler on the fact that these classes needs to be default constructible only for internal convenience (of the FormatSerializer implementation). Should also allow to more easily change this interface if users are only internal to the class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
LGTM, I guess main comment is on public/private constructors plus a few other notes, but in any case seems solid, code looks the same as now, tests are a pass.
For me good to be merged, and we can iterate on details on the go.
| }, | ||
| { | ||
| "name": "child", | ||
| "optional": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a comment, I would potentially consider in a next step to remove 'optional' and move that as part of the type, so here for example:
"name": "child",
"type": "duckdb::optional<ParsedExpression*>"
with the appropriate serialization / deserialization deducted from the type instead than from the extra field, but I am not sure what's more expensive to maintain in the long run.
Also here likely to be re-evaluated later.
| @@ -0,0 +1,376 @@ | |||
| [ | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure exactly how, but I would try to enforce (probably via script run in CI) that only 'good' changes are done to this file, where by good I mean that members should not be removed / shuffled / renamed (= doing so triggers incompatibility).
Or via a script comparing two of these json and returning either "compatible" or not. That way we could use say json from 1.0 and json from 1.x, and check whether they are (forward!) compatible.
This partially could be reached by testing, but could be nice to have it also at the json level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that is a good idea as well
Serializationrework
… serializationrework
This PR moves towards automatically generating the serialization code for various classes (expressions, query nodes, statements, etc) from a JSON file that defines the schema of these classes. The serialization code can be generated using the following command:
The goal of this change is to centralize the (de)serialization code generation, and to more easily detect changes that break backwards compatibility of serialization by locating the serialized schema of the objects in one file. In the future, we would also like to use this to generate
CopyandEqualsmethods for these classes. Below is a snippet of what the JSON looks like:{ "class": "ParsedExpression", "class_type": "expression_class", "extra_parameters": [ "type" ], "includes": [ "duckdb/parser/expression/list.hpp" ], "members": [ { "name": "class", "type": "ExpressionClass", "property": "expression_class" }, { "name": "type", "type": "ExpressionType" }, { "name": "alias", "type": "string" } ] }, { "class": "BetweenExpression", "base": "ParsedExpression", "enum": "BETWEEN", "members": [ { "name": "input", "type": "ParsedExpression*" }, { "name": "lower", "type": "ParsedExpression*" }, { "name": "upper", "type": "ParsedExpression*" } ], "constructor": [] }The FormatSerialize/FormatDeserialize code that gets generated is identical to the current code.