Skip to content

Auto-generate FormatSerialize/FormatDeserialize code from JSON that defines the schema #8156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jul 7, 2023

Conversation

Mytherin
Copy link
Collaborator

@Mytherin Mytherin commented Jul 5, 2023

This PR moves towards automatically generating the serialization code for various classes (expressions, query nodes, statements, etc) from a JSON file that defines the schema of these classes. The serialization code can be generated using the following command:

python3 scripts/generate_serialization.py

The goal of this change is to centralize the (de)serialization code generation, and to more easily detect changes that break backwards compatibility of serialization by locating the serialized schema of the objects in one file. In the future, we would also like to use this to generate Copy and Equals methods for these classes. Below is a snippet of what the JSON looks like:

  {
    "class": "ParsedExpression",
    "class_type": "expression_class",
    "extra_parameters": [
      "type"
    ],
    "includes": [
      "duckdb/parser/expression/list.hpp"
    ],
    "members": [
      {
        "name": "class",
        "type": "ExpressionClass",
        "property": "expression_class"
      },
      {
        "name": "type",
        "type": "ExpressionType"
      },
      {
        "name": "alias",
        "type": "string"
      }
    ]
  },
  {
    "class": "BetweenExpression",
    "base": "ParsedExpression",
    "enum": "BETWEEN",
    "members": [
      {
        "name": "input",
        "type": "ParsedExpression*"
      },
      {
        "name": "lower",
        "type": "ParsedExpression*"
      },
      {
        "name": "upper",
        "type": "ParsedExpression*"
      }
    ],
    "constructor": []
  }

The FormatSerialize/FormatDeserialize code that gets generated is identical to the current code.

@Mytherin Mytherin requested review from carlopi and Maxxen July 5, 2023 18:23
carlopi added 2 commits July 5, 2023 23:50
This allows to move back default constructors as private. Alternatives where all sort
of ugly / unclear, like having secret token passed around or private factories.

It's more wordly than make_uniq<>, but I guess given it's anyhow generated code it should
not be too relevant.
This should allow stricter checks by the compiler on the fact that these classes
needs to be default constructible only for internal convenience (of the FormatSerializer
implementation). Should also allow to more easily change this interface if users are only
internal to the class.
Copy link
Contributor

@carlopi carlopi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
LGTM, I guess main comment is on public/private constructors plus a few other notes, but in any case seems solid, code looks the same as now, tests are a pass.

For me good to be merged, and we can iterate on details on the go.

},
{
"name": "child",
"optional": true,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment, I would potentially consider in a next step to remove 'optional' and move that as part of the type, so here for example:

"name": "child",
"type": "duckdb::optional<ParsedExpression*>"

with the appropriate serialization / deserialization deducted from the type instead than from the extra field, but I am not sure what's more expensive to maintain in the long run.

Also here likely to be re-evaluated later.

@@ -0,0 +1,376 @@
[
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure exactly how, but I would try to enforce (probably via script run in CI) that only 'good' changes are done to this file, where by good I mean that members should not be removed / shuffled / renamed (= doing so triggers incompatibility).

Or via a script comparing two of these json and returning either "compatible" or not. That way we could use say json from 1.0 and json from 1.x, and check whether they are (forward!) compatible.
This partially could be reached by testing, but could be nice to have it also at the json level.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that is a good idea as well

@github-actions github-actions bot marked this pull request as draft July 6, 2023 06:19
@Mytherin Mytherin marked this pull request as ready for review July 6, 2023 06:58
@github-actions github-actions bot marked this pull request as draft July 6, 2023 17:21
@Mytherin Mytherin marked this pull request as ready for review July 6, 2023 17:22
@Mytherin Mytherin merged commit b3ba0da into duckdb:master Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants