Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPEC] Allow nested struct fields in SchemaDatasetFacet #2548

Merged

Conversation

dolfinus
Copy link
Contributor

@dolfinus dolfinus commented Mar 29, 2024

Problem

Closes: #2051

Solution

  • Your change modifies the core OpenLineage model
  • Your change modifies one or more OpenLineage facets

One-line summary:

Allow nested fields support to SchemaDatasetFacet

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (if necessary)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

@boring-cyborg boring-cyborg bot added the area:spec Specifications and standards for the project label Mar 29, 2024
@dolfinus
Copy link
Contributor Author

It looks like TypeResolver currently cannot handle recursive types

@mobuchowski
Copy link
Member

@JDarDagran you had some ideas around nested types 🙂

@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch from 064cfb6 to 9dae896 Compare March 29, 2024 14:34
@JDarDagran
Copy link
Contributor

I did similar change to SchemaDatasetFacet. TypeResolver in Java for sure needs improvement to handle recursive types.

@dolfinus
Copy link
Contributor Author

TypeResolver in Java for sure needs improvement to handle recursive types.

Any suggestions? My specialization is mostly Python, not Java.

@pawel-big-lebowski
Copy link
Contributor

pawel-big-lebowski commented Apr 2, 2024

@JDarDagran @dolfinus It makes sense to support collection alike structures as well. I like the repeated keyword in protobuff. I think it would make sense to add repeated property in SchemaFieldItem with a default value equal to false.

@dolfinus
Copy link
Contributor Author

dolfinus commented Apr 2, 2024

@pawel-big-lebowski @JDarDagran Could we move this discussion to #2051?

@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch 4 times, most recently from 3acf056 to bdc04fe Compare April 2, 2024 11:44
@dolfinus dolfinus marked this pull request as draft April 3, 2024 16:16
@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch from bdc04fe to db2dabe Compare April 3, 2024 16:17
@boring-cyborg boring-cyborg bot added the area:client/java openlineage-java label Apr 3, 2024
@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch 4 times, most recently from aaa7da5 to c0a0115 Compare April 3, 2024 16:33
@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch 5 times, most recently from 6d4ee3b to 0c208dc Compare April 3, 2024 18:29
@codecov-commenter
Copy link

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (3a0d14c) to head (0c208dc).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #2548   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            5         5           
  Lines          173       173           
=========================================
  Hits           173       173           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch from 0c208dc to a5ee679 Compare April 3, 2024 18:48
@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch from a5ee679 to 573f0c9 Compare April 3, 2024 19:22
@dolfinus
Copy link
Contributor Author

dolfinus commented Apr 3, 2024

Ok, TypeResolver now can handle self-referencing types.

Current implementation can break Java client backwards compatibility because method newSchemaDatasetFacetFieldsBuilder is renamed to newSchemaDatasetFacetFieldBuilder. Should I keep prior naming?

Copy link
Contributor

@JDarDagran JDarDagran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dolfinus that's fantastic you've made it to fix bug with recursive resolving in Java generator. I'm leaving review of this to @mobuchowski and @pawel-big-lebowski.

For me the change from SchemaDatasetFacetFields to SchemaDatasetFacetField is natural. More than that - I believe generator should change plural forms to singular in such cases (you can check my PR as an example)

spec/tests/SchemaDatasetFacet/1.json Show resolved Hide resolved
spec/facets/SchemaDatasetFacet.json Outdated Show resolved Hide resolved
@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch from 573f0c9 to b2c3387 Compare April 4, 2024 17:53
@dolfinus dolfinus marked this pull request as ready for review April 4, 2024 17:56
@dolfinus
Copy link
Contributor Author

dolfinus commented Apr 4, 2024

I've removed repeated: boolean field from schema as a result of a discussion

Copy link
Contributor

@pawel-big-lebowski pawel-big-lebowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a test suite io.openlineage.client.OpenLineageTest which verifies how the generated OpenLineage object serialises to Json. Tests within that suite create programatically OL object, serialise & deserialise it, and then compare if those objects have same values.

Normally we don't such tests when modifying a facet. However, it makes sense now as new type is added to java poet generator. Additional advantage of this approach is that is shows how convenient it is to use the newly generated method (how to produce nested schema in java client)

@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch 8 times, most recently from 7dbe3b9 to 8b1f762 Compare April 5, 2024 11:29
Signed-off-by: Martynov Maxim <martinov_m_s_@mail.ru>
@dolfinus dolfinus force-pushed the feature/schema-dataset-facet-nested branch from 8b1f762 to 5287387 Compare April 5, 2024 11:34
@dolfinus
Copy link
Contributor Author

dolfinus commented Apr 5, 2024

I've added tests for serialization/deserialization to OpenLineageTest.factory. But IMHO it should be splitted to separated tests for each factory.

@mobuchowski mobuchowski merged commit bc335dc into OpenLineage:main Apr 8, 2024
72 checks passed
@dolfinus dolfinus deleted the feature/schema-dataset-facet-nested branch April 8, 2024 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:client/java openlineage-java area:integration/flink area:integration/spark area:spec Specifications and standards for the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[PROPOSAL] Nested schemaFields definition
5 participants