Skip to content

add subset dataset facets to spec#4008

Merged
pawel-big-lebowski merged 3 commits intomainfrom
spec/subset-definition-facet
Sep 12, 2025
Merged

add subset dataset facets to spec#4008
pawel-big-lebowski merged 3 commits intomainfrom
spec/subset-definition-facet

Conversation

@pawel-big-lebowski
Copy link
Copy Markdown
Collaborator

@pawel-big-lebowski pawel-big-lebowski commented Sep 2, 2025

@pawel-big-lebowski pawel-big-lebowski requested a review from a team as a code owner September 2, 2025 11:50
@boring-cyborg boring-cyborg bot added area:client/java openlineage-java area:client/python openlineage-python area:spec Specifications and standards for the project area:tests Testing code language:java Uses Java programming language language:python Uses Python programming language labels Sep 2, 2025
@pawel-big-lebowski pawel-big-lebowski marked this pull request as draft September 2, 2025 11:52
@pawel-big-lebowski pawel-big-lebowski force-pushed the spec/subset-definition-facet branch 2 times, most recently from fdf951d to 18aed40 Compare September 2, 2025 11:58
@pawel-big-lebowski pawel-big-lebowski marked this pull request as ready for review September 2, 2025 12:31
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 2, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 85.05%. Comparing base (d10b39f) to head (6593b26).
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #4008   +/-   ##
=======================================
  Coverage   85.05%   85.05%           
=======================================
  Files          57       57           
  Lines        3855     3855           
=======================================
  Hits         3279     3279           
  Misses        576      576           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@JDarDagran JDarDagran requested a review from Copilot September 2, 2025 13:45

This comment was marked as outdated.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds subset dataset facets to the OpenLineage specification, introducing a new schema definition for filtering or subsetting datasets based on various conditions like location, partition, comparison, or binary operations.

  • Introduces a new BaseSubsetDatasetFacet.json schema with comprehensive condition types for dataset subsetting
  • Adds test cases demonstrating different subset condition scenarios (compare, binary, location, partition)
  • Updates Java code generation to support const fields in JSON Schema

Reviewed Changes

Copilot reviewed 12 out of 13 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
website/static/spec/facets/1-0-0/BaseSubsetDatasetFacet.json Main schema definition for subset dataset facets with various condition types
spec/facets/BaseSubsetDatasetFacet.json Duplicate schema file in spec directory
spec/tests/BaseSubsetDatasetFacet/*.json Test cases for different subset condition scenarios
client/python/redact_fields.yml Python client configuration for new subset dataset classes
client/python/openlineage/client/facet_v2.py Python client import for new subset dataset module
client/java/src/test/java/io/openlineage/client/OpenLineageTest.java Java test demonstrating subset facet usage
client/java/generator/src/main/java/io/openlineage/client/*.java Java code generator updates to support const fields

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Copy Markdown
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, I have added comments regarding the code generator portion.

Comment on lines +55 to +65
if (field.getValue().has("const")) {
// only String const are supported
fields.add(new Field(field.getKey(), new PrimitiveType("string", null), description, field.getValue().get("const").asText()));
} else {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this logic be in the parse() method instead?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think not, bcz const shouldn't be returned as a Type (like PrimitiveType of String with constant value) and it should be already put at the Field member.

Comment on lines +318 to +322
public PrimitiveType(String name, String format, String constValue) {
super();
this.name = name;
this.format = format;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

constValue is not used?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, constValue should be just a property of a Field instead.

Comment on lines +107 to +111
if (property.getConstValue().isPresent()) {
resolvedField = new ResolvedField(property, visit(property.getType()), property.getConstValue().get());
} else {
resolvedField = new ResolvedField(property, visit(property.getType()));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems unnecessarily complicated. You are unwrapping an Optional to pass it to a constructor that will rewrap it in an Optional.
Maybe just add a ResolvedField constructor that takes an optional since this is how it stores it.
ResolveField(Field field, ResolvedType type, Optional<String> constantValue)


@Override
public String toString() {
return "ResolvedField{name: " + field.getName() + ", type: " + type + "}";
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be updated.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same for equals and hashCode bellow

Copy link
Copy Markdown
Collaborator Author

@pawel-big-lebowski pawel-big-lebowski Sep 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the code so that constantValue is stored directly within field, so there's no need to change toString, equals, and hashCode for ResolvedField. Thanks.

@pawel-big-lebowski pawel-big-lebowski force-pushed the spec/subset-definition-facet branch 2 times, most recently from 5347ec7 to 6593b26 Compare September 4, 2025 11:50
fields.add(new Field(field.getKey(), fieldType, description));
if (field.getValue().has("const")) {
// only String const are supported
fields.add(new Field(field.getKey(), new PrimitiveType("string", null), description, field.getValue().get("const").asText()));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to avoid surprises maybe we should still call parse() but check the type is String and throw an exception if not.
With this implementation, one can put whatever in the type and it will be replaced by string.
This could lead to some hair pulling on unexpected behavior.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, we should definitely avoid hair pulling. Second commit should resolve that.

@pawel-big-lebowski pawel-big-lebowski force-pushed the spec/subset-definition-facet branch from 6593b26 to 3eed2aa Compare September 8, 2025 09:37
Copy link
Copy Markdown
Member

@mobuchowski mobuchowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved but would appreciate more comments around the Java generator part - it's less frequently accessed and looked at part of code, so it's a good opportunity to document it more. Thanks.

// only regular fields are used in the builder
@Override
public void onField(ResolvedField f) {
if (f.getConstantValue().isPresent()) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of adding this if statement in every call to onField(), you could add an onConstantField() to the Visitor with the default implementation that does nothing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love this—this was exactly what I was missing. Thanks!

Copy link
Copy Markdown
Member

@julienledem julienledem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made an additional comment that can simplify a bit the section about handling constant fields.
Otherwise, this looks good.

@pawel-big-lebowski pawel-big-lebowski force-pushed the spec/subset-definition-facet branch from 3eed2aa to 9266d7a Compare September 11, 2025 16:02
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
Signed-off-by: Pawel Leszczynski <leszczynski.pawel@gmail.com>
@pawel-big-lebowski pawel-big-lebowski force-pushed the spec/subset-definition-facet branch from 9266d7a to 9623d1e Compare September 12, 2025 07:23
@pawel-big-lebowski pawel-big-lebowski merged commit d0944e9 into main Sep 12, 2025
67 checks passed
@pawel-big-lebowski pawel-big-lebowski deleted the spec/subset-definition-facet branch September 12, 2025 15:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:client/java openlineage-java area:client/python openlineage-python area:spec Specifications and standards for the project area:tests Testing code language:java Uses Java programming language language:python Uses Python programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants