Skip to content

Improve audit & correctness checking/reporting processes #4050

@andygrove

Description

@andygrove

What is the problem the feature request solves?

It is important to keep track of which Spark expressions are implemented / not implemented yet. It is also important to understand all compatibility issues, and these can be specific to Spark versions. When new Spark versions are released, we need to re-audit against those versions.

We need a solid process around this, rather than the current best-effort approach. It is important to know current compatibility status and have this well documented, even if there are known gaps and issues.

At a high level, I think we need to capture current status in a format that serves as the single source-of-truth which can be used to generate documentation and to validate that we have sufficient test coverage.

We could do this in a file-based format, such as one large yaml file, or one yaml file per expression. Another option could be to use annotations in the Scala code (such as in the serde framework). A file-based approach would be less strain on CI when we make changes.

We would also need good contributor guide documentation and/or Claude skills related to this.

Scripts / skills needed:

  • Verify that that the yaml file(s) are consistent with the current implementation
  • Run an audit task against a new Spark version and update yaml file to track potential issues that need to be investigated
  • Generate user-facing documentation from this file (compatibility guide, list of supported expressions)

Describe the potential solution

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions