What is the problem the feature request solves?
It is important to keep track of which Spark expressions are implemented / not implemented yet. It is also important to understand all compatibility issues, and these can be specific to Spark versions. When new Spark versions are released, we need to re-audit against those versions.
We need a solid process around this, rather than the current best-effort approach. It is important to know current compatibility status and have this well documented, even if there are known gaps and issues.
At a high level, I think we need to capture current status in a format that serves as the single source-of-truth which can be used to generate documentation and to validate that we have sufficient test coverage.
We could do this in a file-based format, such as one large yaml file, or one yaml file per expression. Another option could be to use annotations in the Scala code (such as in the serde framework). A file-based approach would be less strain on CI when we make changes.
We would also need good contributor guide documentation and/or Claude skills related to this.
Scripts / skills needed:
- Verify that that the yaml file(s) are consistent with the current implementation
- Run an audit task against a new Spark version and update yaml file to track potential issues that need to be investigated
- Generate user-facing documentation from this file (compatibility guide, list of supported expressions)
Describe the potential solution
No response
Additional context
No response
What is the problem the feature request solves?
It is important to keep track of which Spark expressions are implemented / not implemented yet. It is also important to understand all compatibility issues, and these can be specific to Spark versions. When new Spark versions are released, we need to re-audit against those versions.
We need a solid process around this, rather than the current best-effort approach. It is important to know current compatibility status and have this well documented, even if there are known gaps and issues.
At a high level, I think we need to capture current status in a format that serves as the single source-of-truth which can be used to generate documentation and to validate that we have sufficient test coverage.
We could do this in a file-based format, such as one large yaml file, or one yaml file per expression. Another option could be to use annotations in the Scala code (such as in the serde framework). A file-based approach would be less strain on CI when we make changes.
We would also need good contributor guide documentation and/or Claude skills related to this.
Scripts / skills needed:
Describe the potential solution
No response
Additional context
No response