Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PARQUET-2489: Strawman guidance on feature releases #258

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

emkornfield
Copy link
Contributor

Make sure you have checked all steps below.

Jira

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • All the public functions and the classes in the PR contain Javadoc that explain what it does

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
Co-authored-by: Rok Mihevc <rok@mihevc.org>
CONTRIBUTING.md Outdated Show resolved Hide resolved
Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the lead on this @emkornfield. It will be very helpful to have this codified.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
emkornfield and others added 2 commits June 5, 2024 23:19
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
@emkornfield
Copy link
Contributor Author

@etseidl thank you for the proof reading

CONTRIBUTING.md Outdated Show resolved Hide resolved
Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is overly prescriptive wrt. timing of introducing and enabling new features. We should let every implementation choose their support policy, based on their user demographics and their respective targeted use cases.

cc @alamb

CONTRIBUTING.md Outdated

In some cases in addition to library level implementations it is
expected the changes will be justified via integration into a
processing engine to show their viability.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is a "processing engine" in this context? Perhaps this is out of scope?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, the intent here is to demonstrate some real world benefit to justify the cost. For instance, for the recent size statistics addition we probably should have required some benchmark numbers showing a clear benefit to justify the cost of creating and storing them. My GPU workflow was made much faster due to the inclusion of the unencoded byte array size information and could have provided hard numbers demonstrating that. Similarly, it would have been nice to have a pushdown example that was made possible because of the histograms. Simply adding the feature and showing that two implementations produced them in a compatible way was probably insufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rephrase and in a new pull request moved this to step 1.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated
library.

2. Forwards compatible. A file written by a new version of the library can be read by an older version
of the library.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking, this is true of almost no format enhancements, since the additional semantics couldn't be understood under an older version of the format.

Perhaps this should be relaxed a bit? For example:

Forwards compatible. A file written under a newer version of the format can be read under an older version of the format, but some information might be missing or performance might be suboptimal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to use your wording (will be present in next commit).

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
@emkornfield
Copy link
Contributor Author

I believe this is overly prescriptive wrt. timing of introducing and enabling new features. We should let every implementation choose their support policy, based on their user demographics and their respective targeted use cases.

Would it address most of your concerns if I replaced the "SHOULD" language, I'll revise to do so and we can hopefully iterate from there.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for writing this up @emkornfield -- it is very much needed and helpful

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated
Comment on lines 83 to 91
### Releases

The Parquet community aims to do releases of the format package only as needed
when new features are introduced. If multiple new features are being proposed
simultaneously some features might be consolidated into the same release.
Guidance is provided below on when implementations should enable features added
to the specification. Due to confusion in the past over parquet versioning it
is not expected that there will be a 3.0 release of the specification in the
foreseeable future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given there seems to be confusion about what even constitutes a parquet "release" maybe we simply should say that the parquet-format has no overall release version identifier and instead is a collection of individual features?

Or maybe we should give it a version date ("parquet-format as of 2024-06-07" etc) 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think at least for parquet-java we need a version identifier compatible with Maven because I think that is how it is consumed. We can maybe discuss separately just inlining the generated code there as well, as i think that is typically what other distributions do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC @wgtmac thoughts on this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about this, I think releases are also potentially helpful for any third-party java implementations beyond parquet java. I don't have strong feelings here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although the version identifier of parquet-format is compatible with Maven, it looks much weaker than usual Java libraries. IMO, a new version of parquet-format usually includes a set of new feature additions and format clarification/correction, meaning that a new version should be a good superset of previous versions. Therefore we should always stick to the latest version of parquet-format wherever possible. An exception is the deprecation of features to the format where a major version bump is required.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated
the feature enabled cannot be read under and older version of the format
(e.g. Adding a new compression algorithm.

The Parquet community hopes that new features are widely beneficial to users of
Copy link
Contributor

@alamb alamb Jun 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think trying to create a process to roll out a feature across all implementations is an admirable goal, however I am not sure there is sufficient consensus to include it in this document. If we include this (aspirational) process I fear there is a non trivial chance it would not be followed in practice, and thus its inclusion would be confusing

Thus I suggest we consider not including the rollout in docs

( I think this is similar to what @pitrou is saying here #258 (review))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to remove almost all prescriptions that actually require effort out of the suggestions. I feel that if we don't provide some guidance the ecosystem will never really move forward. If possible it would be nice to focus on concrete concerns. Are there any parts of what is documented here that you see as problematic as a maintainer from arrow-rs? Also @pitrou anything you see as specifically problematic from a parquet-cpp perspective?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. However, it is my opinion that the updated text still adds more potential for confusion than it adds. Specifically trying to proscribe the rollout cadence across loosly (at best) coordinated implementations in a document is unlikely to change behaviors and is likely to create confusion

Are there any parts of what is documented here that you see as problematic as a maintainer from arrow-rs?

One specific example is the mention of feature flags. While I am sure we could over come the technical details, as a maintainer I would not want to be in the position to try and dictate when a featre should be used by our users -- instead I would imagine that as new features were supported in parquet-rs there would be an appropriate writer configuration that users would decide to enable/disable as they deemed appropriate.

Another potential improvement (orthogonal to this disucssion) would be to label new format additions with the date in which they were added to the spec, and perhaps track the date when support was added to other implementations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically trying to proscribe the rollout cadence across loosly (at best) coordinated implementations in a document is unlikely to change behaviors and is likely to create confusion

Is there specific language here that you find problematic here? With the latest revision, my intent was to make the recommendation only about preserving compatibility. i.e. forward incompatible changes must wait at least 2 years before an implementation decides to turn them on by default. I wonder if changing the phrasing from MAY to MUST NOT USE before would help clarify?

instead I would imagine that as new features were supported in parquet-rs there would be an appropriate writer configuration that users would decide to enable/disable as they deemed appropriate.

I agree. Maybe this is a terminology issue. I view feature flag as a type of configuration, and I think I removed all recommendations for when, if ever, an implementation should turn something on by default (please let me know if something is still ambiguous or maybe I missed something in rephrasing).

Another potential improvement (orthogonal to this disucssion) would be to label new format additions with the date in which they were added to the spec, and perhaps track the date when support was added to other implementations

Agreed, I'll add this to the process I thought the feature matrix that was already underreview would be a good place for this? I would like to avoid putting too many dates directly in the spec to avoid confusion.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another read -- I think it was easy to miss the sentence that says the rollout dates are guidelines.

Therefore, the Parquet PMC gives the following guidance for changing a feature to be "on" by default:

So I guess I got confused that the text says it is providing guidance and then uses MAY which suggests to me the process as a whole is required but the individual steps have options. I think we should just make it clear these are recommended best practices

What about toning it down a bit like instead of

Forward compatible features/changes MAY be used by default in implementations
   once the parquet-format containing those changes has been formally released.
   For features that may pose a significant performance regression to prior
   format readers, libaries SHOULD consider delaying until 1 year after the
   release of the parquet-java implementation that contains the feature
   implementation.  Implementations MAY choose to do a major version bump when
   turning on a feature in this category.

Something like

Forward compatible features/changes may be used by default in implementations
   once the parquet-format containing those changes has been formally released.
   For features that may pose a significant performance regression to prior
   format readers, libraries should wait 1 year after the
   release of the first  parquet-java implementation that contains the feature
   implementation.  

I also think the major version bump suggestion is so library specific it is not worth putting in the guidance. The libraries have their own cadence (e.g. parquet-rs and parquet-cpp have major releases based on calendar cycles rather than feature content)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to rephrase to make the issues above clearer.

I also think the major version bump suggestion is so library specific it is not worth putting in the guidance. The libraries have their own cadence (e.g. parquet-rs and parquet-cpp have major releases based on calendar cycles rather than feature content)

My intent here was not to specify a release cadence but to specify that if changing a forward incompatible change to default the next release must be a major version release, I will try to clarify. I think parquet-rs and parquet-cpp trivially satisfy this given the current release policies.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alamb thanks for feedback, I've update to clarify and remove upper case for SHOULD/MAY language, please let me know if the current iteration still raises concerns.

@pitrou
Copy link
Member

pitrou commented Jun 7, 2024

Would it address most of your concerns if I replaced the "SHOULD" language, I'll revise to do so and we can hopefully iterate from there.

I think it could be rephrased as an example of how to deal with feature additions, rather than a recommendation.

emkornfield and others added 3 commits June 7, 2024 09:29
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
Co-authored-by: Andrew Lamb <andrew@nerdnetworks.org>
@emkornfield
Copy link
Contributor Author

I think it could be rephrased as an example of how to deal with feature additions, rather than a recommendation.

My goal here is to provide at least a little bit stronger then just an example, I think we do want to try to align implementations for compatiblity purposes and moving the ecosystem further. Maybe we can continue the conversation here on the @alamb's comment

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
- Soften language on recommendations and add clarifications
- Fix some grammar issues.
discussion concrete. This step is complete when there is lazy consensus. Part
of the consensus is whether it sufficient to provide 2 working
implementations as outlined in step 2 or if demonstration of the feature with
a down-stream query engine is necessary to justify the feature (e.g.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overly specific IMHO. Not all features are developed in relation with a "down-stream query engine". For example, the BYTE_STREAM_SPLIT additions were motivated by size measurements with different encodings and compression algorithms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think a lot of features will not need this per @etseidl point on a previous revision, this helps avoid extraneous features that sound good in practice but don't provide much value in practice. Other areas where this is more likely to apply could be for more complicated indexing schemes.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated
the source of truth for the encoding).

3. New compression mechanisms must have a pure Java implementation that can be
used as dependency in parquet-java.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR, is it the case for all current compression algorithms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't audited myself, on the discussion it was asserted it was the case for parquet-java.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated
Comment on lines 130 to 131
format readers, libaries should consider delaying default enablement until 1
year after the release of the parquet-java implementation that contains the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still skeptical that this should suggest a precise delay for third-party implementations. Or, alternatively, replace "should" with "may"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a point of discussion. I'm OK changing the language either way but would like to get more opinions before doing so. I think specifying some timeline is important, even if non-binding for third party implementations as it gives third-parties a sense of timeline for adopting features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can either invite more parties to this discussion, or raise it on the ML?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll plan on doing one more pass to address open comments then start a discuss thread on the mailing list (hopefully I will have time to do this today)

CONTRIBUTING.md Outdated
3. Forwards incompatible features/changes should not be turned on by default
until 2 years after the parquet-java implementation containing the feature is
released. It is recommended that changing the default value for a forward
incompatible feature flag be done as part of a major release of an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that "major release" is vague and potentially misleading, I would remove this recommendation as it is not really helpful for implementors, IMHO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll see if I can rephrase to something more reasonable otherwise we can remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rephrase by only giving examples, let me know if you think this language is still too strong or still not helpful.

feature implementation.

3. Forwards incompatible features/changes should not be turned on by default
until 2 years after the parquet-java implementation containing the feature is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still skeptical about a delay.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per above, I can understand the skepticism, I'd like to get more opinions here and we can update accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a shortcut like "or a PMC vote was in favour of it". If something gets really fast adoption, then we could enable it sooner.

How did you come up with "2 years"? Was this a guess or a deeper reasoning behind it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My suggestion is to avoid this sentence and just start with "It is recommended that changing the default value for a forward incompatible feature flag should be clearly advertised to consumers..."

Otherwise, this could be read as "any change made to parquet will not even plausibly be used for at least two years". I think it would be a shame if people used this 2 years as a reason to slow down their adoption of new parquet features

I think parquet would be better off if we let market demand drive the proliferation of features rather than trying to control a rollout. The thinking is that if new files begin appearing that are not readable by some system that is lagging, it would be beneficial to add pressure on that system upgrading their support

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add a shortcut like "or a PMC vote was in favour of it". If something gets really fast adoption, then we could enable it sooner.

I think we should either do away with the guidance or release it. PMC can always modify the language necessary via vote if necessary, but I'm not sure there is a good scenario where we want to actively break guidance?

How did you come up with "2 years"? Was this a guess or a deeper reasoning behind it?

Mostly guesstimate and it might actually be too short. This assumes a least some downstream dependencies only do major releases on a yearly basis as well. So it leaves time for a downstream system to pickup a release in the following year and then ~1 year for rollout. Taking Spark as an example, it appears to average about 3.5 years beetween major releases. Minor releases are aimed at every six-months, but not everyone is going to use a release immediately. If you look at EMR ignoring the LTE release of 2.4.8, they are currently supporting 3.5.0 and 3.4.1. 3.4.1 was released 9 months ago and ~1 year ago respectively. Dataproc, explicitly supports images for 24 months, but the actual Spark versions are older then AWS (i.e. 3.3)

If we do a parquet 2.0 release with a breaking change, we will probably learn how long it has taken people to actually adopt some of the newer breaking features.

I think parquet would be better off if we let market demand drive the proliferation of features rather than trying to control a rollout. The thinking is that if new files begin appearing that are not readable by some system that is lagging, it would be beneficial to add pressure on that system upgrading their support

I think this ignores how long it takes changes to roll through the ecosystem, I detailed more in my reply on why 2 years above. I'd welcome people to start experimenting features early but I think not recommending at least some delay will cause a lot of unnecessary pain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this ignores how long it takes changes to roll through the ecosystem, I detailed more in my reply on why 2 years above

I am quite sure that stable mature enterprise platforms will take 2+ years to adopt any changes. However, this doesn't mean we should offer guidance that the entire ecosystem should follow that pace.

Instead I think we (the parquet community) should offer help / visibility into how far the feature has percolated through the ecosystem (e.g. by a compatibility matrix) rather than some prescriptive proposal of a rollout schedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thinking is that if new files begin appearing that are not readable by some system that is lagging, it would be beneficial to add pressure on that system upgrading their support

Having a not-before date published is a means of applying pressure without breaking anything. I think it is much more reasonable messaging for downstream consumers to say, "you are seeing files that you can't read because you haven't updated your software for a long time". It seems less reasonable to me to say to someone you are seeing "files you can't read because we a new feature was designed and implemented last month and we decided to make it default so you should upgrade your software as well and/or contribute an implementation to the software you are using". Also, maintainers have differing priorities, motivations and bandwidth. "Pressuring" them because things are broken immediately rather than giving a reasonable time frame for implementation does not seem to me to be a considerate thing to do.

I am quite sure that stable mature enterprise platforms will take 2+ years to adopt any changes. However, this doesn't mean we should offer guidance that the entire ecosystem should follow that pace.

Perhaps there is more word-smithing needed here. This guidance is about when forward incompatible files should start being written by default. I expect that early adopters will opt into features sooner, especially if new features are appropriately publicized. Early adopters are also usually more sophisticated and can better assess the risks for there use-case. There is already language here saying users can decide for themselves if they are sure they are using Parquet in a closed ecosystem (maybe it can be strengthened?). I am more concerned about late adopters. I'd love to see new features "on" by default, but I just don't think it is serves the broader ecosystem well to do so. As a point of reference even two years might not be sufficient as seen by trinodb/trino#7953 (part of this is likely partially caused over confusion of "V2").

Instead I think we (the parquet community) should offer help / visibility into how far the feature has percolated through the ecosystem (e.g. by a compatibility matrix) rather than some prescriptive proposal of a rollout schedule.

I think both are useful. A matrix doesn't help further adoption without a mechanism for guiding when a feature should be turned on by default. I think it is especially difficult without more contextualized information about relative popularity of readers (e.g. if a parquet haskell implementation is lagging behind on a feature should it give pause for enablement).

Ultimately, implementations not under the governance of the Parquet PMC are free to do what they want but I think providing official and consistent guidance here to give some lead time for breaking changes is in the best interest of everyone using Parquet.

To annotate some of the options already suggested on the thread:

  1. Don't provide guidance. I think this will result in either the current status quo which is to never turn anything on by default OR lead to many incompatibilities across the ecosystem.
  2. PMC votes on when to recommend a feature be turned on (possibly informed by feature matrix?). I don't think this provides consistency (or if there is a consistent means of making decision we should simply document it here and avoid the overhead of votes).
  3. Time based implementation/implementations. Provides a consistent criteria that other implementations can understand and take action on.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
Co-authored-by: Antoine Pitrou <pitrou@free.fr>
CONTRIBUTING.md Outdated

3. Forwards incompatible. A file written under a newer version of the format with
the feature enabled cannot be read under an older version of the format (e.g.
adding and using a new compression algorithm).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One clarifying question here - does "cannot be read under an older version of the format" allow for "cannot be read correctly under an older version of the format"? I suspect we want that be considered a breaking change.

For reference I have the Arrow spec in mind which defines forward compatibility as "An older client library must be able to either read data generated from a new client library or detect that it cannot properly read the data." (emphasis mine). Is the "detect" case what we're calling "Forwards incompatible" here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to clarify this. Forward incompatible features I think should provide a signal to old readers so ideally an appropriate error message can be returned.

@emkornfield emkornfield changed the title DRAFT: PARQUET-2489: Strawman guidance on feature releases PARQUET-2489: Strawman guidance on feature releases Jun 26, 2024
Copy link
Contributor

@etseidl etseidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more suggestions.

As to the most controversial section, I think the current wording strikes a good balance. Implementers should consider the ramifications of defaulting to breaking changes, and I like the idea of providing a deadline for downstream apps to adapt. Otherwise Parquet will always be trapped in amber, unable to steer users towards beneficial new features because some downstream project refuses to update for a decade. And these are now recommendations (unlike earlier revisions), so an implementation could still decide that a new feature is so beneficial for its user base that it's worth risking compatibility problems (so long as it clearly documents how to revert to non-breaking behavior). Most implementations already seem to be pretty conservative with their choice of defaults, so I don't think this guidance will be too burdensome.

Overall I'm +1 for the release process, and +0.5 for the last section.

CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
CONTRIBUTING.md Outdated Show resolved Hide resolved
emkornfield and others added 3 commits June 26, 2024 14:12
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
Co-authored-by: Ed Seidl <etseidl@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

10 participants