Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-3093] Introduce annotations for interface stability in flink-core #1427

Closed

Conversation

rmetzger
Copy link
Contributor

@rmetzger rmetzger commented Dec 1, 2015

Please see #1426

@@ -33,6 +34,7 @@
* @param <U> Type of the pair's first element.
* @param <V> Type of the pair's second element.
*/
@PublicInterface
public abstract class Pair<U extends Key<U>, V extends Key<V>> implements Key<Pair<U, V>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely, yes. I've asked Stephan in the JIRA just to make sure.

@aljoscha
Copy link
Contributor

aljoscha commented Dec 7, 2015

Are inner classes of a class tagged as PublicInterface also marked as public/stable?

@StephanEwen
Copy link
Contributor

Thanks for the initiative, Robert. I had a look and here are some thoughts:

General Comments

  • The interface declarations should declare their targets:
    • @publicinterface should have @Target(ElementType.TYPE)
    • @PublicExperimental should have @Target({ ElementType.TYPE, ElementType.METHOD })
  • How about shortening the name @PublicInterface to @Public or @PublicAPI

Stability decisions to make before 1.0

  • We need to think about whether we change the RichGroupReduceFunction to implement the regular combine function (combining into one value) rather than the group combine function. While API breaking, it allows for more efficient pre-aggregations.

Classes that I suggest not to declare public

  • Key (I would suggest to entirely remove that class, separate Pull Requests)
  • Nothing (May even remove that and replace it by Void)
  • InputSplitAssigner : I think we need to change that one a bit to support more fine grained fault tolerance for data sources.
  • ConfigConstants : Not sure if we should encourage people to use this class.
  • AccumulatorHelper : No need to expose this, and might change.
  • IntCounter, LongCounter, DoubleCounter : These are actually "sums", not couners. If we declare them stable, we should at least fix their wrong name ;-)
  • Anything under org.apache.flink.api.common.aggregators. Would be nice to have aggregators unified with accumulators in the long run.
  • For the same reason, we need to mark the Aggregator releated methods in IterationRuntimeContext as experimental.
  • StrictlyLocalAssignment : This is still experimental, we have not yet had users of that and a chance to see it validated.
  • SemanticProperties could be declared public experimental.
  • TypeSerializer and TypeComparator : These are not touched by users usually, and we may want to change them a lot depending on future designs for serialization improvements and intermediate data layout.

The following classes should be declared as stable, in my opinion:

  • GenericInputSplit : Part of the publis stable GenericInputFormat
  • SplittableIterator : This is an exposed class for parallel collection sources.
  • TraversableOnceException : Simple (and virtually empty) exception class thrown into user code by the runtime.

Classes that are good candidates for public API:

  • NumberSequenceIterator : This is a good and stable utility class.
  • XORShiftRandom : This is a good and stable utility class.
  • MemorySegment : If in the future, we allow UDFs to reserve and use managed memory (as suggested by Chengxliang Li's pull request, this will need to be declared stable later anyways)

@hsaputra
Copy link
Contributor

Since this is primarily used to tag Interface or methods, could we just call the annotations as Public and Experimental rather than PublicInterface and PublicExperimental ?

Seemed redundant to me.

@rmetzger
Copy link
Contributor Author

Thank you all for the comments.
I'll rename the interfaces to @Public and @Experimental.

I'll keep the annotation on the ConfigConstants class to ensure that we are not breaking the configuration parameters.
The tool will detect changes to the variable names, not to the values.
Adding @deprecated annotations is not a breaking change, so we can rename configs by adding the new name and Deprecating the old one.

@rmetzger
Copy link
Contributor Author

I didn't make MemorySegment a public API. Lets declare it in the pull request as stable.
I decided to mark XORShiftRandom and NumberSequenceIterator as stable APIs.

@rmetzger
Copy link
Contributor Author

Are inner classes of a class tagged as PublicInterface also marked as public/stable?

Yes!

@rmetzger rmetzger force-pushed the interface_stability_no_maven-core branch from a08ee41 to 8dd0381 Compare December 15, 2015 14:15
@rmetzger
Copy link
Contributor Author

I addressed all issues you've mentioned and rebased the code to the current master. I would like to merge this soon!

@fhueske
Copy link
Contributor

fhueske commented Dec 16, 2015

Hi,

I had a look and have some comments as well

  • ExecutionConfig: Do we want to make the whole class public? Are we sure that we do not change the configuration of certain streaming features such as enableTimestamps(), disableTimestamps(), setAutoWatermarkInterval()? I find these options in combination with TimeCharacteristics very confusing and I'm not sure if we should mark them already stable. I talked to @aljoscha and he's not satisfied with the current implementation as well.
  • ExecutionMode: IMO, the name is quite generic for it's purpose. The mode is is only relevant for network shuffles in the batch programs.
  • JobExecutionResult.getIntCounterResult(): Do we need and want to keep this method? IMO, it should be rather deprecated than marked as stable.
  • Partitioner: Does it make sense to extend the partition(K key, int numPartition) method by a paramter for the current partitionId? I remember I had some use cases for this, but forgot about the details.

We need to carefully check the class hierarchy of public/stable classes. For example, if we make DefaultInputSplitAssigner a Public interface, we don't need to make InputSplitAssigner Experimental because we cannot touch it then. There might be more of these cases in this PR (and PR #1428).

@rmetzger
Copy link
Contributor Author

rmetzger commented Jan 6, 2016

Thanks a lot for looking into this!

In the ExecutionConfig I've made the following methods experimental:

  • setAutoWatermarkInterval
  • enableTimestamps
  • disableTimestamps
  • areTimestampsEnabled
  • getAutoWatermarkInterval
  • setCodeAnalysisMode
  • getCodeAnalysisMode

Regarding the ExecutionMode, I'll bring the issue to the mailing list.

JobExecutionResult.getIntCounterResult() marked Experimental and Deprecated

I also remember that there was an issue like this with the Partitioner. I'll also bring this to the mailing list.

I removed the @Public annotation from DefaultInputSplitAssigner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants