Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-12881][ml] Add more functionalities for ML Params and ParamInfo class #8776

Closed
wants to merge 1 commit into from

Conversation

xuyang1706
Copy link
Contributor

@xuyang1706 xuyang1706 commented Jun 18, 2019

What is the purpose of the change

Add more functionalities for ML Params and ParamInfo class

Brief change log

  1. change Params$get to support aliases
  • support aliases
  • if the Params contains the specific parameter and alias, but has more than one value or
  • if the Params doesn't contains the specific parameter, while the ParamInfo is optional but has no default value, it will throw exception
  • when isOptional is true, contain(ParamInfo) is true, it will return the value found in Params, whether the value is null or not. when isOptional is true, contain(ParamInfo) is false: hasDefaultValue is true, it will return defaultValue. hasDefaultValue is false, it will throw exception. developer should use contain to check that Params has ParamInfo or not.
  1. add size, clear, isEmpty, contains, fromJson in Params
  2. fix ParamInfo, PipelineStage to adapt new Params
  • assert null in alias
  • use Params$loadJson in PipelineStage
  1. add test cases about aliases

Verifying this change

This change added tests and can be verified as follows:

(example:)

  • add put null in Params

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (yes)
  • If yes, how is the feature documented? (javaDocs)

@flinkbot
Copy link
Collaborator

flinkbot commented Jun 18, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ✅ 1. The [description] looks good.
  • ✅ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ✅ 4. The change fits into the overall [architecture].
  • ✅ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

for (String nameOrAlias : getParamNameAndAlias(info)) {
if (params.containsKey(nameOrAlias)) {
if (usedParamName != null) {
throw new IllegalArgumentException(String.format("Duplicate parameters of %s and %s",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure about the usefulness of the information in the error message? I suppose usedParamName and nameOrAlias are same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usedParamName is the first found in the list returned by getParamNameAndAlias(info), so when the map contain the param name or the alias more than one, it will throw exception.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, thanks for the explanation.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add test case for check it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added to o.a.f.ml.api.misc.ParamsTest$testGetAliasParam

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

*
* example:
*
* in {@link HasPredDetailColName}, it should be optional, and has not default value.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we should not link from flink-ml-api model to a param defined in flink-ml-lib. flink-ml-api should be independent. Please check.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@xuyang1706
Copy link
Contributor Author

@becketqin @walterddr @ex00 @c4emmmm @zentol Thanks for all your comments. It seems all comments have been addressed, please let me know if there are more pending questions.

cc @shaoxuan-wang

Copy link
Contributor

@becketqin becketqin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuyang1706 Thanks for the patch. It LGTM overall. I gave a pass and left some comment. Can you take a look?


/**
* Definition of a parameter, including name, type, default value, validator and so on.
*
* <p>This class is provided to unify the interaction with parameters.
*
* <p>when isOptional is true, contain(ParamInfo) is true, it will return the value found in Params,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is only about definition. It might be better to keep the semantic of contains() and get() in there own method java doc. How about just mention the following here? Also, should we throw exception if a default value is specified for a non-optional parameter?

/**
 * Definition of a parameter, including name, type, default value, validator and so on.
 *
 * <p>A parameter can either be optional or non-optional.
 * <ul>
 *     <li>
 *         A non-optional parameter should not have a default value. Instead, its value must be provided by the users.
 *     </li>
 *     <li>
 *         An optional parameter may or may not have a default value.
 *     </li>
 * </ul>
 *
 * <p>Please see {@link Params#get(ParamInfo)} and {@link Params#contains(ParamInfo)} for more details about the behavior.
 *
 * <p>A parameter may have aliases in addition to the parameter name for convenience and compatibility purposes. One
 * should not set values for both parameter name and an alias. One and only one value should be set either under
 * the parameter name or one of the alias.
 *
 * @param <V> the type of the param value
 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, your description is more clear. The default value in the non-optional parameter will never be used, I think it is not necessary to check this by extra codes.

*/
public boolean isEmpty() {
return params.isEmpty();
}

/**
* Returns the value of the specific parameter, or default value defined in the {@code info} if
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved some of the java doc from ParamInfo to here. See if it is clearer.

	/**
	 * Returns the value of the specific parameter, or default value defined in the {@code info} if
	 * this Params doesn't have a value set for the parameter. An exception will be thrown in the
	 * following cases because no value could be found for the specified parameter.
	 * <ul>
	 *     <li>
	 *         Non-optional parameter: no value is defined in this params for a non-optional parameter.
	 *     </li>
	 *     <li>
	 *         Optional parameter: no value is defined in this params and no default value is defined.
	 *     </li>
	 * </ul>
	 *
	 * @param info the info of the specific parameter, usually with default value
	 * @param <V>  the type of the specific parameter
	 * @return the value of the specific parameter, or default value defined in the {@code info} if
	 * this Params doesn't contain the parameter
	 * @throws IllegalArgumentException if no value can be found for specified parameter
	 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed.

Params newParams = new Params();
newParams.paramMap.putAll(this.paramMap);
return newParams;
public <V> boolean contains(ParamInfo<V> paramInfo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument name is info everywhere else. Can we just use info instead of paramInfo here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your advice, info is better.

}

/**
* Creates and returns a deep clone of this Params.
* Returns <tt>true</tt> if this params has the specified paramInfo.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved some of the java doc from ParamInfo here. See if it is clearer.

	/**
	 * Check whether a value is set for the given {@code paramInfo}. A value is considered set in
	 * the following cases:
	 * <ul>
	 *     <li>
	 *         The given parameter has a value set in this Params, or
	 *     </li>
	 *     <li>
	 *         The given parameter is optional and has a default value defined in the {@code paramInfo}
	 *     </li>
	 * </ul>
	 *
	 * If this method returns false, {@link #get(ParamInfo)} will throw an exception.
	 *
	 * @return <tt>true</tt> if this params has a value set for the specified {@code paramInfo}, false otherwise.
	 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@becketqin , refer your feedback, I change the doc. Could you take a look?

newParams.paramMap.putAll(this.paramMap);
return newParams;
public <V> boolean contains(ParamInfo<V> paramInfo) {
return params.containsKey(paramInfo.getName()) ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is different from the java doc. The java doc in ParamInfo said the following:

"when isOptional is true, contain(ParamInfo) is true, it will return the value found in Params, ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, changed.

…o class

1. change Params$get to support aliases
 * support aliases
 * if the Params contains the specific parameter and alias, but has more than one value, it will be throw exception
 * if the Params doesn't contains the specific parameter, while the ParamInfo is optional but has no default value, it will throw exception
 * when isOptional is true, contain(ParamInfo) is true, it will return the value found in Params, whether the value is null or not. when isOptional is true, contain(ParamInfo) is false: hasDefaultValue is true, it will return defaultValue. hasDefaultValue is false, it will throw exception. developer should use contain to check that Params has ParamInfo or not.

2. add size, clear, isEmpty, contains, fromJson in Params
3. fix ParamInfo, PipelineStage to adapt new Params
 * assert null in alias
 * use Params$loadJson in PipelineStage

4. add test cases about aliases
@becketqin
Copy link
Contributor

@xuyang1706 Thanks for the update. The latest patch LGTM.
@shaoxuan-wang Do you want to take another look?

@shaoxuan-wang shaoxuan-wang self-assigned this Jun 25, 2019
shaoxuan-wang pushed a commit to shaoxuan-wang/flink that referenced this pull request Jun 25, 2019
@shaoxuan-wang
Copy link
Contributor

Thanks @xuyang1706 for the contribution, and @ex00 @becketqin @c4emmmm for the reviews.
The changes look good to me. I will merge it once the CI passes on my repo.

@flinkbot approve all

Copy link

@ex00 ex00 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thansk @xuyang1706
LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants