Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate NormalizeFeatures from FFM trainer #2964

Merged
merged 6 commits into from Mar 18, 2019
Merged
Diff settings

Always

Just for now

@@ -153,15 +153,24 @@ private static JArray BuildInputManifest(IExceptionContext ectx, Type inputType,

// Instantiate a value of the input, to pull defaults out of.
var defaults = Activator.CreateInstance(inputType);

var collectedFields = new HashSet<string>();
var inputs = new List<KeyValuePair<Double, JObject>>();
foreach (var fieldInfo in inputType.GetFields(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance))
{
var inputAttr = fieldInfo.GetCustomAttributes(typeof(ArgumentAttribute), false).FirstOrDefault() as ArgumentAttribute;
if (inputAttr == null || inputAttr.Visibility == ArgumentAttribute.VisibilityType.CmdLineOnly)
continue;
var name = inputAttr.Name ?? fieldInfo.Name;
// The order of GetFields is stable, meaning that
// fields are always returned in the same order and for this reason
// unit tests to compare manifest are passing. For the same reason
// duplicate name skipped are always in the same correct order.
// Same name field can bubble up from base class even though
// its overidden / hidden, skip it.
if (collectedFields.Contains(name))

This comment has been minimized.

Copy link
@wschin

wschin Mar 15, 2019

Member

If there are two Features columns and the second one is hidden, which one will be picked up here? I couldn't see any mechanism to filter out hidden fields. Do I miss something? #Resolved

This comment has been minimized.

Copy link
@ganik

ganik Mar 15, 2019

Author Member

this is not about columns, this is about fields in a class. Did you mean fields? If yes, then pls read extensive comment i wrote on top


In reply to: 266176048 [](ancestors = 266176048)

This comment has been minimized.

Copy link
@wschin

wschin Mar 15, 2019

Member

Sorry. Please replace columns with fields in my comment.


In reply to: 266176509 [](ancestors = 266176509,266176048)

This comment has been minimized.

Copy link
@ganik

ganik Mar 15, 2019

Author Member

yes, i got it, pls see the comment i wrote on top


In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)

This comment has been minimized.

Copy link
@wschin

wschin Mar 16, 2019

Member

Do you think this comment is clearer?

                // The fields returned by GetFields are stably ordered starting from derived class' fields
                // and ending up with ones from base classes. Therefore,
                // unit tests to compare manifest are passing as long as the new field's type and name are identical to
                // what it overrides. For the same reason,
                // we can always skip the base filed overridden by another field introduced by "new".

In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)

This comment has been minimized.

Copy link
@ganik

ganik Mar 18, 2019

Author Member

Its not exactly correct. I am not sure that the order of fields are returned from derivatives and then for base classes. I didnt make such statement. Moreover in documentation for GetFields() its said that fields could be returned in any order. The only statements I am making: 1. Order is stable (meaning always the same order of fields) 2. For "currently duplicate" field (and its only one in our case NormalizeFields) order is "correct" with the fix, and that is the derivative is returned first and then the base. Note the difference, I do not state in general that derivative fields are returned first and then base one.


In reply to: 266179947 [](ancestors = 266179947,266176749,266176509,266176048)

This comment has been minimized.

Copy link
@ganik

ganik Mar 18, 2019

Author Member

talked to Wei-Sheng offline, this looks good to him as it is. Thank you


In reply to: 266546642 [](ancestors = 266546642,266179947,266176749,266176509,266176048)

continue;
var jo = new JObject();
jo[FieldNames.Name] = inputAttr.Name ?? fieldInfo.Name;
jo[FieldNames.Name] = name;
jo[FieldNames.Type] = BuildTypeToken(ectx, fieldInfo, fieldInfo.FieldType, catalog);
jo[FieldNames.Desc] = inputAttr.HelpText;
if (inputAttr.Aliases != null)
@@ -262,6 +271,7 @@ private static JArray BuildInputManifest(IExceptionContext ectx, Type inputType,
}

inputs.Add(new KeyValuePair<Double, JObject>(inputAttr.SortOrder, jo));
collectedFields.Add(name);
}
return new JArray(inputs.OrderBy(x => x.Key).Select(x => x.Value).ToArray());
}
@@ -81,7 +81,7 @@ public sealed class Options : TrainerInputBaseWithWeight
/// Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)]
public bool Normalize = true;

This comment has been minimized.

Copy link
@Ivanidzo4ka

Ivanidzo4ka Mar 14, 2019

Member

Normalize [](start = 24, length = 9)

I would double check with @wschin #Resolved

This comment has been minimized.

Copy link
@ganik

ganik Mar 14, 2019

Author Member

yep, did


In reply to: 265801014 [](ancestors = 265801014)

public new bool NormalizeFeatures = true;

This comment has been minimized.

Copy link
@wschin

wschin Mar 15, 2019

Member

Why do we need to hide it? Could we set the underlying field to always be false and expose Normalize as is? I feel your previous solution is just what we need (almost). #Closed

This comment has been minimized.

Copy link
@ganik

ganik Mar 15, 2019

Author Member

then there will be 2 Normalize* input fields mentioned in manifest.json and confusion over which one to use.


In reply to: 266176252 [](ancestors = 266176252)

This comment has been minimized.

Copy link
@wschin

wschin Mar 15, 2019

Member

No one cares about entry points this and next weeks, right? I roughly remember filtering out hidden fields is a bit difficult, so want to make your life easier..


In reply to: 266176427 [](ancestors = 266176427,266176252)

This comment has been minimized.

Copy link
@ganik

ganik Mar 15, 2019

Author Member

yes its difficult to filter out hidden fields, but its currently not needed in this PR. It would be good to fix this as NimbusML depends on manifest.json. This duplication will propagate confusion in set of params for FFM class in NimbusML as well as docs.


In reply to: 266176837 [](ancestors = 266176837,266176427,266176252)


/// <summary>
/// Extra feature column names. The column named <see cref="TrainerInputBase.FeatureColumnName"/> stores features from the first field.
@@ -137,7 +137,7 @@ public sealed class Options : TrainerInputBaseWithWeight
/// The <see cref="TrainerInfo"/> containing at least the training data for this trainer.
/// </summary>
TrainerInfo ITrainer.Info => _info;
private static readonly TrainerInfo _info = new TrainerInfo(supportValid: true, supportIncrementalTrain: true);
private static readonly TrainerInfo _info = new TrainerInfo(normalization: false, supportValid: true, supportIncrementalTrain: true);

private int _latentDim;
private int _latentDimAligned;
@@ -225,7 +225,7 @@ private void Initialize(IHostEnvironment env, Options options)
_lambdaLatent = options.LambdaLatent;
_learningRate = options.LearningRate;
_numIterations = options.NumberOfIterations;
_norm = options.Normalize;
_norm = options.NormalizeFeatures;
_shuffle = options.Shuffle;
_verbose = options.Verbose;
_radius = options.Radius;
@@ -10155,26 +10155,6 @@
},
{
"Name": "NormalizeFeatures",
"Type": {
"Kind": "Enum",
"Values": [
"No",
"Warn",
"Auto",
"Yes"
]
},
"Desc": "Normalize option for the feature column",
"Aliases": [
"norm"
],
"Required": false,
"SortOrder": 5.0,
"IsNullable": false,
"Default": "Auto"
},
{
"Name": "Normalize",
"Type": "Bool",
"Desc": "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length",
"Aliases": [
ProTip! Use n and p to navigate between commits in a pull request.
You can’t perform that action at this time.