Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove duplicate NormalizeFeatures from FFM trainer #2964

Merged
merged 6 commits into from Mar 18, 2019

Conversation

Projects
None yet
3 participants
@ganik
Copy link
Member

ganik commented Mar 14, 2019

Fixes #2958. There are two flags controlling normalization steps right before and in FFM trainer. We decide to disable the former one because FFM has its own built-in normalization for multiple feature columns and the other normalization only works with a single feature column.

/// Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)]
public bool Normalize = true;

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Mar 14, 2019

Member

Normalize [](start = 24, length = 9)

I would double check with @wschin #Resolved

This comment has been minimized.

@ganik

ganik Mar 14, 2019

Author Member

yep, did


In reply to: 265801014 [](ancestors = 265801014)

@ganik ganik requested a review from wschin Mar 14, 2019

Gani Nazirov
@wschin

wschin approved these changes Mar 14, 2019

@wschin
Copy link
Member

wschin left a comment

Wait... @ganik discussed it with me offline but it seems I reached a wrong conclusion. Let me think about it again..

@ganik

This comment has been minimized.

Copy link
Member Author

ganik commented Mar 14, 2019

Wait... @ganik discussed it with me offline but it seems I reached a wrong conclusion. Let me think about it again..

sure #Closed

Gani Nazirov added some commits Mar 15, 2019

@@ -225,7 +220,7 @@ private void Initialize(IHostEnvironment env, Options options)
_lambdaLatent = options.LambdaLatent;
_learningRate = options.LearningRate;
_numIterations = options.NumberOfIterations;
_norm = options.Normalize;
_norm = (options.NormalizeFeatures == NormalizeOption.Yes);

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Mar 15, 2019

Member

NormalizeFeatures [](start = 29, length = 17)

Just curious, why you decide not have NormalizeOption.Auto here? #Resolved

This comment has been minimized.

@ganik

ganik Mar 15, 2019

Author Member

By default normalization should be off as per Wei-Sheng


In reply to: 266086395 [](ancestors = 266086395)

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Mar 15, 2019

Member

public bool Normalize = true;
so this was mistake which we fixing.
Ok, if @wschin says so.


In reply to: 266129940 [](ancestors = 266129940,266086395)

@Ivanidzo4ka
Copy link
Member

Ivanidzo4ka left a comment

:shipit:

Gani Nazirov added some commits Mar 15, 2019

@ganik ganik requested a review from TomFinley Mar 15, 2019

@ganik

This comment has been minimized.

Copy link
Member Author

ganik commented Mar 15, 2019

Pls dont merge yet! #Closed

@codecov

This comment has been minimized.

Copy link

codecov bot commented Mar 15, 2019

Codecov Report

Merging #2964 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master    #2964      +/-   ##
==========================================
+ Coverage   72.29%   72.29%   +<.01%     
==========================================
  Files         796      796              
  Lines      142349   142354       +5     
  Branches    16051    16052       +1     
==========================================
+ Hits       102905   102911       +6     
  Misses      35063    35063              
+ Partials     4381     4380       -1
Flag Coverage Δ
#Debug 72.29% <100%> (ø) ⬆️
#production 68.01% <100%> (ø) ⬆️
#test 88.48% <ø> (ø) ⬆️
Impacted Files Coverage Δ
...soft.ML.EntryPoints/JsonUtils/JsonManifestUtils.cs 83.66% <100%> (+0.23%) ⬆️
...actorizationMachine/FactorizationMachineTrainer.cs 88.47% <100%> (ø) ⬆️
src/Microsoft.ML.Transforms/CategoricalCatalog.cs 100% <0%> (ø) ⬆️
src/Microsoft.ML.Data/Training/TrainerInputBase.cs 100% <0%> (ø) ⬆️
...icrosoft.ML.FastTree/RandomForestClassification.cs 78.07% <0%> (ø) ⬆️
...soft.ML.FastTree/Training/EarlyStoppingCriteria.cs 71.73% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/TreeTrainersCatalog.cs 94.18% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeRanking.cs 48.19% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeTweedie.cs 56.29% <0%> (ø) ⬆️
src/Microsoft.ML.FastTree/FastTreeRegression.cs 54.5% <0%> (ø) ⬆️
... and 8 more
#Closed

@Ivanidzo4ka Ivanidzo4ka changed the title Remove duplicate NormalizeFeatures from FFM trainer WIP Remove duplicate NormalizeFeatures from FFM trainer Mar 15, 2019

@Ivanidzo4ka

This comment has been minimized.

Copy link
Member

Ivanidzo4ka commented Mar 15, 2019

Pls dont merge yet!

Sure, I put WIP into PR title, feel free to take it out after you done. #Closed

// duplicate name skipped are always in the same correct order.
// Same name field can bubble up from base class even though
// its overidden / hidden, skip it.
if (collectedFields.Contains(name))

This comment has been minimized.

@wschin

wschin Mar 15, 2019

Member

If there are two Features columns and the second one is hidden, which one will be picked up here? I couldn't see any mechanism to filter out hidden fields. Do I miss something? #Resolved

This comment has been minimized.

@ganik

ganik Mar 15, 2019

Author Member

this is not about columns, this is about fields in a class. Did you mean fields? If yes, then pls read extensive comment i wrote on top


In reply to: 266176048 [](ancestors = 266176048)

This comment has been minimized.

@wschin

wschin Mar 15, 2019

Member

Sorry. Please replace columns with fields in my comment.


In reply to: 266176509 [](ancestors = 266176509,266176048)

This comment has been minimized.

@ganik

ganik Mar 15, 2019

Author Member

yes, i got it, pls see the comment i wrote on top


In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)

This comment has been minimized.

@wschin

wschin Mar 16, 2019

Member

Do you think this comment is clearer?

                // The fields returned by GetFields are stably ordered starting from derived class' fields
                // and ending up with ones from base classes. Therefore,
                // unit tests to compare manifest are passing as long as the new field's type and name are identical to
                // what it overrides. For the same reason,
                // we can always skip the base filed overridden by another field introduced by "new".

In reply to: 266176749 [](ancestors = 266176749,266176509,266176048)

This comment has been minimized.

@ganik

ganik Mar 18, 2019

Author Member

Its not exactly correct. I am not sure that the order of fields are returned from derivatives and then for base classes. I didnt make such statement. Moreover in documentation for GetFields() its said that fields could be returned in any order. The only statements I am making: 1. Order is stable (meaning always the same order of fields) 2. For "currently duplicate" field (and its only one in our case NormalizeFields) order is "correct" with the fix, and that is the derivative is returned first and then the base. Note the difference, I do not state in general that derivative fields are returned first and then base one.


In reply to: 266179947 [](ancestors = 266179947,266176749,266176509,266176048)

This comment has been minimized.

@ganik

ganik Mar 18, 2019

Author Member

talked to Wei-Sheng offline, this looks good to him as it is. Thank you


In reply to: 266546642 [](ancestors = 266546642,266179947,266176749,266176509,266176048)

@@ -81,7 +81,7 @@ public sealed class Options : TrainerInputBaseWithWeight
/// Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length.
/// </summary>
[Argument(ArgumentType.AtMostOnce, HelpText = "Whether to normalize the input vectors so that the concatenation of all fields' feature vectors is unit-length", ShortName = "norm", SortOrder = 6)]
public bool Normalize = true;
public new bool NormalizeFeatures = true;

This comment has been minimized.

@wschin

wschin Mar 15, 2019

Member

Why do we need to hide it? Could we set the underlying field to always be false and expose Normalize as is? I feel your previous solution is just what we need (almost). #Closed

This comment has been minimized.

@ganik

ganik Mar 15, 2019

Author Member

then there will be 2 Normalize* input fields mentioned in manifest.json and confusion over which one to use.


In reply to: 266176252 [](ancestors = 266176252)

This comment has been minimized.

@wschin

wschin Mar 15, 2019

Member

No one cares about entry points this and next weeks, right? I roughly remember filtering out hidden fields is a bit difficult, so want to make your life easier..


In reply to: 266176427 [](ancestors = 266176427,266176252)

This comment has been minimized.

@ganik

ganik Mar 15, 2019

Author Member

yes its difficult to filter out hidden fields, but its currently not needed in this PR. It would be good to fix this as NimbusML depends on manifest.json. This duplication will propagate confusion in set of params for FFM class in NimbusML as well as docs.


In reply to: 266176837 [](ancestors = 266176837,266176427,266176252)

@ganik ganik changed the title WIP Remove duplicate NormalizeFeatures from FFM trainer Remove duplicate NormalizeFeatures from FFM trainer Mar 18, 2019

@ganik ganik merged commit 0835c52 into dotnet:master Mar 18, 2019

3 checks passed

MachineLearning-CI #20190315.23 succeeded
Details
MachineLearning-CodeCoverage #20190315.23 succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.