Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize documentation #3244

Merged
merged 12 commits into from Apr 15, 2019

Conversation

Ivanidzo4ka
Copy link
Contributor

towards #1209

@codecov
Copy link

codecov bot commented Apr 8, 2019

Codecov Report

Merging #3244 into master will decrease coverage by <.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3244      +/-   ##
==========================================
- Coverage   72.65%   72.64%   -0.01%     
==========================================
  Files         807      807              
  Lines      145191   145191              
  Branches    16223    16223              
==========================================
- Hits       105485   105480       -5     
- Misses      35290    35293       +3     
- Partials     4416     4418       +2
Flag Coverage Δ
#Debug 72.64% <ø> (-0.01%) ⬇️
#production 68.17% <ø> (-0.01%) ⬇️
#test 88.97% <ø> (ø) ⬆️
Impacted Files Coverage Δ
src/Microsoft.ML.Transforms/NormalizerCatalog.cs 84.78% <ø> (ø) ⬆️
src/Microsoft.ML.Maml/MAML.cs 24.75% <0%> (-1.46%) ⬇️
...soft.ML.Data/DataLoadSave/Text/TextLoaderCursor.cs 84.7% <0%> (-0.21%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️

foreach (var row in columnFixZero)
Console.WriteLine(string.Join(", ", row.Select(x => x.ToString("f4"))));
// 1.0000, 0.0000, 1.0000, 0.0000
// 0.5000, 0.5000, 0.0000, 0.5000
Copy link
Contributor

@artidoro artidoro Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you add (here and in other files)
// Expected output: #Resolved

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect expected values are incorrect: input: 2, 2, 2, 0 minmaxnormalized returns 1,1,1,0 or am I missing something?

var transformParams = (normalizeTransform.GetNormalizerModelParameters(0) as Microsoft.ML.Transforms.NormalizingTransformer.BinNormalizerModelParameters<ImmutableArray<float>>);
Console.WriteLine($"Values for slot 0 would be transfromed by applying y = (Index(x) / {transformParams.Density[0]}) - {(transformParams.Offset.Length == 0 ? 0 : transformParams.Offset[0])}");
Console.WriteLine("Where Index(x) is index of bin to which x belongs");
Console.WriteLine($"Bins upper borders are: {string.Join(" ", transformParams.UpperBounds[0])}");
Copy link
Contributor

@artidoro artidoro Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

borders [](start = 43, length = 7)

Maybe would suggest to use "bounds" instead of "borders". #Resolved

// 0.0000, -0.5000, 0.0000, 1.0000
// 0.0000, -0.5000, 0.0000, 1.0000

// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for function.
Copy link
Contributor

@artidoro artidoro Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for function. [](start = 118, length = 13)

.... as parameter to GetNormalizerModelParameters. #Resolved

// 0.0000, -0.5000, 0.0000, 1.0000

// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for function.
// If case if we have multiple column transformations we need to pass index of InputOutputColumnPair.
Copy link
Contributor

@artidoro artidoro Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If case [](start = 15, length = 8)

Remove "If case", ere and in other files. #Resolved

// If case if we have multiple column transformations we need to pass index of InputOutputColumnPair.
var transformParams = (normalizeTransform.GetNormalizerModelParameters(0) as Microsoft.ML.Transforms.NormalizingTransformer.AffineNormalizerModelParameters<ImmutableArray<float>>);
Console.WriteLine($"Values for slot 1 would be transfromed by applying y= (x - ({(transformParams.Offset.Length == 0 ? 0 : transformParams.Offset[1])})) * {transformParams.Scale[1]}");
// Values for slot 1 would be transfromed by applying y= (x - (-1)) * 0.3333333
Copy link
Contributor

@artidoro artidoro Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

y= [](start = 66, length = 3)

could you add a space after "y"? #Resolved

var normalize = mlContext.Transforms.NormalizeMeanVariance("Features", useCdf: true);

// NormalizeMeanVariance normalizes the data based on the computed mean and variance of the data.
var normalizeNoCdf = mlContext.Transforms.NormalizeMeanVariance("Features", useCdf: false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not repeat what NormalizeMeanVariance does.


// NormalizeBinning normalizes the data by constructing equidensity bins and produce output based on
// to which bin original value belong but make sure zero values would remain zero after normalization.
// Helps preserve sparsity.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I would not repeat this explanation. Here and also in other samples.

Console.WriteLine("Where Index(x) is index of bin to which x belongs");
Console.WriteLine($"Bins upper borders are: {string.Join(" ", fixZeroParams.UpperBounds[1])}");
// Values for slot 1 would be transfromed by applying y = (Index(x) / 2) - 0.5
// Where Index(x) is index of bin to which x belongs
Copy link
Contributor

@artidoro artidoro Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where Index(x) is index of bin to which x belongs [](start = 15, length = 49)

// Where Index(x) is the index of the bin to which x belongs" #Resolved

var normalize = mlContext.Transforms.NormalizeBinning("Features", maximumBinCount: 4, fixZero: false);

// NormalizeBinning normalizes the data by constructing equidensity bins and produce output based on
// to which bin original value belong but make sure zero values would remain zero after normalization.
Copy link
Contributor

@zeahmed zeahmed Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate comment...see line 26-27 #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh I see you are making another estimator here...


In reply to: 273736107 [](ancestors = 273736107)


// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters.
// If we have multiple column transformations we need to pass index of InputOutputColumnPair.
var transformParams = (normalizeTransform.GetNormalizerModelParameters(0) as Microsoft.ML.Transforms.NormalizingTransformer.BinNormalizerModelParameters<ImmutableArray<float>>);
Copy link
Contributor

@zeahmed zeahmed Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Microsoft.ML.Transforms [](start = 89, length = 23)

Can we make it a using statement? This line is getting very long. #Resolved

// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters.
// If we have multiple column transformations we need to pass index of InputOutputColumnPair.
var transformParams = normalizeTransform.GetNormalizerModelParameters(0) as CdfNormalizerModelParameters<ImmutableArray<float>>;
Console.WriteLine($"Values for slot 1 would be transfromed by applying y = 0.5* (1 + ERF((Math.Log(x)- {transformParams.Mean[1]}) / ({transformParams.StandardDeviation[1]} * sqrt(2)))" );
Copy link
Contributor

@zeahmed zeahmed Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slot 1 [](start = 43, length = 6)

What are slot 0 and slot 1? In this particular sample this one and below are both slot 1? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rephrased


In reply to: 273739363 [](ancestors = 273739363)

// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters.
// If we have multiple column transformations we need to pass index of InputOutputColumnPair.
var transformParams = normalizeTransform.GetNormalizerModelParameters(0) as AffineNormalizerModelParameters<ImmutableArray<float>>;
Console.WriteLine($"Values for slot 1 would be transfromed by applying y = (x - ({(transformParams.Offset.Length == 0 ? 0 : transformParams.Offset[1])})) * {transformParams.Scale[1]}");
Copy link
Contributor

@zeahmed zeahmed Apr 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

slot 1 [](start = 43, length = 6)

see my comments in other sample. #Resolved


namespace Samples.Dynamic
{
public class NormalizeBinning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeBinning [](start = 17, length = 16)

one line comment about what this does.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see that pattern in our documentation


In reply to: 273786125 [](ancestors = 273786125)

// NormalizeBinning normalizes the data by constructing equidensity bins and produce output based on
// to which bin original value belong but make sure zero values would remain zero after normalization.
// Helps preserve sparsity.
var normalizeFixZero = mlContext.Transforms.NormalizeBinning("Features", maximumBinCount: 4, fixZero: true);
Copy link
Member

@sfilipi sfilipi Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixZero: true [](start = 105, length = 13)

love the variation on the parameter! #WontFix

// Expected output:
// 1.0000, 0.6667, 1.0000, 0.0000
// 0.6667, 1.0000, 0.6667, 0.0000
// 0.3333, 0.3333, 0.3333, 0.0000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.3333 [](start = 24, length = 6)

I'd emphasize this value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get your comment.


In reply to: 273786583 [](ancestors = 273786583)

// 0.3333, 0.0000, 0.3333, 0.0000
// 0.0000, -0.3333, 0.0000, 1.0000

// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to pass 0, the index of this column in the dataview, as parameter for..

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No


In reply to: 273786759 [](ancestors = 273786759)

// 0.0000, -0.3333, 0.0000, 1.0000

// Let's get transformation parameters. Since we work with only one column we need to pass 0 as parameter for GetNormalizerModelParameters.
// If we have multiple column transformations we need to pass index of InputOutputColumnPair.
Copy link
Member

@sfilipi sfilipi Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

multiple column transformations [](start = 26, length = 31)

multiple columns #Resolved

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

var mlContext = new MLContext();
var samples = new List<DataPoint>()
{
new DataPoint(){ Features = new float[4] { 1, 1, 3, 0} },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe no need to have two columns with the same exact values.

/// [!code-csharp[NormalizeMinMax](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeMinMax.cs)]
/// ]]>
/// </format>
/// </example>
public static NormalizingEstimator NormalizeMinMax(this TransformsCatalog catalog, InputOutputColumnPair[] columns,
Copy link
Member

@sfilipi sfilipi Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NormalizeMinMax [](start = 43, length = 15)

you're not invoking this API. I would not put the example here. #Resolved

@@ -137,7 +165,7 @@ public static class NormalizationCatalog
/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[Normalize](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Normalizer.cs)]
/// [!code-csharp[NormalizeLogMeanVariance](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeLogMeanVariance.cs)]
Copy link
Member

@sfilipi sfilipi Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[!code-csharp[NormalizeLogMeanVariance] [](start = 12, length = 39)

you're not invoking this API. I would not put the example here. #Resolved

@@ -175,6 +210,13 @@ public static class NormalizationCatalog
/// <param name="maximumExampleCount">Maximum number of examples used to train the normalizer.</param>
/// <param name="fixZero">Whether to map zero to zero, preserving sparsity.</param>
/// <param name="maximumBinCount">Maximum number of bins (power of 2 recommended).</param>
/// <example>
Copy link
Member

@sfilipi sfilipi Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're not invoking this API. I would not put the example here. #Resolved

/// [!code-csharp[NormalizeBinning](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Transforms/NormalizeSupervisedBinning.cs)]
/// ]]>
/// </format>
/// </example>
Copy link
Member

@sfilipi sfilipi Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[](start = 21, length = 1)

you're not invoking this API. I would not put the example here. #Resolved

using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using static Microsoft.ML.Transforms.NormalizingTransformer;
Copy link
Contributor

@shmoradims shmoradims Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? #ByDesign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, otherwise I need to write Microsoft.ML.Transforms.NormalizingTransformer.BinNormalizerModelParameters which takes too much space


In reply to: 274057369 [](ancestors = 274057369)

// If we have multiple column transformations we need to pass index of InputOutputColumnPair.
var transformParams = normalizeTransform.GetNormalizerModelParameters(0) as BinNormalizerModelParameters<ImmutableArray<float>>;
Console.WriteLine($"Values for slot 0 would be transfromed by applying y = (Index(x) / {transformParams.Density[0]}) - {(transformParams.Offset.Length == 0 ? 0 : transformParams.Offset[0])}");
Console.WriteLine("Where Index(x) is the index of the bin to which x belongs");
Copy link
Contributor

@shmoradims shmoradims Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please break super-long lines into two lines #Resolved

using System.Linq;
using Microsoft.ML;
using Microsoft.ML.Data;
using static Microsoft.ML.Transforms.NormalizingTransformer;
Copy link
Contributor

@shmoradims shmoradims Apr 10, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sing static Microsoft.ML.Transforms.NormalizingTransformer; [](start = 1, length = 59)

ditto #Resolved

Copy link
Contributor

@shmoradims shmoradims left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

var normalize = mlContext.Transforms.NormalizeBinning("Features", maximumBinCount: 4, fixZero: false);

// NormalizeBinning normalizes the data by constructing equidensity bins and produce output based on
// to which bin original value belong but make sure zero values would remain zero after normalization.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please move the details to similar to your other PR

@Ivanidzo4ka Ivanidzo4ka merged commit 96b3b2a into dotnet:master Apr 15, 2019
@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 23, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants