Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internalize and cleanup recommender project #2451

Merged

Conversation

Ivanidzo4ka
Copy link
Contributor

@Ivanidzo4ka Ivanidzo4ka commented Feb 7, 2019

fixes #2276

@codecov
Copy link

codecov bot commented Feb 7, 2019

Codecov Report

❗ No coverage uploaded for pull request base (master@3b1358e). Click here to learn what that means.
The diff coverage is 76.06%.

@@            Coverage Diff            @@
##             master    #2451   +/-   ##
=========================================
  Coverage          ?   71.21%           
=========================================
  Files             ?      786           
  Lines             ?   140989           
  Branches          ?    16114           
=========================================
  Hits              ?   100408           
  Misses            ?    36112           
  Partials          ?     4469
Flag Coverage Δ
#Debug 71.21% <76.06%> (?)
#production 67.57% <70.21%> (?)
#test 85.3% <100%> (?)

@Ivanidzo4ka Ivanidzo4ka changed the title [WIP] Internalize recommender project Internalize recommender project Feb 7, 2019
/// <summary>
/// Returns right approximation matrix.
/// </summary>
/// <returns>Flattened into one dimension array with size equal to <see cref="ApproximationRank"/> * <see cref="NumberOfColumns"/></returns>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dimension [](start = 40, length = 9)

dimensional #Closed


/// <summary>
/// Approximation rank.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approximation rank. [](start = 16, length = 19)

Rank of approximation matrixes? #Closed

@Ivanidzo4ka Ivanidzo4ka changed the title Internalize recommender project Internalize and cleanup recommender project Feb 7, 2019
private readonly int _approximationRank;
// Packed _numberOfRows by _approximationRank matrix.
///<summary> The number of rows.</summary>
public readonly int NumberOfRows;
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is one line summary a new convention? I remember we used to have

<summary>
...
</summary>
``` #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can still use old one. This one is nice if you have short description.


In reply to: 254906400 [](ancestors = 254906400)

/// Returns left approximation matrix.
/// </summary>
/// <returns>Flattened into one-dimensional array with size equal to <see cref="NumberOfRows"/> * <see cref="ApproximationRank"/></returns>
public float[] GetLeftFactorMatrix()
{
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any test for the two public functions, GetLeftFactorMatrix and GetRightFactorMatrix? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I modified one of tests see MatrixFactorizationTests.cs


In reply to: 254907007 [](ancestors = 254907007)

/// Returns left approximation matrix.
/// </summary>
/// <returns>Flattened into one-dimensional array with size equal to <see cref="NumberOfRows"/> * <see cref="ApproximationRank"/></returns>
public float[] GetLeftFactorMatrix()
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float[] [](start = 15, length = 7)

Do we want to use IReadOnlyList to avoid active copy? #Resolved


public ColumnType OutputType { get { return NumberType.Float; } }
private ColumnType OutputType { get { return NumberType.Float; } }
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

{ get { return NumberType.Float; } } [](start = 38, length = 36)

Maybe => NumberType.Float? #Resolved

@wschin
Copy link
Member

wschin commented Feb 7, 2019

    {

We have a public Save here. Do we have a public Load or something similar? #Resolved


Refers to: src/Microsoft.ML.Recommender/MatrixFactorizationPredictor.cs:167 in 67e9f4d. [](commit_id = 67e9f4d, deletion_comment = False)


internal MatrixFactorizationPredictor(IHostEnvironment env, SafeTrainingAndModelBuffer buffer, KeyType matrixColumnIndexType, KeyType matrixRowIndexType)
internal MatrixFactorizationModelParameters(IHostEnvironment env, SafeTrainingAndModelBuffer buffer, KeyType matrixColumnIndexType, KeyType matrixRowIndexType)
{
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a public constructor for creating a MatrixFactorizationModelParameters from its public fields? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we don't.
See #1866


In reply to: 254909189 [](ancestors = 254909189)

/// <remarks>
/// Two values are allowed, <see cref="LossFunctionType.SquareLossRegression"/> or <see cref="LossFunctionType.SquareLossOneClass"/>.
/// The <see cref="LossFunctionType.SquareLossRegression"/> means traditional collaborative filtering problem with squared loss.
/// The <see cref="LossFunctionType.SquareLossOneClass"/> triggers one-class matrix factorization for implicit-feedback recommendation problem.
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add For the first case, please see Equation (1) in [https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf](https://www.csie.ntu.edu.tw/~cjlin/papers/libmf/mf_adaptive_pakdd.pdf) for the formulation considered. For the second case, please see Equation (3) in [http://yifanhu.net/PUB/cf.pdf](http://yifanhu.net/PUB/cf.pdf`). #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add that to loss functions


In reply to: 254911243 [](ancestors = 254911243)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check them as remarks on loss functions enum?


In reply to: 254911715 [](ancestors = 254911715,254911243)

/// </summary>
/// <remarks>
/// It's the weight of factor matrices' norms in the objective function minimized by matrix factorization's algorithm. A small value could cause over-fitting.
/// </remarks>
[Argument(ArgumentType.AtMostOnce, HelpText = "Regularization parameter. " +
"It's the weight of factor matrices' norms in the objective function minimized by matrix factorization's algorithm. " +
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

norms [](start = 53, length = 5)

Frobenius Norm #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying to summon demon in my PR?


In reply to: 254911433 [](ancestors = 254911433)

/// Rank of approximation matrixes.
/// </summary>
/// <remarks>
/// If input data has size of m by n we would build two approximation matrixes m by k and k by n where k is approxiation rank.
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m by n [](start = 42, length = 6)

m-by-n #Resolved

/// Rank of approximation matrixes.
/// </summary>
/// <remarks>
/// If input data has size of m by n we would build two approximation matrixes m by k and k by n where k is approxiation rank.
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

m by k [](start = 91, length = 6)

m-by-k #Resolved

/// Rank of approximation matrixes.
/// </summary>
/// <remarks>
/// If input data has size of m by n we would build two approximation matrixes m by k and k by n where k is approxiation rank.
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k by n [](start = 102, length = 6)

k-by-n #Resolved

/// Rank of approximation matrixes.
/// </summary>
/// <remarks>
/// If input data has size of m by n we would build two approximation matrixes m by k and k by n where k is approxiation rank.
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approxiation [](start = 120, length = 12)

approximation. #Resolved


/// <summary>
/// Number of thread will be used during training. If unspecified all aviable threads will be use.
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thread [](start = 26, length = 6)

threads #Resolved

/// <![CDATA[
/// [!code-csharp[MatrixFactorization](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Recommendation/MatrixFactorization.cs)]
/// ]]></format>
/// </example>
public MatrixFactorizationTrainer MatrixFactorization(
Copy link
Member

@wschin wschin Feb 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MatrixFactorization [](start = 46, length = 19)

Loss function is an important parameter. Could we have it here? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as soon as I add loss function I need to add rest of option parameters.
We have another overload with options, if user want to specify loss functions, he can use it.


In reply to: 254913045 [](ancestors = 254913045)

@Ivanidzo4ka
Copy link
Contributor Author

    {

@artidoro has PR which would tackle public Save methods.


In reply to: 461635021 [](ancestors = 461635021)


Refers to: src/Microsoft.ML.Recommender/MatrixFactorizationPredictor.cs:167 in 67e9f4d. [](commit_id = 67e9f4d, deletion_comment = False)

_host.Assert(_leftFactorMatrix.Length == _numberOfRows * _approximationRank);
_host.Assert(_rightFactorMatrix.Length == _numberofColumns * _approximationRank);
buffer.Get(out NumberOfRows, out NumberOfColumns, out ApproximationRank, out var leftFactorMatrix, out var rightFactorMatrix);
LeftFactorMatrix = Array.AsReadOnly<float>(leftFactorMatrix);
Copy link
Member

@wschin wschin Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Array.AsReadOnly [](start = 31, length = 23)

Not sure if casting is necessary. #Resolved

Copy link
Member

@wschin wschin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

var rightMatrix = model.Model.RightFactorMatrix;
Assert.Equal(leftMatrix.Count, model.Model.NumberOfRows * model.Model.ApproximationRank);
Assert.Equal(rightMatrix.Count, model.Model.NumberOfColumns * model.Model.ApproximationRank);

Copy link
Member

@wschin wschin Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least Assert.Equal() the first and the last elements in the returned arrays. #Resolved

};

var pipeline = mlContext.Recommendation().Trainers.MatrixFactorization(options);

// Train a matrix factorization model.
var model = pipeline.Fit(data);

// Les's validate content of the model.
Copy link
Member

@wschin wschin Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Les's [](start = 15, length = 5)

Let's #Resolved

public ColumnType MatrixColumnIndexType { get; }
public ColumnType MatrixRowIndexType { get; }
internal ColumnType MatrixColumnIndexType { get; }
internal ColumnType MatrixRowIndexType { get; }
Copy link
Contributor

@artidoro artidoro Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in Tom's last internalization PR, he left the methods that return the types of the input/output matrices public.
It could make sense that one wants to access that after, say after having loaded a previously trained model to know what type of data it can score. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tom PR wasn't about INTENALIZE EVERYTHING I CAN SEE, right?
I ask him and this is his reply:

For now probably internal.
Unless you have some scenario that requires it.

I don't see such scenarios, @wschin do you see such?


In reply to: 255193733 [](ancestors = 255193733)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough :)


In reply to: 255211305 [](ancestors = 255211305,255193733)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for asking Tom!


In reply to: 255211957 [](ancestors = 255211957,255211305,255193733)

internal string MatrixColumnIndexColumnName { get; }
internal string MatrixRowIndexColumnName { get; }
internal ColumnType MatrixColumnIndexColumnType { get; }
internal ColumnType MatrixRowIndexColumnType { get; }
Copy link
Contributor

@artidoro artidoro Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark here. Although I am not sure if it makes sense to make it available in both places? #ByDesign

/// See <a href="http://yifanhu.net/PUB/cf.pdf">Equation</a> (3).
/// </remarks>
SquareLossOneClass = 12
};

public sealed class Options
Copy link
Contributor

@artidoro artidoro Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

Can we have a line of xml here too? #Resolved

@@ -403,7 +483,7 @@ private SafeTrainingAndModelBuffer PrepareBuffer()
/// <param name="validationData">The validation data set.</param>
public MatrixFactorizationPredictionTransformer Train(IDataView trainData, IDataView validationData = null)
{
Copy link
Contributor

@artidoro artidoro Feb 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this private? Or internal? #ByDesign

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this one needed for the case when you train with validation data.


In reply to: 255196397 [](ancestors = 255196397)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad thank you!


In reply to: 255197330 [](ancestors = 255197330,255196397)

Copy link
Contributor

@artidoro artidoro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

var rightMatrix = model.Model.RightFactorMatrix;
Assert.Equal(leftMatrix.Count, model.Model.NumberOfRows * model.Model.ApproximationRank);
Assert.Equal(rightMatrix.Count, model.Model.NumberOfColumns * model.Model.ApproximationRank);
Assert.Equal(leftMatrix[0], (double)0.3091519,5);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leftMatrix[0] [](start = 25, length = 13)

@wschin do you know why I get
on Mac:
Expected: 0.41953 (rounded from 0.419525027275085)
Actual: 0.30915 (rounded from 0.3091519)
on Linux:
Expected: 0.51506 (rounded from 0.515059947967529)
Actual: 0.30915 (rounded from 0.3091519)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and how you handle that discrepancy between mac/linux and windows before?


In reply to: 255225321 [](ancestors = 255225321)

@Ivanidzo4ka Ivanidzo4ka merged commit 48dffb0 into dotnet:master Feb 8, 2019
@dotnet dotnet locked as resolved and limited conversation to collaborators Mar 24, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lockdown Microsoft.ML.Recommender public surface
3 participants