New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lockdown HAL Project #2497

Merged
merged 12 commits into from Feb 15, 2019

Conversation

Projects
None yet
4 participants
@Ivanidzo4ka
Copy link
Member

Ivanidzo4ka commented Feb 11, 2019

Fixes ##2267

/// <param name="weights">The weights column.</param>
/// <param name="labelColumn">The name of label column.</param>
/// <param name="featureColumn">The name of feature column.</param>
/// <param name="weightColumn">The name of weight column.</param>

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

weight [](start = 51, length = 6)

optional! #Closed

/// <example>
/// <format type="text/markdown">
/// <![CDATA[
/// [!code-csharp[ScoreTensorFlowModel](~/../docs/samples/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/Regression/OrdinaryLeastSquaresWithOptions.cs)]

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 11, 2019

Author Member

ScoreTensorFlowModel [](start = 26, length = 20)

oops #Closed

@codecov

This comment has been minimized.

Copy link

codecov bot commented Feb 12, 2019

Codecov Report

❗️ No coverage uploaded for pull request base (master@b2587fa). Click here to learn what that means.
The diff coverage is 86.95%.

@@            Coverage Diff            @@
##             master    #2497   +/-   ##
=========================================
  Coverage          ?   71.45%           
=========================================
  Files             ?      801           
  Lines             ?   141840           
  Branches          ?    16139           
=========================================
  Hits              ?   101352           
  Misses            ?    36019           
  Partials          ?     4469
Flag Coverage Δ
#Debug 71.45% <86.95%> (?)
#production 67.74% <81.25%> (?)
#test 85.52% <100%> (?)

// Leave out 10% of data for testing
var (trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1);
// Get DefaultBinaryPipine for adult dataset and append SymSGD to it.

This comment has been minimized.

@zeahmed

zeahmed Feb 12, 2019

Member

DefaultBinaryPipine [](start = 19, length = 19)

typo. #Closed

return reader.Read(dataFile);
}

public static IEstimator<ITransformer> DefaultBinaryPipeline(MLContext mlContext)

This comment has been minimized.

@zeahmed

zeahmed Feb 12, 2019

Member

DefaultBinaryPipeline [](start = 47, length = 21)

DefaultBinaryPipelineForAdultDataSet???

Its confusing, unless I see this method definition here. The usage seems like we are creating binary pipeline in general. #Closed

"occupation", "relationship", "ethnicity", "native-country", "age", "education-num",
"capital-gain", "capital-loss", "hours-per-week"))
// Min-max normalized all the features
.Append(mlContext.Transforms.Normalize("Features"));

This comment has been minimized.

@zeahmed

zeahmed Feb 12, 2019

Member

(mlContext.Transforms.Normalize("Features") [](start = 22, length = 43)

There is no need to normalize OneHotEncoded variables as they already range between 0-1. Here we should only normalize numeric features. #Closed


// Leave out 10% of data for testing
var (trainData, testData) = mlContext.BinaryClassification.TrainTestSplit(data, testFraction: 0.1);
// Get DefaultBinaryPipine for adult dataset and append SymSGD to it.

This comment has been minimized.

@zeahmed

zeahmed Feb 12, 2019

Member

DefaultBinaryPipine [](start = 19, length = 19)

typo. #Closed

public static void Example()
{
// Downloading a regression dataset from github.com/dotnet/machinelearning
// this will create a housing.txt file in the filsystem this code will run

This comment has been minimized.

@zeahmed

zeahmed Feb 12, 2019

Member

this code will run [](start = 68, length = 18)

This sentence does not convey anything? #Closed

This comment has been minimized.

@zeahmed

zeahmed Feb 12, 2019

Member

I feel we should have some sort of template kind of thing for all the code and comments because most of the code and comments are shared across many different samples. What do you say? maybe tt template?
cc: @sfilipi, @rogancarr, @shmoradims


In reply to: 255788213 [](ancestors = 255788213)

{
public sealed class SymbolicStochasticGradientDescent
{
/// <summary>

This comment has been minimized.

@sfilipi

sfilipi Feb 12, 2019

Member

/// <su [](start = 8, length = 7)

no need for xml style comments #Resolved

// In this examples we will use the adult income dataset. The goal is to predict
// if a person's income is above $50K or not, based on different pieces of information about that person.
// For more details about this dataset, please see https://archive.ics.uci.edu/ml/datasets/adult

This comment has been minimized.

@sfilipi

sfilipi Feb 12, 2019

Member

i'd move this comment above the method. #Resolved

namespace Microsoft.ML.Samples.Dynamic
{
public sealed class SymbolicStochasticGradientDescent
{

This comment has been minimized.

@sfilipi

sfilipi Feb 12, 2019

Member

cc @shmoradims, @jwood803 so you don't work on the same. #Closed


printHelper(nameof(SamplesUtils.DatasetUtils.SampleVectorOfNumbersData.Features), whitening);

// Features column obtained post-transformation.

This comment has been minimized.

@sfilipi

sfilipi Feb 12, 2019

Member

column [](start = 24, length = 6)

i think we should come up with a convention to print vector columns differently from columns of the dataset.
I'll take a look at what .Net does.
cc @shmoradims, @rogancarr

maybe put in [ ]

//[ -0.394 -0.318 -0.243 -0.168 0.209 0.358 0.433 0.589 0.873 2.047 ]

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

I like the suggestion of adding [...]


In reply to: 255982945 [](ancestors = 255982945)

/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
/// </summary>
public static void Example()

This comment has been minimized.

@artidoro

artidoro Feb 13, 2019

Contributor

Xml #Resolved

public sealed class SymbolicStochasticGradientDescent
{
/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>

This comment has been minimized.

@artidoro

artidoro Feb 13, 2019

Contributor

Could you replace "requires" instead of "require"? #Resolved

@artidoro

This comment has been minimized.

Copy link
Contributor

artidoro commented Feb 13, 2019

Should we make this a normal integer, with default value? Otherwise could you explain what the default is in the xml comment? #Resolved


Refers to: src/Microsoft.ML.HalLearners/SymSgdClassificationTrainer.cs:52 in 1527369. [](commit_id = 1527369, deletion_comment = False)

@artidoro

This comment has been minimized.

Copy link
Contributor

artidoro commented Feb 13, 2019

I see this applies to quite a few instances in this class, and probably also in many others.
Would you suggest to file this as an issue and work on it in a separate PR?


In reply to: 463122337 [](ancestors = 463122337)


Refers to: src/Microsoft.ML.HalLearners/SymSgdClassificationTrainer.cs:52 in 1527369. [](commit_id = 1527369, deletion_comment = False)

Ivanidzo4ka added some commits Feb 13, 2019

@Ivanidzo4ka

This comment has been minimized.

Copy link
Member Author

Ivanidzo4ka commented Feb 15, 2019

Think about this nullable values as something which can be tuned by algorithm. If it's specified we will use it, and if we not, we will run code to pick best possible value.


In reply to: 463122866 [](ancestors = 463122866,463122337)


Refers to: src/Microsoft.ML.HalLearners/SymSgdClassificationTrainer.cs:52 in 1527369. [](commit_id = 1527369, deletion_comment = False)

/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
/// </summary>
public static void Example()

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

Can you remove the xml? #Resolved

/// <summary>
/// This example require installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>
/// </summary>
public static void Example()

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

Here too, could you remove the xml? #Resolved

printHelper(nameof(SamplesUtils.DatasetUtils.SampleVectorOfNumbersData.Features), whitening);

// Features column obtained post-transformation.
// -0.979 0.867 1.449 1.236

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

0 [](start = 23, length = 1)

I think this is a vector column, if so, could you use senja's suggestion of printing the vector inside brackets: [....]

/// <param name="labelColumn">The labelColumn column.</param>
/// <param name="featureColumn">The features column.</param>
/// <param name="weights">The weights column.</param>
/// <param name="labelColumn">The name of label column.</param>

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

label [](start = 50, length = 5)

Could you add 'the' after of in these xml comments?
I think it sounds better like this: "The name of the label column.", and I think that's how we have done it in other xml doc. #Resolved

@@ -60,9 +74,18 @@ public static class HalLearnersCatalog
/// <param name="catalog">The <see cref="BinaryClassificationCatalog"/>.</param>
/// <param name="labelColumn">The labelColumn column.</param>

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

labelColumn [](start = 42, length = 11)

Could you make the change you did above here too? Specifying that this is the name of the label column, instead of just saying label column.
#Resolved

@@ -87,9 +89,6 @@ internal OlsLinearRegressionTrainer(IHostEnvironment env, Options options)
protected override RegressionPredictionTransformer<OlsLinearRegressionModelParameters> MakeTransformer(OlsLinearRegressionModelParameters model, DataViewSchema trainSchema)
=> new RegressionPredictionTransformer<OlsLinearRegressionModelParameters>(Host, model, trainSchema, FeatureColumn.Name);

public RegressionPredictionTransformer<OlsLinearRegressionModelParameters> Train(IDataView trainData, IPredictor initialPredictor = null)
=> TrainTransformer(trainData, initPredictor: initialPredictor);

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

Why are we removing this? #Resolved

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 15, 2019

Author Member

Because it's not used at all.


In reply to: 257348853 [](ancestors = 257348853)

score = float.MaxValue;
editor.Values[i] = score;
}
weights = editor.Commit();

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

It seems that this is lost in the base method? Do we need it? #Resolved

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 15, 2019

Author Member

See #1850


In reply to: 257350745 [](ancestors = 257350745)

// Setting the seed to a fixed number in this examples to make outputs deterministic.
var mlContext = new MLContext(seed: 0);

// Download and featurize the dataset

This comment has been minimized.

@zeahmed

zeahmed Feb 15, 2019

Member

Download and featurize the dataset [](start = 15, length = 34)

I saw @rogancarr comment regarding ending every comment with period. #Closed

@@ -0,0 +1,39 @@
namespace Microsoft.ML.Samples.Dynamic
{
public sealed class SymbolicStochasticGradientDescent

This comment has been minimized.

@zeahmed

zeahmed Feb 15, 2019

Member

SymbolicStochasticGradientDescent [](start = 24, length = 33)

We also need to decide the fate of the classes in sample project. I see some of the classes are static some are sealed.
cc: @rogancarr, @sfilipi, @shmoradims #Resolved

/// Continue to train <paramref name="initialPredictor"/> on <paramref name="trainData"/> and produce calibrated binary linear model.
/// </summary>
/// <param name="trainData">The training data set.</param>
/// <param name="initialPredictor">Accepts <see cref="LinearBinaryModelParameters"/> to continue training of this model</param>
public BinaryPredictionTransformer<TPredictor> Train(IDataView trainData, TPredictor initialPredictor = null)

This comment has been minimized.

@artidoro

artidoro Feb 15, 2019

Contributor

IDataView [](start = 61, length = 9)

You might need to merge with master. I should have added a doc for this. #Resolved

// Create a new ML context, for ML.NET operations. It can be used for exception tracking and logging,
// as well as the source of randomness.
var mlContext = new MLContext(seed: 3);
// Creating a data reader, based on the format of the data

This comment has been minimized.

@zeahmed

zeahmed Feb 15, 2019

Member

newline please. #Closed

public sealed class SymbolicStochasticGradientDescent
{

// This example requires installation of addition nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>

This comment has been minimized.

@zeahmed

zeahmed Feb 15, 2019

Member

addition [](start = 49, length = 8)

addition -> additional

correct it everywhere. #Closed

Console.WriteLine($"{colName} column obtained post-transformation.");
foreach (var row in column)
Console.WriteLine($"{string.Join(" ", row.DenseValues().Select(x => x.ToString("f3")))} ");
};

This comment has been minimized.

@zeahmed

zeahmed Feb 15, 2019

Member

Can it go to SamplesUtils? #Closed

This comment has been minimized.

@Ivanidzo4ka

Ivanidzo4ka Feb 15, 2019

Author Member

Eventually. Not in this PR


In reply to: 257356248 [](ancestors = 257356248)

This comment has been minimized.

@zeahmed

zeahmed Feb 15, 2019

Member

ok.


In reply to: 257356644 [](ancestors = 257356644,257356248)

Ivanidzo4ka added some commits Feb 15, 2019

@artidoro
Copy link
Contributor

artidoro left a comment

:shipit:

{
public static class SymbolicStochasticGradientDescent
{
// This example requires installation of additional nuget package <a href="https://www.nuget.org/packages/Microsoft.ML.HalLearners/">Microsoft.ML.HalLearners</a>

This comment has been minimized.

@zeahmed

zeahmed Feb 15, 2019

Member

[](start = 165, length = 4)

period here as well.

@zeahmed
Copy link
Member

zeahmed left a comment

Overall it LGTM. Thanks for doing it.

@Ivanidzo4ka Ivanidzo4ka merged commit 78b161b into dotnet:master Feb 15, 2019

3 checks passed

MachineLearning-CI #20190215.22 succeeded
Details
MachineLearning-CodeCoverage #20190215.22 succeeded
Details
license/cla All CLA requirements met.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment