Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce examples for pipeline api. #677

Merged
merged 5 commits into from Aug 20, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
43 changes: 43 additions & 0 deletions test/Microsoft.ML.Tests/Scenarios/PipelineApi/CrossValidation.cs
@@ -0,0 +1,43 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.ML.Models;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using Xunit;

namespace Microsoft.ML.Tests.Scenarios.PipelineApi
{
public partial class PipelineApiScenarioTests
{
/// <summary>
/// Cross-validation: Have a mechanism to do cross validation, that is, you come up with
/// a data source (optionally with stratification column), come up with an instantiable transform
/// and trainer pipeline, and it will handle (1) splitting up the data, (2) training the separate
/// pipelines on in-fold data, (3) scoring on the out-fold data, (4) returning the set of
/// evaluations and optionally trained pipes. (People always want metrics out of xfold,
/// they sometimes want the actual models too.)
/// </summary>
[Fact]
void CrossValidation()
{
var dataPath = GetDataPath(SentimentDataPath);
var testDataPath = GetDataPath(SentimentDataPath);
Copy link
Contributor

@Zruty0 Zruty0 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testDataPath [](start = 16, length = 12)

remove line #Closed

var pipeline = new LearningPipeline();

pipeline.Add(MakeSentimentTextLoader(dataPath));
Copy link
Contributor

@Zruty0 Zruty0 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add [](start = 21, length = 3)

TextLoader ? #Closed


pipeline.Add(MakeSentimentTextTransform());

pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });

var cv = new CrossValidator().CrossValidate<SentimentData, SentimentPrediction>(pipeline);
var metrics = cv.BinaryClassificationMetrics[0];
var singlePrediction = cv.PredictorModels[0].Predict(new SentimentData() { SentimentText = "Not big fan of this." });
Assert.True(singlePrediction.Sentiment);
}
}
}
41 changes: 41 additions & 0 deletions test/Microsoft.ML.Tests/Scenarios/PipelineApi/Metacomponents.cs
@@ -0,0 +1,41 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.ML.Data;
using Microsoft.ML.Models;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using Xunit;

namespace Microsoft.ML.Tests.Scenarios.PipelineApi
{
public partial class PipelineApiScenarioTests
{
/// <summary>
/// Meta-components: Meta-components (e.g., components that themselves instantiate components) should not be booby-trapped.
/// When specifying what trainer OVA should use, a user will be able to specify any binary classifier.
/// If they specify a regression or multi-class classifier ideally that should be a compile error.
/// </summary>
[Fact]
void Metacomponents()
{
var dataPath = GetDataPath(IrisDataPath);
var pipeline = new LearningPipeline(seed: 1, conc: 1);
pipeline.Add(new TextLoader(dataPath).CreateFrom<IrisData>(useHeader: false));
pipeline.Add(new Dictionarizer(new[] { "Label" }));
pipeline.Add(new ColumnConcatenator(outputColumn: "Features",
"SepalLength", "SepalWidth", "PetalLength", "PetalWidth"));

pipeline.Add(OneVersusAll.With(new StochasticDualCoordinateAscentBinaryClassifier()));
Copy link
Contributor

@Zruty0 Zruty0 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With [](start = 38, length = 4)

does it accept multiclass? #Closed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write a comment that this is checked at Train time, so technically this example is not possible with LP


In reply to: 210040514 [](ancestors = 210040514)


var model = pipeline.Train<IrisData, IrisPrediction>();

var testData = new TextLoader(dataPath).CreateFrom<IrisData>(useHeader: false);
var evaluator = new ClassificationEvaluator();
ClassificationMetrics metrics = evaluator.Evaluate(model, testData);

var prediction = model.Predict(new IrisData { PetalLength = 1, PetalWidth = 2, SepalLength = 1.4f, SepalWidth = 1.6f });
}
}
}
@@ -0,0 +1,138 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.ML.Runtime.Api;
using Microsoft.ML.Runtime.Internal.Utilities;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;
using Xunit;

namespace Microsoft.ML.Tests.Scenarios.PipelineApi
{
public partial class PipelineApiScenarioTests
{
/// <summary>
/// Multi-threaded prediction. A twist on "Simple train and predict", where we account that
/// multiple threads may want predictions at the same time. Because we deliberately do not
/// reallocate internal memory buffers on every single prediction, the PredictionEngine
/// (or its estimator/transformer based successor) is, like most stateful .NET objects,
/// fundamentally not thread safe. This is deliberate and as designed. However, some mechanism
/// to enable multi-threaded scenarios (e.g., a web server servicing requests) should be possible
/// and performant in the new API.
/// </summary>
[Fact]
void MultithreadedPrediction()
{
var dataPath = GetDataPath(SentimentDataPath);
var testDataPath = GetDataPath(SentimentDataPath);
var pipeline = new LearningPipeline();

pipeline.Add(MakeSentimentTextLoader(dataPath));

pipeline.Add(MakeSentimentTextTransform());

pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });
var model = pipeline.Train<SentimentData, SentimentPrediction>();
var modelName = "multithreadModel.zip";
DeleteOutputPath(modelName);
model.WriteAsync(modelName);
Copy link
Contributor

@Zruty0 Zruty0 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

model [](start = 12, length = 5)

await? #Closed

var collection = new List<SentimentData>();
int numExamples = 100;
for (int i = 0; i < numExamples; i++)
{
collection.Add(new SentimentData() { SentimentText = "Let's predict this one!" });
}

var lockEngine = new LockBasedPredictionEngine(model);
Parallel.ForEach(collection, (input) => lockEngine.Predict(input));
var threadEngine = new ThreadLocalBasedPredictionEngine(modelName);
var poolEngine = new PoolBasedPredictionEngine(modelName);
Parallel.ForEach(collection, (input) => threadEngine.Predict(input));
Parallel.ForEach(collection, (input) => poolEngine.Predict(input));
}

/// <summary>
/// This is a trivial implementation of a thread-safe prediction engine is just guarded by a lock.
/// </summary>
private sealed class LockBasedPredictionEngine
Copy link
Contributor

@Zruty0 Zruty0 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

class [](start = 23, length = 5)

remove custom classes #Closed

{
private readonly PredictionModel<SentimentData, SentimentPrediction> _model;

public LockBasedPredictionEngine(PredictionModel<SentimentData, SentimentPrediction> model)
{
_model = model;
}

public SentimentPrediction Predict(SentimentData input)
{
lock (_model)
{
return _model.Predict(input);
}
}
}

/// <summary>
/// This is an implementation of a thread-safe prediction engine that works by instantiating one model per the worker thread.
/// </summary>
private sealed class ThreadLocalBasedPredictionEngine
{
private readonly ThreadLocal<PredictionModel<SentimentData, SentimentPrediction>> _engine;

public ThreadLocalBasedPredictionEngine(string modelFile)
{
_engine = new ThreadLocal<PredictionModel<SentimentData, SentimentPrediction>>(
() =>
{
var model = PredictionModel.ReadAsync<SentimentData, SentimentPrediction>(modelFile);
model.Wait();
return model.Result;
});
}

public SentimentPrediction Predict(SentimentData input)
{
return _engine.Value.Predict(input);
}
}

/// <summary>
/// This is an implementation of a thread-safe prediction engine that works by keeping a pool of allocated
/// <see cref="SimplePredictionEngine"/> objects, that is grown as needed.
/// </summary>
private sealed class PoolBasedPredictionEngine
{
private readonly MadeObjectPool<PredictionModel<SentimentData, SentimentPrediction>> _enginePool;

public PoolBasedPredictionEngine(string modelFile)
{
_enginePool = new MadeObjectPool<PredictionModel<SentimentData, SentimentPrediction>>(
() =>
{
var model = PredictionModel.ReadAsync<SentimentData, SentimentPrediction>(modelFile);
model.Wait();
return model.Result;
});
}

public SentimentPrediction Predict(SentimentData features)
{
var engine = _enginePool.Get();
try
{
return engine.Predict(features);
}
finally
{
_enginePool.Return(engine);
}
}
}
}
}
@@ -0,0 +1,62 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.ML.Runtime.Api;
using Microsoft.ML.TestFramework;
using Xunit.Abstractions;

namespace Microsoft.ML.Tests.Scenarios.PipelineApi
{
public partial class PipelineApiScenarioTests : BaseTestClass
{
public PipelineApiScenarioTests(ITestOutputHelper output) : base(output)
{
}

public const string IrisDataPath = "iris.data";
public const string SentimentDataPath = "wikipedia-detox-250-line-data.tsv";
public const string SentimentTestPath = "wikipedia-detox-250-line-test.tsv";

public class IrisData : IrisDataNoLabel
{
[Column("0")]
public string Label;
}

public class IrisDataNoLabel
{
[Column("1")]
public float SepalLength;

[Column("2")]
public float SepalWidth;

[Column("3")]
public float PetalLength;

[Column("4")]
public float PetalWidth;
}

public class IrisPrediction
{
public float[] Score;
}

public class SentimentData
{
[ColumnName("Label")]
public bool Sentiment;
public string SentimentText;
}

public class SentimentPrediction
{
[ColumnName("PredictedLabel")]
public bool Sentiment;

public float Score;
}
}
}
@@ -0,0 +1,88 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using Microsoft.ML.Data;
using Microsoft.ML.Models;
using Microsoft.ML.Runtime;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;
using Xunit;

namespace Microsoft.ML.Tests.Scenarios.PipelineApi
{
public partial class PipelineApiScenarioTests
{
/// <summary>
/// Start with a dataset in a text file. Run text featurization on text values.
/// Train a linear model over that. (I am thinking sentiment classification.)
/// Out of the result, produce some structure over which you can get predictions programmatically
/// (e.g., the prediction does not happen over a file as it did during training).
/// </summary>
[Fact]
void SimpleTrainAndPredict()
{
var dataPath = GetDataPath(SentimentDataPath);
var testDataPath = GetDataPath(SentimentDataPath);
var pipeline = new LearningPipeline();

pipeline.Add(MakeSentimentTextLoader(dataPath));
Copy link
Contributor

@Zruty0 Zruty0 Aug 13, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MakeSentimentTextLoader [](start = 25, length = 23)

can you not use TextLoader somehow? #Closed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want text loader with arguments? Or what?


In reply to: 209770697 [](ancestors = 209770697)


pipeline.Add(MakeSentimentTextTransform());

pipeline.Add(new FastTreeBinaryClassifier() { NumLeaves = 5, NumTrees = 5, MinDocumentsInLeafs = 2 });

pipeline.Add(new PredictedLabelColumnOriginalValueConverter() { PredictedLabelColumn = "PredictedLabel" });
var model = pipeline.Train<SentimentData, SentimentPrediction>();
var testData = MakeSentimentTextLoader(testDataPath);
Copy link
Contributor

@Zruty0 Zruty0 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MakeSentimentTextLoader [](start = 27, length = 23)

yuck #Closed

Copy link
Contributor

@Zruty0 Zruty0 Aug 14, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename testData to testHandle?.. testSomething. It's not data


In reply to: 210042025 [](ancestors = 210042025)

var evaluator = new BinaryClassificationEvaluator();
var metrics = evaluator.Evaluate(model, testData);

var singlePrediction = model.Predict(new SentimentData() { SentimentText = "Not big fan of this."});
Assert.True(singlePrediction.Sentiment);
}

private static TextLoader MakeSentimentTextLoader(string dataPath)
{
return new TextLoader(dataPath)
{
Arguments = new TextLoaderArguments
{
Separator = new[] { '\t' },
HasHeader = true,
Column = new[]
{
new TextLoaderColumn()
{
Name = "Label",
Source = new [] { new TextLoaderRange(0) },
Type = Data.DataKind.Num
},

new TextLoaderColumn()
{
Name = "SentimentText",
Source = new [] { new TextLoaderRange(1) },
Type = Data.DataKind.Text
}
}
}
};
}

private static TextFeaturizer MakeSentimentTextTransform()
{
return new TextFeaturizer("Features", "SentimentText")
{
KeepDiacritics = false,
KeepPunctuations = false,
TextCase = TextNormalizerTransformCaseNormalizationMode.Lower,
OutputTokens = true,
StopWordsRemover = new PredefinedStopWordsRemover(),
VectorNormalizer = TextTransformTextNormKind.L2,
CharFeatureExtractor = new NGramNgramExtractor() { NgramLength = 3, AllLengths = false },
WordFeatureExtractor = new NGramNgramExtractor() { NgramLength = 2, AllLengths = true }
};
}
}
}