IGNITE-10480: [ML] Stacking for training and inference by artemmalykh · Pull Request #5635 · apache/ignite

artemmalykh · 2018-12-11T10:54:24Z

No description provided.

# Conflicts: # modules/ml/src/main/java/org/apache/ignite/ml/dataset/PartitionDataBuilder.java # modules/ml/src/main/java/org/apache/ignite/ml/dataset/UpstreamTransformerChain.java # modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/local/LocalDatasetBuilder.java

# Conflicts: # modules/ml/src/main/java/org/apache/ignite/ml/clustering/kmeans/KMeansTrainer.java # modules/ml/src/main/java/org/apache/ignite/ml/dataset/impl/cache/util/ComputeUtils.java # modules/ml/src/main/java/org/apache/ignite/ml/knn/regression/KNNRegressionTrainer.java # modules/ml/src/main/java/org/apache/ignite/ml/math/functions/IgniteFunction.java # modules/ml/src/main/java/org/apache/ignite/ml/math/primitives/vector/VectorUtils.java # modules/ml/src/main/java/org/apache/ignite/ml/regressions/logistic/multiclass/LogRegressionMultiClassTrainer.java # modules/ml/src/main/java/org/apache/ignite/ml/svm/SVMLinearMultiClassClassificationTrainer.java # modules/ml/src/main/java/org/apache/ignite/ml/trainers/DatasetTrainer.java # modules/ml/src/test/java/org/apache/ignite/ml/TestUtils.java # modules/ml/src/test/java/org/apache/ignite/ml/trainers/BaggingTest.java

avplatonov · 2018-12-11T11:31:36Z

modules/ml/src/main/java/org/apache/ignite/ml/Model.java

+     * @param <X> Type of input and output of identity model.
+     * @return Model equivalent to identity function.
+     */
+    public static <X> Model<X, X> identityModel() {


"No usages found in All Places"

Okay, let's add it on demand.

avplatonov · 2018-12-11T11:47:07Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedModel.java

+     */
+    void addSubmodel(Model<IS, IA> subMdl) {
+        submodels.add(subMdl);
+        subModelsLayer = subModelsLayer != null ? subModelsLayer.combine(subMdl, aggregatingInputMerger)


In my opinion such interface is not convenient. Combiner requires working with Monoids like List[Double], but use may expect working with Doubles.

Yeah, got StackedVectorDatasetTrainer#addTrainerWithDoubleOutput for this which essentially lifts Double to Vector monoid (with concatenation as mappend).

avplatonov · 2018-12-11T12:36:37Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedVectorModel.java

+import org.apache.ignite.ml.math.primitives.vector.VectorUtils;
+import org.apache.ignite.ml.trainers.DatasetTrainer;
+
+public class StackedVectorModel<O, AM extends Model<Vector, O>, L> extends SimpleStackedModelTrainer<Vector, O, AM, L> {


Seems like this class in unnecessary, removed it completely.

avplatonov · 2018-12-11T12:38:13Z

modules/ml/src/main/java/org/apache/ignite/ml/math/functions/IgniteFunction.java

+        return new IgniteFunction<T, R>() {
+            /** {@inheritDoc} */
+            @Override public String toString() {
+                return "Constant function [c=" + r + "]";


it looks good but we don't use toString for functions)

Yes, intention was to make debugging less painful with at least some of the lambdas listed with meaningful names, but maybe we should make a ticket for that.

avplatonov · 2018-12-11T15:32:04Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedVectorTrainer.java

+     * @param <M1> Type of submodel trainer.
+     * @return This object.
+     */
+    public <M1 extends Model<Vector, Double>> StackedVectorTrainer<O, AM, L> withAddedDoubleValuedTrainer(


-> addModelTrainerWithDoubleOutput

Agree, done.

avplatonov · 2018-12-11T15:32:28Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedVectorTrainer.java

+     */
+    public <M1 extends Model<Vector, Double>> StackedVectorTrainer<O, AM, L> withAddedDoubleValuedTrainer(
+        DatasetTrainer<M1, L> trainer) {
+        return withAddedTrainer(AdaptableDatasetTrainer.of(trainer).afterTrainedModel(VectorUtils::num2Vec));


s/withAdded/add/g

Agree, done.

avplatonov · 2018-12-12T07:31:22Z

modules/ml/src/main/java/org/apache/ignite/ml/Model.java

+    }
+
+    /**
+     * Get a composition model of the form {@code after . mdl}.


you provide composition interface through andThen-function
"f1 andThen f2" works as "f2(f1(x))" but "f1 compose f2" as "f1(f2())"
In my opinion you should provide uniform semantic (preferably "andThen") in all comments to avoid problems of understanding and using of such interface

Agree, done.

avplatonov · 2018-12-12T07:39:00Z

.../ml/src/main/java/org/apache/ignite/ml/composition/stacking/SimpleStackedDatasetTrainer.java

+            submodelInput2AggregatingInputConverter);
+    }
+
+    @Override public SimpleStackedDatasetTrainer<I, O, AM, L> withAggregatorInputMerger(IgniteBinaryOperator<I> merger) {


avplatonov · 2018-12-12T07:39:06Z

.../ml/src/main/java/org/apache/ignite/ml/composition/stacking/SimpleStackedDatasetTrainer.java

+        return (SimpleStackedDatasetTrainer<I, O, AM, L>)super.withOriginalFeaturesDropped();
+    }
+
+    @Override public SimpleStackedDatasetTrainer<I, O, AM, L> withOriginalFeaturesKept(


avplatonov · 2018-12-12T07:41:27Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java

+ * </pre>
+ * During second step we can choose if we want to keep original features along with converted outputs of first layer
+ * models or use only converted results of first layer models. This choice will also affect inference.
+ * This class is a most abstract stacked trainer, there is a {@link StackedVectorDatasetTrainer}: a shortcut version of


I think "the most general stacked trainer" will be better

Agree, done.

avplatonov · 2018-12-12T07:48:28Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java

+     * @param trainer Submodel trainer.
+     * @return This object.
+     */
+    public <M1 extends Model<IS, IA>> StackedDatasetTrainer<IS, IA, O, AM, L> addTrainer(


it is just wrapper without additional logic
what is the reason of using it?

Reason is to convert everything into DatasetTrainer<Model<IS, IA>, L> (in contrast to DatasetTrainer<? extends Model<IS, IA>, L>) to make work with the list of submodelTrainers less painful. This is unsafe conversion, but since we have control of all sumbodelTrainers list usages inside our class, it's IMHO a reasanoble tradeoff.

avplatonov · 2018-12-12T07:52:01Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java

+    }
+
+    /** {@inheritDoc} */
+    @Override protected <K, V> StackedModel<IS, IA, O, AM> updateModel(StackedModel<IS, IA, O, AM> mdl,


maybe we should rewrite updateModel function from abstract form to form like this to avoid such code:

class DatasetTrainer {
....
protected ... updateModel(...) {
throw NotImplementedException()
}
....
}

?

Hmm... We will avoid boilerplate, but seems like it will make code more error prone. Developer can forget to override this method for trainer which potentially supports updating and get an error while trying to update this model in the future whereas keeping it abstract forces developer to think if this trainer supports update and insert NotImplementedException more cautiously.

avplatonov · 2018-12-12T07:55:57Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java

+            ensemble -> {
+                int i = 0;
+                List<IgniteSupplier<Model<IS, IA>>> res = new ArrayList<>();
+                for (Model<IS, IA> submodel : mdl.submodels()) {


May you merge update and fit logic?

Agree, done.

avplatonov · 2018-12-12T08:01:27Z

modules/ml/src/main/java/org/apache/ignite/ml/trainers/AdaptableDatasetTrainer.java

+
+/**
+ * Type used to adapt input and output types of wrapped {@link DatasetTrainer}.
+ * Produces model which is composition  of form {@code after . wMdl . before} where dot denotes functional composition


please avoid functional composition notation in Java world)
in this wild world there are a lot of non-functional programmers may to beat you))

Ok, since we have andThen, I'll use it.
~~P.S. I will camouflage in a State monad to look imperativish~~

zaleslaw · 2018-12-13T13:19:21Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java

+        IgniteFunction<IS, IA> submodelInput2AggregatingInputConverter,
+        IgniteFunction<IA, Vector> submodelOutput2VectorConverter,
+        IgniteFunction<Vector, IS> vector2SubmodelInputConverter) {
+        if (submodelInput2AggregatingInputConverter != null)


to (avoid numbers)

The reasons behind "2" are

I saw in code some methods with such naming, and decided to be consistent

"2" Is a good visual separator (at least for me) between "from" and "to" parts which (again, at least for me) rises readability,

zaleslaw · 2018-12-13T13:19:44Z

modules/ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedDatasetTrainer.java

+        IgniteFunction<IA, Vector> submodelOutput2VectorConverter,
+        IgniteFunction<Vector, IS> vector2SubmodelInputConverter,
+        Vector v) {
+        return vector2SubmodelInputConverter.andThen(mdl).andThen(submodelOutput2VectorConverter).apply(v);


the same thing

See #5635 (comment)

zaleslaw · 2018-12-13T13:21:47Z

.../ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedVectorDatasetTrainer.java

+     * Constructs instance of this class.
+     */
+    public StackedVectorDatasetTrainer() {
+        this(null);


This is a shortcut specially for people who prefer to pass all arguments via withs.
And the one with aggregator trainer as a single param is for IDE automatic type inference when introducing new variable.

zaleslaw · 2018-12-13T13:22:06Z

.../ml/src/main/java/org/apache/ignite/ml/composition/stacking/StackedVectorDatasetTrainer.java

+     * @param <M1> Type of submodel trainer model.
+     * @return This object.
+     */
+    public <M1 extends Model<Matrix, Matrix>> StackedVectorDatasetTrainer<O, AM, L> addMatrix2MatrixTrainer(


Do we really need this?

This is shortcut "specially" for MLP :)

zaleslaw · 2018-12-13T13:22:28Z

modules/ml/src/main/java/org/apache/ignite/ml/math/primitives/vector/VectorUtils.java

+     * @param val Value to wrap.
+     * @return Specified value wrapped into vector.
+     */
+    public static Vector num2Vec(double val) {


If you don't mind, I wouldn't change it now because otherwise for consistency I should also change other such methods. I think it should be done atomically after some discussion.

zaleslaw · 2018-12-13T13:22:54Z

modules/ml/src/main/java/org/apache/ignite/ml/math/primitives/vector/VectorUtils.java

+     * @param val Value to wrap in array.
+     * @return Number wrapped in 1-sized array.
+     */
+    public static double[] num2Arr(double val) {


See #5635 (comment)

zaleslaw · 2018-12-13T13:23:56Z

modules/ml/src/main/java/org/apache/ignite/ml/trainers/DatasetTrainer.java

+     * Creates {@link DatasetTrainer} with same training logic, but able to accept labels of given new type
+     * of labels.
+     *
+     * @param new2Old Converter of new labels to old labels.


new20 old? newToOld or something better

zaleslaw · 2018-12-13T13:24:46Z

modules/ml/src/test/java/org/apache/ignite/ml/trainers/StackingTest.java

+
+        StackedModel<Vector, Vector, Double, LinearRegressionModel> mdl = trainer
+            .withAggregatorTrainer(new LinearRegressionLSQRTrainer().withConvertedLabels(x -> x * factor))
+            .addTrainer(mlpTrainer)


withTrainer

StackedModel getting looks pretty, many thanks!

Thanks! As for withTrainer, it was a tough choice: withTrainer looks like we substitute new trainer in place of the old trainer, not adding it. Previously it was withAddedTrainer, which looked a bit clumsy, so I went with addTrainer.

zaleslaw · 2018-12-13T13:26:00Z

modules/ml/src/test/java/org/apache/ignite/ml/trainers/StackingTest.java

+        final double factor = 3;
+
+        StackedModel<Vector, Vector, Double, LinearRegressionModel> mdl = trainer
+            .withAggregatorTrainer(new LinearRegressionLSQRTrainer().withConvertedLabels(x -> x * factor))


ConvertedLabels seems like reinvention of labelExtractor sometimes

Maybe we need a preprocessor like LabelTransformer with map function to change them

This is for the case when we use one DatasetBuilder for many trainers like we do in StackedDatasetTrainer. In this case we want the ability to adapt each of the trainers to be able to work with this dataset, this method is just for that.

artemmalykh added 30 commits November 15, 2018 23:00

First version

765c440

LearningEnvironmentBuilder is now interface

78857ec

Started integrating learning environment into transformers

762568c

Learning environment included in bagging

74f9b09

Insterted learning environment into compute

eeefda2

Fixed javadocs

2ce75e0

Javadocs

13bfc70

Merge branch 'master-apache' into ignite-10272

8454f2d

Added javadocs

a143a0e

Removing environment on close

0093271

Javadocs

b337daa

Fixes in javadocs

80d67ba

Dependency on partition in LearningEnvironmentBuilder API

c835455

LEB is Serializable

9ec0896

Removed UpstreamTransformerBuildersChain

6d034d2

IGNITE-10272: Fixes in functional interfaces.

4a63f57

Some fixes

28b4931

Some fixes

7d3219a

IGNITE-10272: Fixes.

2184eb4

IGNITE-10272: Fixes.

cac6995

IGNITE-10272: Fixes.

8bfc274

IGNITE-10272: Added toto about fluent API.

f6236cb

IGNITE-10272: Fixed examples, added shortcuts for DatasetFactory.

58c985f

Draft

84a0d8b

WIP

dd7f555

WIP

8c2d295

Merge branch 'ignite-10272' into ignite-10480

6bb7423

WIP

c6efc02

artemmalykh added 8 commits December 10, 2018 18:15

IGNITE-10480: Refactoring.

e3be62f

IGNITE-10480: Reverted modifiers in checkState and updateMdl.

eef3235

IGNITE-10480: Some javadocs

7cea836

IGNITE-10480: Some javadocs

55d19bb

IGNITE-10480: Deleted mistakingly merged files

3562e2c

IGNITE-10480: Added check for merger function

2e46f14

IGNITE-10480: Reverted KMEansTrainer#updateModel access modifier

0c58b8b

avplatonov reviewed Dec 11, 2018

View reviewed changes

artemmalykh added 4 commits December 11, 2018 17:50

IGNITE-10480: Overrides in in "with" methods.

3e5e1ef

IGNITE-10480: Code formatting.

56c06fa

IGNITE-10480: Javadocs fixes.

97d7fa5

IGNITE-10480: Javadocs fixes.

63e3847

avplatonov reviewed Dec 11, 2018

View reviewed changes

artemmalykh added 3 commits December 11, 2018 18:46

IGNITE-10480: Naming changes.

3e3efbf

IGNITE-10480: Namings and "with" methods appropriate types.

0d62790

IGNITE-10480: Formatting.

fddad2c

avplatonov reviewed Dec 12, 2018

View reviewed changes

artemmalykh added 7 commits December 12, 2018 15:34

IGNITE-10480: Fixes.

539f3c0

IGNITE-10480: Fixes.

1e2cb3a

IGNITE-10480: Clarification comments.

33331e4

IGNITE-10480: Suppress unchecked warning.

857dd30

IGNITE-10480: Fix comment.

e06861d

IGNITE-10480: Merged update and fit.

2255747

IGNITE-10480: Added javadoc.

d490317

zaleslaw reviewed Dec 13, 2018

View reviewed changes

artemmalykh added 3 commits December 13, 2018 21:13

IGNITE-10480: Fix examples compilation issues.

fb376a4

Merge branch 'master-apache' into ignite-10480

81d0525

Merge branch 'master-apache' into ignite-10480

b28d4fc

asfgit closed this in 142648d Dec 14, 2018

Conversation

artemmalykh commented Dec 11, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

artemmalykh Dec 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

artemmalykh Dec 12, 2018 •

edited

Loading