[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #26124

mob-ai · 2019-10-15T06:48:25Z

What changes were proposed in this pull request?

Implement Factorization Machines as a ml-pipeline component

loss function supports: logloss, mse
optimizer: GD, adamW

Why are the changes needed?

Factorization Machines is widely used in advertising and recommendation system to estimate CTR(click-through rate).
Advertising and recommendation system usually has a lot of data, so we need Spark to estimate the CTR, and Factorization Machines are common ml model to estimate CTR.
References:

S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

Does this PR introduce any user-facing change?

No

How was this patch tested?

run unit tests

mob-ai · 2019-10-15T11:26:43Z

@srowen I don't know how to change PR target branch, so I open a new PR. I have checkout branch base master(spark 3.0.0), and implement FM python api.
@zhengruifeng I plat the convergence curves use normal GD and adamW. https://issues.apache.org/jira/browse/SPARK-29224

srowen

Looks promising

srowen · 2019-10-20T22:25:05Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+import org.apache.spark.storage.StorageLevel
+
+/**
+ * Params for Factorization Machines


Is there any paper you can link to to explain the implementation? or, just a few paragraphs about what the implementation does?

Is there any paper you can link to to explain the implementation? or, just a few paragraphs about what the implementation does?

FM paper: S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

Yes, I mean, this should be in the docs along with a brief explanation of what it does ... enough that an informed reader understands how the parameters you expose map to the algorithm in the paper (which is probably straightforward)

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

srowen · 2019-10-20T22:38:43Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+   */
+  @Since("3.0.0")
+  final val regParam: DoubleParam = new DoubleParam(this, "regParam",
+    "regularization for L2")


This could be clarified a bit: strength of regularization? does this correspond to some parameter in a paper, some 'alpha'? etc . You can add a validator to ensure it's not negative, too. See some other DoubleParam for an example.

srowen · 2019-10-20T22:39:03Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+   * @group param
+   */
+  @Since("3.0.0")
+  final val miniBatchFraction: DoubleParam = new DoubleParam(this, "miniBatchFraction",


Likewise, minibatch of what? and can be validated to be nonnegative/positive?

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

srowen · 2019-10-20T22:41:01Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+        (if ($(fitLinear)) Array.fill(numFeatures)(0.0) else Array.empty[Double]) ++
+        (if ($(fitBias)) Array.fill(1)(0.0) else Array.empty[Double]))
+
+    val data = instances.map{ case OldLabeledPoint(label, features) => (label, features) }


Nit: space before brace

srowen · 2019-10-20T22:42:14Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+    }
+    (0 until numFactors).foreach { f =>
+      var sumSquare = 0.0
+      var squareSum = 0.0


Nit: this looks like a 'sum' not squared sum.

srowen · 2019-10-20T22:45:19Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+}
+
+private[ml] class AdamWUpdater(weightSize: Int) extends Updater with Logging {
+  var beta1: Double = 0.9


Can these be val?

python/pyspark/ml/regression.py

zhengruifeng · 2019-10-22T10:44:00Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+  @Since("3.0.0")
+  final override val loss: Param[String] = new Param[String](this, "loss", "The loss function to" +
+    s" be optimized. Supported options: ${supportedLosses.mkString(", ")}. (Default logisticLoss)",
+    ParamValidators.inArray[String](supportedLosses))


for a log-loss, should we put it into ml.classification?

for a log-loss, should we put it into ml.classification?

I will add a FM Classifier later.

I guess the question is then, do you want to add log-loss later? it won't make sense for a regression

@srowen When loss is logloss, labels must in {0, 1}. So, I am planning to add FMClassifier for logloss, original FactorizationMachines changes to FMRegressor for mse, then remove loss parameter. What do you think?

I would implement it all in one pull request then. My concern is that we're coming up on Spark 3 and we would not want to release part of this only to change the API after.

OK, I will implement Classifier and Regressor in this PR. Thank you for your suggest.

mob-ai · 2019-10-25T09:28:16Z

I already resolve change requested. Then I will add the FM classifier (logloss in FM), which needs a few days.

srowen · 2019-10-25T13:12:49Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+  import FactorizationMachines._
+
+  /**
+   * Param for dimensionality of the factors (&gt;= 0)


Nit: Must be greater than 0 right? the check below is correct

srowen · 2019-10-25T13:16:45Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+   */
+  @Since("3.0.0")
+  final val numFactors: IntParam = new IntParam(this, "numFactors",
+    "dimensionality of the factor vectors, " +


Nit: the doc strings should start with a Capital

srowen · 2019-10-25T13:17:12Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+   */
+  @Since("3.0.0")
+  final val regParam: DoubleParam = new DoubleParam(this, "regParam",
+    "the parameter of l2-regularization term, " +


Nit: l2 -> L2
Nit: can we say "strength" rather than "parameter" to clarify a little?
Rather than describe what L2 regularization is, what is being regularized? i'm just looking for a tiny bit more specificity.

srowen · 2019-10-25T13:18:20Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+   */
+  @Since("3.0.0")
+  def setLoss(value: String): this.type = set(loss, value)
+  setDefault(loss -> LogisticLoss)


Maybe I'm just not familiar here, but if this is a regressor, can you even use logistic loss?

srowen · 2019-10-25T13:19:41Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+      (if ($(fitBias)) 1 else 0)
+    val initialCoefficients =
+      Vectors.dense(Array.fill($(numFactors) * numFeatures)(Random.nextGaussian() * $(initStd)) ++
+        (if ($(fitLinear)) Array.fill(numFeatures)(0.0) else Array.empty[Double]) ++


Nit: where you call Array.fill(...)(0.0) just new Array[Double](...). It doesn't matter much but no need to set the array to 0 as this is already its value. I dont' feel strongly about it.

another nit, Array.emptyDoubleArray

another nit, Array.emptyDoubleArray

The difference between Array.emptyDoubleArray and Array.empty[Double] is?

another nit, Array.emptyDoubleArray

Array.emptyDoubleArray is a val, Array.empty[Double] will create a new object. So Array.emptyDoubleArray is more efficient, right?

python/pyspark/ml/regression.py

zhengruifeng

In general, I think it maybe better to move the impl to ml.regression as a private class, and add wrappers in both ml.regression & ml.classification.
And I think we need more suites, (such as withBias/withoutBias/withLinear/withoutLinear). since it will be an important impl.

zhengruifeng · 2019-10-29T03:31:18Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+    train(dataset, handlePersistence)
+  }
+
+  protected[spark] def train(


I guess we do not need to define this function. Other impls have this function for call in the .mllib side.

I guess we do not need to define this function. Other impls have this function for call in the .mllib side.

Yes, ml's Estimator use .fit() public function (not .train()). But in .fit() source code, it will do something then call a train function, so I need to implement it. The train function is not a public function, users can only use fit function.

zhengruifeng · 2019-10-29T03:33:58Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+      (if ($(fitBias)) 1 else 0)
+    val initialCoefficients =
+      Vectors.dense(Array.fill($(numFactors) * numFeatures)(Random.nextGaussian() * $(initStd)) ++
+        (if ($(fitLinear)) Array.fill(numFeatures)(0.0) else Array.empty[Double]) ++


another nit, Array.emptyDoubleArray

zhengruifeng · 2019-10-29T03:37:17Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+      case AdamW => new AdamWUpdater(coefficientsSize)
+    }
+
+    val optimizer = new GradientDescent(gradient, updater)


BTW, If both FM and MLP still need mini-batch solver, we may move it to the .ml side in the future, to avoid vector conversion.

BTW, If both FM and MLP still need mini-batch solver, we may move it to the .ml side in the future, to avoid vector conversion.

I agree, and I think GradientDescent supports aggregateDepth parameter, seed parameter, weightCol parameter and logInfo loss value per epoch will be better.

zhengruifeng · 2019-10-29T03:38:29Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+@Since("3.0.0")
+class FactorizationMachinesModel (
+    @Since("3.0.0") override val uid: String,
+    @Since("3.0.0") val coefficients: OldVector,


I'd prefer a ml.vector here, just call .asML method.

zhengruifeng · 2019-10-29T03:41:04Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+
+  protected def getLoss(rawPrediction: Double, label: Double): Double
+
+  private val sumVX = Array.fill(numFactors)(0.0)


will this cause potential thread safety issue?

will this cause potential thread safety issue?

I think it is thread safety.

The GD calculates the gradient code: (spark/mllib/optimization/GradientDescent.scala: 239)
GD will use treeAggregate to calculate gradient.

.treeAggregate((BDV.zeros[Double](n), 0.0, 0L))( seqOp = (c, v) => { // c: (grad, loss, count), v: (label, features) val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1)) (c._1, c._2 + l, c._3 + 1) }, combOp = (c1, c2) => { // c: (grad, loss, count) (c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3) })

treeAggregate code: (spark/rdd/RDD.scala: 1201)
treeAggregate will call mapPartitions to calculate local gradient pre partition. And in a partition, it will run it.aggregate.

val cleanSeqOp = context.clean(seqOp) val cleanCombOp = context.clean(combOp) val aggregatePartition = (it: Iterator[T]) => it.aggregate(zeroValue)(cleanSeqOp, cleanCombOp) var partiallyAggregated: RDD[U] = mapPartitions(it => Iterator(aggregatePartition(it)))

scala aggregate code: (scala/collection/TraversableOnce.scala: 214)
aggregate only use seqop, I think it means cumulative gradient will be calculated sequentially in a partition.

def aggregate[B](z: =>B)(seqop: (B, A) => B, combop: (B, B) => B): B = foldLeft(z)(seqop)

In this PR, method getRawPrediction is not only used in training, but also used in prediction. What if user call method predict concurrently?

In this PR, method getRawPrediction is not only used in training, but also used in prediction. What if user call method predict concurrently?

When user call method predict, val sumVX will not be used. (sumVX is only used when calculate gradient)

But I will change it to avoid more worry.

zhengruifeng · 2019-10-29T03:46:49Z

mllib/src/main/scala/org/apache/spark/ml/regression/FactorizationMachines.scala

+}
+
+private[ml] class AdamWUpdater(weightSize: Int) extends Updater with Logging {
+  val beta1: Double = 0.9


these params are not exposed to end users?

these params are not exposed to end users?

I also considered this. But I was warried about that if someone adds better solver, then the solver's parameters will be too much. And in AdamW, these parameters almost not be tuning. So I am not exposed these parameters for now.

zhengruifeng · 2019-10-29T04:01:13Z

In practice I am using FM/FFM, and IMHO SSP or ASYNC solvers (like Difacto/PS-lite) seems more efficient than mini-batch solvers.
I'm wondering whether it is possible to include a simple PS framework(like glint ) in the future.

mob-ai · 2019-10-31T08:43:42Z

I already resolved change requested. And I splited FM to FMClassifier and FMRegressor.

zhengruifeng · 2019-11-01T03:06:36Z

mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala

+  extends ProbabilisticClassifier[Vector, FMClassifier, FMClassifierModel]
+  with FMClassifierParams with DefaultParamsWritable with Logging {
+
+  import org.apache.spark.ml.regression.BaseFactorizationMachinesGradient.{LogisticLoss, parseLoss}


Can we put those import above?

zhengruifeng · 2019-11-01T03:08:30Z

mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala

+    train(dataset, handlePersistence)
+  }
+
+  protected[spark] def train(


I still think that this method is not needed, just create a singe def train(dataset: Dataset[_]): FMClassifierModel?

I still think that this method is not needed, just create a singe def train(dataset: Dataset[_]): FMClassifierModel?

@zhengruifeng I refer to the implementation of LR. It will handle persistence. If dataset not be persisted, it would persist the dataset, and release the cache after train finish.
Or should I cache dataset regardless of whether the dataset be cached?

LogisticRegression.scala: 481

override protected[spark] def train(dataset: Dataset[_]): LogisticRegressionModel = { val handlePersistence = dataset.storageLevel == StorageLevel.NONE train(dataset, handlePersistence) } protected[spark] def train( dataset: Dataset[_], handlePersistence: Boolean): LogisticRegressionModel = instrumented { instr => val instances = extractInstances(dataset) if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK) // train model code if (handlePersistence) instances.unpersist()

Yes, but this method def train(dataset: Dataset[_], handlePersistence: Boolean) is added for the old mllib.LogisticRegression.

As a new alg, FM do not need to have this method.

Or should I cache dataset regardless of whether the dataset be cached?

you should cache the dataset, refer to LinearSVC

override protected[spark] def train(dataset: Dataset[_]): FMClassifierModel = { val handlePersistence = dataset.storageLevel == StorageLevel.NONE val data: RDD[(Double, OldVector)] = ... if (handlePersistence) data.persist(StorageLevel.MEMORY_AND_DISK) ... }

zhengruifeng · 2019-11-01T03:10:34Z

mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala

+      .setRegParam($(regParam))
+      .setMiniBatchFraction($(miniBatchFraction))
+      .setConvergenceTol($(tol))
+    val coefficients = optimizer.optimize(data, initialCoefficients)


If the real training dataset is data not instances, why not cache data instead?

zhengruifeng · 2019-11-01T03:11:15Z

mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala

+  extends ProbabilisticClassificationModel[Vector, FMClassifierModel]
+    with FMClassifierParams with MLWritable {
+
+  import org.apache.spark.ml.regression.BaseFactorizationMachinesGradient.LogisticLoss


I guess we can put this import above

zhengruifeng · 2019-11-01T03:12:04Z

mllib/src/main/scala/org/apache/spark/ml/classification/FMClassifier.scala

+   *   [i * numFactors + f] denotes i-th feature and f-th factor
+   * Following indices are 1-way coefficients and global bias.
+   */
+  private val oldCoefficients: OldVector = coefficients


I guess we can mark it lazy and transient?

zhengruifeng · 2019-11-01T03:19:47Z

mllib/src/main/scala/org/apache/spark/ml/regression/FMRegressor.scala

+  final def getInitStd: Double = $(initStd)
+
+  /** String name for "gd". */
+  private[ml] val GD = "gd"


I think GD, AdamW, supportedSolvers should be defined in an object

zhengruifeng · 2019-11-01T03:20:19Z

mllib/src/main/scala/org/apache/spark/ml/regression/FMRegressor.scala

+  with FMRegressorParams with DefaultParamsWritable with Logging {
+
+  import org.apache.spark.ml.regression.BaseFactorizationMachinesGradient.{SquaredError, parseLoss}
+  import org.apache.spark.ml.regression.FMRegressor.initCoefficients


zhengruifeng · 2019-11-01T03:21:06Z

mllib/src/main/scala/org/apache/spark/ml/regression/FMRegressor.scala

+   */
+  @Since("3.0.0")
+  def setMiniBatchFraction(value: Double): this.type = {
+    require(value > 0 && value <= 1.0,


ParamValidators.inRange(0, 1, false, true) already checks input value

zhengruifeng · 2019-11-01T03:21:36Z

mllib/src/main/scala/org/apache/spark/ml/regression/FMRegressor.scala

+      .setRegParam($(regParam))
+      .setMiniBatchFraction($(miniBatchFraction))
+      .setConvergenceTol($(tol))
+    val coefficients = optimizer.optimize(data, initialCoefficients)


mllib/src/test/scala/org/apache/spark/ml/classification/FMClassifierSuite.scala

srowen · 2019-12-12T15:34:18Z

Looking reasonable to me. @zhengruifeng ?

zhengruifeng · 2019-12-13T05:57:45Z

LGTM. @mob-ai You can open another ticket for doc & examples, after this pr get merged.

mob-ai · 2019-12-13T10:08:03Z

LGTM. @mob-ai You can open another ticket for doc & examples, after this pr get merged.

ok, "another ticket" means "another PR"?

srowen · 2019-12-13T10:32:53Z

You can make another PR for the same JIRA, or really, just add it here.

mob-ai · 2019-12-18T04:17:03Z

add examples and docs

zhengruifeng · 2019-12-18T10:54:28Z

@mob-ai There seems still some comments from @srowen has not been resolved ?

mob-ai · 2019-12-18T11:25:39Z

@mob-ai There seems still some comments from @srowen has not been resolved ?

All comments are resolved...I think...

Should I click all "resolve conversation" button? I realized just now...

srowen · 2019-12-18T13:30:05Z

#26124 (comment)
I think it'd be worthwhile to at least include a section in the MLlib docs on this new component. It can happen here rather than separately, if you're going to do it.

mob-ai · 2019-12-19T04:27:14Z

I added a new section for FM to describe the background of FM. And FM examples are placed in classification section and regression section.

srowen

Looking pretty good

srowen · 2019-12-19T21:09:23Z

docs/ml-classification-regression.md

+
+Factorization machines are able to estimate interactions between features even in problems with huge sparsity.
+For more background and more details about the implementation of factorization machines,
+refer to [Factorization Machines section](ml-classification-regression.html#factorization-machines).


to -> to the

srowen · 2019-12-19T21:09:37Z

docs/ml-classification-regression.md

+
+The following examples load a dataset in LibSVM format, split it into training and test sets,
+train on the first dataset, and then evaluate on the held-out test set.
+We scale features between 0 and 1 to prevent the exploding gradient problem.


between -> to be between

srowen · 2019-12-19T21:10:01Z

docs/ml-classification-regression.md

+# Factorization Machines
+
+[Factorization Machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) are able to estimate interactions
+between features even in problems with huge sparsity (like advertising and recommendation system).


You maybe don't have to repeat this sentence 3 times. You can omit it above?

srowen · 2019-12-19T21:10:26Z

docs/ml-classification-regression.md

+  \sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j
+$$
+
+First two terms denote intercept and linear term (as same as linear regression),


First -> The first
as same as -> same as in

srowen · 2019-12-19T21:10:40Z

docs/ml-classification-regression.md

+$$
+
+First two terms denote intercept and linear term (as same as linear regression),
+and last term denotes pairwise interactions term. $$v_i$$ describes the i-th variable


last -> the last

srowen · 2019-12-19T21:11:02Z

docs/ml-classification-regression.md

+and last term denotes pairwise interactions term. $$v_i$$ describes the i-th variable
+with k factors.
+
+FM can be used directly as the regression and optimization criterion is mean square error. FM also can be used as


"used directly for regression because the optimization criterion is mean squared error"?

srowen · 2019-12-19T21:11:57Z

docs/ml-classification-regression.md

+with k factors.
+
+FM can be used directly as the regression and optimization criterion is mean square error. FM also can be used as
+the binary classification through wrap sigmoid function, the optimization criterion is logit loss.


the binary -> for binary
What is a wrap sigmoid function.
, the optimization -> . The optimization criterion
Is it "logistic loss" BTW?

srowen · 2019-12-19T21:12:14Z

docs/ml-classification-regression.md

+This equation has only linear complexity in both k and n - i.e. its computation is in $$O(kn)$$.
+
+In general, in order to prevent the exploding gradient problem, it is best to scale continuous features between 0 and 1,
+or bin the continuous features and one-hot.


one-hot encode them

mob-ai · 2019-12-20T06:55:27Z

update fm docs

srowen · 2019-12-23T16:11:27Z

Merged to master

srowen · 2019-12-23T16:12:50Z

Ack, wait, this didn't pass tests after the last commit. Let me monitor

cloud-fan · 2019-12-24T05:48:20Z

all the PR builders are broken now, with pyspark ML test failures. I'm reverting it to unblock other PRs. Please re-open it later, thanks!

HyukjinKwon · 2019-12-24T06:07:41Z

Thanks, all!

HeartSaVioR · 2019-12-24T06:11:08Z

Ah OK that's the reason of pyspark failure. Nice finding and thanks for the quick action!

mob-ai · 2019-12-24T08:58:37Z

@srowen I used following command to run pyspark tests. And I fixed FM python doc error.

python/run-tests --modules pyspark-ml

I'm very sorry about the mistake.

I added a new commit, but not display in this PR. What should I do?

HeartSaVioR · 2019-12-24T09:29:58Z

@mob-ai
No worries. You can create a new PR with same content (title/description/commits) + additional commit.

mob-ai · 2019-12-24T09:34:38Z

@mob-ai
No worries. You can create a new PR with same content (title/description/commits) + additional commit.

thanks!

srowen · 2019-12-24T13:23:13Z

@cloud-fan yep this was entirely my mistake - looked at the wrong PR tab open in my browser to see if it had passed and merged it when it had not. My bad, I meant to go check and revert if needed this morning but you beat me to it

…omponent ### What changes were proposed in this pull request? Implement Factorization Machines as a ml-pipeline component 1. loss function supports: logloss, mse 2. optimizer: GD, adamW ### Why are the changes needed? Factorization Machines is widely used in advertising and recommendation system to estimate CTR(click-through rate). Advertising and recommendation system usually has a lot of data, so we need Spark to estimate the CTR, and Factorization Machines are common ml model to estimate CTR. References: 1. S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010. https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf ### Does this PR introduce any user-facing change? No ### How was this patch tested? run unit tests Closes apache#26124 from mob-ai/ml/fm. Authored-by: zhanjf <zhanjf@mob.com> Signed-off-by: Sean Owen <srowen@gmail.com>

mobai-zhanjf added 8 commits October 9, 2019 09:43

add Factorization Machines

1734b62

fix bug: Param[Boolean] change to BooleanParam

da23709

update solver and loss api

842112a

update alias oldXXX to OldXXX

edf26c3

update Since2.4.3 to Since3.0.0

fc8ff75

add more unit test

71292c3

impl python FM

0724168

update coeff init

580a9fc

dongjoon-hyun added the ML label Oct 15, 2019

mobai-zhanjf added 3 commits October 15, 2019 16:00

update comments

f14884e

update fm python style

9ecc33d

fix python style

9147406

srowen requested changes Oct 20, 2019

View reviewed changes

zhengruifeng reviewed Oct 22, 2019

View reviewed changes

srowen mentioned this pull request Oct 22, 2019

[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #25909

Closed

resolve change requested

f401976

srowen requested changes Oct 25, 2019

View reviewed changes

resolve change requests

a7b5c49

zhengruifeng reviewed Oct 29, 2019

View reviewed changes

FM split to FMClassifier and FMRegressor

5ee87a1

zhengruifeng reviewed Nov 1, 2019

View reviewed changes

mobai-zhanjf added 3 commits November 4, 2019 10:36

add FactorizationMachines class to control common code

d69ff70

coefficients split to bias/linearVector/factorMatrix

30602cc

update fm testcase

c305e3c

mob-ai requested review from srowen and zhengruifeng November 6, 2019 03:27

mob-ai requested review from srowen and zhengruifeng December 12, 2019 03:27

zhengruifeng approved these changes Dec 13, 2019

View reviewed changes

add fm examples and docs

1a5f6a5

update fm docs

facc011

srowen reviewed Dec 19, 2019

View reviewed changes

update docs

8212e03

srowen approved these changes Dec 20, 2019

View reviewed changes

srowen closed this in c6ab716 Dec 23, 2019


		protected def getLoss(rawPrediction: Double, label: Double): Double

		private val sumVX = Array.fill(numFactors)(0.0)

[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #26124

[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #26124

Uh oh!

Conversation

mob-ai commented Oct 15, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

mob-ai commented Oct 15, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mob-ai commented Oct 25, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mob-ai Oct 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zhengruifeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mob-ai Oct 29, 2019 •

edited

Loading

zhengruifeng Nov 1, 2019 •

edited

Loading