Skip to content

Conversation

@mob-ai
Copy link

@mob-ai mob-ai commented Oct 15, 2019

What changes were proposed in this pull request?

Implement Factorization Machines as a ml-pipeline component

  1. loss function supports: logloss, mse
  2. optimizer: GD, adamW

Why are the changes needed?

Factorization Machines is widely used in advertising and recommendation system to estimate CTR(click-through rate).
Advertising and recommendation system usually has a lot of data, so we need Spark to estimate the CTR, and Factorization Machines are common ml model to estimate CTR.
References:

  1. S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
    https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

Does this PR introduce any user-facing change?

No

How was this patch tested?

run unit tests

@mob-ai
Copy link
Author

mob-ai commented Oct 15, 2019

@srowen I don't know how to change PR target branch, so I open a new PR. I have checkout branch base master(spark 3.0.0), and implement FM python api.
@zhengruifeng I plat the convergence curves use normal GD and adamW. https://issues.apache.org/jira/browse/SPARK-29224

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks promising

import org.apache.spark.storage.StorageLevel

/**
* Params for Factorization Machines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any paper you can link to to explain the implementation? or, just a few paragraphs about what the implementation does?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any paper you can link to to explain the implementation? or, just a few paragraphs about what the implementation does?

FM paper: S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean, this should be in the docs along with a brief explanation of what it does ... enough that an informed reader understands how the parameters you expose map to the algorithm in the paper (which is probably straightforward)

*/
@Since("3.0.0")
final val regParam: DoubleParam = new DoubleParam(this, "regParam",
"regularization for L2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be clarified a bit: strength of regularization? does this correspond to some parameter in a paper, some 'alpha'? etc . You can add a validator to ensure it's not negative, too. See some other DoubleParam for an example.

* @group param
*/
@Since("3.0.0")
final val miniBatchFraction: DoubleParam = new DoubleParam(this, "miniBatchFraction",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise, minibatch of what? and can be validated to be nonnegative/positive?

(if ($(fitLinear)) Array.fill(numFeatures)(0.0) else Array.empty[Double]) ++
(if ($(fitBias)) Array.fill(1)(0.0) else Array.empty[Double]))

val data = instances.map{ case OldLabeledPoint(label, features) => (label, features) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: space before brace

}
(0 until numFactors).foreach { f =>
var sumSquare = 0.0
var squareSum = 0.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this looks like a 'sum' not squared sum.

}

private[ml] class AdamWUpdater(weightSize: Int) extends Updater with Logging {
var beta1: Double = 0.9
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these be val?

@Since("3.0.0")
final override val loss: Param[String] = new Param[String](this, "loss", "The loss function to" +
s" be optimized. Supported options: ${supportedLosses.mkString(", ")}. (Default logisticLoss)",
ParamValidators.inArray[String](supportedLosses))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a log-loss, should we put it into ml.classification?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for a log-loss, should we put it into ml.classification?

I will add a FM Classifier later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is then, do you want to add log-loss later? it won't make sense for a regression

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen When loss is logloss, labels must in {0, 1}. So, I am planning to add FMClassifier for logloss, original FactorizationMachines changes to FMRegressor for mse, then remove loss parameter. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would implement it all in one pull request then. My concern is that we're coming up on Spark 3 and we would not want to release part of this only to change the API after.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will implement Classifier and Regressor in this PR. Thank you for your suggest.

@mob-ai
Copy link
Author

mob-ai commented Oct 25, 2019

I already resolve change requested. Then I will add the FM classifier (logloss in FM), which needs a few days.

import FactorizationMachines._

/**
* Param for dimensionality of the factors (>= 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Must be greater than 0 right? the check below is correct

*/
@Since("3.0.0")
final val numFactors: IntParam = new IntParam(this, "numFactors",
"dimensionality of the factor vectors, " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the doc strings should start with a Capital

*/
@Since("3.0.0")
final val regParam: DoubleParam = new DoubleParam(this, "regParam",
"the parameter of l2-regularization term, " +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: l2 -> L2
Nit: can we say "strength" rather than "parameter" to clarify a little?
Rather than describe what L2 regularization is, what is being regularized? i'm just looking for a tiny bit more specificity.

*/
@Since("3.0.0")
def setLoss(value: String): this.type = set(loss, value)
setDefault(loss -> LogisticLoss)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm just not familiar here, but if this is a regressor, can you even use logistic loss?

(if ($(fitBias)) 1 else 0)
val initialCoefficients =
Vectors.dense(Array.fill($(numFactors) * numFeatures)(Random.nextGaussian() * $(initStd)) ++
(if ($(fitLinear)) Array.fill(numFeatures)(0.0) else Array.empty[Double]) ++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: where you call Array.fill(...)(0.0) just new Array[Double](...). It doesn't matter much but no need to set the array to 0 as this is already its value. I dont' feel strongly about it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit, Array.emptyDoubleArray

Copy link
Author

@mob-ai mob-ai Oct 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit, Array.emptyDoubleArray

The difference between Array.emptyDoubleArray and Array.empty[Double] is?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit, Array.emptyDoubleArray

Array.emptyDoubleArray is a val, Array.empty[Double] will create a new object. So Array.emptyDoubleArray is more efficient, right?

Copy link
Contributor

@zhengruifeng zhengruifeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I think it maybe better to move the impl to ml.regression as a private class, and add wrappers in both ml.regression & ml.classification.
And I think we need more suites, (such as withBias/withoutBias/withLinear/withoutLinear). since it will be an important impl.

train(dataset, handlePersistence)
}

protected[spark] def train(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we do not need to define this function. Other impls have this function for call in the .mllib side.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we do not need to define this function. Other impls have this function for call in the .mllib side.

Yes, ml's Estimator use .fit() public function (not .train()). But in .fit() source code, it will do something then call a train function, so I need to implement it. The train function is not a public function, users can only use fit function.

(if ($(fitBias)) 1 else 0)
val initialCoefficients =
Vectors.dense(Array.fill($(numFactors) * numFeatures)(Random.nextGaussian() * $(initStd)) ++
(if ($(fitLinear)) Array.fill(numFeatures)(0.0) else Array.empty[Double]) ++
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit, Array.emptyDoubleArray

case AdamW => new AdamWUpdater(coefficientsSize)
}

val optimizer = new GradientDescent(gradient, updater)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, If both FM and MLP still need mini-batch solver, we may move it to the .ml side in the future, to avoid vector conversion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, If both FM and MLP still need mini-batch solver, we may move it to the .ml side in the future, to avoid vector conversion.

I agree, and I think GradientDescent supports aggregateDepth parameter, seed parameter, weightCol parameter and logInfo loss value per epoch will be better.

@Since("3.0.0")
class FactorizationMachinesModel (
@Since("3.0.0") override val uid: String,
@Since("3.0.0") val coefficients: OldVector,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer a ml.vector here, just call .asML method.


protected def getLoss(rawPrediction: Double, label: Double): Double

private val sumVX = Array.fill(numFactors)(0.0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this cause potential thread safety issue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will this cause potential thread safety issue?

I think it is thread safety.

The GD calculates the gradient code: (spark/mllib/optimization/GradientDescent.scala: 239)
GD will use treeAggregate to calculate gradient.

        .treeAggregate((BDV.zeros[Double](n), 0.0, 0L))(
          seqOp = (c, v) => {
            // c: (grad, loss, count), v: (label, features)
            val l = gradient.compute(v._2, v._1, bcWeights.value, Vectors.fromBreeze(c._1))
            (c._1, c._2 + l, c._3 + 1)
          },
          combOp = (c1, c2) => {
            // c: (grad, loss, count)
            (c1._1 += c2._1, c1._2 + c2._2, c1._3 + c2._3)
          })

treeAggregate code: (spark/rdd/RDD.scala: 1201)
treeAggregate will call mapPartitions to calculate local gradient pre partition. And in a partition, it will run it.aggregate.

      val cleanSeqOp = context.clean(seqOp)
      val cleanCombOp = context.clean(combOp)
      val aggregatePartition =
        (it: Iterator[T]) => it.aggregate(zeroValue)(cleanSeqOp, cleanCombOp)
      var partiallyAggregated: RDD[U] = mapPartitions(it => Iterator(aggregatePartition(it)))

scala aggregate code: (scala/collection/TraversableOnce.scala: 214)
aggregate only use seqop, I think it means cumulative gradient will be calculated sequentially in a partition.

  def aggregate[B](z: =>B)(seqop: (B, A) => B, combop: (B, B) => B): B = foldLeft(z)(seqop)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, method getRawPrediction is not only used in training, but also used in prediction. What if user call method predict concurrently?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, method getRawPrediction is not only used in training, but also used in prediction. What if user call method predict concurrently?

When user call method predict, val sumVX will not be used. (sumVX is only used when calculate gradient)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I will change it to avoid more worry.

}

private[ml] class AdamWUpdater(weightSize: Int) extends Updater with Logging {
val beta1: Double = 0.9
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these params are not exposed to end users?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these params are not exposed to end users?

I also considered this. But I was warried about that if someone adds better solver, then the solver's parameters will be too much. And in AdamW, these parameters almost not be tuning. So I am not exposed these parameters for now.

@zhengruifeng
Copy link
Contributor

In practice I am using FM/FFM, and IMHO SSP or ASYNC solvers (like Difacto/PS-lite) seems more efficient than mini-batch solvers.
I'm wondering whether it is possible to include a simple PS framework(like glint ) in the future.

@mob-ai
Copy link
Author

mob-ai commented Oct 31, 2019

I already resolved change requested. And I splited FM to FMClassifier and FMRegressor.

extends ProbabilisticClassifier[Vector, FMClassifier, FMClassifierModel]
with FMClassifierParams with DefaultParamsWritable with Logging {

import org.apache.spark.ml.regression.BaseFactorizationMachinesGradient.{LogisticLoss, parseLoss}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put those import above?

train(dataset, handlePersistence)
}

protected[spark] def train(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that this method is not needed, just create a singe def train(dataset: Dataset[_]): FMClassifierModel?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think that this method is not needed, just create a singe def train(dataset: Dataset[_]): FMClassifierModel?

@zhengruifeng I refer to the implementation of LR. It will handle persistence. If dataset not be persisted, it would persist the dataset, and release the cache after train finish.
Or should I cache dataset regardless of whether the dataset be cached?

LogisticRegression.scala: 481

  override protected[spark] def train(dataset: Dataset[_]): LogisticRegressionModel = {
    val handlePersistence = dataset.storageLevel == StorageLevel.NONE
    train(dataset, handlePersistence)
  }

  protected[spark] def train(
      dataset: Dataset[_],
      handlePersistence: Boolean): LogisticRegressionModel = instrumented { instr =>
    val instances = extractInstances(dataset)

    if (handlePersistence) instances.persist(StorageLevel.MEMORY_AND_DISK)

    // train model code

    if (handlePersistence) instances.unpersist()

Copy link
Contributor

@zhengruifeng zhengruifeng Nov 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this method def train(dataset: Dataset[_], handlePersistence: Boolean) is added for the old mllib.LogisticRegression.

As a new alg, FM do not need to have this method.

Or should I cache dataset regardless of whether the dataset be cached?

you should cache the dataset, refer to LinearSVC

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  override protected[spark] def train(dataset: Dataset[_]): FMClassifierModel = {
    val handlePersistence = dataset.storageLevel == StorageLevel.NONE
    val data: RDD[(Double, OldVector)] = ...
    if (handlePersistence) data.persist(StorageLevel.MEMORY_AND_DISK)
    ...
  }

.setRegParam($(regParam))
.setMiniBatchFraction($(miniBatchFraction))
.setConvergenceTol($(tol))
val coefficients = optimizer.optimize(data, initialCoefficients)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the real training dataset is data not instances, why not cache data instead?

extends ProbabilisticClassificationModel[Vector, FMClassifierModel]
with FMClassifierParams with MLWritable {

import org.apache.spark.ml.regression.BaseFactorizationMachinesGradient.LogisticLoss
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can put this import above

* [i * numFactors + f] denotes i-th feature and f-th factor
* Following indices are 1-way coefficients and global bias.
*/
private val oldCoefficients: OldVector = coefficients
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we can mark it lazy and transient?

final def getInitStd: Double = $(initStd)

/** String name for "gd". */
private[ml] val GD = "gd"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think GD, AdamW, supportedSolvers should be defined in an object

with FMRegressorParams with DefaultParamsWritable with Logging {

import org.apache.spark.ml.regression.BaseFactorizationMachinesGradient.{SquaredError, parseLoss}
import org.apache.spark.ml.regression.FMRegressor.initCoefficients
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

*/
@Since("3.0.0")
def setMiniBatchFraction(value: Double): this.type = {
require(value > 0 && value <= 1.0,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParamValidators.inRange(0, 1, false, true) already checks input value

.setRegParam($(regParam))
.setMiniBatchFraction($(miniBatchFraction))
.setConvergenceTol($(tol))
val coefficients = optimizer.optimize(data, initialCoefficients)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@mob-ai mob-ai requested review from srowen and zhengruifeng November 6, 2019 03:27
@srowen
Copy link
Member

srowen commented Dec 12, 2019

Looking reasonable to me. @zhengruifeng ?

@zhengruifeng
Copy link
Contributor

LGTM. @mob-ai You can open another ticket for doc & examples, after this pr get merged.

@mob-ai
Copy link
Author

mob-ai commented Dec 13, 2019

LGTM. @mob-ai You can open another ticket for doc & examples, after this pr get merged.

ok, "another ticket" means "another PR"?

@srowen
Copy link
Member

srowen commented Dec 13, 2019

You can make another PR for the same JIRA, or really, just add it here.

@mob-ai
Copy link
Author

mob-ai commented Dec 18, 2019

add examples and docs

@zhengruifeng
Copy link
Contributor

@mob-ai There seems still some comments from @srowen has not been resolved ?

@mob-ai
Copy link
Author

mob-ai commented Dec 18, 2019

@mob-ai There seems still some comments from @srowen has not been resolved ?

All comments are resolved...I think...

Should I click all "resolve conversation" button? I realized just now...

@srowen
Copy link
Member

srowen commented Dec 18, 2019

#26124 (comment)
I think it'd be worthwhile to at least include a section in the MLlib docs on this new component. It can happen here rather than separately, if you're going to do it.

@mob-ai
Copy link
Author

mob-ai commented Dec 19, 2019

I added a new section for FM to describe the background of FM. And FM examples are placed in classification section and regression section.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking pretty good


Factorization machines are able to estimate interactions between features even in problems with huge sparsity.
For more background and more details about the implementation of factorization machines,
refer to [Factorization Machines section](ml-classification-regression.html#factorization-machines).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to -> to the


The following examples load a dataset in LibSVM format, split it into training and test sets,
train on the first dataset, and then evaluate on the held-out test set.
We scale features between 0 and 1 to prevent the exploding gradient problem.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

between -> to be between

# Factorization Machines

[Factorization Machines](https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf) are able to estimate interactions
between features even in problems with huge sparsity (like advertising and recommendation system).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You maybe don't have to repeat this sentence 3 times. You can omit it above?

\sum\limits^n_{i=1} \sum\limits^n_{j=i+1} \langle v_i, v_j \rangle x_i x_j
$$

First two terms denote intercept and linear term (as same as linear regression),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First -> The first
as same as -> same as in

$$

First two terms denote intercept and linear term (as same as linear regression),
and last term denotes pairwise interactions term. $$v_i$$ describes the i-th variable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last -> the last

and last term denotes pairwise interactions term. $$v_i$$ describes the i-th variable
with k factors.

FM can be used directly as the regression and optimization criterion is mean square error. FM also can be used as
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"used directly for regression because the optimization criterion is mean squared error"?

with k factors.

FM can be used directly as the regression and optimization criterion is mean square error. FM also can be used as
the binary classification through wrap sigmoid function, the optimization criterion is logit loss.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the binary -> for binary
What is a wrap sigmoid function.
, the optimization -> . The optimization criterion
Is it "logistic loss" BTW?

This equation has only linear complexity in both k and n - i.e. its computation is in $$O(kn)$$.

In general, in order to prevent the exploding gradient problem, it is best to scale continuous features between 0 and 1,
or bin the continuous features and one-hot.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one-hot encode them

@mob-ai
Copy link
Author

mob-ai commented Dec 20, 2019

update fm docs

@srowen srowen closed this in c6ab716 Dec 23, 2019
@srowen
Copy link
Member

srowen commented Dec 23, 2019

Merged to master

@srowen
Copy link
Member

srowen commented Dec 23, 2019

Ack, wait, this didn't pass tests after the last commit. Let me monitor

@cloud-fan
Copy link
Contributor

cloud-fan commented Dec 24, 2019

all the PR builders are broken now, with pyspark ML test failures. I'm reverting it to unblock other PRs. Please re-open it later, thanks!

@HyukjinKwon
Copy link
Member

Thanks, all!

@HeartSaVioR
Copy link
Contributor

Ah OK that's the reason of pyspark failure. Nice finding and thanks for the quick action!

@mob-ai
Copy link
Author

mob-ai commented Dec 24, 2019

@srowen I used following command to run pyspark tests. And I fixed FM python doc error.

python/run-tests --modules pyspark-ml

I'm very sorry about the mistake.

I added a new commit, but not display in this PR. What should I do?

@HeartSaVioR
Copy link
Contributor

@mob-ai
No worries. You can create a new PR with same content (title/description/commits) + additional commit.

@mob-ai
Copy link
Author

mob-ai commented Dec 24, 2019

@mob-ai
No worries. You can create a new PR with same content (title/description/commits) + additional commit.

thanks!

@srowen
Copy link
Member

srowen commented Dec 24, 2019

@cloud-fan yep this was entirely my mistake - looked at the wrong PR tab open in my browser to see if it had passed and merged it when it had not. My bad, I meant to go check and revert if needed this morning but you beat me to it

fqaiser94 pushed a commit to fqaiser94/spark that referenced this pull request Mar 30, 2020
…omponent

### What changes were proposed in this pull request?

Implement Factorization Machines as a ml-pipeline component

1. loss function supports: logloss, mse
2. optimizer: GD, adamW

### Why are the changes needed?

Factorization Machines is widely used in advertising and recommendation system to estimate CTR(click-through rate).
Advertising and recommendation system usually has a lot of data, so we need Spark to estimate the CTR, and Factorization Machines are common ml model to estimate CTR.
References:

1. S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

run unit tests

Closes apache#26124 from mob-ai/ml/fm.

Authored-by: zhanjf <zhanjf@mob.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants