[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #27000

mob-ai · 2019-12-24T09:43:37Z

What changes were proposed in this pull request?

Implement Factorization Machines as a ml-pipeline component

loss function supports: logloss, mse
optimizer: GD, adamW

Why are the changes needed?

Factorization Machines is widely used in advertising and recommendation system to estimate CTR(click-through rate).
Advertising and recommendation system usually has a lot of data, so we need Spark to estimate the CTR, and Factorization Machines are common ml model to estimate CTR.
References:

S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

Does this PR introduce any user-facing change?

No

How was this patch tested?

run unit tests

mob-ai · 2019-12-24T10:23:35Z

@srowen I opened a new PR to resolve pyspark unittests failure. I used following command to run pyspark tests. And I fixed FM python doc error.

python/run-tests --modules pyspark-ml

SparkQA · 2019-12-24T15:03:29Z

Test build #4974 has finished for PR 27000 at commit d9cadc1.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

srowen

Looks good, tests pass now.

srowen · 2019-12-26T17:40:00Z

Merged to master

…omponent ### What changes were proposed in this pull request? Implement Factorization Machines as a ml-pipeline component 1. loss function supports: logloss, mse 2. optimizer: GD, adamW ### Why are the changes needed? Factorization Machines is widely used in advertising and recommendation system to estimate CTR(click-through rate). Advertising and recommendation system usually has a lot of data, so we need Spark to estimate the CTR, and Factorization Machines are common ml model to estimate CTR. References: 1. S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010. https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf ### Does this PR introduce any user-facing change? No ### How was this patch tested? run unit tests Closes apache#27000 from mob-ai/ml/fm. Authored-by: zhanjf <zhanjf@mob.com> Signed-off-by: Sean Owen <srowen@gmail.com>

mobai-zhanjf added 29 commits October 9, 2019 09:43

add Factorization Machines

1734b62

fix bug: Param[Boolean] change to BooleanParam

da23709

update solver and loss api

842112a

update alias oldXXX to OldXXX

edf26c3

update Since2.4.3 to Since3.0.0

fc8ff75

add more unit test

71292c3

impl python FM

0724168

update coeff init

580a9fc

update comments

f14884e

update fm python style

9ecc33d

fix python style

9147406

resolve change requested

f401976

resolve change requests

a7b5c49

FM split to FMClassifier and FMRegressor

5ee87a1

add FactorizationMachines class to control common code

d69ff70

coefficients split to bias/linearVector/factorMatrix

30602cc

update fm testcase

c305e3c

change some details

07f37da

change sumVX logic

b3b260b

add handlePersistence

4735fbf

remove unused import

b96467c

solve change request

9bd6cbf

solve change request

9f5ec47

fitBias change to fitIntercept

52c92a0

resolve change requests

b1086e7

add fm examples and docs

1a5f6a5

update fm docs

facc011

update docs

8212e03

fix python doc bug

d9cadc1

srowen approved these changes Dec 26, 2019

View reviewed changes

srowen closed this in 8d3eed3 Dec 26, 2019

zero323 mentioned this pull request Jan 15, 2020

Sync with changes merged after 6378d4bc06cd1bb1a209bd5fb63d10ef52d75eb4 zero323/pyspark-stubs#230

Closed

47 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #27000

[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #27000

Uh oh!

mob-ai commented Dec 24, 2019

Uh oh!

mob-ai commented Dec 24, 2019

Uh oh!

SparkQA commented Dec 24, 2019

Uh oh!

srowen left a comment

Uh oh!

srowen commented Dec 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #27000

[SPARK-29224][ML]Implement Factorization Machines as a ml-pipeline component #27000

Uh oh!

Conversation

mob-ai commented Dec 24, 2019

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

mob-ai commented Dec 24, 2019

Uh oh!

SparkQA commented Dec 24, 2019

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

srowen commented Dec 26, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants