Skip to content

Conversation

@mob-ai
Copy link

@mob-ai mob-ai commented Dec 24, 2019

What changes were proposed in this pull request?

Implement Factorization Machines as a ml-pipeline component

  1. loss function supports: logloss, mse
  2. optimizer: GD, adamW

Why are the changes needed?

Factorization Machines is widely used in advertising and recommendation system to estimate CTR(click-through rate).
Advertising and recommendation system usually has a lot of data, so we need Spark to estimate the CTR, and Factorization Machines are common ml model to estimate CTR.
References:

  1. S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
    https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

Does this PR introduce any user-facing change?

No

How was this patch tested?

run unit tests

@mob-ai
Copy link
Author

mob-ai commented Dec 24, 2019

@srowen I opened a new PR to resolve pyspark unittests failure. I used following command to run pyspark tests. And I fixed FM python doc error.

python/run-tests --modules pyspark-ml

@SparkQA
Copy link

SparkQA commented Dec 24, 2019

Test build #4974 has finished for PR 27000 at commit d9cadc1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, tests pass now.

@srowen
Copy link
Member

srowen commented Dec 26, 2019

Merged to master

@srowen srowen closed this in 8d3eed3 Dec 26, 2019
fqaiser94 pushed a commit to fqaiser94/spark that referenced this pull request Mar 30, 2020
…omponent

### What changes were proposed in this pull request?

Implement Factorization Machines as a ml-pipeline component

1. loss function supports: logloss, mse
2. optimizer: GD, adamW

### Why are the changes needed?

Factorization Machines is widely used in advertising and recommendation system to estimate CTR(click-through rate).
Advertising and recommendation system usually has a lot of data, so we need Spark to estimate the CTR, and Factorization Machines are common ml model to estimate CTR.
References:

1. S. Rendle, “Factorization machines,” in Proceedings of IEEE International Conference on Data Mining (ICDM), pp. 995–1000, 2010.
https://www.csie.ntu.edu.tw/~b97053/paper/Rendle2010FM.pdf

### Does this PR introduce any user-facing change?

No

### How was this patch tested?

run unit tests

Closes apache#27000 from mob-ai/ml/fm.

Authored-by: zhanjf <zhanjf@mob.com>
Signed-off-by: Sean Owen <srowen@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants