Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-12247] [ML] [DOC] Documentation for spark.ml's ALS and collaborative filtering in general #10411

Closed
wants to merge 30 commits into from

Conversation

BenFradet
Copy link
Contributor

This documents the implementation of ALS in spark.ml with example code in scala, java and python.

@SparkQA
Copy link

SparkQA commented Dec 21, 2015

Test build #48109 has finished for PR 10411 at commit 0787362.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public class JavaALSExample\n

@BenFradet
Copy link
Contributor Author

cc @thunterdb

@SparkQA
Copy link

SparkQA commented Dec 21, 2015

Test build #48130 has finished for PR 10411 at commit ab0f301.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public class JavaALSExample\n * case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: Long)\n

@SparkQA
Copy link

SparkQA commented Dec 22, 2015

Test build #48174 has finished for PR 10411 at commit 3a860b1.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public class JavaALSExample\n * public static class Rating implements Serializable\n * case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: Long)\n

@SparkQA
Copy link

SparkQA commented Dec 22, 2015

Test build #48176 has finished for PR 10411 at commit b086ffd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):\n * public class JavaALSExample\n * public static class Rating implements Serializable\n * case class Rating(userId: Int, movieId: Int, rating: Float, timestamp: Long)\n

@SparkQA
Copy link

SparkQA commented Dec 23, 2015

Test build #48234 has finished for PR 10411 at commit e336ebd.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Dec 24, 2015

Test build #48308 has finished for PR 10411 at commit 4176788.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

DataFrame rawPredictions = model.transform(test);
DataFrame predictions = rawPredictions
.withColumn("rating", rawPredictions.col("rating").cast(DataTypes.DoubleType))
.withColumn("prediction", rawPredictions.col("prediction").cast(DataTypes.DoubleType));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a better way to do this, input welcome.

@BenFradet
Copy link
Contributor Author

pinging @thunterdb and @jkbradley

@SparkQA
Copy link

SparkQA commented Feb 9, 2016

Test build #50995 has finished for PR 10411 at commit 2603e42.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BenFradet
Copy link
Contributor Author

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented Feb 10, 2016

Test build #51032 has finished for PR 10411 at commit 2603e42.

  • This patch fails Scala style tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Feb 10, 2016

Test build #51051 has finished for PR 10411 at commit 9021f36.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@mengxr
Copy link
Contributor

mengxr commented Feb 11, 2016

@srowen @coderxiang Do you have time to review this PR?

import sqlContext.implicits._

// $example on$
val ratings = sc.textFile("data/mllib/als/sample_movielens_ratings.txt")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this file was removed though right? is it because we can't distribute even a sample of it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, the one removed is sample_movielens_movies.txt as it was only used in MovieLens.scala which has been removed, cf the discussion in the jira.

@BenFradet
Copy link
Contributor Author

@srowen thanks for the review, will make the necessary changes.

It is common in many real-world use cases to only have access to *implicit feedback* (e.g. views,
clicks, purchases, likes, shares etc.). The approach used in `spark.mllib` to deal with such data is taken
from [Collaborative Filtering for Implicit Feedback Datasets](http://dx.doi.org/10.1109/ICDM.2008.22).
Essentially, instead of trying to model the matrix of ratings directly, this approach treats the data
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen tried to take your remarks into account, I don't know if it's clearer now though.

@SparkQA
Copy link

SparkQA commented Feb 13, 2016

Test build #51242 has finished for PR 10411 at commit 7e72c60.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 14, 2016

I'm OK merging this

@srowen
Copy link
Member

srowen commented Feb 15, 2016

@BenFradet yeah I like your last edit. If you're willing to make that change and the sentence fragment change I'll merge

@BenFradet
Copy link
Contributor Author

Great, I'll do that later today.

@SparkQA
Copy link

SparkQA commented Feb 15, 2016

Test build #51318 has finished for PR 10411 at commit 9b351e9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@srowen
Copy link
Member

srowen commented Feb 16, 2016

Merged to master

@asfgit asfgit closed this in 00c72d2 Feb 16, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants