Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix #5455

Closed
wants to merge 4 commits into from

Conversation

MechCoder
Copy link
Contributor

Since sparse matrices now support a isTransposed flag for row major data, DenseMatrices should do the same.

@MechCoder
Copy link
Contributor Author

ping @mengxr Would be great if you could have a look. Also, there is one existing test that fails due to serialization (which I have commented out)

@MechCoder MechCoder changed the title [SPARK-6845] [Mllib] [PySpark] Add isTranposed flag to DenseMatrix [SPARK-6845] [MLlib] [PySpark] Add isTranposed flag to DenseMatrix Apr 10, 2015
@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30030 has finished for PR 5455 at commit ba02f6e.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class FPGrowthModel(JavaModelWrapper):
    • class FPGrowth(object):
    • class SparseMatrix(Matrix):
    • class FlattenedValuesSerializer(BatchedSerializer):
    • class ExternalList(object):
    • class ExternalListOfList(ExternalList):
    • class GroupByKey(object):
    • class ExternalGroupBy(ExternalMerger):
  • This patch does not change any dependencies.

@SparkQA
Copy link

SparkQA commented Apr 10, 2015

Test build #30038 has finished for PR 5455 at commit 4e6eaa1.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@MechCoder
Copy link
Contributor Author

The test failure seems unrelated.

@SparkQA
Copy link

SparkQA commented Apr 13, 2015

Test build #30168 has finished for PR 5455 at commit 0330540.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 13, 2015

@MechCoder I think you need to update the matrix SerDe in PythonMLlibAPI. Could you double check?

@MechCoder
Copy link
Contributor Author

@mengxr I pushed a non-working update in the last commit just to get feedback.

  1. I went through the code of SerDe in PythonMLlibAPI. I'm unable to understand the entire codepath. I see that somehow, class DenseMatrixPickler is affected but what exactly happens after https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala#L1101 ?
  2. How do I use PickleUtils to store boolean values. Here I converted to an int, and used integer_to_bytes but which exactly is the right way?

@SparkQA
Copy link

SparkQA commented Apr 14, 2015

Test build #30262 has finished for PR 5455 at commit 3ba7e25.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@MechCoder
Copy link
Contributor Author

@mengxr It would be really helpful if you could guide me on my two questions?

@SparkQA
Copy link

SparkQA commented Apr 17, 2015

Test build #30457 has finished for PR 5455 at commit 151f3b6.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@jkbradley
Copy link
Member

@MechCoder I'm not an expert here, but I'll ask around. I'm wondering if the failure in PythonMLLibAPISuite indicates that the issue is with the tuple (or tuple code), rather than with how isTransposed is stored.

I'm pretty sure storing the Boolean as an Int is fine.

@@ -985,6 +985,7 @@ private[spark] object SerDe extends Serializable {
val m: DenseMatrix = obj.asInstanceOf[DenseMatrix]
val bytes = new Array[Byte](8 * m.values.size)
val order = ByteOrder.nativeOrder()
val isTransposed = if (m.isTransposed) 1 else 0
ByteBuffer.wrap(bytes).order(order).asDoubleBuffer().put(m.values)

out.write(Opcodes.BININT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this line, put out.write(Opcodes.MARK), which marks the start of the tuple.

@MechCoder
Copy link
Contributor Author

@mengxr fixed !

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30665 has finished for PR 5455 at commit 004a37f.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

}
val bytes = getBytes(args(2))
val n = bytes.length / 8
val values = new Array[Double](n)
val order = ByteOrder.nativeOrder()
val isTransposed = if (args(3).asInstanceOf[Int] == 1) true else false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this line after ByteBuffer.wrap... and simplify it to val isTransposed = args(3).asInstanceOf[Int] == 1.

@mengxr
Copy link
Contributor

mengxr commented Apr 21, 2015

LGTM except one minor inline comment.

@MechCoder
Copy link
Contributor Author

done

@SparkQA
Copy link

SparkQA commented Apr 21, 2015

Test build #30682 has finished for PR 5455 at commit 525c370.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
  • This patch does not change any dependencies.

@mengxr
Copy link
Contributor

mengxr commented Apr 21, 2015

Merged into master. Thanks!

@asfgit asfgit closed this in 45c47fa Apr 21, 2015
@MechCoder MechCoder deleted the spark-6845 branch April 21, 2015 21:39
@MechCoder
Copy link
Contributor Author

@mengxr Thanks! I think we need to support serialization and deserialization for SparseMatrices also?

nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Since sparse matrices now support a isTransposed flag for row major data, DenseMatrices should do the same.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes apache#5455 from MechCoder/spark-6845 and squashes the following commits:

525c370 [MechCoder] minor
004a37f [MechCoder] Cast boolean to int
151f3b6 [MechCoder] [WIP] Add isTransposed to pickle DenseMatrix
cc0b90a [MechCoder] [SPARK-6845] Add isTranposed flag to DenseMatrix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants