Skip to content

[SPARK-16566][MLLib] sort sparseVector's indices before doing multiplication#14219

Closed
wilson-lauw wants to merge 10 commits intoapache:masterfrom
wilson-lauw:SPARK-16566
Closed

[SPARK-16566][MLLib] sort sparseVector's indices before doing multiplication#14219
wilson-lauw wants to merge 10 commits intoapache:masterfrom
wilson-lauw:SPARK-16566

Conversation

@wilson-lauw
Copy link

@wilson-lauw wilson-lauw commented Jul 15, 2016

What changes were proposed in this pull request?

https://issues.apache.org/jira/browse/SPARK-16566

sort sparseVector's indices before doing multiplication to make sure the result returned correctly

How was this patch tested?

manual and existing tests

@wilson-lauw wilson-lauw changed the title sort sparseVector's indices before doing multiplication [SPARK-16566][MLLib] sort sparseVector's indices before doing multiplication Jul 15, 2016
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@hhbyyh
Copy link
Contributor

hhbyyh commented Jul 16, 2016

IMO, order of the indices in SparseVector potentially affects many other BLAS operations. Perhaps we should have it in a more general/reusable pattern if we do want to impose the validation.

Yet the most efficient way as I see is still to perform the check during construction, maybe something like an optional validate parameter.

val xValues = x.values
val xIndices = x.indices

def isSorted(l:Array[Int]): Boolean = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many small style issues here. I think this is best as

private def isSorted(array: Array[Int]): Boolean = {
  var index = 1
  while (index < array.length) {
    if (array(index - 1) > array(index)) {
      return false
    }
    index += 1
  }
  true
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, will fix it.

@wilson-lauw
Copy link
Author

@hhbyyh I just checked, the implementation of def dot(x: SparseVector, y: SparseVector) also need to be modified.

@wilson-lauw
Copy link
Author

wilson-lauw commented Jul 17, 2016

Modifications also needed on module mllib-local. Once the changes is okay, I will also apply the changes there.

@jkbradley
Copy link
Member

SparseVector indices are assumed to be sorted already. We should not re-sort them during operations. The deeper issue which needs to be fixed is SPARK-14707
I commented in the JIRA as well.
Could you please close this issue?
Thanks!

@wilson-lauw
Copy link
Author

Closing this.

@wilson-lauw wilson-lauw closed this Sep 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants