Skip to content

MAHOUT-1691: iterable of vectors to matrix#138

Closed
alexeygrigorev wants to merge 1 commit intoapache:masterfrom
alexeygrigorev:it2vec
Closed

MAHOUT-1691: iterable of vectors to matrix#138
alexeygrigorev wants to merge 1 commit intoapache:masterfrom
alexeygrigorev:it2vec

Conversation

@alexeygrigorev
Copy link
Contributor

Some syntactic sugar for writing

val res = drmX.mapBlock(drmX.ncol) {
  case (keys, block) => {
    keys -> block.map(row => (row - mean) / std)
  }
}

Instead of writing

val res = drmX.mapBlock(drmX.ncol) {
  case (keys, block) => {
    val copy = block.like
    copy := block.map(row => (row - mean) / std)
    (keys, copy)
  }
}

When having side effects is not desirable

@dlyubimov
Copy link
Contributor

Alexey, there are a few problems here.

I believe much more computationally efficient form to do this as it stands
block.cloned := {(r,c,v) => v- mean(c) / std(c) }

(1) Creation + assignment is much slower
(2) Functional assignments take into account matrix structure and avoid inefficient iteration directions. e.g. if block is really column-wise sparse matrix consisting of sparse sequential columns, this iteration is 10...100x slower than it needs to be (as demonstrated by #135).
(3) This syntax already exist in form of dense() or sparse() (if you want to assemble a matrix from collection of vector rows).
(4) Finally, this code is most likely missing your intent because row slices are coming from iterator in order which is not guaranteed. I.e. iterator() may be returning first row number 20, then 5, then 31 etc. You assemble it back in order of iteration which is probably not what you want. Note that iterators return MatrixSlice, not just a vector, and the slice has index() method which indicates its true row ordinal.

@alexeygrigorev
Copy link
Contributor Author

Thanks for the feedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants