Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MAHOUT-1837: Fixed dense bug in drm/package.blockify() #252

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -60,26 +60,22 @@ package object drm {
val keys = data.map(t => t._1).toArray[K]
val vectors = data.map(t => t._2).toArray

// create the block by default as dense.
// would probably be better to sample a subset of these
// vectors first before creating the entire matrix.
// so that we don't have the overhead of creating a full second matrix in
// the case that the matrix is not dense.
val block = new DenseMatrix(vectors.length, blockncol)
var row = 0
while (row < vectors.length) {
block(row, ::) := vectors(row)
row += 1
}
// create the block by default as Sparse.
val block = new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)

// Test the density of the data. If the matrix does not meet the
// requirements for density, convert the Vectors to a sparse Matrix.
// Test the density of the data. If the matrix does meets the
// requirements for density, convert the Vectors to a DenseMatrix.
val resBlock = if (densityAnalysis(block)) {
block
val dBlock = new DenseMatrix(vectors.length, blockncol)
var row = 0
while (row < vectors.length) {
dBlock(row, ::) := vectors(row)
row += 1
}
dBlock
} else {
new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
block
}

Iterator(keys -> resBlock)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default here be a SparseRowMatrix of Random Access Sparse Vectors? Seems so. I.e.
This line should probably read:

new SparseRowMatrix(vectors.length, blockncol, vectors, true, true)

rather than:

new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)

as is, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, i believe sequential should stay as it will be more natural (ordered)
when converted to CSR which hopefuly will be our most common modus operandi
for exponential algorithms.

in fact, random access rarely makes sense at all for block-wise algorithms.

On Fri, Aug 26, 2016 at 8:03 PM, Andrew Palumbo notifications@github.com
wrote:

In spark/src/main/scala/org/apache/mahout/sparkbindings/drm/package.scala
#252 (comment):

     } else {
  •      new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)
    

Should the default here be a SparseRowMatrix of Random Access Sparse
Vectors? Seems so. I.e.
This line should probably read:

new SparseRowMatrix(vectors.length, blockncol, vectors, true, true)

rather than:

new SparseRowMatrix(vectors.length, blockncol, vectors, true, false)

as is, correct?


You are receiving this because you commented.
Reply to this email directly, view it on GitHub
https://github.com/apache/mahout/pull/252/files/5d2f5cc9746f968d0c776f869070dd9a439de9f1#r76509011,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAf7_zKcdpkHa0zDQnNvOFMuXQHNW897ks5qj6j4gaJpZM4JtpfO
.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thx.

})
Expand Down