Skip to content

Commit

Permalink
[SPARK-8823] [MLLIB] [PYSPARK] Optimizations for SparseVector dot pro…
Browse files Browse the repository at this point in the history
…ducts

Follow up for #5946

Currently we iterate over indices and values in SparseVector and can be vectorized.

Author: MechCoder <manojkumarsivaraj334@gmail.com>

Closes #7222 from MechCoder/sparse_optim and squashes the following commits:

dcb51d3 [MechCoder] [SPARK-8823] [MLlib] [PySpark] Optimizations for SparseVector dot product
  • Loading branch information
MechCoder authored and mengxr committed Jul 7, 2015
1 parent 1dbc4a1 commit 738c107
Showing 1 changed file with 8 additions and 12 deletions.
20 changes: 8 additions & 12 deletions python/pyspark/mllib/linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -590,18 +590,14 @@ def dot(self, other):
return np.dot(other.array[self.indices], self.values)

elif isinstance(other, SparseVector):
result = 0.0
i, j = 0, 0
while i < len(self.indices) and j < len(other.indices):
if self.indices[i] == other.indices[j]:
result += self.values[i] * other.values[j]
i += 1
j += 1
elif self.indices[i] < other.indices[j]:
i += 1
else:
j += 1
return result
# Find out common indices.
self_cmind = np.in1d(self.indices, other.indices, assume_unique=True)
self_values = self.values[self_cmind]
if self_values.size == 0:
return 0.0
else:
other_cmind = np.in1d(other.indices, self.indices, assume_unique=True)
return np.dot(self_values, other.values[other_cmind])

else:
return self.dot(_convert_to_vector(other))
Expand Down

0 comments on commit 738c107

Please sign in to comment.