Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-7450 Use UNSAFE.getLong() to speed up BitSetMethods#anySet() #5897

Closed
wants to merge 11 commits into from
Closed

Conversation

tedyu
Copy link
Contributor

@tedyu tedyu commented May 4, 2015

@JoshRosen
Please take a look

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@JoshRosen
Copy link
Contributor

Jenkins, add to whitelist and test this please.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31800 has started for PR 5897 at commit 4ca0ef6.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31800 has finished for PR 5897 at commit 4ca0ef6.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31800/
Test FAILed.

@@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long baseOffset, int index) {
* Returns {@code true} if any bit is set.
*/
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInBytes) {
for (int i = 0; i <= bitSetWidthInBytes; i++) {
long widthInLong = bitSetWidthInBytes / SIZE_OF_LONG;
for (long i = 0; i <= widthInLong; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually we should use int instead of long for i. the reason is that JIT would inject a safepoint for loops over longs which prevents loop unrolling and incurs an extra check for every iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed the above comment.

@AmplabJenkins
Copy link

Merged build triggered.

@@ -71,7 +72,13 @@ public static boolean isSet(Object baseObject, long baseOffset, int index) {
* Returns {@code true} if any bit is set.
*/
public static boolean anySet(Object baseObject, long baseOffset, long bitSetWidthInBytes) {
for (int i = 0; i <= bitSetWidthInBytes; i++) {
long widthInLong = bitSetWidthInBytes / SIZE_OF_LONG;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

width needs to be an int too :)

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31808 has started for PR 5897 at commit 63ee050.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31808 has finished for PR 5897 at commit 63ee050.

  • This patch fails to build.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31808/
Test FAILed.

return true;
}
}
for (int i = (int)(SIZE_OF_LONG * widthInLong); i < bitSetWidthInBytes; i++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this block now contains two for loops. A bad merge conflict resolution, perhaps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so.
The second loop is for the remaining bytes :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry for overlooking that. I guess I was thinking of optimizing for the case where the bitset width was a multiple of the word size.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case is covered by the first loop.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the 2nd loop? I think the assumption is that all bitsets are word-aligned. We should definitely document that though.

@tedyu
Copy link
Contributor Author

tedyu commented May 5, 2015

From failed test:
bq. [error] oro#oro;2.0.8!oro.jar origin location must be absolute: file:/home/jenkins/.m2/repository/oro/oro/2.0.8/oro-2.0.8.jar

Pretty sure the above is not caused by my change

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31809 has started for PR 5897 at commit 093b7a4.

@SparkQA
Copy link

SparkQA commented May 5, 2015

Test build #31809 has finished for PR 5897 at commit 093b7a4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • abstract class ShuffleHandle(val shuffleId: Int) extends Serializable
    • * class SomethingNotSerializable
    • logDebug(s" + cloning the object $obj of class $
    • abstract class Evaluator extends Params
    • abstract class PipelineStage extends Params with Logging
    • class BinaryClassificationEvaluator extends Evaluator with HasRawPredictionCol with HasLabelCol
    • trait LDAOptimizer
    • class EMLDAOptimizer extends LDAOptimizer
    • class OnlineLDAOptimizer extends LDAOptimizer
    • class SaslEncryption
    • static class EncryptedMessage extends AbstractReferenceCounted implements FileRegion
    • class SaslRpcHandler extends RpcHandler
    • public class SaslServerBootstrap implements TransportServerBootstrap
    • public class SparkSaslClient implements SaslEncryptionBackend
    • public class SparkSaslServer implements SaslEncryptionBackend
    • public class ByteArrayWritableChannel implements WritableByteChannel
    • class ParamGridBuilder(object):
    • abstract class Dialect
    • class DialectException(msg: String, cause: Throwable) extends Exception(msg, cause)
    • case class HiveDatabase(
    • abstract class TableType
    • case class HiveStorageDescriptor(
    • case class HivePartition(
    • case class HiveColumn(name: String, hiveType: String, comment: String)
    • case class HiveTable(
    • trait ClientInterface
    • class ClientWrapper(
    • class IsolatedClientLoader(
    • protected trait ReflectionMagic
    • protected implicit class InstanceMagic(a: Any)
    • protected implicit class StaticMagic(c: Class[_])

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31809/
Test FAILed.

@tedyu
Copy link
Contributor Author

tedyu commented May 5, 2015

From https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/31809/testReport/junit/org.apache.spark.deploy/SparkSubmitSuite/includes_jars_passed_in_through___jars/ :

sbt.ForkMain$ForkError: The code passed to failAfter did not complete within 60 seconds.
at org.scalatest.concurrent.Timeouts$$anonfun$failAfter$1.apply(Timeouts.scala:249)
at org.scalatest.concurrent.Timeouts$$anonfun$failAfter$1.apply(Timeouts.scala:249)

I think the above was not related to my change.

bq. This patch adds the following public classes (experimental)

This change doesn't add any public class.

@rxin
Copy link
Contributor

rxin commented May 5, 2015

Jenkins, retest this please.

@SparkQA
Copy link

SparkQA commented May 6, 2015

Test build #32026 has finished for PR 5897 at commit 1719c5b.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class Model(Transformer):
    • class PipelineModel(Model):
    • class CrossValidator(Estimator):
    • class CrossValidatorModel(Model):
    • class JavaModel(Model, JavaTransformer):

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@AmplabJenkins
Copy link

Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32026/
Test FAILed.

@tedyu
Copy link
Contributor Author

tedyu commented May 6, 2015

[error] /home/jenkins/workspace/SparkPullRequestBuilder@2/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala:769: not found: type HiveCompatibilitySuite
[error] extends HiveCompatibilitySuite with BeforeAndAfter {
[error] ^
[error] /home/jenkins/workspace/SparkPullRequestBuilder@2/sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveWindowFunctionQuerySuite.scala:828: value testCases is not a member of org.scalatest.BeforeAndAfter
[error] override def testCases: Seq[(String, File)] = super.testCases.filter {
[error] ^

I don't think the above error was related to my change.
BitSetSuite passed:

[info] Test org.apache.spark.unsafe.bitset.BitSetSuite.basicOps started
[info] Test org.apache.spark.unsafe.bitset.BitSetSuite.traversal started
[info] Test run finished: 0 failed, 0 ignored, 2 total, 0.012s

@SparkQA
Copy link

SparkQA commented May 6, 2015

Test build #771 has started for PR 5897 at commit 1719c5b.

* @return whether any bit in the BitSet is set
*/
public boolean anySet() {
return BitSetMethods.anySet(baseObject, baseOffset, numWords*BitSetMethods.WORD_SIZE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in an earlier comment, we can assume that BitSetMethods.anySet will be called with a size that's word aligned, so for consistency with nextSetBit I think we should change anySet to accept a size that's measured in numbers of words. This would let us avoid the mod calculations or multiplications by word size.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented May 6, 2015

Test build #32037 has started for PR 5897 at commit 473bf9d.

@SparkQA
Copy link

SparkQA commented May 6, 2015

Test build #771 has finished for PR 5897 at commit 1719c5b.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented May 7, 2015

Test build #32037 has finished for PR 5897 at commit 473bf9d.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@AmplabJenkins
Copy link

Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/32037/
Test PASSed.

@andrewor14
Copy link
Contributor

@tedyu also please file a JIRA that tracks this improvement and put it in the title. See how other PRs are opened.

@tedyu tedyu changed the title Use UNSAFE.getLong() to speed up BitSetMethods#anySet() SPARK-7450 Use UNSAFE.getLong() to speed up BitSetMethods#anySet() May 7, 2015
@tedyu
Copy link
Contributor Author

tedyu commented May 7, 2015

SPARK-7450 has been filed

@JoshRosen
Copy link
Contributor

This looks fine; I have two minor nits because I'm really pedantic, but I'll just deal with it on merge.

@@ -28,7 +28,7 @@
*/
public final class BitSetMethods {

private static final long WORD_SIZE = 8;
static final long WORD_SIZE = 8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could remain private.

@tedyu
Copy link
Contributor Author

tedyu commented May 7, 2015

bq. I'm really pedantic

I like that :-)

asfgit pushed a commit that referenced this pull request May 7, 2015
Author: tedyu <yuzhihong@gmail.com>

Closes #5897 from tedyu/master and squashes the following commits:

473bf9d [tedyu] Address Josh's review comments
1719c5b [tedyu] Correct upper bound in for loop
b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet()
83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark
817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG
75a467b [tedyu] Correct offset for UNSAFE.getLong()
855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned
093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()

(cherry picked from commit 88063c6)
Signed-off-by: Josh Rosen <joshrosen@databricks.com>
@asfgit asfgit closed this in 88063c6 May 7, 2015
@JoshRosen
Copy link
Contributor

Merged to master and branch-1.4 (1.4.0). Thanks!

jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request May 28, 2015
Author: tedyu <yuzhihong@gmail.com>

Closes apache#5897 from tedyu/master and squashes the following commits:

473bf9d [tedyu] Address Josh's review comments
1719c5b [tedyu] Correct upper bound in for loop
b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet()
83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark
817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG
75a467b [tedyu] Correct offset for UNSAFE.getLong()
855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned
093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
jeanlyn pushed a commit to jeanlyn/spark that referenced this pull request Jun 12, 2015
Author: tedyu <yuzhihong@gmail.com>

Closes apache#5897 from tedyu/master and squashes the following commits:

473bf9d [tedyu] Address Josh's review comments
1719c5b [tedyu] Correct upper bound in for loop
b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet()
83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark
817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG
75a467b [tedyu] Correct offset for UNSAFE.getLong()
855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned
093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
nemccarthy pushed a commit to nemccarthy/spark that referenced this pull request Jun 19, 2015
Author: tedyu <yuzhihong@gmail.com>

Closes apache#5897 from tedyu/master and squashes the following commits:

473bf9d [tedyu] Address Josh's review comments
1719c5b [tedyu] Correct upper bound in for loop
b51dcaf [tedyu] Add unit test in BitSetSuite for BitSet#anySet()
83f9f87 [tedyu] Merge branch 'master' of github.com:apache/spark
817e3f9 [tedyu] Replace constant 8 with SIZE_OF_LONG
75a467b [tedyu] Correct offset for UNSAFE.getLong()
855374b [tedyu] Remove second loop since bitSetWidthInBytes is WORD aligned
093b7a4 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
63ee050 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
4ca0ef6 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
3e9b691 [tedyu] Use UNSAFE.getLong() to speed up BitSetMethods#anySet()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
8 participants