Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-14613][ML] Add @Since into the matrix and vector classes in spark-mllib-local #12416

Closed
wants to merge 11 commits into from

Conversation

pravingadakh
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds @since tag into the matrix and vector classes in spark-mllib-local.

How was this patch tested?

Scala-style checks passed.

@@ -759,6 +769,8 @@ object SparseMatrix {

/**
* Factory methods for [[org.apache.spark.ml.linalg.Matrix]].
*
* @since 2.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These need to be annotations not scaladoc, like @Since("2.0.0")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Jira description says that these can not be annotations:

In spark-mllib-local, we're no longer to be able to use @Since annotation. As a result, we will switch to standard java doc style using /* @Since /*. This task will add those new APIs as @Since 2.0

Not sure of reasoning behind it though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, because that's actually an annotation in Spark Core and it's not available. My mistake. For now fair enough then. But do write 2.0.0

@dbtsai
Copy link
Member

dbtsai commented Apr 15, 2016

ok to test

@dbtsai
Copy link
Member

dbtsai commented Apr 15, 2016

All the public methods need to have Since for the doc. See the following for reference. Thanks.

https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/Vectors.scala

@dbtsai
Copy link
Member

dbtsai commented Apr 15, 2016

@srowen I think using scala annotation is easier, and I like it more. What do you think that we have a copy of @Since annotation in mllib-local jar? Thanks.

@SparkQA
Copy link

SparkQA commented Apr 15, 2016

Test build #55938 has finished for PR 12416 at commit 4f4f789.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@holdenk
Copy link
Contributor

holdenk commented Apr 15, 2016

I think either moving the annotation to a common base (like sketch) or copying it would be a good idea.

@srowen
Copy link
Member

srowen commented Apr 15, 2016

Yeah, really it doesn't belong there though. It belongs in some super-core project behind all of the projects behind core. For one class, maybe not worth it; if there are many others, maybe so.

@holdenk
Copy link
Contributor

holdenk commented Apr 15, 2016

@srowen Maybe not sketch, we have some testing annotations in common/tags - we could extend that beyond just testing tags to include the documentation tags we want to access?

@MLnick
Copy link
Contributor

MLnick commented Apr 18, 2016

common/tags has a compile-time dependency on scalatest, will that not bring in that dep if used in mllib-local, or core?

@srowen
Copy link
Member

srowen commented Apr 18, 2016

If common/tags is supposed to be an annotation module, that's a good place. However, the current annotations under src/main are all actually test-related. Ideally these would move under src/test and dependencies would be updated to properly reference test scope everywhere. Then this could be part of src/main

@dbtsai
Copy link
Member

dbtsai commented Apr 18, 2016

I like @srowen 's idea. Having the shared annotation in common/tags, and move the current ones under src/test. @pravingadakh, can you update this PR? Thanks.

@holdenk
Copy link
Contributor

holdenk commented Apr 18, 2016

@pravingadakh let me know if you need any help :)

@pravingadakh
Copy link
Contributor Author

@dbtsai I'll update the PR accordingly. I just have one doubt before making any further changes, do you want to make a copy of @Since under common/tags or should I move @Since annotation from core to common/tags? @holdenk I'll definitely contact you if I am stuck anywhere :)

@srowen
Copy link
Member

srowen commented Apr 19, 2016

You would move the annotation. There's no sense in having two copies. Tagging @rxin since he made the tags module I believe.

@rxin
Copy link
Contributor

rxin commented Apr 19, 2016

yea just move it.

@pravingadakh
Copy link
Contributor Author

@dbtsai @srowen Please validate the changes.

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56373 has finished for PR 12416 at commit e617dd9.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • class DenseMatrix @Since(\"2.0.0\") (
    • class SparseMatrix @Since(\"2.0.0\") (
    • class DenseVector @Since(\"2.0.0\") (@Since(\"2.0.0\") val values: Array[Double]) extends Vector
    • class SparseVector @Since(\"2.0.0\") (

<goals><goal>add-source</goal></goals>
<configuration>
<sources>
<source>src/main/java</source>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually required? the scala compiler plugin should already know all of these source locations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only required by src/test/java/, but thought I should include all source directories.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What requires it? I wonder if this isn't actually a problem that needs to be solved by the classifier declarations below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We moved existing test annotations under common/tags from src/main package to src/test package, thus those annotation imports will not work as is, since by default src/test will be treated as test sources directory unless we explicitly mark it as sources directory.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plugin already knows about src/test though. I think your problem is something else, like what's below or this: you have a problem with the new directory structure. They're now under a directory called "org.apache.spark.tags" instead of nested dirs "org/apache/spark/tags"

@SparkQA
Copy link

SparkQA commented Apr 20, 2016

Test build #56378 has finished for PR 12416 at commit 05b755b.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Apr 22, 2016

Ping @pravingadakh for update. Thanks.

@pravingadakh
Copy link
Contributor Author

@dbtsai I'll update the PR soon, I have been overwhelmed by office work :(

@SparkQA
Copy link

SparkQA commented Apr 23, 2016

Test build #56795 has finished for PR 12416 at commit 714afac.

  • This patch fails MiMa tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 23, 2016

Test build #56796 has finished for PR 12416 at commit 1da5e91.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

val values: Array[Double],
override val isTransposed: Boolean) extends Matrix {
@Since("2.0.0")
class DenseMatrix @Since("2.0.0") (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be overkill, since the class is marked "since 2.0.0". Everything in it is implicitly "since 2.0.0" as well -- or later. But right now there's no later version.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen Yeah it does look like a overkill, I was just referring this class.

@srowen
Copy link
Member

srowen commented Apr 23, 2016

This looks correct to my eyes. The Since annotation is in place as a non-test class; the existing test classes are moved into the test source root. The parent pom now declares available dependencies on the test or non-test artifacts from this module. Everything refers to the new module, and most dependencies are non-test (which implicitly includes the test artifacts) because they need @Since directly. I agree with renaming the module too. This LGTM as far as I know, except possibly the question of whether the class constructors are over-annotated (CC @mengxr). I wouldn't mind another look from @dbtsai or @vanzin or @JoshRosen.

@srowen
Copy link
Member

srowen commented Apr 25, 2016

@pravingadakh sorry this needs a rebase now

@pravingadakh
Copy link
Contributor Author

@srowen Done rebasing.

@vanzin
Copy link
Contributor

vanzin commented Apr 25, 2016

I was going to ask why do we need an annotation for this (instead of javadoc's @since), but since that's already used in other places in the code, it's kind of a moot point.

@SparkQA
Copy link

SparkQA commented Apr 28, 2016

Test build #57273 has finished for PR 12416 at commit f81616e.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Apr 28, 2016

Test build #57276 has finished for PR 12416 at commit 8b57dc0.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dbtsai
Copy link
Member

dbtsai commented Apr 28, 2016

Thanks. Merged into master.

@asfgit asfgit closed this in dae538a Apr 28, 2016
@yhuai
Copy link
Contributor

yhuai commented Apr 29, 2016

@yhuai
Copy link
Contributor

yhuai commented Apr 29, 2016

Sorry. I am going to revert it. I believe it breaks the build. Seems those build changes are not related to adding Since tag.

@yhuai
Copy link
Contributor

yhuai commented Apr 29, 2016

I have reverted this commit.

@srowen
Copy link
Member

srowen commented Apr 29, 2016

Hm, is the issue that the SBT build won't see these classes now that they're test-scope? they are properly test scope though. If really necessary we could, I suppose, bring them back into compile scope, but that would be unfortunate. Does anyone know enough SBT to see how to make it see the test classes?

@yhuai
Copy link
Contributor

yhuai commented Apr 29, 2016

Seems we are moving ExtendedYarnTest to src/test, but we are not making others depend on the test jar? Also, can we separate the build change with adding Since tag to those ML methods? I am not sure they should be done in a single PR.

@yhuai
Copy link
Contributor

yhuai commented Apr 29, 2016

In sql/hive's pom, we have

<dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_${scala.binary.version}</artifactId>
      <type>test-jar</type>
      <version>${project.version}</version>
      <scope>test</scope>
    </dependency>

I am wondering if we have to specify <type>test-jar</type>?

@srowen
Copy link
Member

srowen commented Apr 29, 2016

Yeah these are separable. Really, adding the annotation requires the build change, and the latter grew to dominate the change.

Hm, I had suggested using <classifier>tests</classifier> to depend on test artifacts. I wonder if that's not the standard way to do it? it's how I've expressed this in the past. But I also see examples of <type>test-jar</type> @vanzin you probably know this better than I. However I don't know if that's the issue here.

This particular failure is turning up from lots of tests, that are in modules with nothing to do with the test tags. Was the last change to declare all these 'tags' dependencies the problem? most modules have nothing to do with it. They're just annotations... they don't matter to any logic.

@vanzin
Copy link
Contributor

vanzin commented Apr 29, 2016

Seems we are moving ExtendedYarnTest to src/test, but we are not making others depend on the test jar?

Haven't looked in detail (and missed this in my not-so-careful read of the original patch), but that could be a problem, yes. The test tags need to be in every project's test classpath.

Regarding type vs. classifier, I believe classifier is the new recommendation, although both generally work the same for tests.

@pravingadakh
Copy link
Contributor Author

@srowen I undid the changes done in dev/sparktestsupport/modules.py, ran the sbt test build locally with build/sbt -Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test command, all tests passed except one from graphx/test. The graphx test failed is

[info] - Label Propagation *** FAILED *** (5 seconds, 618 milliseconds)
[info] org.apache.spark.SparkException: Job aborted due to stage failure: Failed to serialize task 65, not attempting to retry it. Exception during serialization: java.lang.UnsupportedOperationException: Accumulator must be registered before send to executor
[info] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1448)

which I believe has nothing to do with python module changes (maybe I do not have updated code in my branch). So yes, I think build failure is due to changes (related to tags) done in dev/sparktestsupport/modules.py.

@dbtsai
Copy link
Member

dbtsai commented Apr 29, 2016

@pravingadakh The changes in modules.py is just helping Jenkins to understand the dependencies, and I don't get it why this will break the build.

Is there any way to reproduce it locally? Why the original Jenkins build worked?

@@ -66,7 +66,7 @@
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-test-tags_${scala.binary.version}</artifactId>
<artifactId>spark-tags_${scala.binary.version}</artifactId>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically, what you have to do is: everywhere you're changing this, you need to also add <classifier>tests</classifier>. In modules where both the test and non-test tags are used, you need both dependencies (with and without classifier).

It might be easier to just leave the test tags in the main source base...

@pravingadakh
Copy link
Contributor Author

@dbtsai @srowen My mistake, apparently changes in modules.py does not cause build failure. And I was unable to reproduce the locally (even after rebasing with latest master). Although I see difference in sbt test build command ran in both the builds.

Passed build sbt command:

[info] Running Spark tests using SBT with these arguments: -Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver test

Failed build sbt command:

[info] Running Spark tests using SBT with these arguments: -Pyarn -Phadoop-2.3 -Phive -Pkinesis-asl -Phive-thriftserver -Dtest.exclude.tags=org.apache.spark.tags.ExtendedHiveTest,org.apache.spark.tags.ExtendedYarnTest test

But both command works (weird) for me locally.

@srowen
Copy link
Member

srowen commented Apr 30, 2016

OK, let's try this PR again but only with the tag changes first. (You're even welcome to make a separate JIRA that blocks this one, since that much is a fairly separable change.)

I vote for just leaving these tags in src/main then. Although they're logically test-only, they're just annotations, so can plausibly stay there I guess. I think we still want to rename the module, ideally. It's just that we won't make the MLlib changes and won't move the annotations this time, and so won't need any classifier or type changes in Maven.

Thanks @pravingadakh , my suggestion ended up being the wrong one here.

@pravingadakh
Copy link
Contributor Author

OK, let's try this PR again but only with the tag changes first.

@srowen I'm little confused here, could you please elaborate a bit here? What tag changes specifically?
Thanks.

@srowen
Copy link
Member

srowen commented May 2, 2016

Meaning, let's leave aside the actual original change in the PR, which is to annotate some methods as @Since a release. Let's make the structural change to move the annotation first and rename the module.

@pravingadakh
Copy link
Contributor Author

pravingadakh commented May 2, 2016

Aah ok, got it thanks. I'll create new jira for that and make that a blocker for this one.

@pravingadakh pravingadakh deleted the SPARK-14613 branch May 2, 2016 18:37
@srowen
Copy link
Member

srowen commented May 6, 2016

@pravingadakh go ahead - would be good to get this in for 2.0

@srowen
Copy link
Member

srowen commented May 11, 2016

@pravingadakh if you're busy, I may resubmit this change on your behalf to try to get it in for 2.0

@pravingadakh
Copy link
Contributor Author

@srowen Yes, please go ahead. Thanks.

@srowen
Copy link
Member

srowen commented May 12, 2016

See #13074 for the module change. I'm also moving other common annotations

asfgit pushed a commit that referenced this pull request May 17, 2016
…nto spark-tags

## What changes were proposed in this pull request?

(See #12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.)

Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags`

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #13074 from srowen/SPARK-15290.

(cherry picked from commit 122302c)
Signed-off-by: Sean Owen <sowen@cloudera.com>
asfgit pushed a commit that referenced this pull request May 17, 2016
…nto spark-tags

## What changes were proposed in this pull request?

(See #12416 where most of this was already reviewed and committed; this is just the module structure and move part. This change does not move the annotations into test scope, which was the apparently problem last time.)

Rename `spark-test-tags` -> `spark-tags`; move common annotations like `Since` to `spark-tags`

## How was this patch tested?

Jenkins tests.

Author: Sean Owen <sowen@cloudera.com>

Closes #13074 from srowen/SPARK-15290.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants