Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARK-1556. jets3t dep doesn't update properly with newer Hadoop versions #629

Closed
wants to merge 5 commits into from

Conversation

srowen
Copy link
Member

@srowen srowen commented May 4, 2014

See related discussion at #468

This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does not remove yarn.version.

  • Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
  • Removes hadoop.major.version
  • Introduce hadoop-2.2 and hadoop-2.3 profiles to control Hadoop-specific changes:
    • like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
    • like the jets3t version issue now
  • Hadoop profiles set an appropriate default hadoop.version, that can be overridden
  • (YARN profiles in the parent now only exist to add the sub-module)
  • Fixes the jets3t dependency issue
    • and makes it a runtime dependency
    • and centralizes config of this guy in the parent pom
  • Updates build docs
  • Updates SBT build too
    • and fixes a regex problem along the way

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14641/

<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
</exclusion>
</exclusions>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only moved up to the parent where similar config resides.

@witgo
Copy link
Contributor

witgo commented May 4, 2014

Looks good to me.

$ mvn -Pyarn-alpha -Phadoop-0.23 -Dhadoop.version=0.23.7 -DskipTests clean package

# Apache Hadoop 2.2.X
$ mvn -Pyarn -Phadoop-2.2 -DskipTests clean package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to always ask people to specify hadoop.version and then just explain that in some cases they need to add a profile to work around problems in the hadoop dependency graph. Otherwise sometimes we are relying on the profile to set hadoop.version and it could be a bit confusing to users what is going on.

I.e. for this example do:

mvn -Pyarn -Dhadoop.version=2.2.X -Phadoop-2.2  -DskipTests clean package

The header here says "Apache Hadoop 2.2.X" but the actual example can't be directly generalized to 2.2.X (as it stands now) without them digging around the build more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this work for Hadoop 2.2.0? I can fix the example. It should be matter of setting the version and profile, unless there is something deeper. In which case maybe the build needs to change in the hadoop-2.2 profile?

@pwendell
Copy link
Contributor

pwendell commented May 5, 2014

Thanks @srowen! I tested this and it seems to work well (tests below).

Comments are inline. My main thoughts were:

  1. It might be nicer to ask people to always set hadoop.version, it's just a bit more explicit and IMO less confusing. Of course, advanced users can rely on the profile to set the version.
  2. It could be nice to make distinct profiles for 2.3 and 2.4... then it will be clear to users if they want to build against 2.5+ they are in uncharted territory, since we can't officially support those builds until they come out.
mvn -Pyarn -Phadoop-2.3 -DskipTests clean package
./bin/spark-shell
scala> sc.textFile("s3n://big-data-benchmark/pavlo/text/tiny/rankings/").count
res1: Long = 1200

@srowen
Copy link
Member Author

srowen commented May 5, 2014

OK, but leave default hadoop.version=1.0.4? OK. I can make a hadoop-2.4 profile if you're not concerned about having the extra stanza to maintain.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

$ mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package

# Apache Hadoop 2.4.X
$ mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.4.0 -DskipTests clean package
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be -Phadoop-2.4

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14661/

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14662/

@pwendell
Copy link
Contributor

pwendell commented May 5, 2014

@srowen looks good to me! Is there anything else you plan to add to this or is it ready to go?

@srowen
Copy link
Member Author

srowen commented May 5, 2014

All set here. I think it does fix the issue and make future, similar changes easier to apply in maven, so I'm for it.

asfgit pushed a commit that referenced this pull request May 5, 2014
…ions

See related discussion at #468

This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`.

- Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
- Removes `hadoop.major.version`
- Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes:
  - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
  - like the jets3t version issue now
- Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden
- _(YARN profiles in the parent now only exist to add the sub-module)_
- Fixes the jets3t dependency issue
 - and makes it a runtime dependency
 - and centralizes config of this guy in the parent pom
- Updates build docs
- Updates SBT build too
  - and fixes a regex problem along the way

Author: Sean Owen <sowen@cloudera.com>

Closes #629 from srowen/SPARK-1556 and squashes the following commits:

c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc
a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles
274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config
bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build)
f21f356 [Sean Owen] Build changes to set up for jets3t fix
(cherry picked from commit 73b0cbc)

Signed-off-by: Patrick Wendell <pwendell@gmail.com>
@asfgit asfgit closed this in 73b0cbc May 5, 2014
@srowen srowen deleted the SPARK-1556 branch May 6, 2014 06:41
@darose
Copy link

darose commented May 8, 2014

I tried building and running this code, but it failed with:

14/05/08 20:09:11 ERROR Remoting: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = -6451051318873184044, local class serialVersionUID = 5837456792360$
1411
java.io.InvalidClassException: org.apache.spark.deploy.ApplicationDescription; local class incompatible: stream classdesc serialVersionUID = -6451051318873184044, local class serialVersionUID = 5837456792360714$
1
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
at akka.serialization.JavaSerializer$$anonfun$1.apply(Serializer.scala:136)
...

  • I grabbed the source zip from https://github.com/apache/spark/tree/73b0cbcc241cca3d318ff74340e80b02f884acbd
  • I built it with "SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt assembly" followed by "SPARK_HADOOP_VERSION=2.3.0-cdh5.0.0 SPARK_YARN=true sbt/sbt package"
  • I grabbed shark 0.9.1 source
  • I updated the build script to use jets3t 0.9.0
  • I built shark with "SHARK_HADOOP_VERSION=2.3.0-cdh5.0.0 ./sbt/sbt assembly" and "SHARK_HADOOP_VERSION=2.3.0-cdh5.0.0 ./sbt/sbt package"
  • I deployed both spark and shark to a client, a master, and a worker machine

Master and worker can talk to each other, worker can register with master. I can launch a shark shell on the client, and do basic things like "show tables". But when I try a simple SELECT query, I get the above error.

Any idea what I might be doing wrong?

@srowen
Copy link
Member Author

srowen commented May 8, 2014

So this means that the Java-serialized format of a Spark class has changed (or at least, it may have; the serialVersionUID changed) between the versions of two Spark components that you are using. I doubt it is related to this PR per se, but is related to having two versions of Spark in play -- what you just compiled, and whatever else may be on your machines? I see you distributed the updated code to some machines, but maybe not all, or, are you sure the running processes are all the same new version? At the least, this error suggests it isn't.

If so the fix (which is a separate issue) is to really run the same version of everything. I suppose it may be worth looking into whether the serialized form of this class really changed or not and whether management of its serialVersionUID might resolve what is really not a problem, but that's less good than matching versions everywhere, which should ensure this can't happen.

@darose
Copy link

darose commented May 9, 2014

I tried downloading the latest from the master branch of both spark and shark, and built and ran them (in the hopes of getting past that "class incompatible" issue) but no luck there either. Now it's dying on "Exception in thread "main" java.lang.NoClassDefFoundError: org/json4s/JsonAST$JValue"

I'm really at wits end here - have spent days on trying to get this to run. Is there a matched set of spark & shark binaries available somewhere that can run on CDH5? (I.e., that includes this jets3t 0.9.0 fix.) Or if not, could someone provide instructions on how I might build them?

I'm going to have to fall back to slow-as-molasses Hive if I can't find a way through this.

@srowen
Copy link
Member Author

srowen commented May 9, 2014

json4s is a direct dependency of Spark core. It's not related to the changes here. It sounds like something is running but being pointed to an artifact that does not contain dependencies, and they are not otherwise on the classpath.

I am not sure how you're modifying/running this, but I suspect you continue to collide with existing 0.9.0 binaries and environment variables on your cluster. Building matching binaries is the easy part. It's deploying it manually that may be difficult. It should just be a matter of scraping out and replacing the stuff you find in /opt/cloudera/.../spark, but you may have to be sure to replace many things, not just one assembly jar. You'd have to restart the services too. And this has to happen on all workers.

Longer-term of course an updated Spark will be deployed with CDH updates anyway.

If you just want to move off Hive, there's another answer for that, but that's a different question.

@darose
Copy link

darose commented May 9, 2014

Thanks for the suggestions. I don't think this is a deployment issue though. I don't have any spark/shark remnants installed from packages on the client machine. (I don't even have an /opt/cloudera directory - my Cloudera packages seem to get installed under /usr/lib/hadoop, /usr/lib/hive, etc.). Rather, I was manually deploying the binaries I built to /usr/lib/spark and /usr/lib/shark, and I've been completely removing those directory trees each time I do a new build.

And similarly on the Hadoop cluster machines: These are Amazon EC2 AMI's that I'm building, off of a fresh pristine Ubuntu 13.10 base, so there's no spark/shark remnants present before I start.

So I'm fairly certain this is an issue of me building incorrectly. I think what I did to build was:

  • grab the current master branch of spark & shark
  • update SparkBuild.scala to downgrade the version from 1.0-SNAPSHOT to 0.9.1
  • update SharkBuild.scala to use jets3t 0.9.0
  • build both with sbt assembly and sbt package
  • when done, copy the versions of spark-core_2.10-0.9.1.jar, spark-bagel_2.10-0.9.1.jar, spark-mllib_2.10-0.9.1.jar, and spark-repl_2.10-0.9.1.jar that were generated during the spark build and use them to replace the corresponding jars in lib_managed in the shark build. (My thinking here was that the shark build was pulling those jars from maven, and that perhaps the "class incompatible" issue was being caused by spark and shark using different versions of those jars.)

But result was the json4s issue I posted above.

In any case, as this is a build/deployment issue, probably best for me to take this off GitHub. Would be very grateful, though, if you might be able to assist in getting a working spark/shark build for us. Our company is a Cloudera support customer, so I'll try following up through those channels. If you don't mind, I'll suggest that the support team get in touch with you about this, as you're obviously the most well-versed on the issue.

@srowen
Copy link
Member Author

srowen commented May 9, 2014

Although there's now a 'fix' in, it is only going to do automatically what you are doing by hand.
My hunch is that perhaps the script, or env variable, already contains a classpath with "0.9.0" jars in it, which means that when they're replaced with differently-named jars, suddenly it can't find basic classes. Just a guess.

(Not sure what support will do with it given that this is into unsupported territory. You can ask. There's a reasonable question here about understanding the deployment. They'll reach out to the right people as needed.)

pdeyhim pushed a commit to pdeyhim/spark-1 that referenced this pull request Jun 25, 2014
…ions

See related discussion at apache#468

This PR may still overstep what you have in mind, but let me put it on the table to start. Besides fixing the issue, it has one substantive change, and that is to manage Hadoop-specific things only in Hadoop-related profiles. This does _not_ remove `yarn.version`.

- Moves the YARN and Hadoop profiles together in pom.xml. Sorry that this makes the diff a little hard to grok but the changes are only as follows.
- Removes `hadoop.major.version`
- Introduce `hadoop-2.2` and `hadoop-2.3` profiles to control Hadoop-specific changes:
  - like the protobuf version issue - this was only 'solved' now by enabling YARN for 2.2+, which is really an orthogonal issue
  - like the jets3t version issue now
- Hadoop profiles set an appropriate default `hadoop.version`, that can be overridden
- _(YARN profiles in the parent now only exist to add the sub-module)_
- Fixes the jets3t dependency issue
 - and makes it a runtime dependency
 - and centralizes config of this guy in the parent pom
- Updates build docs
- Updates SBT build too
  - and fixes a regex problem along the way

Author: Sean Owen <sowen@cloudera.com>

Closes apache#629 from srowen/SPARK-1556 and squashes the following commits:

c3fa967 [Sean Owen] Fix hadoop-2.4 profile typo in doc
a2105fd [Sean Owen] Add hadoop-2.4 profile and don't set hadoop.version in profiles
274f4f9 [Sean Owen] Make jets3t a runtime dependency, and bring its exclusion up into parent config
bbed826 [Sean Owen] Use jets3t 0.9.0 for Hadoop 2.3+ (and correct similar regex issue in SBT build)
f21f356 [Sean Owen] Build changes to set up for jets3t fix
gzm55 pushed a commit to MediaV/spark that referenced this pull request Jul 17, 2014
…s of features

There's a step in implicit ALS where the matrix `Yt * Y` is computed. It's computed as the sum of matrices; an f x f matrix is created for each of n user/item rows in a partition. In `ALS.scala:214`:

```
        factors.flatMapValues{ case factorArray =>
          factorArray.map{ vector =>
            val x = new DoubleMatrix(vector)
            x.mmul(x.transpose())
          }
        }.reduceByKeyLocally((a, b) => a.addi(b))
         .values
         .reduce((a, b) => a.addi(b))
```

Completely correct, but there's a subtle but quite large memory problem here. map() is going to create all of these matrices in memory at once, when they don't need to ever all exist at the same time.
For example, if a partition has n = 100000 rows, and f = 200, then this intermediate product requires 32GB of heap. The computation will never work unless you can cough up workers with (more than) that much heap.

Fortunately there's a trivial change that fixes it; just add `.view` in there.

Author: Sean Owen <sowen@cloudera.com>

Closes apache#629 from srowen/ALSMatrixAllocationOptimization and squashes the following commits:

062cda9 [Sean Owen] Update style per review comments
e9a5d63 [Sean Owen] Avoid unnecessary out of memory situation by not simultaneously allocating lots of matrices

(cherry picked from commit c8a4c9b)
Signed-off-by: Reynold Xin <rxin@apache.org>
andrewor14 pushed a commit to andrewor14/spark that referenced this pull request Jan 8, 2015
…s of features

There's a step in implicit ALS where the matrix `Yt * Y` is computed. It's computed as the sum of matrices; an f x f matrix is created for each of n user/item rows in a partition. In `ALS.scala:214`:

```
        factors.flatMapValues{ case factorArray =>
          factorArray.map{ vector =>
            val x = new DoubleMatrix(vector)
            x.mmul(x.transpose())
          }
        }.reduceByKeyLocally((a, b) => a.addi(b))
         .values
         .reduce((a, b) => a.addi(b))
```

Completely correct, but there's a subtle but quite large memory problem here. map() is going to create all of these matrices in memory at once, when they don't need to ever all exist at the same time.
For example, if a partition has n = 100000 rows, and f = 200, then this intermediate product requires 32GB of heap. The computation will never work unless you can cough up workers with (more than) that much heap.

Fortunately there's a trivial change that fixes it; just add `.view` in there.

Author: Sean Owen <sowen@cloudera.com>

Closes apache#629 from srowen/ALSMatrixAllocationOptimization and squashes the following commits:

062cda9 [Sean Owen] Update style per review comments
e9a5d63 [Sean Owen] Avoid unnecessary out of memory situation by not simultaneously allocating lots of matrices

(cherry picked from commit c8a4c9b)
Signed-off-by: Reynold Xin <rxin@apache.org>
bzhaoopenstack pushed a commit to bzhaoopenstack/spark that referenced this pull request Sep 11, 2019
* Update run.yaml

explicitly on gomodule, due to the bug
kubernetes/test-infra#14070

* Update run.yaml
bulldozer-bot bot referenced this pull request in palantir/spark Nov 7, 2019
###### _excavator_ is a bot for automating changes across repositories.

Changes produced by the roomba/latest-baseline check.

# Release Notes
## 0.50.0
[feature] Warn against .parallel() calls on Java streams (#537) 
[fix] Correct prioritisation of versions.props to match nebula logic (#533)


## 0.51.0
- [feature] New 'com.palantir.baseline-reproducibility' plugin (#539) 
- [improvement] `./gradlew idea` deletes redundant ipr files (#550) 
- [fix] ValidateConstantMessage error-prone check is more accurate. (#546)


## 0.51.1
- [fix] Fix cleanup of old idea project files (#559) 
- [fix] Remove stale references to no longer existing puppycrawl checkstyle DTDs (#556)

## 0.52.0
- [improvement] errorprone 2.3.2 -> 2.3.3 (#561) 
- [feature] new plugin: `com.palantir.baseline-exact-dependencies` helps declare necessary and sufficient dependencies (#548)
- [improvement] Split out circle style plugin into generic junit reports plugin #564

## 0.53.0
- [improvement] Disallow javafx imports with checkstyle #569
- [fix] Avoid lambda to allow build caching of checkstyle results #576

## 0.54.0
- [feature] New `com.palantir.baseline-release-compatibility` plugin (#582)

## 0.55.0
[break] Enable running of unique class check on multiple configurations (#583)

## 0.55.1
[fix] checkImplicitDependencies shouldn't count ignored artifacts (#601) 

## 0.55.2
[fix] BaselineReleaseCompatibility up-to-date checking of compile tasks (#605)

## 0.56.0
[feature] Add an errorprone rule GradleCacheableTaskAction that prevents passing a lambda to Task.doFirst or Task.doLast when implementing gradle plugins (#608)

## 0.57.0
* [feature] Error prone rule to replace `Iterables.partition(List, int)` with `Lists.partition(List, int)` (#622)
* [feature] Error prone rule to prefer `Lists` or `Collections2` `transfrom` over `Iterables.transform` (#623)

## 0.58.0
[improvement] make CheckClassUniquenessTask cacheable (#637)
[fix] Add Javac Settings to uncheck "Use compiler from module target JDK when possible" (#629)
[fix] class uniqueness rule must have a config (#638) 

## 0.59.0
[improvement] Spotless to remove blank lines at start of methods (#641) 

## 0.60.0
* [improvement] New PreferBuiltInConcurrentKeySet suggestion (#649) 
* [improvement] Start publishing plugin to the [Gradle plugin portal](https://plugins.gradle.org/plugin/com.palantir.baseline) (#613)

## 0.61.0
- [improvement] Sensible defaults for test tasks (HeapDumpOnOutOfMemory) (#652)

## 0.62.0
* [improvement] Ensure Optional#orElse argument is not method invocation (#655)


## 0.62.1
[fix] Revert "[improvement] Ensure Optional#orElse argument is not method invocation" (#659)

## 0.63.0
[improvement] Support auto-applying error-prone suggested fixes (#660) 

## 0.64.0
* [improvement] Refaster rule compilation (#661)

## 0.64.1
- [improvement] JUnit 5 boilerplate #666

## 0.65.0
[improvement] Error-prone check to help prevent logging AuthHeader and BearerToken (#654)
[fix] fix potential NPE when configuring testing (#669) 
[fix] Fix refaster compilation to support version recommendations (#667)

## 0.66.0
[improvement] Ignore DesignForExtension for ParameterizedTest (#673) 

## 0.66.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | The PreventTokenLogging error-prone check will now correctly handle null use in SLF4J and Safe/Unsafe Arg functions. | palantir/gradle-baseline#674 |


## 1.0.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Add refaster rule to migrate away from optional.orElse(supplier.get()) | palantir/gradle-baseline#679 |
| Fix | Projects can now compile using Java12, because the one errorprone check that breaks (Finally) is now disabled when you use this toolchain. It remains enabled when compiling against earlier JDKs. | palantir/gradle-baseline#681 |


## 1.1.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Ensure that format tasks execute after compilation | palantir/gradle-baseline#688 |


## 1.1.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Auto-fix OptionalOrElseMethodInvocation using `-PerrorProneApply`. | palantir/gradle-baseline#690 |


## 1.2.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Spotless check for disallowing dangling parenthesis. | palantir/gradle-baseline#687 |


## 1.3.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Don't cache test tasks in the build cache by default.<br>It's possible to restore caching by adding `com.palantir.baseline.restore-test-cache = true` to your `gradle.properties`. | palantir/gradle-baseline#694 |


## 1.4.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | No longer cache javaCompile tasks when applying errorprone or refaster checks. | palantir/gradle-baseline#696 |
| Feature | Test helper for refaster checks. | palantir/gradle-baseline#697 |


## 1.5.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Determine whether to use junitPlatform on a per source set basis | palantir/gradle-baseline#701 |
| Feature | OptionalOrElseMethodInvocation now checks for constructor invocations. | palantir/gradle-baseline#702 |


## 1.6.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | The severity of PreferSafeLoggableExceptions and PreferSafeLoggingPreconditions is now WARNING. | palantir/gradle-baseline#704 |
| Fix | OptionalOrElseMethodInvocation now allows method references in orElse. | palantir/gradle-baseline#709 |


## 1.6.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Do not overwrite user provided test configure when using junit5 | palantir/gradle-baseline#712 |


## 1.7.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Baseline can now re-format all your Java files using the Eclipse formatter. This is currently an opt-in preview, try it out by running `./gradlew format -Pcom.palantir.baseline-format.eclipse`. | palantir/gradle-baseline#707 |
| Improvement | Add errorprone check to ensure junit5 tests are not used with junit4 Rule/ClassRule | palantir/gradle-baseline#714 |


## 1.8.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Checkstyle now tolerates empty lambda bodies (e.g. `() -> {}` | palantir/gradle-baseline#715 |


## 1.8.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Correctly set dependency between spotlessApply and baselineUpdateConfig to prevent a race | palantir/gradle-baseline#724 |


## 1.8.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Correctly handle `EnableRuleMigrationSupport` in `JUnit5RuleUsage` errorprone-rule | palantir/gradle-baseline#725 |


## 1.9.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Wrap long parameterized types where necessary | palantir/gradle-baseline#716 |
| Improvement | Allow suppression of the TODO checkstyle check by giving it an ID. Clarify its comment to allow // TODO(username): ... | palantir/gradle-baseline#727 |
| Improvement | IntelliJ GitHub issue navigation | palantir/gradle-baseline#729 |
| Improvement | print out suggestion for module dependencies inclusion in useful format | palantir/gradle-baseline#733 |
| Fix | The `checkImplicitDependencies` task will no longer suggest a fix of the current project. | palantir/gradle-baseline#736, palantir/gradle-baseline#567 |
| Improvement | Implement DangerousCompletableFutureUsage errorprone check | palantir/gradle-baseline#740 |


## 1.10.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Refaster to use `execute` over `submit` when the result is ignored | palantir/gradle-baseline#741 |


## 1.10.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Enable applying refaster rules for repos with -Xlint:deprecation | palantir/gradle-baseline#742 |


## 1.11.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Apply `InputStreamSlowMultibyteRead` error prone check at ERROR severity | palantir/gradle-baseline#749 |


## 1.12.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | The `baseline-idea` plugin now generates configuration more closely aligned with Gradle defaults. | palantir/gradle-baseline#718 |
| Improvement | Apply the suggested fixes for `UnusedMethod` and `UnusedVariable`. | palantir/gradle-baseline#751 |
| Improvement | Refaster `stream.sorted().findFirst()` into `stream.min(Comparator.naturalOrder())` | palantir/gradle-baseline#752 |
| Improvement | Error prone validation that Stream.sort is invoked on comparable streams | palantir/gradle-baseline#753 |
| Improvement | `DangerousStringInternUsage`: Disallow String.intern() invocations | palantir/gradle-baseline#754 |


## 1.12.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Do not apply the suggested fixes for `UnusedMethod` and `UnusedVariable` which automaticall remove code with side effects. | palantir/gradle-baseline#757 |


## 1.13.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Remove errorprone `LogSafePreconditionsConstantMessage` | palantir/gradle-baseline#755 |
| Improvement | Disable errorprone `Slf4jLogsafeArgs` in test code | palantir/gradle-baseline#756 |
| Improvement | error-prone now detects `Duration#getNanos` mistakes and bans URL in equals methods | palantir/gradle-baseline#758 |


## 1.14.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Implement `OptionalOrElseThrowThrows` to prevent throwing from orElseThrow | palantir/gradle-baseline#759 |


## 1.15.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | LogSafePreconditionsMessageFormat disallows slf4j-style format characters | palantir/gradle-baseline#761 |
| Improvement | Error Prone LambdaMethodReference check | palantir/gradle-baseline#763 |


## 1.16.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | baseline-circleci no longer integrates with the (deprecated) FindBugs plugin, as a pre-requisite for Gradle 6.0 compatibility. | palantir/gradle-baseline#766 |


## 1.17.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | The `TypeParameterUnusedInFormals` errorprone check is disabled when compiling on Java 13, to workaround an error-prone bug. | palantir/gradle-baseline#767 |
| Improvement | Publish scm information within POM | palantir/gradle-baseline#769 |


## 1.17.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | LambdaMethodReference avoids suggestions for non-static methods | palantir/gradle-baseline#771 |


## 1.17.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Remove pom only dependencies from analysis in checkUnusedDependencies | palantir/gradle-baseline#773 |


## 1.18.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | When computing unused dependencies, compileOnly and annotationProcessor<br>dependencies are ignored due to false positives as these dependencies<br>will not appear as dependencies in the generated byte-code, but are in<br>fact necessary dependencies to compile a given module. | palantir/gradle-baseline#783 |


## 1.19.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Disable `PreconditionsConstantMessage` on gradle plugins | palantir/gradle-baseline#790 |


## 2.0.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Break | Add gradle 6.0-20190904072820+0000 compatibiltiy. This raises minimum required version of gradle for plugins from this repo to 5.0. | palantir/gradle-baseline#791 |


## 2.1.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Feature | Automatically configure the [Intellij Eclipse format plugin](https://plugins.jetbrains.com/plugin/6546-eclipse-code-formatter) to use the eclipse formatter | palantir/gradle-baseline#794 |


## 2.1.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Stop applying error prone patches for checks that have been turned off. | palantir/gradle-baseline#793 |


## 2.2.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | baseline-circleci now validates that the rootProject.name isn't the CircleCI default (`project`) as can interfere with publishing. | palantir/gradle-baseline#775 |
| Improvement | Remove JGit dependency | palantir/gradle-baseline#798 |


## 2.2.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Don't add whitespace to blank lines inside comments. Fixes apache#799 | palantir/gradle-baseline#800 |
| Fix | Eclipse formatter now aligns multicatch so that it passes checkstyle. | palantir/gradle-baseline#807 |


## 2.2.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | ClassUniquenessPlugin now checks the `runtimeClasspath` configuration by default. | palantir/gradle-baseline#810 |


## 2.3.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | SafeLoggingExceptionMessageFormat disallows `{}` in safelog exception messages | palantir/gradle-baseline#815 |


## 2.4.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | A new `StrictUnusedVariable` check will catch any unused arguments (e.g. AuthHeaders) to public methods. If you need to suppress this, rename your variable to have an underscore prefix (e.g. `s/foo/_foo/`) or just run `./gradlew classes -PerrorProneApply` to auto-fix | palantir/gradle-baseline#819 |
| Improvement | Message format checks use instanceof rather than catching | palantir/gradle-baseline#821 |


## 2.4.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Avoid false positives caused by `module-info.class` when checking class uniqueness | palantir/gradle-baseline#823 |


## 2.4.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Checkstyle tasks only check their own source set and only actual java sources. They don't look in your `src/*/resources` directory anymore. | palantir/gradle-baseline#830 |


## 2.4.3
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Add link to StrictUnusedVariable that directs users to baseline repo. | palantir/gradle-baseline#829 |
| Fix | Long try-with-resources statements are now aligned such that the first assignment stays on the first line. | palantir/gradle-baseline#835 |


## 2.5.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Error Prone StringBuilderConstantParameters. StringBuilder with a constant number of parameters should be replaced by simple concatenation. The Java compiler (jdk8) replaces concatenation of a constant number of arguments with a StringBuilder, while jdk 9+ take advantage of JEP 280 (https://openjdk.java.net/jeps/280) to efficiently pre-size the result for better performance than a StringBuilder. | palantir/gradle-baseline#832 |


## 2.6.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Excavator PRs that apply other refaster rules (e.g. Witchcraft ones) will not also apply baseline refaster rules. | palantir/gradle-baseline#827 |
| Improvement | Added a new ErrorProne check `PreferAssertj` to assist migration to AssertJ from legacy test frameworks. It may be necessary to add a dependency on `org.assertj:assertj-core` in modules which do not already depend on AssertJ. If there's a technical reason that AssertJ cannot be used, `PreferAssertj` may be explicitly disabled to prevent future upgrades from attempting to re-run the migration. | palantir/gradle-baseline#841 |


## 2.7.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | `StrictUnusedVariable` now ignores variables prefixed with `_` and the suggested fix will rename all unused parameters in public methods instead of removing them | palantir/gradle-baseline#833 |
| Improvement | ErrorProne will now detect dangerous usage of `@RunWith(Suite.class)` that references JUnit5 classes, as this can cause tests to silently not run! | palantir/gradle-baseline#843 |


## 2.8.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | PreferAssertj provides better replacements fixes | palantir/gradle-baseline#850 |
| Improvement | Do not run error prone on any code in the build directory | palantir/gradle-baseline#853 |


## 2.8.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Fix hamcrest arrayContainingInAnyOrder conversion | palantir/gradle-baseline#859 |


## 2.9.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | StrictUnusedVariable can only be suppressed with `_` prefix | palantir/gradle-baseline#854 |
| Improvement | StrictUnusedVariable is now an error by default | palantir/gradle-baseline#855 |
| Fix | The PreferAssertj refactoring will only be applied if you have explicitly opted in (e.g. using `baselineErrorProne { patchChecks += 'PreferAssertj' }` | palantir/gradle-baseline#861 |


## 2.9.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Error prone will correctly ignore all source files in the build directory and in any generated source directory | palantir/gradle-baseline#864 |
| Fix | Ensure that `StrictUnusedVariable` correctly converts previously suppressed variables `unused` to `_` | palantir/gradle-baseline#865 |


## 2.9.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | When removing unused variables, `StrictUnusedVariable` will preserve side effects | palantir/gradle-baseline#870 |


## 2.10.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | A new `checkJUnitDependencies` task detects misconfigured JUnit dependencies which could result in some tests silently not running. | palantir/gradle-baseline#837 |
| Improvement | Some AssertJ assertions can now be automatically replaced with more idiomatic ones using refaster. | palantir/gradle-baseline#851 |
| Fix | PreferAssertj check avoids ambiguity in assertThat invocations | palantir/gradle-baseline#874 |
| Improvement | Improve performannce of error prone PreferAssertj check | palantir/gradle-baseline#875 |
| Improvement | StringBuilderConstantParameters suggested fix doesn't remove comments | palantir/gradle-baseline#877 |


## 2.10.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Allow junit4 dependencies to exist without junit4 tests | palantir/gradle-baseline#880 |


## 2.11.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | PreferAssertj supports migration of zero-delta floating point array asserts | palantir/gradle-baseline#883 |


## 2.11.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | checkJunitDependencies only checks Java source | palantir/gradle-baseline#885 |


## 2.11.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | AssertJ Refaster fixes use static `assertThat` imports | palantir/gradle-baseline#887 |


## 2.12.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Disable `UnusedVariable` error prone rule by default | palantir/gradle-baseline#888 |


## 2.13.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Refaster for AssertJ isZero/isNotZero/isOne and collections | palantir/gradle-baseline#881 |
| Improvement | AssertJ refaster migrations support string descriptions | palantir/gradle-baseline#891 |
| Fix | Certain error-prone checks are disabled in test code, and the presence of JUnit5's `@TestTemplate` annotation is now used to detect whether a class is test code. | palantir/gradle-baseline#892 |
| Fix | BaselineFormat task exclude generated code on Windows | palantir/gradle-baseline#896 |


## 2.14.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Refaster rules for AssertJ tests | palantir/gradle-baseline#898 |
| Improvement | refaster replacement for assertj hasSize(foo.size) -> hasSameSizeAs | palantir/gradle-baseline#900 |
| Fix | Keep spotless plugin from eagerly configuring all tasks | diffplug/spotless#444 |
| Fix | Continue when RefasterRuleBuilderScanner throws | palantir/gradle-baseline#904 |
| Improvement | Refaster now works on repos using Gradle 6.0 | palantir/gradle-baseline#804, palantir/gradle-baseline#906 |


## 2.15.0
_No documented user facing changes_

## 2.16.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Rewrite ImmutableCollection#addAll to add for arrays | palantir/gradle-baseline#743 |
| Improvement | Add refaster rule to simplify empty optional asserts | palantir/gradle-baseline#911 |
| Improvement | Baseline now allows static imports of AssertJ and Mockito methods. | palantir/gradle-baseline#915 |
| Improvement | Remove refaster AssertjIsOne rule. | palantir/gradle-baseline#917 |
| Improvement | Add assertj refaster rules for map size asserts | palantir/gradle-baseline#919 |
| Improvement | Added a Refaster rule to change `isEqualTo` checks into `hasValue` checks |  |


## 2.17.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Implement AssertjCollectionHasSameSizeAsArray | palantir/gradle-baseline#922 |
| Improvement | Implement assertj map refactors for containsKey and containsEntry | palantir/gradle-baseline#925 |
| Improvement | Refaster assertj migrations support descriptions with format args | palantir/gradle-baseline#926 |
| Improvement | Refaster out String.format from describedAs | palantir/gradle-baseline#927 |


## 2.18.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Refaster rules to simplify negated boolean expressions and extract null checks. | palantir/gradle-baseline#935 |
| Improvement | Refaster rules for checks that maps do not contain a specific key | palantir/gradle-baseline#935 |
| Improvement | Refaster rule 'CollectionStreamForEach' | palantir/gradle-baseline#942 |
| Improvement | ExecutorSubmitRunnableFutureIgnored as error prone ERROR | palantir/gradle-baseline#943 |


## 2.19.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | checkJUnitDependencies detects a possible misconfiguration with spock and JUnit5 which could lead to tests silently not running. | palantir/gradle-baseline#951 |


## 2.20.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Use Mockito verifyNoInteractions over deprecated verifyZeroInteractions | palantir/gradle-baseline#924 |
| Improvement | Errorprone rules for usage of Guava static factory methods | palantir/gradle-baseline#941 |
| Improvement | Fix error-prone `UnnecessaryParentheses` by default | palantir/gradle-baseline#952 |
| Improvement | Implement Error Prone `ThrowError` to discourage throwing Errors in production code<br>Errors are often handled poorly by libraries resulting in unexpected<br>behavior and resource leaks. It's not obvious that 'catch (Exception e)'<br>does not catch Error.<br>This check  is intended to be advisory - it's fine to<br>`@SuppressWarnings("ThrowError")` in certain cases, but is usually not<br>recommended unless you are writing a testing library that throws<br>AssertionError. | palantir/gradle-baseline#957 |
| Improvement | Improve TestCheckUtils.isTestCode test detection | palantir/gradle-baseline#958 |
| Improvement | Implement Error Prone `Slf4jLevelCheck` to validate that slf4j level checks agree with contained logging. | palantir/gradle-baseline#960 |


## 2.20.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Suppress error-prone PreferCollectionConstructors on jdk13 | palantir/gradle-baseline#968 |


## 2.21.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Feature | Users can opt-in to format their files using our fork of google-java-format (palantir-java-format) | palantir/gradle-baseline#936 |


## 2.22.0
_Automated release, no documented user facing changes_

## 2.23.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Implement error prone ReverseDnsLookup for unexpected reverse dns lookups<br><br>Calling address.getHostName may result in a DNS lookup which is a network request,<br>making the invocation significantly more expensive than expected depending on the<br>environment.<br>This check  is intended to be advisory - it's fine to<br>@SuppressWarnings("ReverseDnsLookup") in certain cases, but is usually not<br>recommended. | palantir/gradle-baseline#970 |


## 2.24.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | The deprecated `verifyZeroInteractions` now gets rewritten to `verifyNoMoreInteractions`, which has the same behaviour. | palantir/gradle-baseline#975 |
| Improvement | ReadReturnValueIgnored: Check that read operation results are not ignored | palantir/gradle-baseline#978 |
| Improvement | Stop migrating source sets to safe-logging, unless they already have the requisite library (`com.palantir.safe-logging:preconditions`). | palantir/gradle-baseline#981 |
| Improvement | For users who opted into palantir-java-format, we now reflow strings and reorder imports. | palantir/gradle-baseline#982 |


## 2.25.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | checkstyle Indentation rule is disabled when palantir-java-format is enabled | palantir/gradle-baseline#987 |
| Improvement | Load palantir-java-format dynamically from the same configuration set up by `com.palantir-java-format` which is also used to determine the version used by IntelliJ. | palantir/gradle-baseline#989 |


## 2.26.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Run `./gradlew formatDiff` to reformat the relevant sections of any uncommitted changed Java files (relies on `git diff -U0 HEAD` under the hood) | palantir/gradle-baseline#988 |


## 2.27.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Slf4jLogsafeArgs fixes safe-log wrapped throwables | palantir/gradle-baseline#1001 |
| Improvement | `DangerousParallelStreamUsage` checks for `Collection.parallelStream()` and `StreamSupport` utility methods with parallel=true. | palantir/gradle-baseline#1005 |
| Improvement | DangerousThrowableMessageSafeArg disallows Throwables in SafeArg values.<br>Throwables must be logged without an Arg wrapper as the last parameter, otherwise unsafe data may be leaked from the unsafe message or the unsafe message of a cause. | palantir/gradle-baseline#997 |
| Improvement | Implement a suggested fix for CatchBlockLogException | palantir/gradle-baseline#998 |


## 2.28.0
| Type | Description | Link |
| ---- | ----------- | ---- |
| Improvement | Implement `FinalClass` error prone check, replacing the checkstyle implementation | palantir/gradle-baseline#1008 |
| Improvement | Error prone validation to avoid redundant modifiers | palantir/gradle-baseline#1010 |


## 2.28.1
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Fix `RedundantModifier` interpretation of implicit modifiers | palantir/gradle-baseline#1014 |


## 2.28.2
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Fix RedundantModifier failures types nested in interfaces | palantir/gradle-baseline#1017 |


## 2.28.3
| Type | Description | Link |
| ---- | ----------- | ---- |
| Fix | Fix error-prone mathcing literal null as a subtype.<br>The most common issue this fixes is failures on `SafeArg.of("name", null)`<br>assuming that the null literal value parameter may be a throwable. | palantir/gradle-baseline#1020 |



To enable or disable this check, please contact the maintainers of Excavator.
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Aug 18, 2023
…reader (apache#623) (apache#629)

KE-41399 spark-42388 Avoid parquet footer reads twice in vectorized reader
RolatZhang pushed a commit to RolatZhang/spark that referenced this pull request Dec 8, 2023
 KE-41399 spark-42388 Avoid parquet footer reads twice in vectorized reader (apache#623) (apache#629)

KE-41399 spark-42388 Avoid parquet footer reads twice in vectorized reader
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants