Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gradle build cache miss due to volatility of SpotlessTask.steps input #2168

Open
jprinet opened this issue Jun 12, 2024 · 7 comments
Open

Gradle build cache miss due to volatility of SpotlessTask.steps input #2168

jprinet opened this issue Jun 12, 2024 · 7 comments
Labels

Comments

@jprinet
Copy link

jprinet commented Jun 12, 2024

Summary

  • Gradle version: any
  • Spotless version: 7.0.0.BETA1
  • OS: any

Spotless configuration

See this example in the JUnit5 project

Expected behavior

Run the build twice without changing the code and get a Gradle build cache hit on spotlessJava task

Current behavior

A cache miss occurs in the second build scan

First Build Scan®
Second Build Scan®

Due to the SpotlessTask.steps value being different in the two builds:
Screenshot 2024-06-12 at 4 06 22 PM

Steps to reproduce

cd /tmp
git clone git@github.com:junit-team/junit5.git junit5-1 --depth 1
git clone git@github.com:junit-team/junit5.git junit5-2 --depth 1
cd junit5-1
gw clean :junit-platform-console:spotlessJava -Dgradle.user.home=../guh -Dgradle.cache.remote.enabled=false --scan
cd ../junit5-2
gw clean :junit-platform-console:spotlessJava -Dgradle.user.home=../guh -Dgradle.cache.remote.enabled=false --scan

Reasons

The SpotlessTask.steps is declared as task input but deserializing/serializing the collection is not consistent

Fix

Use a consistent representation of the input by using the steps name.
This assumes that the step name is unique.

See this Build Scan®️ illustrating the build cache hit with the fix

@nedtwigg
Copy link
Member

Thanks for this report! We have gone to great lengths to make our step's equality correct, but 7.0.0.BETA1 is a huge change so I'm not shocked if we got one wrong.

Our current implementation puts all of each task's implementation details into account. Settings files, flags, etc. If you change a formatter's setting file, you want the formatter to rerun, you don't want to clear cache before you can find out what the new flag does.

I see this problem is present in spotlessJava, is it also present in spotlessKotlin? If not, then probably the issue is only in the importOrderFile or eclipse step, since they are unique to spotlessJava. If we comment out both steps, does the problem go away? Then add them one at a time until it comes back? If we can narrow down which step has the problem, then we can fix that step.

@nedtwigg nedtwigg added the bug label Jun 12, 2024
@jprinet
Copy link
Author

jprinet commented Jun 25, 2024

Sorry for the late reply @nedtwigg
I could eventually identify the root cause after investigating further the issue.

The cache key differs when running the build from different locations due to a change in the underlying fingerprint of FormatterStepSerializationRoundtrip.roundtripStateInternal

This is specific to the EclipseJdtFormatterStep (backed by EquoBasedStepBuilder).

Serializing EquoBasedStepBuilder has some references on a settings file with an absolute path.

This can be fixed by configuring the collection transient and adding a collection that would represent this input without volatility (Collection of file names?).

@nedtwigg
Copy link
Member

Huh. So the point of this:

private final transient ThrowingEx.Supplier<RoundtripState> initializer;
private @Nullable RoundtripState roundtripStateInternal;
private final SerializedFunction<RoundtripState, EqualityState> equalityStateExtractor;

Is that there is a RoundtripState which can have absolute paths in it. And then there is an EqualityState which is extracted from the RoundtripState, and that should not have absolute paths in it.

The state which is used to calculate .equals and .hashCode comes from here

@Override
protected EqualityState stateSupplier() throws Exception {
if (roundtripStateInternal == null) {
roundtripStateInternal = initializer.get();
}
return equalityStateExtractor.apply(roundtripStateInternal);
}

This can be fixed by configuring the collection transient and adding a collection that would represent this input without volatility (Collection of file names?).

This is what FileSignature and FileSignature.Promised do. FileSignature.Promised has the absolute path, and thus supports roundtrip serialization, and thus can be in the RoundtripState. For the EqualityState, it gets resolved into a regular FileSignature which loses the absolute paths.

I thought that the convoluted design above would let us satisfy the convoluted requirements the Gradle's configuration cache forces on us:

  • relocatable build cache -> we have to ignore absolute paths (EqualityState)
  • configuration cache -> we have to support roundtrip serialization -> we have to store absolute paths (RoundtripState)

One of two things is going on:

  1. there is an absolute path in the Equo-based steps that needs to become FileSignature/Promise ified that I missed
  2. Gradle is doing fingerprinting based on the serialized representation rather than based on hashcode (the hack that we used before configuration cache forced us to change)

Serializing EquoBasedStepBuilder has some references on a settings file with an absolute path

Any chance you can provide the filename in question?

@jprinet
Copy link
Author

jprinet commented Jun 26, 2024

I can confirm that the steps input property is fingerprinted with the Java serialization result.

You can see in this screenshot the absolute path to the settings file (in addition to my proposal of adding a collection containing only the filenames)
Screenshot 2024-06-26 at 9 39 43 AM

It is configured here in the consumer project.
Do you think it could work with a relative path being passed?

Unrelated to this issue, while investigating the problem I discovered that EndWithNewLineStep was not implementing Serializable although it should according to the method signature.

@jprinet
Copy link
Author

jprinet commented Jul 2, 2024

Do you think it could work with a relative path being passed?

any chance you could consider this option @nedtwigg ?

If not, will you implement a fix?

@nedtwigg
Copy link
Member

nedtwigg commented Jul 3, 2024

Re: EndWithNewlineStep, the code is correct. EndWithNewlineStep is not serializable, but EndWithNewlineStep.class is serializable. Because the step has no settings, it just needs a "key" for equality comparison.

Do you think it could work with a relative path being passed?

I just added a test to show a few things:

  • if you change the folder of a settings file
  • it does change the serialized representation
  • but it doesn't change .equals or .hashCode

public void test() throws Exception {
var settings1 = setFile("subfolder1/formatter.xml").toResource("java/eclipse/formatter.xml");
var settings2 = setFile("subfolder2/formatter.xml").toResource("java/eclipse/formatter.xml");
var step1 = withSettingsFile(settings1);
var step2 = withSettingsFile(settings2);
Assertions.assertTrue(step1.equals(step2));
Assertions.assertTrue(step1.hashCode() == step2.hashCode());
var serialized1 = toBytes(step1);
var serialized2 = toBytes(step2);
Assertions.assertFalse(serialized1.equals(serialized2));
}

From the docs for Gradle build cache:

A “value property” is a task input property that is not a file or set of files. ... Build scans capture a hash of the value for comparison, not the value itself

So now the question is: is Gradle doing the exact same hack we did before this gigantic 7.0 epic? As in, they are using the serialized representation to do .equals and .hashcode, and ignoring the actual .equals and .hashcode? That's reasonable thing to do, but doing that and requiring roundtrip serialization of all configuration values is insane... I hope that's not what's going on...

@nedtwigg
Copy link
Member

nedtwigg commented Jul 3, 2024

Yeah, looks like there's a fundamental conflict here. Gradle uses the serialized representation of your task to do build cache, it does not respect .equals. But configuration cache requires your task to support full roundtrip serialization. As argued in #987, the roundtrip serialization has a lot of downsides, and the only upside (cross-machine configuration cache) is imo never going to happen, and if it does it's going to smear secret keys even further across the internet than it already has.

As it stands in spotless 7.x, local build cache and local configuration cache both work, but shared build cache is broken. In 6.x, everything worked except that the local configuration cache barfed if you started a new daemon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants