-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make EvalTask track resolved output paths #403
Conversation
import org.pkl.cli.CliEvaluator; | ||
import org.pkl.cli.CliEvaluatorOptions; | ||
|
||
@UntrackedTask(because = "Output file names are known only after execution") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe explain it a bit better here as well (that the output files contain placeholders)? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👌
I see no indication that |
You can read it up here: So yes it's a bit confusing that you can put a file into it that can contain a placeholder that doesn't work with Gradle because the file location is unknown at that point. |
This is a design mistake in |
If we would use a normal String as input it would work right? 🤔 |
I think |
Great idea! Updated the PR to add new properties that are tracked. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments. Otherwise LGTM.
public abstract DirectoryProperty getMultipleFileOutputDir(); | ||
|
||
@Input | ||
@Optional | ||
public abstract Property<String> getExpression(); | ||
|
||
@Internal | ||
public Provider<CliEvaluator> getCliEvaluator() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think Provider
helps for this and the following methods, as they are only called if and when the information is needed.
Update: Maybe this is correct and enables task dependency tracking for getOutputPaths()
and getMultipleFileOutputPaths()
. I can't find any Gradle docs on this.
Update 2: Found this in the docs:
A provider may represent a task output. Such a provider carries information about the task producing its value. When this provider is attached to an input of another task, Gradle will automatically determine the task dependencies based on this connection.
Now the only question is if this works for a provider obtained with getProject().provider()
. Another option is to inject a ProviderFactory
into the task.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now the only question is if this works for a provider obtained with getProject().provider(). Another option is to inject a ProviderFactory into the task.
Hm... probably not, in that case. I'm guessing we need to reference the inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC getProject
would also fail if using configuration cache.
See https://docs.gradle.org/current/userguide/configuration_cache.html#config_cache:requirements:use_project_during_execution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to know. I think that @Inject
ing a ProviderFactory
might be the right solution. Could check task dependency tracking with an automated or manual test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the link. Switching to the property API (e.g. SetProperty
) works!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Two more naming suggestions for your consideration.
|
||
@OutputFiles | ||
@Optional | ||
public SetProperty<File> getOutputPaths() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would getEffectiveOutputFiles()
be a better name? Makes clear that it’s related to getOutputFile()
and not meant to be configured directly.
|
||
@OutputDirectories | ||
@Optional | ||
public SetProperty<File> getMultipleFileOutputPaths() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is getEffectiveMultipleFileOutputDirs()
a better name?
public abstract ObjectFactory getObjectFactory(); | ||
|
||
private CliEvaluator getCliEvaluator() { | ||
return new CliEvaluator( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this results in multiple CliEvaluator
instances, each of which recomputes CliEvaluator.outputFiles
and CliEvaluator.outputDirectories
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a reason why we originally made EvalTask
untracked.
Unfortunately, Gradle's model of input and output properties and their caching expects that all properties have statically known values after configuration time. In other words, it is expected that Gradle knows exact values of all input and output properties of a task, in order to do its caching and invalidation.
This, in turn, implies that it is not possible (under the model constraints) to change or otherwise dynamically compute values of properties at the task execution time. As an example, this is not feasible:
class ExampleTask : DefaultTask() {
@get:Output
val someProperty: Property<String>
@TaskAction
fun doWork() {
someProperty.set("xyz")
}
}
To be absolutely precise, this is possible to write, and it will kind of work as expected, however, it will not play nicely with caching and invalidation, because Gradle will read input and output property values before it executes the task and build its task state and cache based on those values, and it will not track any changes happening to properties during the task execution.
There are tasks in Gradle which may produce files whose names are unknown before execution, like compilation or code generation. However, in these cases these tasks' output properties are not individual files but directories, whose names are statically known.
In the EvalTask
specifically, due to the nature of patterns, I originally assumed that the specific output file and directory names will not be known until evaluation. Additionally, because often you need to produce a file into a specific directory (for example, evaluate a config.pkl
file into a config.yml
file which must be present in the same directory), you cannot really declare the parent directory of the output pattern to be an "output directory", because it may interfere with other tasks and in general is not "controlled" by the eval task.
Because of this, I originally made this task untracked, because it ensures that the cache consistency is not violated, and that the task is invalidated at appropriate times.
Given the above, it is only realistically possible to track the outputs of this task in the following cases:
- Computing output paths from a pattern does not, in fact, require Pkl evaluation. Looking at how
CliEvaluator
is used here, it seems like it might be the case. If this is true, then idiomatic description of this behavior would be different - I will describe it below. - We are okay with evaluating Pkl files at configuration time. I personally don't think this is a correct approach, and would vote for the task to remain untracked, because the whole point of splitting task configuration and evaluation is to avoid costly computations at configuration time and to have consistent inputs which do not depend on build script evaluation order. Still, if this is the case, I would still recommend to reimplement the logic as described below.
So, if we assume scenario 1, the core idea would be to build CliEvaluator
as Provider<CliEvaluator>
derived from other properties in the class, and then derive Provider<File>
and FileCollection
from that:
private Provider<CliEvaluator> getCliEvaluator() {
return getOutputFile().flatMap(outputFile ->
getOutputFormat().flatMap(outputFormat ->
...new CliEvaluator(...)));
}
@OutputFile
public Property<File> getOutputFilePath() {
// This logic should be changed to return only one file properly,
// because there can be only one file
return getCliEvaluator().map(evaluator ->
evaluator.getOutputFiles().iterator().next()
);
}
@OutputDirectories
public FileCollection getOutputDirectories() {
return outputDirectoriesCollection;
}
private final ConfigurableFileCollection outputDirectoriesCollection = getProject().files();
{
outputDirectoriesCollection.from(
getCliEvaluator().map(evaluator ->
evaluator.getOutputDirectories()
)
);
}
We can also cache the CliEvaluator
instance, but this will require some extra code. Following an example in here, we can make a utility function:
public static <T> Provider<T> cache(ObjectFactory objects, Class<T> elementType, Provider<T> provider) {
var property = objects.property(elementType);
property.value(provider);
property.disallowChanges();
property.finalizeValueOnRead();
return property;
}
Then, define the CliEvaluator
property like this:
private final Property<CliEvaluator> cliEvaluator = cache(
getObjects(),
CliEvaluator.class,
getOutputFile().flatMap(outputFile ->
getOutputFormat().flatMap(outputFormat ->
...new CliEvaluator(...)));
);
and use that as the source of output properties:
@OutputFile
public Property<File> getOutputFilePath() {
// This logic should be changed to return only one file properly,
// because there can be only one file
return cliEvaluator.map(evaluator ->
evaluator.getOutputFiles().iterator().next()
);
}
@OutputDirectories
public FileCollection getOutputDirectories() {
return outputDirectoriesCollection;
}
private final ConfigurableFileCollection outputDirectoriesCollection = getProject().files();
{
outputDirectoriesCollection.from(
cliEvaluator.map(evaluator ->
evaluator.getOutputDirectories()
)
);
}
I haven’t tested yet that this works fully correctly (I’m going to do it right now), and I do see that all of this is kind of clunky and more complicated than the original approach, but unfortunately that’s what is required here if we are to follow recommended Gradle practices.
Actually I have just remembered another reason why it makes sense to keep the task untracked — Pkl evaluation can depend on external, non-file inputs, like environment variables. Thus it is possible for an evaluation task to become cached when it should not be, for example: the user changed an env var and re-run the task => Pkl script depends on the env var but there is no way for Gradle to track it => Pkl is not re-evaluated, old values are still used => the user is confused.
As a side note — I would recommend not making every property @Optional
by default. For example, properties like outputFormat
always contain some default value from the spec, thus they should not really be optional.
public SetProperty<File> getOutputPaths() { | ||
var ret = getObjectFactory().setProperty(File.class); | ||
ret.value(getCliEvaluator().getOutputFiles()).disallowChanges(); | ||
return ret; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this approach has some issues.
Gradle lazy property objects can be accessed at any time, including at the configuration phase. Therefore, their creation must not be predicated on any complex logic, and especially not on values of other lazy properties.
In this case, however, just creating the SetProperty
instance requires creating the CliEvaluator
object, which in turn evaluates other properties in the task. This is, unfortunately, does not follow Gradle's model of lazy properties properly.
Here is a limited and artificial example which illustrates the point:
// buildSrc/src/main/java/tasks/EvalTask.java
public abstract class EvalTask extends DefaultTask {
@Internal
public abstract RegularFileProperty getOutputFile();
@Inject
public abstract ObjectFactory getObjectFactory();
private Set<File> getCliEvaluatorFiles() {
return Set.of(getOutputFile().get().getAsFile());
}
@OutputFiles
@Optional
public SetProperty<File> getOutputPaths() {
var ret = getObjectFactory().setProperty(File.class);
ret.value(getCliEvaluatorFiles()).disallowChanges();
return ret;
}
}
// build.gradle.kts
import tasks.EvalTask
val evalSomething by tasks.creating(EvalTask::class) {
// outputFile is not set
}
val copySomewhere by tasks.creating(Copy::class) {
from(evalSomething.outputPaths)
into("whatever")
}
evalSomething.run {
outputFile = layout.projectDirectory.file("settings.gradle.kts")
}
This build will fail at configuration time with an exception like this:
* What went wrong:
Cannot query the value of task ':evalSomething' property 'outputFile' because it has no value available.
This is actually the reason why in that link that was submitted in the earlier discussion (on Gradle forums) it was done differently, along these lines:
@OutputFile
abstract RegularFileProperty getOutputFile()
Producer() {
outputFile.value(getOutputFileName().flatMap(name -> project.layout.buildDirectory.file(name))).disallowChanges()
}
Not only is the actual instance of the property instantiated by Gradle here (via abstract method injection), it is configured with a provider (derived from a property here), not with a directly computed value, like in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, Gradle's model of input and output properties and their caching expects that all properties have statically known values after configuration time
Surely a task can take a FileTree
as input whose files don't exist at configuration time?
Pkl evaluation can depend on external, non-file inputs, like environment variables.
I don't think that's an issue here because all such inputs are configured through the task (see superclasses of EvalTask
).
Evaluating Pkl files at configuration time is not OK. But looking at CliEvaluator
, I think that "computing output paths from a pattern does not, in fact, require Pkl evaluation" is correct.
Gradle lazy property objects can be accessed at any time, including at the configuration phase. Therefore, their creation must not be predicated on any complex logic, and especially not on values of other lazy properties.
Surely it must be possible for a computed input/output property to depend on another property of the same task? I don't see anything expensive here, with the possible exception of CliEvaluator.outputFiles/outputDirectories
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely a task can take a FileTree as input whose files don't exist at configuration time?
Yes, it most definitely can, but it does not contradict what I'm saying. I might not have expressed it fully clearly, but essentially what I wanted to say is that in Gradle model, file collections must be fully configured before the task execution time. It does not mean that their contents can't be dynamic - they most certainly can, like class files produced by compilation tasks, and they can be absent before some task is executed - but that their configuration can't be dynamic, i.e. you cannot add new sources to a file collection or a file tree at task execution time.
When I was implementing the first version of the evaluation task, I assumed that it is not possible to determine exact file names and directory names from patterns without evaluation, i.e. without executing the task logic first. Because patterns are flexible, it would've been impossible to statically configure a FileCollection
representing output files/directories, and therefore I had to make the tasks untracked.
Since apparently it is possible to resolve patterns and determine outputs without running evaluation first, therefore, it should be possible to determine the full set of output files/directories at configuration time, and therefore we can make the task tracked.
I don't think that's an issue here because all such inputs are configured through the task (see superclasses of EvalTask).
Fair point, forgot that these items were there. Yes, in this case evaluation should be self-contained and thus cached.
Btw, let me correct myself - it seems I forgot the code base a bit - of course there can be multiple output files with single output pattern; then the logic would look like this: @OutputFiles
@Optional
public Provider<Set<File>> getEffectiveOutputFilePaths() {
return cliEvaluator.map(CliEvaluator::getOutputFiles);
}
@OutputDirectories
@Optional
public Provider<Set<File>> getOutputDirectories() {
return cliEvaluator.map(CliEvaluator::getOutputDirectories);
}
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
1db413a
to
ca653da
Compare
@@ -501,7 +501,7 @@ class EvaluatorsTest : AbstractTest() { | |||
} | |||
|
|||
val printEvalFiles by tasks.registering { | |||
inputs.files(doEval) | |||
inputs.files(doEval.map { it.effectiveOutputFiles }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not be necessary - it is idiomatic to depend on a task rather than on a specific output property, and also it is not really possible for these properties to be non-empty at the same time, so there is no ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair; will undo this change
return getObjects() | ||
.fileCollection() | ||
.from(cliEvaluator.map(e -> nullToEmpty(e.getOutputFiles()))); | ||
} | ||
|
||
@OutputDirectories | ||
@Optional | ||
public FileCollection getOutputDirectories() { | ||
public FileCollection getEffectiveOutputDirectories() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor - I believe in all of the plugins we use getSomeDir()
for directory properties, not getSomeDirectory()
, so it might make sense to name this getEffectiveOutputDirs()
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking, but I agree with @netvl.
return getObjects() | ||
.fileCollection() | ||
.from(cliEvaluator.map(e -> nullToEmpty(e.getOutputFiles()))); | ||
} | ||
|
||
@OutputDirectories | ||
@Optional | ||
public FileCollection getOutputDirectories() { | ||
public FileCollection getEffectiveOutputDirectories() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-blocking, but I agree with @netvl.
3b93eb6
to
c6d41e6
Compare
This reverts commit 3c6df1f. The reason that this task is untracked is because the output file and multiple file output dir both accept placeholder values, which get swapped out with actual values during eval.
Also added more tests for task caching behavior.
c6d41e6
to
a4e8765
Compare
This fixes an issue where absolute `File` objects in Gradle on Windows turn into an invalid URI. Also, renames `getEffectiveOutputDirectories()` to `getEffectiveOutputDirs()`.
a4e8765
to
25068dd
Compare
This is a port of a fix that was included in apple#403.
This is a port of a fix that was included in #403.
This makes Gradle track
getOutputPaths()
andgetMultipleFileOutputPaths
. These represent the actual outputs of the EvalTask.