Make EvalTask track resolved output paths #403

bioball · 2024-04-05T17:25:33Z

This makes Gradle track getOutputPaths() and getMultipleFileOutputPaths. These represent the actual outputs of the EvalTask.

StefMa · 2024-04-05T17:28:16Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

 import org.pkl.cli.CliEvaluator;
 import org.pkl.cli.CliEvaluatorOptions;

+@UntrackedTask(because = "Output file names are known only after execution")


Maybe explain it a bit better here as well (that the output files contain placeholders)? 🤔

StefMa

LGTM 👌

translatenix · 2024-04-05T17:55:57Z

I see no indication that EvalTask supports placeholders. It only accepts RegularFileProperty getOutputFile() and DirectoryProperty getMultipleFileOutputDir(), which clearly shouldn't contain placeholders.

StefMa · 2024-04-05T18:01:22Z

You can read it up here:
https://pkl-lang.org/main/current/pkl-gradle/index.html#configuration-options

So yes it's a bit confusing that you can put a file into it that can contain a placeholder that doesn't work with Gradle because the file location is unknown at that point.

translatenix · 2024-04-05T18:05:08Z

This is a design mistake in EvalTask. Should try to fix that instead of making the task untracked.

StefMa · 2024-04-05T18:13:14Z

This is a design mistake in EvalTask. Should try to fix that instead of making the task untracked.

If we would use a normal String as input it would work right? 🤔

translatenix · 2024-04-05T18:28:34Z

I think EvalTask could have separate Property<String> properties and invalidate outputs if those are set.
But I see that CliEvaluator has outputFiles and outputDirectories properties, which might be all that's needed for accurate tracking.

bioball · 2024-04-05T20:36:42Z

But I see that CliEvaluator has outputFiles and outputDirectories properties, which might be all that's needed for accurate tracking.

Great idea! Updated the PR to add new properties that are tracked.

translatenix

Left two comments. Otherwise LGTM.

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

translatenix · 2024-04-05T21:47:59Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

  public abstract DirectoryProperty getMultipleFileOutputDir();

  @Input
  @Optional
  public abstract Property<String> getExpression();

+  @Internal
+  public Provider<CliEvaluator> getCliEvaluator() {


I don't think Provider helps for this and the following methods, as they are only called if and when the information is needed.

Update: Maybe this is correct and enables task dependency tracking for getOutputPaths() and getMultipleFileOutputPaths(). I can't find any Gradle docs on this.

Update 2: Found this in the docs:

A provider may represent a task output. Such a provider carries information about the task producing its value. When this provider is attached to an input of another task, Gradle will automatically determine the task dependencies based on this connection.

Now the only question is if this works for a provider obtained with getProject().provider(). Another option is to inject a ProviderFactory into the task.

Now the only question is if this works for a provider obtained with getProject().provider(). Another option is to inject a ProviderFactory into the task.

Hm... probably not, in that case. I'm guessing we need to reference the inputs.

IIRC getProject would also fail if using configuration cache.
See https://docs.gradle.org/current/userguide/configuration_cache.html#config_cache:requirements:use_project_during_execution

Good to know. I think that @Injecting a ProviderFactory might be the right solution. Could check task dependency tracking with an automated or manual test.

Related: https://discuss.gradle.org/t/auto-create-task-dependecies-in-case-of-computed-outputfile-provider-regularfile/45381

Thanks for the link. Switching to the property API (e.g. SetProperty) works!

translatenix

LGTM. Two more naming suggestions for your consideration.

translatenix · 2024-04-08T22:47:34Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

+
+  @OutputFiles
+  @Optional
+  public SetProperty<File> getOutputPaths() {


Would getEffectiveOutputFiles() be a better name? Makes clear that it’s related to getOutputFile() and not meant to be configured directly.

translatenix · 2024-04-08T22:48:55Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

+
+  @OutputDirectories
+  @Optional
+  public SetProperty<File> getMultipleFileOutputPaths() {


Is getEffectiveMultipleFileOutputDirs() a better name?

translatenix · 2024-04-08T23:18:21Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

+  public abstract ObjectFactory getObjectFactory();
+
+  private CliEvaluator getCliEvaluator() {
+    return new CliEvaluator(


Note that this results in multiple CliEvaluator instances, each of which recomputes CliEvaluator.outputFiles and CliEvaluator.outputDirectories.

netvl

There was a reason why we originally made EvalTask untracked.

Unfortunately, Gradle's model of input and output properties and their caching expects that all properties have statically known values after configuration time. In other words, it is expected that Gradle knows exact values of all input and output properties of a task, in order to do its caching and invalidation.

This, in turn, implies that it is not possible (under the model constraints) to change or otherwise dynamically compute values of properties at the task execution time. As an example, this is not feasible:

class ExampleTask : DefaultTask() {
    @get:Output
    val someProperty: Property<String>

    @TaskAction
    fun doWork() {
        someProperty.set("xyz")
    }
}

To be absolutely precise, this is possible to write, and it will kind of work as expected, however, it will not play nicely with caching and invalidation, because Gradle will read input and output property values before it executes the task and build its task state and cache based on those values, and it will not track any changes happening to properties during the task execution.

There are tasks in Gradle which may produce files whose names are unknown before execution, like compilation or code generation. However, in these cases these tasks' output properties are not individual files but directories, whose names are statically known.

In the EvalTask specifically, due to the nature of patterns, I originally assumed that the specific output file and directory names will not be known until evaluation. Additionally, because often you need to produce a file into a specific directory (for example, evaluate a config.pkl file into a config.yml file which must be present in the same directory), you cannot really declare the parent directory of the output pattern to be an "output directory", because it may interfere with other tasks and in general is not "controlled" by the eval task.

Because of this, I originally made this task untracked, because it ensures that the cache consistency is not violated, and that the task is invalidated at appropriate times.

Given the above, it is only realistically possible to track the outputs of this task in the following cases:

Computing output paths from a pattern does not, in fact, require Pkl evaluation. Looking at how CliEvaluator is used here, it seems like it might be the case. If this is true, then idiomatic description of this behavior would be different - I will describe it below.
We are okay with evaluating Pkl files at configuration time. I personally don't think this is a correct approach, and would vote for the task to remain untracked, because the whole point of splitting task configuration and evaluation is to avoid costly computations at configuration time and to have consistent inputs which do not depend on build script evaluation order. Still, if this is the case, I would still recommend to reimplement the logic as described below.

So, if we assume scenario 1, the core idea would be to build CliEvaluator as Provider<CliEvaluator> derived from other properties in the class, and then derive Provider<File> and FileCollection from that:

private Provider<CliEvaluator> getCliEvaluator() {
    return getOutputFile().flatMap(outputFile ->
      getOutputFormat().flatMap(outputFormat ->
        ...new CliEvaluator(...)));
}

@OutputFile
public Property<File> getOutputFilePath() {
    // This logic should be changed to return only one file properly,
    // because there can be only one file
    return getCliEvaluator().map(evaluator ->
        evaluator.getOutputFiles().iterator().next()
    );
}

@OutputDirectories
public FileCollection getOutputDirectories() {
    return outputDirectoriesCollection;
}

private final ConfigurableFileCollection outputDirectoriesCollection = getProject().files();

{
    outputDirectoriesCollection.from(
        getCliEvaluator().map(evaluator ->
            evaluator.getOutputDirectories()
        )
    );
}

We can also cache the CliEvaluator instance, but this will require some extra code. Following an example in here, we can make a utility function:

public static <T> Provider<T> cache(ObjectFactory objects, Class<T> elementType, Provider<T> provider) {
    var property = objects.property(elementType);
    property.value(provider);
    property.disallowChanges();
    property.finalizeValueOnRead();
    return property;
}

Then, define the CliEvaluator property like this:

private final Property<CliEvaluator> cliEvaluator = cache(
    getObjects(),
    CliEvaluator.class,
    getOutputFile().flatMap(outputFile ->
      getOutputFormat().flatMap(outputFormat ->
        ...new CliEvaluator(...)));
);

and use that as the source of output properties:

@OutputFile
public Property<File> getOutputFilePath() {
    // This logic should be changed to return only one file properly,
    // because there can be only one file
    return cliEvaluator.map(evaluator ->
        evaluator.getOutputFiles().iterator().next()
    );
}

@OutputDirectories
public FileCollection getOutputDirectories() {
    return outputDirectoriesCollection;
}

private final ConfigurableFileCollection outputDirectoriesCollection = getProject().files();

{
    outputDirectoriesCollection.from(
        cliEvaluator.map(evaluator ->
            evaluator.getOutputDirectories()
        )
    );
}

I haven’t tested yet that this works fully correctly (I’m going to do it right now), and I do see that all of this is kind of clunky and more complicated than the original approach, but unfortunately that’s what is required here if we are to follow recommended Gradle practices.

Actually I have just remembered another reason why it makes sense to keep the task untracked — Pkl evaluation can depend on external, non-file inputs, like environment variables. Thus it is possible for an evaluation task to become cached when it should not be, for example: the user changed an env var and re-run the task => Pkl script depends on the env var but there is no way for Gradle to track it => Pkl is not re-evaluated, old values are still used => the user is confused.

As a side note — I would recommend not making every property @Optional by default. For example, properties like outputFormat always contain some default value from the spec, thus they should not really be optional.

netvl · 2024-04-09T00:19:19Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

+  public SetProperty<File> getOutputPaths() {
+    var ret = getObjectFactory().setProperty(File.class);
+    ret.value(getCliEvaluator().getOutputFiles()).disallowChanges();
+    return ret;
+  }


Unfortunately this approach has some issues.

Gradle lazy property objects can be accessed at any time, including at the configuration phase. Therefore, their creation must not be predicated on any complex logic, and especially not on values of other lazy properties.

In this case, however, just creating the SetProperty instance requires creating the CliEvaluator object, which in turn evaluates other properties in the task. This is, unfortunately, does not follow Gradle's model of lazy properties properly.

Here is a limited and artificial example which illustrates the point:

// buildSrc/src/main/java/tasks/EvalTask.java public abstract class EvalTask extends DefaultTask { @Internal public abstract RegularFileProperty getOutputFile(); @Inject public abstract ObjectFactory getObjectFactory(); private Set<File> getCliEvaluatorFiles() { return Set.of(getOutputFile().get().getAsFile()); } @OutputFiles @Optional public SetProperty<File> getOutputPaths() { var ret = getObjectFactory().setProperty(File.class); ret.value(getCliEvaluatorFiles()).disallowChanges(); return ret; } }

// build.gradle.kts import tasks.EvalTask val evalSomething by tasks.creating(EvalTask::class) { // outputFile is not set } val copySomewhere by tasks.creating(Copy::class) { from(evalSomething.outputPaths) into("whatever") } evalSomething.run { outputFile = layout.projectDirectory.file("settings.gradle.kts") }

This build will fail at configuration time with an exception like this:

* What went wrong: Cannot query the value of task ':evalSomething' property 'outputFile' because it has no value available.

This is actually the reason why in that link that was submitted in the earlier discussion (on Gradle forums) it was done differently, along these lines:

@OutputFile abstract RegularFileProperty getOutputFile() Producer() { outputFile.value(getOutputFileName().flatMap(name -> project.layout.buildDirectory.file(name))).disallowChanges() }

Not only is the actual instance of the property instantiated by Gradle here (via abstract method injection), it is configured with a provider (derived from a property here), not with a directly computed value, like in this PR.

Unfortunately, Gradle's model of input and output properties and their caching expects that all properties have statically known values after configuration time

Surely a task can take a FileTree as input whose files don't exist at configuration time?

Pkl evaluation can depend on external, non-file inputs, like environment variables.

I don't think that's an issue here because all such inputs are configured through the task (see superclasses of EvalTask).

Evaluating Pkl files at configuration time is not OK. But looking at CliEvaluator, I think that "computing output paths from a pattern does not, in fact, require Pkl evaluation" is correct.

Gradle lazy property objects can be accessed at any time, including at the configuration phase. Therefore, their creation must not be predicated on any complex logic, and especially not on values of other lazy properties.

Surely it must be possible for a computed input/output property to depend on another property of the same task? I don't see anything expensive here, with the possible exception of CliEvaluator.outputFiles/outputDirectories.

Surely a task can take a FileTree as input whose files don't exist at configuration time?

Yes, it most definitely can, but it does not contradict what I'm saying. I might not have expressed it fully clearly, but essentially what I wanted to say is that in Gradle model, file collections must be fully configured before the task execution time. It does not mean that their contents can't be dynamic - they most certainly can, like class files produced by compilation tasks, and they can be absent before some task is executed - but that their configuration can't be dynamic, i.e. you cannot add new sources to a file collection or a file tree at task execution time.

When I was implementing the first version of the evaluation task, I assumed that it is not possible to determine exact file names and directory names from patterns without evaluation, i.e. without executing the task logic first. Because patterns are flexible, it would've been impossible to statically configure a FileCollection representing output files/directories, and therefore I had to make the tasks untracked.

Since apparently it is possible to resolve patterns and determine outputs without running evaluation first, therefore, it should be possible to determine the full set of output files/directories at configuration time, and therefore we can make the task tracked.

I don't think that's an issue here because all such inputs are configured through the task (see superclasses of EvalTask).

Fair point, forgot that these items were there. Yes, in this case evaluation should be self-contained and thus cached.

netvl · 2024-04-09T03:50:52Z

Btw, let me correct myself - it seems I forgot the code base a bit - of course there can be multiple output files with single output pattern; then the logic would look like this:

  @OutputFiles
  @Optional
  public Provider<Set<File>> getEffectiveOutputFilePaths() {
      return cliEvaluator.map(CliEvaluator::getOutputFiles);
  }
  
  @OutputDirectories
  @Optional
  public Provider<Set<File>> getOutputDirectories() {
      return cliEvaluator.map(CliEvaluator::getOutputDirectories);
  }

I also switched to Provider<Set<File>> instead of FileCollection, because unfortunately file collections are more cumbersome to create than providers, especially if empty providers is a possibility. Not true in the latest version of the PR, FileCollections are all right here.

netvl

LGTM!

netvl · 2024-06-29T00:33:04Z

pkl-gradle/src/test/kotlin/org/pkl/gradle/EvaluatorsTest.kt

@@ -501,7 +501,7 @@ class EvaluatorsTest : AbstractTest() {
      }

      val printEvalFiles by tasks.registering {
-        inputs.files(doEval)
+        inputs.files(doEval.map { it.effectiveOutputFiles })


This should not be necessary - it is idiomatic to depend on a task rather than on a specific output property, and also it is not really possible for these properties to be non-empty at the same time, so there is no ambiguity.

Fair; will undo this change

netvl · 2024-06-29T00:34:23Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

    return getObjects()
        .fileCollection()
        .from(cliEvaluator.map(e -> nullToEmpty(e.getOutputFiles())));
  }

  @OutputDirectories
  @Optional
-  public FileCollection getOutputDirectories() {
+  public FileCollection getEffectiveOutputDirectories() {


Minor - I believe in all of the plugins we use getSomeDir() for directory properties, not getSomeDirectory(), so it might make sense to name this getEffectiveOutputDirs(). WDYT?

Non-blocking, but I agree with @netvl.

holzensp · 2024-07-18T10:38:27Z

pkl-gradle/src/main/java/org/pkl/gradle/task/EvalTask.java

    return getObjects()
        .fileCollection()
        .from(cliEvaluator.map(e -> nullToEmpty(e.getOutputFiles())));
  }

  @OutputDirectories
  @Optional
-  public FileCollection getOutputDirectories() {
+  public FileCollection getEffectiveOutputDirectories() {


Non-blocking, but I agree with @netvl.

This reverts commit 3c6df1f. The reason that this task is untracked is because the output file and multiple file output dir both accept placeholder values, which get swapped out with actual values during eval.

Also added more tests for task caching behavior.

This fixes an issue where absolute `File` objects in Gradle on Windows turn into an invalid URI. Also, renames `getEffectiveOutputDirectories()` to `getEffectiveOutputDirs()`.

This is a port of a fix that was included in apple#403.

This is a port of a fix that was included in #403.

StefMa reviewed Apr 5, 2024

View reviewed changes

StefMa approved these changes Apr 5, 2024

View reviewed changes

holzensp approved these changes Apr 5, 2024

View reviewed changes

bioball changed the title ~~Revert "Turn EvalTask into a tracked Gradle task"~~ Make EvalTask track resolved output paths Apr 5, 2024

translatenix reviewed Apr 5, 2024

View reviewed changes

translatenix reviewed Apr 8, 2024

View reviewed changes

netvl reviewed Apr 9, 2024

View reviewed changes

bioball requested a review from holzensp June 29, 2024 00:09

netvl approved these changes Jun 29, 2024

View reviewed changes

bioball force-pushed the revert-tracked-eval branch from 1db413a to ca653da Compare June 29, 2024 00:10

netvl reviewed Jun 29, 2024

View reviewed changes

stackoverflow approved these changes Jul 18, 2024

View reviewed changes

holzensp approved these changes Jul 18, 2024

View reviewed changes

bioball force-pushed the revert-tracked-eval branch from 3b93eb6 to c6d41e6 Compare July 24, 2024 22:04

bioball added 7 commits July 24, 2024 15:07

Revert "Turn EvalTask into a tracked Gradle task"

929d882

This reverts commit 3c6df1f. The reason that this task is untracked is because the output file and multiple file output dir both accept placeholder values, which get swapped out with actual values during eval.

Update comment with detailed explanation

0ca03a7

Make EvalTask tracked again, and add derived output paths

55de0ae

Make getOutputFormat() a tracked input

70cfff7

Create a provider via provider factory

063a2c6

Use SetProperty to enable proper dependency tracking

1becaec

Style fix

a208562

netvl and others added 3 commits July 24, 2024 15:08

Reimplemented tracking of output files in EvalTask

b027056

Also added more tests for task caching behavior.

Cleanup: rename output properties

60c8043

Fix merge conflict

6ce42b5

bioball force-pushed the revert-tracked-eval branch from c6d41e6 to a4e8765 Compare July 24, 2024 22:08

Fix loading files on Windows and rename Gradle task property

25068dd

This fixes an issue where absolute `File` objects in Gradle on Windows turn into an invalid URI. Also, renames `getEffectiveOutputDirectories()` to `getEffectiveOutputDirs()`.

bioball force-pushed the revert-tracked-eval branch from a4e8765 to 25068dd Compare July 26, 2024 00:29

bioball merged commit 604b042 into apple:main Jul 26, 2024
5 checks passed

bioball deleted the revert-tracked-eval branch July 26, 2024 00:41

bioball mentioned this pull request Aug 1, 2024

Gradle Plugin Fails on Windows #608

Closed

bioball added a commit to bioball/pkl that referenced this pull request Aug 5, 2024

Fix usage of file() notation with Pkl Gradle plugin on Windows

f958396

This is a port of a fix that was included in apple#403.

bioball mentioned this pull request Aug 5, 2024

Fix usage of file() notation with Pkl Gradle plugin on Windows #611

Merged

bioball added a commit that referenced this pull request Aug 6, 2024

Fix usage of file() notation with Pkl Gradle plugin on Windows (#611)

f9a3fc8

This is a port of a fix that was included in #403.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make EvalTask track resolved output paths #403

Make EvalTask track resolved output paths #403

bioball commented Apr 5, 2024 •

edited

Loading

StefMa Apr 5, 2024

bioball Apr 5, 2024

StefMa left a comment

translatenix commented Apr 5, 2024

StefMa commented Apr 5, 2024

translatenix commented Apr 5, 2024 •

edited

Loading

StefMa commented Apr 5, 2024

translatenix commented Apr 5, 2024

bioball commented Apr 5, 2024

translatenix left a comment

translatenix Apr 5, 2024 •

edited

Loading

bioball Apr 6, 2024

StefMa Apr 6, 2024

translatenix Apr 6, 2024

translatenix Apr 6, 2024

bioball Apr 8, 2024

translatenix left a comment

translatenix Apr 8, 2024 •

edited

Loading

translatenix Apr 8, 2024

translatenix Apr 8, 2024

netvl left a comment

netvl Apr 9, 2024

translatenix Apr 9, 2024 •

edited

Loading

netvl Apr 9, 2024

netvl commented Apr 9, 2024 •

edited

Loading

netvl left a comment

netvl Jun 29, 2024

bioball Jul 24, 2024

netvl Jun 29, 2024

holzensp Jul 18, 2024

holzensp Jul 18, 2024

Make EvalTask track resolved output paths #403

Make EvalTask track resolved output paths #403

Conversation

bioball commented Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefMa left a comment

Choose a reason for hiding this comment

translatenix commented Apr 5, 2024

StefMa commented Apr 5, 2024

translatenix commented Apr 5, 2024 • edited Loading

StefMa commented Apr 5, 2024

translatenix commented Apr 5, 2024

bioball commented Apr 5, 2024

translatenix left a comment

Choose a reason for hiding this comment

translatenix Apr 5, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix left a comment

Choose a reason for hiding this comment

translatenix Apr 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netvl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

translatenix Apr 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

netvl commented Apr 9, 2024 • edited Loading

netvl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bioball commented Apr 5, 2024 •

edited

Loading

translatenix commented Apr 5, 2024 •

edited

Loading

translatenix Apr 5, 2024 •

edited

Loading

translatenix Apr 8, 2024 •

edited

Loading

translatenix Apr 9, 2024 •

edited

Loading

netvl commented Apr 9, 2024 •

edited

Loading