Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IncrementalValuesProvider.GroupBy #72667

Open
eiriktsarpalis opened this issue Mar 22, 2024 · 12 comments
Open

IncrementalValuesProvider.GroupBy #72667

eiriktsarpalis opened this issue Mar 22, 2024 · 12 comments
Labels
Area-Compilers Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request
Milestone

Comments

@eiriktsarpalis
Copy link
Member

Background and Motivation

I've been responding to this user report about an issue with the STJ source generator where adding JsonSerializable allocations in two separate declarations of the same partial context class will result in the generator emitting code twice for the same type, resulting in conflicts.

While this doesn't appear to be a common enough scenario, it made me think that there's no real way in which we could group elements of an IncrementalValuesProvider<T> by a given key without use of Collect() which typically forces re-evaluation of every element in the incremental collection. It seems that a GroupBy-like combinator which only triggers downstream evaluation for keys that have been updated might be necessary, although I couldn't find a good way to obtain this with the existing APIs.

Proposed API

namespace Microsoft.CodeAnalysis;

public static partial class IncrementalValueProviderExtensions
{
    public static IncrementalValuesProvider<(TKey Key, ImmutableArray<TSource> Values)> GroupBy<TSource, TKey>(this IncrementalValuesProvider<TSource> source, Func<TSource, TKey> keySelector, IEqualityComparer<TKey>? keyComparer = null);
}

Usage Examples

IncrementalGeneratorInitializationContext context = ...;
IncrementalValuesProvider<ModelForType> models = context.SyntaxProvider
    .ForAttributeWithMetadataName("FooAttribute", (node, _) => node is ClassDeclarationSyntax, (ctx, _) => ctx)
    .GroupBy(ctx => ctx.TargetSymbol, SymbolEqualityComparer.Default)
    .Select(GenerateModelForType);
@eiriktsarpalis eiriktsarpalis added Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request labels Mar 22, 2024
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Mar 22, 2024
@jaredpar jaredpar added this to the Backlog milestone Apr 1, 2024
@jamescarterbellMSFT
Copy link

Am I correct in understanding that calling .Collect() makes an IncrementalGenerator non-cache friendly? I think I may be running into this issue when I collect to do some deduping.

@sharwell
Copy link
Member

sharwell commented Apr 9, 2024

You can call Collect() and then split the items back apart using SelectMany(), and the only part that runs each time is the operation within the Collect(). Here's an example:
https://github.com/dotnet/roslyn-analyzers/blob/7455f63411369a962f769361d1a979a547756ada/src/Microsoft.CodeAnalysis.ResxSourceGenerator/Microsoft.CodeAnalysis.ResxSourceGenerator/AbstractResxGenerator.cs#L131-L175

GroupBy could be efficiently implemented by using Collect() followed by SelectMany, where each element of the SelectMany is a group.

@jamescarterbellMSFT
Copy link

So provider.Collect().Select() that outputs an immutable array provider would not be cacheable, but a provider.Collect().SelectMany() would produce something cacheable?

@sharwell
Copy link
Member

sharwell commented Apr 9, 2024

Incremental generators include many steps in the graph. Each step is cached depending on its own inputs and outputs. Most changes to a project result in at least one step somewhere running. As best I can tell, the proposed GroupBy operation would not inherently make anything cacheable that is not already possible to cache today using Collect and SelectMany. It's also possible that Collect().Select() has good caching characteristics. The link I gave above includes a step with that form.

@CyrusNajmabadi
Copy link
Member

.Collect is fine to call, you just need to provide a comparer to properly compare the arrays it has inside of it. Note: i consider it bad that this is needed, but c'est la vie for now. But as long as you give an appropriate comparer, things will be incrementally safe and sound.

@CyrusNajmabadi
Copy link
Member

@chsienki I think we really need to invest in:

  1. a comparer for comparing ImmutableArrays.
  2. having anything we produce that returns a IncrementalValueProvider<ImmutableArray<TSource>> use teh comparer.

There's really no reason at all to have no comparer here, and it means a collect call will break incrementality unless a user knows to go fix this up.

@sharwell
Copy link
Member

sharwell commented Apr 9, 2024

@CyrusNajmabadi I'm not sure that aspect is a problem today. I've seen many source generators opt for EquatableArray<T> instead of ImmutableArray<T> in their data models. It's not clear how the current behavior of Collect() would have any negative impact on this.

Providing a default comparer for ImmutableArray<T> would not address cases where EquatableArray<T> is needed inside of record types, and if you already use EquatableArray<T> in the model it's not clear why ImmutableArray<T> would be needed separately.

@CyrusNajmabadi
Copy link
Member

@sharwell Collect doesn't use that though, ti's just:

        public static IncrementalValueProvider<ImmutableArray<TSource>> Collect<TSource>(this IncrementalValuesProvider<TSource> source) => new IncrementalValueProvider<ImmutableArray<TSource>>(new BatchNode<TSource>(source.Node), source.CatchAnalyzerExceptions);

So you get an IVP (not an IVsP) point at an ImmutableArray without a proper comparer for the immutable array itself. .Collect doesn't let the user provide the final array the values are collected into. It chooses that it is IA<>, and you need to know to provide an appropriate comparer if you want proper incrementality at that pipeline step.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Apr 10, 2024

You can call Collect() and then split the items back apart using SelectMany(), and the only part that runs each time is the operation within the Collect(). Here's an example:

So if I'm understanding the semantics of SelectMany correctly, it is forking one IVP into an IVsP of independently cached values? I think that could work for my use case.

@sharwell
Copy link
Member

.Collect doesn't let the user provide the final array the values are collected into

That should be fine. If none of the inputs to the Collect operation changed, Collect should be reusing the old cached ImmutableArray<> output instead of creating a new one. The only time Collect produces a new ImmutableArray<> to compare with the old cached value is when one or more elements in that array no longer compare element-wise equal with the old values.

and you need to know to provide an appropriate comparer if you want proper incrementality at that pipeline step

There is no public API to alter the comparer for Collect, but that's fine because the default comparer is correct for the semantics of this operation.

@CyrusNajmabadi
Copy link
Member

There is. You call . WithComparer afterwards

@CyrusNajmabadi
Copy link
Member

The only time Collect produces a new ImmutableArray<> to compare with the old cached value is when one or more elements in that array no longer compare element-wise equal with the old values.

Right. That's the exact problem. You get a different input, but you end up creating conceptually the same output (which is very very possible). As far as the driver is concerned, because it's a new IA, it is different, and it will not stop the incremental pipeline.

@jcouv jcouv removed the untriaged Issues and PRs which have not yet been triaged by a lead label Apr 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request
Projects
None yet
Development

No branches or pull requests

6 participants