Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Combine overload for IncrementalValuesProvider<T> #58127

Open
Sergio0694 opened this issue Dec 6, 2021 · 5 comments
Open

Proposal: Combine overload for IncrementalValuesProvider<T> #58127

Sergio0694 opened this issue Dec 6, 2021 · 5 comments
Assignees
Labels
Area-Compilers Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request
Milestone

Comments

@Sergio0694
Copy link
Contributor

Sergio0694 commented Dec 6, 2021

Background and Motivation

I find myself often ending up in a situation where I'd like to combine two IncrementalValuesProvider<T> instances, essentially "zipping" them. There doesn't seem to be an API for doing this though, as the existing Combine methods only accept one of the left/right values being an IncrementalValueProvider<T> instance. Consider the following simplified scenario:

IncrementalValuesProvider<INamedTypeSymbol> symbols = context.SyntaxProvider.CreateSyntaxProvider(...);

IncrementalValuesProvider<string> left = symbols.Select(static (item, token) => GatherInfoA(item));
IncrementalValuesProvider<string> right = symbols.Select(static (item, token) => GatherInfoB(item));

context.RegisterSourceOutput(left, static (context, item) => { });

// This doesn't compile: no matching overload. I'd like to zip left and right together
// here as I need to access matching items from both when generating code. I don't want
// to have to recompute the information in left again in this right pipeline subtree.
context.RegisterSourceOutput(right.Combine(left), static (context, item) => { });

The rationale here is that:

  • The intermediate information in left is used on its own in a first source production node
  • That same information is also needed in the source production node taking right
  • I would like not to have to call GatherInfoA() again for each item in right, as I already have that info
  • Additionally calling GatherInfo_() might be expensive, so I really just want to reuse the result I have

Proposed API

namespace Microsoft.CodeAnalysis
{
    public static class IncrementalValueProviderExtensions
    {
+        public static IncrementalValuesProvider<(TLeft Left, TRight Right)> Combine<TLeft, TRight>(this IncrementalValuesProvider<TLeft> provider1, IncrementalValuesProvider<TRight> provider2);
    }
}

Alternative solutions

Consider this scenario:

- SOURCE
   |
   | - Data A ---> Output
   | ---|--- Data B ---> Output
        |      |
        |------|--- Data C ---> Output

One possible workaround doable today is to do something like this:

dataA
.Collect()
.Combine(dataB.Collect())
.SelectMany(static (item, token) =>
    item.Left.Zip(item.Right, static (Left, Right) => (Left, Right)));

Which does yield back an IncrementalValuesProvider<(A, B)> sequence, but this doesn't seem efficient at all. The fact I'm doing Collect() on both means that every time a single item in the sequences is removed/added/updated, the entire collection will be reevaluated, instead of just that one item. What I'd like instead is to just have individual items that are changed to be queried for reevaluation, with the guarantee that if both source sequences have no incompatible filters on them (that is, either they have no Where calls, or if they do, they have one that applies the same filtering on both sequences), then I'll just get asked to recompute a single pair of items in this resulting values provider combining the two.

Notes

In order for this to work, Roslyn needs to guarantee that items in the same position across different IncrementalValuesProvider<T> instance will match and refer to the same source item. As in, this will only work if Roslyn can guarantee that transformations on the values providers are "stable": the two input sources will always have the same number of items when processed (if the user hasn't messed up filtering) and that items will not be reordered in just one of the two providers. That is, if source item A is used to produce B and C in the transformed producers left and right, then calling Combine on them should guarantee that each resulting pair will correctly associate items B and C for each original source item A used to produce them.

cc. @sharwell and @jkoritzinsky who will involved in this conversation on Discord 🙂

@Sergio0694 Sergio0694 added Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request labels Dec 6, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Language Design untriaged Issues and PRs which have not yet been triaged by a lead labels Dec 6, 2021
@sharwell
Copy link
Member

sharwell commented Dec 6, 2021

I don't like the name Combine for this. It suggests the output will be combinations of the inputs, but it is not.

@Sergio0694
Copy link
Contributor Author

Would Zip work? I mean that's the same name LINQ uses for this operation, and other extensions for IncrementalValueProvider<TValue> and IncrementalValuesProviders<TValues> are mirroring the names of LINQ APIs as well.

@sharwell
Copy link
Member

sharwell commented Dec 6, 2021

Potentially yes. Note that Razor identified a use for true Combine semantics, by having one input be zero-or-one with zero meaning "disable the source generator" and one item meaning "enable the source generator". This eliminates a large amount of boolean checks, but perhaps isn't easy to understand that IncrementalValuesProvider<T> is being used as a IncrementalValueProvider<Optional<T>>.

@jaredpar jaredpar removed the untriaged Issues and PRs which have not yet been triaged by a lead label Dec 14, 2021
@jaredpar jaredpar added this to the 17.2 milestone Dec 14, 2021
@chsienki
Copy link
Contributor

chsienki commented Dec 15, 2021

We deliberately left out the Combine which performs a cross join when designing the APIs. It is possible to manually perform one with the APIs today however:

IncrementalValuesProvider<INamedTypeSymbol> symbols = context.SyntaxProvider.CreateSyntaxProvider(...);

IncrementalValuesProvider<string> left = symbols.Select(static (item, token) => GatherInfoA(item));
IncrementalValuesProvider<string> right = symbols.Select(static (item, token) => GatherInfoB(item));

var crossJoin = left.Combine(right.Collect()).SelectMany(static (pair, _) => pair.right.Select(static rightItem => (pair.left, rightItem));

This isn't particularly inefficient. The gather's are called only as needed, and the collect() will be considered cached if all items in it are. When the right hand side changes, the select many will always be called, but the resulting tuples will essentially be 'cached out' in that most of them produced won't be modified so no downstream nodes will be executed for them. Given that the SelectMany is cheap (comparatively) to run, you shouldn't see any perf downsides to doing it this way.

@Sergio0694
Copy link
Contributor Author

Wait, I'm confused. While the name would be the same, this doesn't seem to be the semantics I'm proposing in this issue. I don't want a cross-join (MxN items), I want a zip of two sequences of equal length. I do agree with @sharwell that maybe a different name (eg. Zip) would be better, if Combine would instead suggest a cross-join behavior like the one you mentioned 🙂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Concept-API This issue involves adding, removing, clarification, or modification of an API. Feature Request
Projects
Status: No status
Development

No branches or pull requests

8 participants
@jaredpar @sharwell @CyrusNajmabadi @Sergio0694 @jinujoseph @jcouv @chsienki and others