Skip to content

Conversation

@sharwell
Copy link
Contributor

@sharwell sharwell commented May 29, 2019

This change removes the use of asynchronous methods during the MEF construction portion of the construction of IExperimentationService.

  • IExperimentationServiceFactory can be obtained from the MEF container without any threading dependencies
  • Once an IExperimentationService instance is obtained from the factory, its synchronous methods can be used without threading dependencies

fixes :https://dev.azure.com/devdiv/DevDiv/_workitems/edit/934446

@sharwell sharwell requested a review from a team as a code owner May 29, 2019 01:55
@sharwell sharwell force-pushed the free-threaded-experimentation branch from 95b92e4 to 2910be3 Compare May 29, 2019 02:01
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why it moved to this factory model.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separation is described in the initial PR description above.

Copy link
Member

@jasonmalinowski jasonmalinowski Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation doesn't make sense to me -- it says both the IExperimentationServiceFactory can be fetched without threading dependencies (which is incorrect since it has UI thread dependencies -- see comments below), and then says IExperimentationService can be used without threading dependencies. I'm assuming that the description is wrong and you were trying to split this between a UI-thread dependent and UI-thread independent portion and just have a typo there. Put another way: if both can be used without threading dependencies, why split them?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... it says both the IExperimentationServiceFactory can be fetched without threading dependencies (which is incorrect since it has UI thread dependencies - ...

It says IExperimentationServiceFactory can be fetched from the MEF catalog without threading dependencies. This statement is correct. The service cannot be used without threading dependencies, but that's normal for asynchronous services.

... then says IExperimentationService can be used without threading dependencies ...

This is also correct. Once you have an instance of IExperimentationService, that instance can be used without threading dependencies.

... if both can be used without threading dependencies, why split them?

Only IExperimentationService can be used without threading dependencies.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure whether we can to throw here unless we know that it won't crash VS. otherwise, I would catch and fire NFW and return false so that we don't crash VS due to experiment failure.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now a direct call instead of going through reflection. The latter had many more potential failure points. We can check if the implementation of this service ever throws; it may have its own internal error handling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This worries me: we'll be taking this for 16.2 GA so we may want to keep the catch in there to mitigate and then remove in 16.3.

@sharwell sharwell force-pushed the free-threaded-experimentation branch from 2910be3 to 821015c Compare May 30, 2019 02:51
@sharwell sharwell force-pushed the free-threaded-experimentation branch from 821015c to b800e9e Compare June 11, 2019 16:13
@sharwell sharwell force-pushed the free-threaded-experimentation branch 2 times, most recently from 1e59c1e to 3293e67 Compare June 21, 2019 15:03
@sharwell sharwell force-pushed the free-threaded-experimentation branch from 3293e67 to 507fc7e Compare June 21, 2019 20:40
@jinujoseph jinujoseph added the Tenet-Reliability Customer telemetry indicates that the product is failing in a crash/hang/dataloss manner. label Jun 21, 2019
if (_experimentationService is null)
{
try
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my preference will be just kicking off task here. and then in IsExperimentEnabled, check whether services are set, otherwise, wait the task under JTF.Run rather than make everything async.

or if really want to make things async, just make IsExperiementEnabled Async.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more accurate representation of the asynchronous nature of the services involved. IsExperimentEnabled has no intrinsic asynchronous portion or cross-thread dependencies, but obtaining an instance of the service does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing wrong with making IExperimentationService async -- the remote implementation we checked in is doing this today:

public bool IsExperimentEnabled(string experimentName)
{
var assetSource = AssetStorage.Default.AssetSource;
return assetSource?.IsExperimentEnabledAsync(experimentName, CancellationToken.None).Result ?? false;
}

We didn't make it async at the top level because of the uses in the constructor that you're fixing.

Copy link
Contributor Author

@sharwell sharwell Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.Result

This is not a safe operation. Its use outside of ContinueWith is a code smell, and a design flaw if the API author has control over the scenario.

Copy link
Member

@jasonmalinowski jasonmalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm confused by the overall approach here: although IExperimentationService is free-threaded, the fetching of it is still not, and anybody who calls GetExperimentationServiceAsync() and then blocks the UI thread waiting on it (even indirectly) will still deadlock. The real value of this change seems to be the deferring of the use of the experimentation service from the .editorconfig options provider, which is where the proximate cause of the deadlock is and could be just done directly without all the extra creation of two interfaces that I think only muddies the thread-safety state of this even further than it already is.

Thoughts?

In any case, I suspect this is a 16.2 regression -- the .editorconfig code using the experimentation service more aggressively now is causing us to see a spike, so we're going to have to backport this to that branch.

{
}
});
var featureFlags = (IVsFeatureFlags)await _serviceProvider.GetServiceAsync(typeof(SVsFeatureFlags)).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a deadlock risk since the IVsFeatureFlags will cause COM RPC and risk deadlock -- are we following a rule that it shouldn't be waited for on the UI thread?

Copy link
Contributor Author

@sharwell sharwell Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

➡️ This is updated to perform the casts on the main thread until we verify that both interface implementations are managed objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they are both managed objects, then it means the only jump to the UI thread behind GetServiceAsync was because the services weren't marked as free-threaded, and maybe this gives a different line of attack -- are the implementations actually free-threaded in practice but just need to be marked as such? If so, we can just ask the shell to fix that, this code then all becomes correct and the PR isn't needed at all.

Copy link
Member

@jasonmalinowski jasonmalinowski Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation doesn't make sense to me -- it says both the IExperimentationServiceFactory can be fetched without threading dependencies (which is incorrect since it has UI thread dependencies -- see comments below), and then says IExperimentationService can be used without threading dependencies. I'm assuming that the description is wrong and you were trying to split this between a UI-thread dependent and UI-thread independent portion and just have a typo there. Put another way: if both can be used without threading dependencies, why split them?

{
if (_enabled is null)
{
var experimentationService = await _experimentationServiceFactory.GetExperimentationServiceAsync(cancellationToken).ConfigureAwait(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this still a deadlock risk if somebody were to ask for options on the UI thread and we hadn't already initialized it? Or are we taking a tactic that it's still not correct but "good enough"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this still a deadlock risk if somebody were to ask for options on the UI thread and we hadn't already initialized it?

The deadlock occurred because MEF isn't JTF aware. We moved the part with thread dependencies outside of the MEF construction path where code is expected to be JTF aware.

Copy link
Contributor Author

@sharwell sharwell Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📝 There is still a deadlock risk if users block on the completion of this asynchronous method. This is the same deadlock risk that applies to all cases in Roslyn where code blocks on the completion of an asynchronous operation (currently no cases are proven correct, therefore the only safe assumption is all cases are potential deadlock scenarios).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but right now the only callers of this method are Roslyn code, so this would have to follow Roslyn threading rules instead of JTF rules. "Code is expected to be JTF aware" might be a guideline, but it's patently not the case here.

internal interface IExperimentationServiceFactory : IWorkspaceService
{
bool IsExperimentEnabled(string experimentName);
ValueTask<IExperimentationService> GetExperimentationServiceAsync(CancellationToken cancellationToken);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments please as to why this is now split up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The separation falls out from the default explanation for all MEF services:

  • The constructor for IExperimentationServiceFactory cannot call asynchronous code
  • The IsExperimentEnabled method needs to be callable from synchronous code paths
  • The IsExperimentEnabled method has a precondition that an asynchronous code path was called

None of these items seems especially worthwhile to put in code for this service, but if we have documentation somewhere for MEF practices we could mention it.

return KnownUIContexts.SolutionExistsAndFullyLoadedContext.IsActive;
}

private async Task<bool> IsPartialLoadModeExperimentEnabledAsync(CancellationToken cancellationToken)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we're doing the caching pattern like this in a bunch of places, can we make the pattern more common? I'm still not sure why we're not just making this IsExperimentEnabledAsync() that just does the GetExperimentationService + IsExperimentEnabled.

Copy link
Contributor Author

@sharwell sharwell Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure why we're not just making this IsExperimentEnabledAsync() that just does the GetExperimentationService + IsExperimentEnabled.

Some of the callers are not asynchronous.

Copy link
Contributor Author

@sharwell sharwell Jul 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make the pattern more common?

I haven't found a common pattern that doesn't come with a hit to performance, clarity, and/or correctness.

  • The JTF-aware AsyncLazy<T> conflicts with a Roslyn type of the same name, impairing clarity.
  • Roslyn's AsyncLazy<T> is not safe for use on code paths involving JTF operations. For this service it could negatively impact correctness.
  • It would be simpler to not cache the value, but I wasn't sure which methods were called too frequently to make that viable.

// don't enable partial load mode for ones that are not in experiment yet
// Don't enable partial load mode for ones that are not in experiment yet. If the experimentation
// service isn't available by the time this is called, the VS Service instance is used and it will
// check the value of the experiment at the time individual features are needed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble with the wording. This is an opportunistic fast-path. We try to check the experimentation service, and if that succeeds and the experiment is not enabled, we return a lighter instance of IWorkspaceStatusService. If we can't check the experiment here, we use VisualStudioWorkspaceStatusServiceFactory.Service, and each call to that service will check the experiment.

@sharwell
Copy link
Contributor Author

sharwell commented Jul 3, 2019

and anybody who calls GetExperimentationServiceAsync() and then blocks the UI thread waiting on it (even indirectly) will still deadlock

This is the expected behavior. In general, it is not safe to block on the completion of a Task which is not known to be already completed. A bunch of code in Roslyn tries to be "safe" with respect to this, but there is no compile-time validation of the prerequisites so we repeatedly run into cases where it breaks down. All of my efforts to prove correctness (e.g. dotnet/roslyn-analyzers#2400) in constrained scenarios encountered cases where assumptions did not hold and deadlocks are a possibility. In the absence of a proof of correctness, it is never safe to assume that blocking on a task completion will work.

Base automatically changed from master to main March 3, 2021 23:51
@jinujoseph jinujoseph requested a review from a team as a code owner March 3, 2021 23:51
@jinujoseph jinujoseph requested a review from a team March 3, 2021 23:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area-IDE Tenet-Reliability Customer telemetry indicates that the product is failing in a crash/hang/dataloss manner.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants