Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed bug where extract method throws on unreachable code or local function parameter #25147

Merged
merged 4 commits into from
Mar 2, 2018

Conversation

heejaechang
Copy link
Contributor

@heejaechang heejaechang commented Mar 1, 2018

Customer scenario

User invokes extract method on code snippet that includes unreachable code or local function parameter, and VS throws an exception that ends up either show up in info bar or message box.

Bugs this fixes

#23352
#23175

Workarounds, if any

No workaround

Risk

this address very specific condition so no risk or change on extract method behavior

Performance impact

N/A

Is this a regression from a previous update?

No

Root cause analysis

Extract method has a matrix that lists all possible combination it supports and throw exception if it encounters code that it doesn't know how to handle. we found a case we need to add support (unreachable code/local function parameter)

How was the bug found?

Customer, Dogfooding

@heejaechang heejaechang requested a review from a team as a code owner March 1, 2018 01:20
@heejaechang
Copy link
Contributor Author

@dotnet/roslyn-ide @jinujoseph can you take a look?

@heejaechang heejaechang changed the title Fixed bug where extract method throws on unreachable code Fixed bug where extract method throws on unreachable code or local function parameter Mar 1, 2018
@heejaechang
Copy link
Contributor Author

retest windows_debug_unit64_prtest please

@heejaechang
Copy link
Contributor Author

retest windows_release_vs-integration_prtest please

@CyrusNajmabadi
Copy link
Member

Extract method has a matrix that lists all possible combination it supports and throw exception if it encounters code that it doesn't know how to handle.

This seems bad. Why are you throwing in this case instead of NFWing? If we don't understand something, we should either fail gracefully, or we should do the best we can even if its' not correct.

@heejaechang
Copy link
Contributor Author

heejaechang commented Mar 1, 2018

it logs NFW already. this unexpected condition is same as other code action/refactoring throwing invalid cast exception since it expected type but actual node was namespace. or throwing null exception because it expects type to always exist but not.

not sure what you mean by handling it gracefully. none of roslyn code is written not to throw on unexpected condition.

@heejaechang
Copy link
Contributor Author

if we believe this, we should go this route
"this seems bad. Why are you throwing in this case instead of NFWing? If we don't understand something, we should either fail gracefully, or we should do the best we can even if its' not correct."

like we are now moving away from fatal watson to non fatal watson,
we should start to move away from we doing Contract.ThrowNull,ThrowFalse,ThrowTrue, Fail, and start handle those gracefully and use NFW instead.

...

extract method existed for long time, these new unexpected condition happens since new language features are added such as local function, and that breaks existing invariant and it throws. nothing new for how roslyn has been written. but sure, we can change pattern on how roslyn is written from throw to handle unexpected conditions.

but changing only here won't change anything.

@CyrusNajmabadi
Copy link
Member

extract method existed for long time, these new unexpected condition happens since new language features are added such as local function, and that breaks existing invariant

I guess my point is that existing code should be written with teh expectation that future change to the language may break its assumptions. So, for example, don't blindly cast to something. Check if that thing is actually the right kind. And fallback, or bail out (possibly emitting an NFW) if that happens.

Pretty much all the rest of our code-actions have not had this issue as the language has moved forward. I'm asking that ExtractMethod be written in a similar fashion.

@CyrusNajmabadi
Copy link
Member

is same as other code action/refactoring throwing invalid cast exception since it expected type but actual node was namespace

What features do this? That's a bug in the feature. If you get a SyntaxNode you should check it for the forms you can work with. And if you don't have one of those forms you should bail out gracefully. That's been the way we've tried to write most of our features from the beginning.

but changing only here won't change anything.

I disagree. It means that as hte language continues to move forward this feature won't have problems with that. And that's a good thing. That's strictly better than being in a state where the language can move forward and this feature can break.

@heejaechang
Copy link
Contributor Author

"What features do this? That's a bug in the feature. If you get a SyntaxNode you should check it for the forms you can work with. And if you don't have one of those forms you should bail out gracefully. That's been the way we've tried to write most of our features from the beginning."

no, we prefer hard cast rather than using "is" when we believe it should only get certain type of node/symbol. that sure is correct until we add new node type, symbol kind and etc. we do not write our code future proof.

"I disagree. It means that as hte language continues to move forward this feature won't have problems with that. And that's a good thing. That's strictly better than being in a state where the language can move forward and this feature can break."

well, that's whole pattern change. I can open a bug that states we should move to that pattern same as we opened a bug saying we should move to ValueTask, or move to NFW and etc. but that's not this PR is for.

@CyrusNajmabadi
Copy link
Member

no, we prefer hard cast rather than using "is" when we believe it should only get certain type of node/symbol. that sure is correct until we add new node type, symbol kind and etc. we do not write our code future proof.

I disagree. While we certainly have other code which is makes similar mistakes, we write tons of LS code to be future proof and to not blindly cast. Can you point out hte line of code that is actually crashing in this case? I'd like to see what it looks like and how we might make it resilient both to the language changing, and to us not considering certain cases.


return Contract.FailWithReturn<int>("Shouldn't reach here");
return leftLocation.SourceSpan.Start - rightLocation.SourceSpan.Start;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here, we crash because it doesn't expect local function which added after extract method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if I write it future proof, I can't assume that common tree must exist here as well, since who knows what kind of new method symbol we would add in future?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. i would expect this code to not have a Contract. I would expect this to either be resilient to a new sort of symbol being passed in. Or the code should be able to indicate that it doesn't know what is going on, and the refactoring should be suppressed.

Assuming any of our symbol kinds is "complete" is not something i think is acceptable for the LS code to do. It can NFW if we want to send us data to help fill in these gaps without user reports. But it should not crash just because it expected a certain set of methods, and then new ones were added.

The only way this sort of code is acceptable is if the author has put a system in place to crash immediately the moment a new enum member is added. i.e. that way when someone on the compilre team adds MethodKind.LocalFunction, then the LS refuses to even load because you have a static constructor that fails. That way the person adding the enum member knows to go fix up this code.

Right now you're basically saying: "i've only written code to handle a fixed set of cases, and i will crash in the future if any new cases happen. But there's no way for you to know this will happen and customers will be hit by this in the wild." That's not responsible. Either we need to be resilient to our enums/symbols expanding over time. or we must fail immediately so that hte person adding the new language feature knows they have to fix/test this code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since who knows what kind of new method symbol we would add in future?

Correct. We need to be resilient here. For example: look at AbstractSymbolDescriptionBuilder.AddDescriptionPartAsync. It is needed to create a description for an arbitrary symbol. It hardcodes is knowledge about which symbols are supported, but it has a standard 'fall through' case for any symbols it doesn't expect. It doesn't crash saying "i didn't know how to create quick info for this new language feature type."

Copy link
Contributor Author

@heejaechang heejaechang Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like I said, fine opening issue for what you are saying, not going to make that change in this PR. and I don't believe what you are saying is completely true since if that is true, when new language feature is added from compiler, IDE should never crash which is not true at all.

"The only way this sort of code is acceptable is if the author has put a system in place to crash immediately" not sure what you meant immediately. you mean when there is no one invoking the feature, it need to somehow figure that out immediately? where is such code in Roslyn?

Copy link
Contributor Author

@heejaechang heejaechang Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really didn't want to. but you put the symbol display as example, so I dug in to see whether it is really future proof and there is no Contract.

it does have some such as this
http://source.roslyn.io/#Microsoft.CodeAnalysis.Features/LanguageServices/SymbolDisplayService/AbstractSymbolDisplayService.AbstractSymbolDescriptionBuilder.cs,303

if new group kind is added, it will throw. so, now will it immediately throw when group kind is added and vs run first time? no it won't until quick info is shown. will it throw immediately when quick info is shown? no it won't until quickinfo that has new group kind is needed. if new group kind is needed, will it immediately throw? no it won't until multiple line is needed in description.

..

this kind of code in Roslyn is common. and this actually helped us a lot when something changed. sure we can move to new pattern where we try to not use this kind of pattern as much like we are now moving away from fatal watson to NFW.

but this PR I dont believe is to address that issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

by the way, this doesn't crash VS. it shows info bar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not a good case. SymbolDescriptionGroups is tightly coupled with AbstractSymbolDescriptionBuilder. You can of course have tightly coupled code have knowledge of itself. The issue is when you cross layers, or use subsystems that are by design expected to change over time. The compiler layer is one of those systems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, don't want to argue more. I see this differently so. like I said I am fine to open an issue. I just am not going to change how extract method handle unexpected condition in this PR.

if this crash VS, I will do something not to crash VS. but it doesn't.

@@ -114,6 +114,8 @@ private static void BuildMatrix()
s_matrix.Add(new Key(dataFlowIn: false, dataFlowOut: false, alwaysAssigned: true, variableDeclared: false, readInside: false, writtenInside: true, readOutside: false, writtenOutside: true), VariableStyle.SplitIn);
s_matrix.Add(new Key(dataFlowIn: false, dataFlowOut: false, alwaysAssigned: true, variableDeclared: false, readInside: false, writtenInside: true, readOutside: true, writtenOutside: false), VariableStyle.SplitIn);
s_matrix.Add(new Key(dataFlowIn: false, dataFlowOut: false, alwaysAssigned: true, variableDeclared: false, readInside: false, writtenInside: true, readOutside: true, writtenOutside: true), VariableStyle.SplitIn);
s_matrix.Add(new Key(dataFlowIn: false, dataFlowOut: false, alwaysAssigned: true, variableDeclared: false, readInside: true, writtenInside: false, readOutside: false, writtenOutside: true), VariableStyle.InputOnly);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dataflowin, dataflowout being false, and readinside true and variabledeclared false is not something expected for a symbol. if it is not declared inside and read, data must flow in. but not for unreachable code case.

Copy link
Contributor Author

@heejaechang heejaechang Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here we have 8 variables, which give us 256 possibilities. among those, these are expected combination. we don't expect combination that not belong to ones listed here. if we get one, rather than silently ignore those case or create random code or show some dialog box to users. this one throw and show info bar to users.

I agree we should not crash VS. and we don't. if your point is rather than info bar we should show something else. not sure what that something else is.

@heejaechang
Copy link
Contributor Author

heejaechang commented Mar 2, 2018

again, I am fine I open an issue saying we need to write future proof code but I am not going to make that change in this PR.

"we write tons of LS code to be future proof and to not blindly cast. " this is certainly not true. since I have many code review you saying, crash or throw if it can't happen now even though it can happen in future since tree shape changes or other thing.

we have many code where at the end of switch statement, we throw saying this kind is not expected. where we can always add new kind.

@@ -126,7 +126,7 @@ public int CompareTo(ParameterVariableSymbol other)
return 0;
}

var compare = CompareTo((IMethodSymbol)_parameterSymbol.ContainingSymbol, (IMethodSymbol)other._parameterSymbol.ContainingSymbol);
var compare = CompareMethodParameters((IMethodSymbol)_parameterSymbol.ContainingSymbol, (IMethodSymbol)other._parameterSymbol.ContainingSymbol);
Copy link
Contributor Author

@heejaechang heejaechang Mar 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, I can't assume parameter symbol always belong to method symbol if I want future proof. we have this kind assumption all over the places.


// dataflow in and out can be false for symbols in unreachable code
// boolean indicates
// dataFlowIn: false, dataFlowOut: false, alwaysAssigned: true, variableDeclared: false, readInside: true, writtenInside: false, readOutside: true, writtenOutside: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke Can you confirm these are the expected values from data flow analysis here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -3760,6 +3760,99 @@ private static void NewMethod()
await TestExtractMethodAsync(code, expected);
}

// dataflow in and out can be false for symbols in unreachable code
// boolean indicates
// dataFlowIn: false, dataFlowOut: false, alwaysAssigned: true, variableDeclared: false, readInside: true, writtenInside: false, readOutside: false, writtenOutside: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke Can you confirm these are the expected values from data flow analysis here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke these 2 cases are not for local function but for unreachable code.

..

flow analysis not working correctly on certain language feature can result in 2 different outcome. one being infobar like this. or extract method generating broken code. not sure what known issue mentioned below on local function will result in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@agocke basically, it is reversed representation. data flow analysis API gives set of symbols on each cases, data flow in, out, read in/out, write in/out and etc.

I have variable for each case for a symbol that indicates the symbol exist in each set or not. if that variable is confusing you, think it as dataFlowAnalysisResult.DataFlowIn.Contains(symbol) and etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@agocke
Copy link
Member

agocke commented Mar 2, 2018

Note that there are still known bugs around region analysis and local functions, e.g. #14214

I do have a fix for these issues, but have not had time to implement it.

@agocke
Copy link
Member

agocke commented Mar 2, 2018

@sharwell I'm having trouble understanding the notation used in these tests. DataFlowsOut and DataFlowsIn don't return booleans, they return sets of symbols. Is the boolean to indicate that there is a single symbol in each tests and it is either present or absent in the set for true and false, respectively?

@heejaechang
Copy link
Contributor Author

heejaechang commented Mar 2, 2018

@agocke
one of thing extract method does is that, for all symbols shown in the selection, it determines how to put those in extract method and its signature. pure argument, out, ref, copy the decl inside of the method, use it as return value and etc.

that decision is per symbol, so it gets data flow analysis result which contains state -> set of symbols, and revert it to symbol -> set of state. and use that to find out what to do with each symbol from a predefined matrix.

...

for tests, there are 2 kinds of tests. 1 is just sanity test I have for all entries in the matrix. test could have multiple symbols in it, but one of symbol cover one of entry in the matrix. it basically cover that the symbol behave as described in the matrix.

the other tests are feature test. it cover language features, extract method behavior around comments, whitespace. formatting, selection validation and etc.

Copy link
Member

@sharwell sharwell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 conditioned on @agocke agreeing that the data flow implementation isn't just broken for these cases.

@sharwell sharwell requested a review from agocke March 2, 2018 20:10
Copy link
Member

@agocke agocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where data flow results were listed, the values look like what I expect.

Specifically for data flows in, this is specified as:

A variable assigned outside is used inside if an analysis that leaves the
variable unassigned on entry to the region would cause the generation of "unassigned" errors
within the region.

All variables are definitely assigned in unreachable code, so if a variable is unassigned before entering a region, but the usages of that variable in the region are unreachable, I would expect no definite assignment errors to be produced, thus the variable does not flow into the region.

DataFlowsOut is the same, but for assignments. I would also expect no variables to flow out of the region in these examples.


// dataflow in and out can be false for symbols in unreachable code
// boolean indicates
// dataFlowIn: false, dataFlowOut: false, alwaysAssigned: true, variableDeclared: false, readInside: true, writtenInside: false, readOutside: true, writtenOutside: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -3760,6 +3760,99 @@ private static void NewMethod()
await TestExtractMethodAsync(code, expected);
}

// dataflow in and out can be false for symbols in unreachable code
// boolean indicates
// dataFlowIn: false, dataFlowOut: false, alwaysAssigned: true, variableDeclared: false, readInside: true, writtenInside: false, readOutside: false, writtenOutside: true
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jinujoseph
Copy link
Contributor

@Pilchie for ask mode approval for 15.7.p2

@heejaechang
Copy link
Contributor Author

thank you! @jinujoseph @Pilchie can I get approval for 15.7?

@Pilchie
Copy link
Member

Pilchie commented Mar 2, 2018

Approved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants