Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SONARPY-1798 Try to resolve built-in types for names which have no symbol #1774

Conversation

maksim-grebeniuk-sonarsource
Copy link
Contributor

No description provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a good example of what I'm trying to promote and why.

I believe that if this PR had been focused on fixing the following problem:

"If a name does not have any binding usage, let's search in the builtins module to allocate a type for it"

It could have been much smaller and focused, easier to review and to make progress (carving it out of the ArgumentNumberCheck was a great change, IMO - it would have been even better to do it from the start).

I also have a question regarding names whose type is a builtin: what does it mean for them in terms of symbol? I believe for now, they would have a type but no symbol (because no assignment). Is this intentional? I don't have a strong opinion on the behavior at this point, but I think we should probably test/document this. It may have impacts on rules that aim to detect undefined symbols.

Comment on lines 175 to 178
public boolean isOrExtends(String s) {
// TODO: implement and maybe it should accept PythonType instead of string
return false;
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this as part of this PR? It's probably better to introduce it when we need it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excluded from PR

@@ -157,8 +157,7 @@ void builtin_parent() {
);
ClassType classB = classTypes.get(1);
assertThat(classB.superClasses()).hasSize(2);
// FIXME: ensure builtin parent is resolved
assertThat(classB.hasUnresolvedHierarchy()).isTrue();
assertThat(classB.hasUnresolvedHierarchy()).isFalse();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add some assertions?
Now that the hierarchy is resolved, it would make sense to assert on parents/members to ensure information about BaseException is correctly retrieved (e.g members, etc...).

It may also make sense to have an additional test with a builtin parent that actually has an unresolved type hierarchy to ensure the information is not lost then.

@@ -77,4 +79,13 @@ public List<Tree> computeChildren() {
public Kind getKind() {
return Kind.SUBSCRIPTION;
}

@Override
public PythonType typeV2() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to have a test for this (e.g is ObjectTypeTest).

I also think the current implementation is not correct. If I have:

my_list = ["hello"]
foo(my_list[0])

The type of the argument (= the subscription expression) is str and not list. I think it would therefore make sense for the type of a subscription expression to be evaluated as the type of the attribute, if its own type is a known container type. I think this could deserve a dedicated ticket, with some examples of fixed FNs though.

Ultimately if I remove this method, everything is still green. So maybe it's best to leave this out of this PR for now as it's not really needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excluded from PR

Comment on lines 283 to 284
.map(TypeInferenceV2::getUsagesType)
.or(() -> projectLevelTypeTable.getModule().resolveMember(name.name()))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a subtle yet impactful change from the previous implementation.

In the previous implementation, any variable that either has multiple assignments or a global declaration usage would not be assigned a type.

Now we fall back to builtins instead, which means that in the following (admittedly convoluted) code:

global int
int = 42
int = "hello"
int

The final int would be considered to be builtins.int despite the multiple reassignments.

I think the fallback to builtins should only happen if a symbol has no binding usages.

I'm a bit unhappy about the focus of our discussion regarding the imperative loop vs the optional chain. I think the main focus should be on functional behavior rather than technical implementation and I believe this case was an example where we can improve.

Taking the previous implementation as a reference, I believe that a simple

if (types.size().isEmpty() {
  // the underlying assumption is that each binding usage will have a type and add to the list
  // if it's not the case, a simple boolean "hasNoBindingUsage" computed during the iteration would do the trick
  setTypeToName(name, projectLevelTypeTable.getModule().resolveMember(name.name()));
}

at the end of the method would have achieved what you're trying to do, without requiring an additional iteration over usages and without risking an undesired functional change.

I also think we are missing tests for this (like variations of the convoluted case I showed), which is hidden due to the optional chain.

Again, I have nothing against optional chains, but they do have limitations that need to be acknowledged and I don't think we should aim to refactor everything to these just for the sake of it.

@@ -148,8 +148,16 @@ private void addParameter(org.sonar.plugins.python.api.tree.Parameter parameter,
Token starToken = parameter.starToken();
if (parameterName != null) {
ParameterType parameterType = getParameterType(parameter);
this.parameters.add(new ParameterV2(parameterName.name(), parameterType.pythonType(), parameter.defaultValue() != null,
parameterState.keywordOnly, parameterState.positionalOnly, parameterType.isKeywordVariadic(), parameterType.isPositionalVariadic(), locationInFile(parameter, fileId)));
var parameterV2 = new ParameterV2(parameterName.name(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the changes regarding FunctionType building for this PR?
I feel we can focus on the builtins fallback which would then be clearly separated from this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

excluded

public void createBuiltinModule(ModuleType parent) {
var name = "builtins";
createModuleFromSymbols(name, parent, TypeShed.builtinSymbols().values());
public ModuleType createBuiltinModule() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure yet that I see the value of removing the builtins prefix.

I believe we could still have a builtins dedicated module with a helper method that would help retrieve it so that we don't have to duplicate the "builtins" literal everywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is mainly needed to convert symbols (old ones) to types, the builtin types don't have builtins in their FQN

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see! I think it was a mistake on our end in the original implementation, though. Since other tools (including the Python interpreter itself) will map builtin types to the builtins module.

Ideally, I think I'd like to follow that convention as well, however I think for now the least disruptive approach is okay (i.e, if this PR still achieves its goals without removing builtins, I'd keep the prefix - and if it breaks a lot of other stuff, we can consider adding it back later instead).

Copy link

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have some questions before we proceed with merging this PR.

@Override
public void visitName(Name name) {
SymbolV2 symbolV2 = name.symbolV2();
if (symbolV2 == null) {
projectLevelTypeTable.getModule().resolveMember(name.name())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand this change.

Before, we were falling back to built-in symbols that did not have binding usages.
Now, we're doing it if the symbol is null.

Does that mean that the previous implementation, as well as my suggestion, was wrong? I'd expect this to be explicitly addressed in the comment then. Also, why did the original implementation solve the FNs in nonCallableCalled.py if it was incorrect?

I'm also not sure that we really want the symbol of used built-in names to be null in the first place, which could deserve at least comment / discussion / ticket as well, depending one how we want to proceed with this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my implementation, there was no difference between no symbol and symbol doesn't have a binding usage, that's why it has been working.
About symbol for builtin types - agree but it is out of this ticket scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: add comment with ticket

@@ -75,8 +75,6 @@ public TypeInferenceV2(ProjectLevelTypeTable projectLevelTypeTable) {
this.projectLevelTypeTable = projectLevelTypeTable;
}

private static final String BUILTINS = "builtins";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You didn't answer my question about builtins removal.

Given we don't map FQNs to PythonType yet, do we really need to remove it? Does it break something?

I know it looks like I'm pushing to keep it, but I would really like to understand why this PR needed to remove it, given that we haven't made any decisions on how we represent FQNs yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cause when you resolve types e.g. for serialized symbols or project level symbols the FQN of type str is just str, it doesn't operate with builtins, and I want to have common behavior for it

@guillaume-dequenne-sonarsource guillaume-dequenne-sonarsource changed the title SONARPY-1798 Change level of buildting types to root moduletype SONARPY-1798 Try to resolve built-in types for names which have no binding usages Apr 26, 2024
@guillaume-dequenne-sonarsource guillaume-dequenne-sonarsource changed the title SONARPY-1798 Try to resolve built-in types for names which have no binding usages SONARPY-1798 Try to resolve built-in types for names which have no symbol Apr 26, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@maksim-grebeniuk-sonarsource maksim-grebeniuk-sonarsource merged commit cb39e36 into rnd/type-inference-engine-specification Apr 26, 2024
0 of 8 checks passed
@maksim-grebeniuk-sonarsource maksim-grebeniuk-sonarsource deleted the mg/SONARPY-1798 branch April 26, 2024 11:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants