Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allowing multiple definitions of the same function #505

Merged
merged 29 commits into from
Apr 5, 2021
Merged

Conversation

mwhicks1
Copy link
Member

@mwhicks1 mwhicks1 commented Mar 20, 2021

Update(from @kyleheadley):

This PR makes FunctionConstraintVariables mergers more permissive, simplifying merger code and allowing multiple function bodies. To do this it also needs to be able to rewrite each function decl separately, preserving existing parameter names and
rewriting those that are declared without names.

Also:

  • fixed the typedef regression test and related that mentioned Constraint aren't generated correctly for functions declared by typedef #437. This may be complete now.
  • added compile errors to compile error exit, instead of exiting at the add variables phase
  • better rewrite of function pointers removes whitespace in lots of tests.
  • remove some code that no longer had uses
  • some special casing for main

@john-h-kastner some of the rewriting was trial and error, please let me know if there is anything to improve.

Original:

Allow multiple definitions of the same function as long as the function types are compatible.

Compatibility for definitions is more strict than for prototypes. In particular, we do not allow one definition to be

int foo() { return 0; }

while the other has definition is

int foo(int *x) { return *x; }

For declarations we would allow something like this, e.g.,

int foo();
int foo(int *x) { return *x; }

When we see compatible definitions, the two functions will end up sharing the same ConstraintVariables, so constraints imposed by each's callers and the functions' bodies will end up being combined to find a single, compatible solution.

We should treat main specially, so the strict compatibility rules do not apply to it. In particular, we can allow one definition of main to be int main() { ... } and another to be int main(int argc, char **argv). To force the best merge, we constrain main's type in the latter case to be checked, with the second argument an array whose length is determined by the first.

TODO: The permissive decl merging does not quite work yet; there's a rewriting bug still to fix.

@kyleheadley
Copy link
Member

I have two concerns:

  • The addVariable(FunctionDecl branch) that calls insertNewFVConstriant throws out the new FVC entirely after the merge. I think the constraints are kept separately, so linking them should still constrain the old FVC. When traversing the body, it appears that everything is done through the Decls, and FV parameter constraints are referred to by index, so the new body will constrain the old FVC. If the relevant transitive property holds, this should be fine, but there are a lot of "if"'s and complexity here that may be prone to error. At the very least we need tests that when two functions share a name, their bodies properly provide constraints to both function prototypes.
  • insertNewFVConstraint was pre-checking for which FVC would be the one we use. With two bodies allowed, the pre-check is no longer very important, so maybe the "Resolve conflicts" block should be incorporated into the mergeDeclarations code for clarity.

@mwhicks1
Copy link
Member Author

Looking at ProgramInfo::insertNewFVConstraint, it looks like the first definition of a function that is encountered has its FVC added to the map, and then that FVC object is used from then on.

Future definitions/declarations of the same function will keep using that same FVC (found in the map), but will have their info added/checked by merging. This is my read of what's happening starting at https://github.com/correctcomputation/checkedc-clang/pull/505/files#diff-1b91e5fa72a7c803901481a42899c827abf5f81283959c3881cd4870d76e018fR580 and on mergeDeclaration (which always modifies its target, due to the assert(false && ...) that prevent doing otherwise).

I don't know what you mean by "relevant transitive property" -- can you explain? My read of what's going on is that if two function definitions have the same name, and thus the same FVC in the map, then their bodies will be analyzed using the same FVC, and thus constraints generated from distinct bodies but on the same VarAtoms. When I look at the graph that's generated from -debug-solver that's what I see. And indeed, sometimes the solution for one of the functions is more conservative because they are both there. I've updated a test case to show that.

I'm not sure the ramifications of moving the "Resolve Conflicts" checks. The decision those checks make now depends on whether a definition has been previously seen or not. If it has, then the FVC already in the map is the receiver of the merging, whereas if it has not, then the new one is (and it is added to the map, replacing what was there before). At least some of this sort of logic would have to relocate into mergeDeclaration, and I'm not sure it will be entirely straightforward. But maybe it will be, and can be an improvement.

Last point:

The code at present doesn't entirely for the reason we feared in earlier discussion: If the two bodies have parameter names that differ, then the rewritten code might get the wrong parameter names. This is something that needs to be fixed. One obvious thing we could do is to force the parameter names for both definitions to be the same. Maybe that's the first, easy thing to do. Then a later thing would adjust rewriting etc. to allow the names to differ.

@kyleheadley
Copy link
Member

kyleheadley commented Mar 23, 2021

Looking at ProgramInfo::insertNewFVConstraint, it looks like the first definition of a function that is encountered has its FVC added to the map, and then that FVC object is used from then on.

The new FVC replaces the old in the map if is has a body or params and the one seen earlier didn't, but that may no longer be important to do, as long as we have params if they exist somewhere.

I don't know what you mean by "relevant transitive property" -- can you explain?

If you have two function bodies, they each "should" constrain their definition's prototype, and then those prototypes "should" be used to constrain the "final" choice for the function. You're using one of prototypes as the "final" (perfectly fine), and constraining both bodies against it (skips the step of constraining the other body against its prototype first). Some transitive property allows that skip, and as far I can tell it's safe because the resolution algorithm is a transitive closure anyway (right?).

Except that the resolution algorithm looks at types, and I don't thing we tried to merge the syntax. So we have the problem of different variable names.

I'm not sure the ramifications of moving the "Resolve Conflicts" checks.

I'll take a look, but the code may be simple enough now that it's not worth changing what works.

The code at present doesn't entirely for the reason we feared in earlier discussion ... wrong parameter names.

I'll see if I can check these at merge. And how hard it would be to rewrite based on existing code.

Copy link
Member Author

@mwhicks1 mwhicks1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the code I looked at was fine, aside from that in DeclRewriter.cpp which I don't understand (and didn't really understand before). It might make sense for someone else (e.g., @aaronjeline or @john-h-kastner) to look at those changes. The decl merge/management stuff looks great.

return true;

// For VarDecls, check if these are are not dummy and have a name.
if (auto *VD = dyn_cast<VarDecl>(D))
if (auto *VD = dyn_cast_or_null<VarDecl>(D))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it that D could now be nullptr here when it couldn't before? Or maybe it could, and you are fixing a bug? When could that happen?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started explicitly passing null when needing a param for a prototype that doesn't exist i.e int *foo(). This code is called from that section, so it needed null checks. Previously, it may have been worthy of an assert.

if (IsTypedef && !UnmaskTypedef) {
return gatherQualStrings() + TypedefString +
(EmitName && getName() != RETVAR ? (" " + getName()) : " ");
(EmitName && UseName != RETVAR ? (" " + UseName) : " ");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a risk here that the trick we are playing with RETVAR will get messed up if UseName comes in clashing with that name? Maybe that was also possible before.

Copy link
Collaborator

@john-h-kastner john-h-kastner Apr 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed change: at the top of this function

// The name field encodes if this variable is the return type for a function.   
// TODO: store this information in a separate field.
bool IsReturn = getName() == RETVAR;

and then all subsequent comparisons to RETVAR look at IsReturn instead. This way it still only depends on the Name field of this instance of PointerVariableConstraint, and we don't have inconsistent usage of UseName and getName() in the body of the function. This would also make it easy to move IsReturn into a field later without further changes to this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll commit this

@@ -864,7 +870,7 @@ std::string PointerVariableConstraint::mkString(Constraints &CS,
}

// TODO Remove comparison to RETVAR.
if (getName() == RETVAR && !ForItype)
if (UseName == RETVAR && !ForItype)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice the TODO above; is the concern that it conveys exacerbated by allowing the choice of name to come in from the outside?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed with John's suggestion above

Ret += Name;
Ret += UseName;
else if (EmitName)
Ret += "(*" + UseName + ")";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening here? Maybe add a comment, since I suspect the logic is subtle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And also, maybe it's clearer to write it

if (EmitName) {
  if (UnmaskTypedef)
    Ret += UseName;
  else
    Ret += "(*" + UseName + ")";
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll change it. I really only added to this section what I needed. I didn't follow up to see how the unmasking worked. I usually do, but this seemed a bit too off topic for now.

RewriteParams = true;
}
} else {
// No params and no param source: make explicit
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side comment: I'm not very familiar with this code, so not much to say ...

if (isa<ParmVarDecl>(Decl))
Type += Defn->getName();
if (isa_and_nonnull<ParmVarDecl>(Decl)) {
if (Decl->getName() == "")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decl->getName().empty() ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep

return (int *)1;
};

int *wildBody(int x, int *y, int **z);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you have the first parameter type int x match in this version and in the version in the other file, it's the arity that doesn't match. What happens if the types don't match? I think mergeDeclaration should reject that, right? Do we have a test for that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean the base types, like matching int with char? As far as I can tell, that was never checked, and so it will merge. I didn't have to change anything to merge int main() with void main(). It may be something we could do. But I think it fits in the category of clang errors that we are free to ignore. Pointer depth is the only thing we check in 3C code.

@john-h-kastner
Copy link
Collaborator

john-h-kastner commented Apr 2, 2021

I'm not convinced that how this PR deals with function typedefs is ok. If it doesn't expand the typedef, and it also doesn't generate correct constraints for the typedef, then it will almost definitely generate code that doesn't type check on some inputs.

@kyleheadley
Copy link
Member

@john-h-kastner I saw the function typedef test and assumed it was a rewriting issue that I could fix, but if we're awaiting the constraint generation before we rewrite, than I agree that wasn't done. I've restored the special handling, typedefs are expanded as they were before.

@mwhicks1
Copy link
Member Author

mwhicks1 commented Apr 3, 2021

The changes look good. Please go ahead and merge.

Copy link
Collaborator

@john-h-kastner john-h-kastner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants