New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Asymmetry with chpl_nodeID
with forall
vs. coforall
#16716
Comments
Tagging @vasslitvinov in hopes that he has some insights due to the apparent role of task-private shadow variables; @gbtitus and @ronawho because they were part of the conversation that turned this up and are similarly stymied (so far); @e-kayrakli because of the use in strings and bytes code (which I believe was inherited, but that most of the original devs are no longer here); and @mppf because of the cycles he's spent thinking about cross-locale move/copy initialization which I think relates to the use of |
This sounds like a great approach to me. AFAIK it would be sufficient for all uses of |
As part of dealing with this we should also look to see whether there are other variables defined by the C runtime and used by module (or worse, user) code, that have node-specific values. Whatever turns out to be the source of the |
Is this to say that the user should be using Given that we do not advertise chpl_nodeID, I don't think that we even need to do anything about it. For example, what would be our response if a user got in trouble while using In any case I can go either way about removing chpl_nodeID. Or, I can look at fixing the behavior of the forall case, if desired. |
I'd argue that the |
An alternative to a compiler primitive with all its associated machinery would be a well-defined functional interface in the runtime, for example |
Brad, your intuition is right -- this is a bug. static bool isOuterVar(Symbol* sym, FnSymbol* fn) { // 'fn' is the coforall task function
Symbol* symParent = sym->defPoint->parentSymbol;
Symbol* parent = fn->defPoint->parentSymbol; // the function containing the coforall construct
while (true) {
if (!isFnSymbol(parent) && !isModuleSymbol(parent))
return false;
if (symParent == parent)
return true;
if (!parent->defPoint)
// Only happens when parent==rootModule [...]
return false;
// continue to the enclosing scope
parent = parent->defPoint->parentSymbol;
}
} Here is a user-level single-locale reproducer: module Lib {
var globalLib = 10;
proc updateLib() { globalLib += 1; }
}
module Main {
use Lib;
var s$: sync int;
var globalMain = 20;
proc updateMain() { globalMain += 1; }
proc main {
coforall 1..2 {
s$ = 1; // grab the lock
writeln("globaLib = ", globalLib);
writeln("globalMain = ", globalMain);
updateLib();
updateMain();
s$;
}
}
} currently printing
Consider this in my court. |
Thanks for looking into it and for finding the bug, Vass. Just to make sure I'm understanding, once this bug is fixed, we'd need to use one of the replacements for [edit: Assuming so, I was interested in taking a look at that piece of the puzzle (it feels like a "Brad-sized chunk" and more interesting than the doc edits I've been stuck in recently).]
That seems highly unlikely...? |
Right, this is what I expect after a fix. Note that chpl_nodeID will be 0 only when referenced directly from the lexical scope of a It so happens that if chpl_nodeID were declared |
Interesting, I'd wondered about that, thanks for mentioning it. And I'm guessing there's no reason it couldn't be, if we wanted to leave it as-is... though Greg's idea of using a C static function (maybe even a "paren-less external function?" that's really a C variable, nudge-nudge-wink-wink) sounded intriguing and far less entwined with other language concepts, which seems appealing. |
There are several things with chpl_nodeID - currently it is So a related concern is what to do with "locale private" variables. For example, BTW there is a lengthy comment discussion in the sister isOuterVar() in flattenFunctions.cpp about locale-private variables w.r.t. RVF. We could decide to have no shadow variables for locale-private variables. However, this would mix unrelated concepts and likely cause confusion down the road. Suggestions? |
I think that if you could focus on fixing the bug, I'll focus on switching |
The switch for This pulls the rug over the bug Vass identified for this specific case, but we should definitely still fix the bug for fear of doing the wrong thing in other cases (and perhaps it could even result in performance improvements for codes that refer to global variables within coforalls?) |
How should we handle "locale private" variables, such as chpl_localeTree? Handling them "correctly" would result in accessing a value from a remote locale via a shadow variable instead of accessing the current-locale value. |
Ah, sorry Vass... I'd missed that your question was more generally applicable in other cases beyond this one. Looking over the distinct uses of locale private variables, I'm wondering whether there's a single solution that would work for each of them. For example, chpl_localeTree seems fairly different in its use from many of the other locale private variables that seem to be more about caching something locally (?). I'm also wondering what it would take to eliminate these uses of locale private by writing them in terms of different concepts within the language itself. Maybe we need to fork this off into a separate issue and involve those who are familiar with each of the variables (looks like there's only 8 of them?) |
Convert `chpl_nodeID` from `extern var` to `extern proc` [reviewed by @mppf] Declaring `chpl_nodeID` as a module-scope extern variable pointed to an issue in which we weren't introducing a shadow variable for it as we do for other module-scope variables. However, it also shows a weakness of using a module-scope extern variable for an external value that is different on each locale: it suggests to Chapel that there's a single extern variable when in fact there's a per-locale variable with a distinct value on each locale. This suggests that an extern var isn't the best way to represent such values in Chapel. In this PR, I change chpl_nodeID to an extern (paren-less) proc so that the Chapel code which refers to it can remain unchanged; and I assign it a new name in the C code so that we can implement it as a static inline function without modifying other C code that refers to the C-level variable. The result is a minimal change to our code that avoids the mismatch between describing such a value as a global variable in Chapel that should result in a nearly identical execution-time implementation. Resolves #16716.
Since the original issue here is now resolved by changing how |
#14143 might cover the "locale private" vars |
A colleague pointed out that for the following code running on two nodes:
the output is inconsistent between the two loop forms:
My first reaction was "
chpl_nodeID
isn't really intended to be a user-facing feature, and is only used for bootstrapping, so maybe we don't really need to worry about this." But while I think the first part of that statement is true, we rely onchpl_nodeID
a lot in library code, which makes it slightly concerning. And the fact that I can't explain what's happening is concerning.Here's what I (think I) know:
though
chpl_nodeID
is a fairly special SPMD-style / per-node C variable, Chapel doesn't really know this. It's declared as anextern var
of typeint
and from what I've seen, the compiler doesn't seem to special-case it.I think the forall loop is arguably doing the correct thing in that
chpl_nodeID
is a global integer variable, and so is subject to having aconst in
shadow variable being inserted for it. Since that shadow variable is inserted on locale 0, it makes sense thatchpl_nodeID
would be 0 when printed from either locale.Putting in an explicit
ref
intent for theforall
loop seems to confirm this, resulting in the 0 / 1 values being printed.So, I'm confused by why the coforall loop doesn't seem to insert the similar shadow variable and get the same output (and, if it did, would this break all of the library code that relies on reasoning about chpl_nodeID?)
Also weird: I'd expect that putting a
with (const in chpl_nodeID)
into thecoforall
loop would symmetrically result in the same behavior as theforall
loop, yet it doesn't.I was expecting that the coforall+on optimization might be playing a role here, yet inserting a
writeln()
before theon
-clause within the coforall doesn't change the behavior.All of this makes me suspicious that we have some sort of bug or inconsistency in our implementation, though I'm not sure what it is. It also makes me believe that we should remove
chpl_nodeID
since it doesn't behave like a normal Chapel variable, and rely on something like a primitive that returns the current node ID instead. This also makes me curious to understand better when/whychpl_nodeID
is used in libraries and what it would take to rewrite those to only use user-facing features.The text was updated successfully, but these errors were encountered: