Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhaul typechecks for Mapping/Listing #628

Merged
merged 6 commits into from
Sep 6, 2024

Conversation

bioball
Copy link
Contributor

@bioball bioball commented Aug 15, 2024

This follows this SPICE: apple/pkl-evolution#11

This changes how the language performs typechecks for mappings and
listings.

Currently, Pkl will shallow-force any Mapping and Listing to check
the type parameter (e.g. Listing<Person> means each element is checked
to be an instance of Person).

This changes the language to check each member's type when the member
is accessed.

This PR is designed to be reviewed as two separate commits.
The first one contains business logic that changes how mapping/listing
typechecks work.
The second one adjusts the test runner to handle errors thrown within
tests.

Closes #405
Closes #406

@@ -1,41 +1,367 @@
import "pkl:test"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: these don't amend pkl:test because facts and examples are declared as Mapping<String, Listing<Boolean>> and Mapping<String, Listing<unknown>>. These are typed properties that themselves use Mapping/Listing type checks, and problems with typechecking can quickly be conflated between the tests themselves and the test framework.

@bioball bioball force-pushed the mapping-listing-typechecks branch 7 times, most recently from ce4f445 to 86d6d7b Compare August 16, 2024 18:05
*/
public abstract void execute(VirtualFrame frame, Object value);
public abstract Object execute(VirtualFrame frame, Object value);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is one of the biggest changes. Executing a typecheck will return a type-casted value.

For most types, this simply returns value if the value conforms to the type (e.g. Int). For Listing and Mapping, this returns a surrogate listing/mapping object. Other parameterized types may also return a new object if the parameterized type is Listing or Mapping (e.g. List<Listing<Int>>).

}
}

public abstract static class ListTypeNode extends ObjectSlotTypeNode {
public static final class ListTypeNode extends ObjectSlotTypeNode {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truffle's DSL was overriding executeEagerly when making this an abstract class and using specializations. I'm not sure if I'm doing this wrong?

But, regardless, having specializations here doesn't seem that useful anyway? Reading the same member value skips the typecheck the second time around because it gets cached. And reading a new value needs an instanceof call and will de-specialize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be related to my lambda remark above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so; I'm referring to the generated code that Truffle produces when you have an abstract class with specializations, which is a different concern than partial evaluation?

this.insertedStackFrames = new HashMap<>();
}
this.insertedStackFrames.put(callTarget, stackFrame);
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this so we can allow a VmTypeMismatchException to blame a source section when a typecheck fails (see ListingOrMappingTypeNode#doEagerCheck)

var alreadyVisited = !visited.add(key);
// important to record hidden member as visited before skipping it
// because any overriding member won't carry a `hidden` identifier
if (alreadyVisited || member.isLocalOrExternalOrHidden()) return true;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation is changed because iterateMemberValues might cause a VmMapping/VmListing to throw, because VmListingOrMapping#getCachedValue() can possibly perform an additional typecheck. Mapping#length should not need to execute any members, so, it should not need to run typechecks.

This changes how the language performs typechecks for mappings and
listings.

Currently, Pkl will shallow-force any Mapping and Listing to check it
the type parameter (e.g. Listing<Person> means each element is checked
to be an instance of Person).

This changes the language to check each member's type when the member
is accessed.

Note: List/Map/Set are still eager checks, because these types are not
VmObject and not lazy in the first place.

Note 2: Unions of Listings or Mappings are still eager (we must force
the listing to know if it satisfies Listing<A>|Listing<B>).
With the change to make mapping/listing typechecks lazy, we can now
correctly handle thrown errors from within a single test case.

This adjusts the test runner to consider any thrown errors as a failure
for that specific test case.
Copy link
Contributor

@stackoverflow stackoverflow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice solution! Less code than I expected. Just have some small things.

ctx.objectBody(),
new GetParentForTypeNode(
createSourceSection(ctx),
visitType(typeCtx),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't you use parentType here as you already visited it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -45,10 +47,10 @@ public Object executeGeneric(VirtualFrame frame) {
unresolvedTypeNode = null;
}

// TODO: throw if typeNode is FunctionTypeNode (it's impossible to check)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea to create an issue for that (if it doesn't exist yet) and link it here for tracking.

var seenCollections = new MutableLong(0);
this.acceptTypeNode(
(typeNode) -> {
if (typeNode instanceof ListingTypeNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if TypeNode had a isParametric property you could just check for that and store the Class in a set and check if the class is not in the set or return false.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

for (var elem : value) {
elementTypeNode.execute(frame, elem);
for (var elem : vmList) {
elementTypeNode.executeEagerly(frame, elem);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason why you are not checking the return of executeEargerly to check if the value didn't change, like in execute?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executeEagerly is a type test rather than a type cast. It will never return a different value; instead, it will throw if the type test fails.

}
)
module.catch(() -> l[0][1])
module.catch(() -> l[0][2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you really want to check l[0][1] and l[0][2] here instead of [0][0] and [0][1]? I don't think the index out of bounds check is needed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, good catch!

@@ -0,0 +1,4 @@
// ensure that this member is only evaluated once (should result in only one trace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: all other tests are in the singular (listing4/5/6.pkl), only this one is plural.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor

@holzensp holzensp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock, but some significant questions; especially memory pressure!

@@ -107,6 +107,138 @@ class CliTestRunnerTest {
assertThat(err.toString()).isEqualTo("")
}

@Test
fun `CliTestRunner thrown error in facts`(@TempDir tempDir: Path) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fun `CliTestRunner thrown error in facts`(@TempDir tempDir: Path) {
fun `CliTestRunner throws error in facts`(@TempDir tempDir: Path) {

?

[throughout]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it's not CliTestRunner that throws the error. I'll change this to CliTestRunner with thrown error in facts

Comment on lines 138 to 145
if (isEntry()) {
while (ch == ']' || ch == '=' || Character.isWhitespace(ch)) {
ch = text.charAt(++skip);
}
} else {
while (ch == '=' || Character.isWhitespace(ch)) {
ch = text.charAt(++skip);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (isEntry()) {
while (ch == ']' || ch == '=' || Character.isWhitespace(ch)) {
ch = text.charAt(++skip);
}
} else {
while (ch == '=' || Character.isWhitespace(ch)) {
ch = text.charAt(++skip);
}
while ((ch == ']' && isEntry()) || ch == '=' || Character.isWhitespace(ch)) {
ch = text.charAt(++skip);
}

Comment on lines 287 to 290
@Override
protected void acceptTypeNode(Consumer<TypeNode> visitor) {
visitor.accept(this);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to put this default implementation in TypeNode and reduce some of this duplication (only override when visiting is something other than visitor.accept())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing it this way is less error prone--having a default implementation made it too easy to miss if something did need to change for any one TypeNode.

if (skipElementTypeChecks) return;
protected void acceptTypeNode(Consumer<TypeNode> visitor) {
visitor.accept(this);
//noinspection ForLoopReplaceableByForEach
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason not to use foreach?

Copy link
Contributor Author

@bioball bioball Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; writing it this way allows Truffle to unroll this loop into a series of statements. See ExplodeLoop for details.

var seenPairs = new MutableLong(0);
var seenCollections = new MutableLong(0);
this.acceptTypeNode(
(typeNode) -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually avoid lambdas in any FooNode, because of the partial evaluation of node.execute(). Did you inspect this with IdealGraphVisualizer to confirm this one's okay?

Besides; this visits exhaustively, even when any of these observations hits >= 2. Is it worth pruning that (let visitors return a boolean to mark they want to continue; instead of passing in a lambda, define a class with this logic, create and instance and pass a method reference on said instance as the visitor)?

Copy link
Contributor Author

@bioball bioball Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a truffle boundary here, so there's no impact on partial evaluation/inlining.

instead of passing in a lambda, define a class with this logic, create and instance and pass a method reference on said instance as the visitor)

Not sure what the benefit of this is? The current function syntax is sugar for new Consumer<TypeNode>(), and passing that as a reference.

But, fair point on returning early, will implement!

if (typeNode instanceof ListingTypeNode) {
seenListings.getAndIncrement();
}
if (typeNode instanceof MappingTypeNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be marginal and I see the point of easy translation to a switch, but that's still far off... else if?

}
}

public abstract static class ListTypeNode extends ObjectSlotTypeNode {
public static final class ListTypeNode extends ObjectSlotTypeNode {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be related to my lambda remark above.

throw typeMismatch(value, BaseModule.getListClass());
@Override
public Object execute(VirtualFrame frame, Object value) {
if (!(value instanceof VmList vmList)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a newly introduced scenario, or a previously missed one?

Copy link
Contributor Author

@bioball bioball Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is neither; I rewrote this class to remove specializations (see comment here: https://github.com/apple/pkl/pull/628/files#r1720334213)

Comment on lines 31 to 32
/** The original that this listing/mapping is a surrogate of. It might have its own surrogatee. */
private final @Nullable SELF surrogatee;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the surrogate(e) naming; what about s/surrogate/typeCheckedThis and s/surrogatee/uncheckedThis?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about "delegate"? That seems like a more common term for this type of thing.

Comment on lines 32 to 36
private final @Nullable SELF surrogatee;

private final ListingOrMappingTypeCheckNode typeCheckNode;
private final MaterializedFrame typeNodeFrame;
private final EconomicMap<Object, ObjectMember> cachedMembers = EconomicMaps.create();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little worried about the memory load this imposes (2x this for all Listings and Mappings; especially the cachedMembers). Have you measured the cost of this on a significant corpus?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only introduces new cached members on some mappings and listings.

For example, this still contains 2 cached members, because foo is hidden and never executed:

hidden foo: Mapping = new {
  ["first"] = 1
  ["second"] = 2
}

bar: Mapping<String, Int> = foo

Same here:

hidden foo: Mapping<String, Int> = new {
  ["first"] = 1
}

However, this example means we get four cached members, because foo is not hidden:

foo: Mapping = new {
  ["first"] = 1
  ["second"] = 2
}

bar: Mapping<String, Int> = foo

We can certainly profile this on some real-world projects to see what this change looks like.

Copy link
Contributor Author

@bioball bioball Sep 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added an additional optimization here: now, we will only store cached members on the delegated object if the typecast results in a new object.

This results in two cached members:

foo: Mapping = new {
  ["first"] = 1
  ["second"] = 2
}

bar: Mapping<String, Int> = foo

This results in four cached members, due to the Listing to Listing<Int> typecast:

foo: Mapping<Listing> = new {
  ["first"] = new { 1 }
  ["second"] = new { 2 }
}

bar: Mapping<String, Listing<Int>> = foo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added another optimization: A typecast will only return a new delegating object if the types are different.

This results in no delegation:

foo: Listing<String>

bar = foo as Listing<String> // no delegation here because `foo` is already `Listing<String>`

* Don't create a delegated VmListingOrMapping if the element type check is the same (don't create a new object for: `new Listing<String> {} as Listing<String>`)
* Don't store cached values on a delegated object unless a typecast returns a different value
@bioball bioball merged commit 7868d9d into apple:main Sep 6, 2024
5 checks passed
@bioball bioball deleted the mapping-listing-typechecks branch September 6, 2024 22:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants