Inspect Local Variable Representation #93

CheaterCodes · 2022-11-22T15:06:16Z

Let's have a look at the following java method:

private void localTest(int parameter1, String parameter2, List<String> parameter3) {
    int local1 = 5;
    System.out.println(local1);
    int local2 = 42;
    System.out.println(local2);
}

The following is the Chasm-Internal representation of said method:

{
	access: 2, 
	name: "localTest", 
	parameters: [{
			type: "I", 
			name: "P0", 
		},{
			type: "Ljava/lang/String;", 
			name: "P1", 
		},{
			type: "Ljava/util/List;", 
			name: "P2", 
		},], 
	returnType: "V", 
	signature: "(ILjava/lang/String;Ljava/util/List<Ljava/lang/String;>;)V", 
	exceptions: [], 
	annotations: [], 
	code: {
		instructions: [{
				label: "L0", 
			},{
				opcode: 8, 
			},{
				opcode: 54, 
				var: "V3", 
			},{
				label: "L1", 
			},{
				opcode: 178, 
				owner: "java/lang/System", 
				name: "out", 
				descriptor: "Ljava/io/PrintStream;", 
			},{
				opcode: 21, 
				var: "V3", 
			},{
				opcode: 182, 
				owner: "java/io/PrintStream", 
				name: "println", 
				descriptor: "(I)V", 
				isInterface: false, 
			},{
				label: "L2", 
			},{
				opcode: 16, 
				operand: 42, 
			},{
				opcode: 54, 
				var: "V12", 
			},{
				label: "L3", 
			},{
				opcode: 178, 
				owner: "java/lang/System", 
				name: "out", 
				descriptor: "Ljava/io/PrintStream;", 
			},{
				opcode: 21, 
				var: "V12", 
			},{
				opcode: 182, 
				owner: "java/io/PrintStream", 
				name: "println", 
				descriptor: "(I)V", 
				isInterface: false, 
			},{
				label: "L4", 
			},{
				opcode: 177, 
			},{
				label: "L5", 
			},], 
		sourceLocals: [{
				name: "this", 
				descriptor: "Lorg/quiltmc/chasm/Main$Inner;", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 0, 
			},{
				name: "parameter1", 
				descriptor: "I", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 1, 
			},{
				name: "parameter2", 
				descriptor: "Ljava/lang/String;", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 2, 
			},{
				name: "parameter3", 
				descriptor: "Ljava/util/List;", 
				signature: "Ljava/util/List<Ljava/lang/String;>;", 
				start: "L0", 
				end: "L5", 
				index: 3, 
			},{
				name: "local1", 
				descriptor: "I", 
				signature: null, 
				start: "L1", 
				end: "L5", 
				index: 4, 
			},{
				name: "local2", 
				descriptor: "I", 
				signature: null, 
				start: "L3", 
				end: "L5", 
				index: 5, 
			},], 
		tryCatchBlocks: [], 
		lineNumbers: [{
				line: 23, 
				label: "L0", 
			},{
				line: 24, 
				label: "L1", 
			},{
				line: 25, 
				label: "L2", 
			},{
				line: 26, 
				label: "L3", 
			},{
				line: 27, 
				label: "L4", 
			},], 
	}, 
}

First, let's observe:

The local variable analysis assigns V3 as the name for local1
The local variable analysis assigns V12 as the name for local2

However, the debug information provided in sourceLocals is not in any way associated with said locals.
I want to investigate the possibility and usefulness of merging these representations somehow.

One problem is currently that it is very hard to impossible to match a local by it's type, which is very common with mixin.
Another possible problem is duplicate information shared between signature and descriptor.

Please suggest different representations below.

The text was updated successfully, but these errors were encountered:

Earthcomputer · 2022-11-27T13:10:55Z

There is a fundamental problem with completely relying on source locals, is that it's only debugging information not required to run the code. I reckon we should use a best-effort attempt to match the source locals with the bytecode local variables, in this case V3 and V12, duplicating source local information where necessary.

We can create a map from bytecode local name to local metadata.

{
   this: {
      descriptor: "Lorg/quiltmc/chasm/Main$Inner;",
      start: "L0",
      end: "L5",
      name: "this",
   },
   P0: {
      descriptor: "I",
      start: "L0",
      end: "L5",
      name: "parameter1",
   },
   P1: {
      descriptor: "Ljava/lang/String;",
      start: "L0",
      end: "L5",
      name: "parameter2",
   },
   P2: {
      descriptor: "Ljava/util/List;",
      signature: "Ljava/util/List<Ljava/lang/String;>;",
      start: "L0",
      end: "L5",
      name: "parameter3",
   },
   V3: {
      descriptor: "I",
      start: "L1",
      end: "L5",
      name: "local1",
   },
   V12: {
      descriptor: "I",
      start: "L3",
      end: "L5",
      name: "local2",
   }
}

The descriptors above can be computed from the local variable information that chasm already computes, and therefore reliable for correctness. The signatures are debug information only, and therefore not reliable for correctness; signatures can be almost arbitrary strings in the JVM, therefore I do not recommend deduplicating them with the descriptor.

CheaterCodes · 2022-11-29T13:20:05Z

So I think we can agree that descriptor is crucial information that shold be available.
I also agree that the signature is effectively optional (and formless). Should we just drop it? Or keep it, just in case some tool wants it?
Are labels optional too? I think the point of line numbers is to allow different locals to share the same index. But iirc, this can never happen in Chasm? Should we just omit them?

The main concern I have is with Transformers adding locals. They have to do two things:

Insert instructions referencing a local
Insert the local type into the map you described

Inserting into a map can currently only be done by replacing the entire map. Is this a problem?
We can work around that by using a list, or add official map-slice-indexing to Chasm (though I'm not sure how that would work).

Alternatively, we can drop the requirement that transformers should specify the descriptor. The easiest way to do this would be to recalculate the locals-map after every round. This does however mean, that locals only become visible in the locals map at the end of a round.

A more complicated approach would be to lazy-evaluate map entries (or provide an intrinsic). I'm not sure if this is beneficial from an API perspective, but it could allow to get locals info even during a round.

Earthcomputer · 2022-11-29T14:56:46Z

I don't think we should completely drop debug info, only encourage people not to use it. In other places, the signature can represent the generics of a class or a method, which are actually visible via reflection (although their format is still arbitrary).

As for labels in local variables I'm not sure, maybe they can be dropped.

With transformers adding locals, one other option would be to check that the local variables are well-formed during class writing (conversion back to ASM). This is not the same as class file verification as these locals are not part of the class file format, so therefore is not strictly outside the scope of chasm. But it would force transformers to write to the local type map correctly or else get transformation errors.

Inserting into a map can currently only be done by replacing the entire map. Is this a problem?

Not really. There is already a fairly idiomatic way of adding an entry to a map using the + operator:
m + {foo: "bar"}. This is a trivial case for a chassembly optimizer to convert into mutation.

Alternatively, we can drop the requirement that transformers should specify the descriptor. The easiest way to do this would be to recalculate the locals-map after every round. This does however mean, that locals only become visible in the locals map at the end of a round.

A more complicated approach would be to lazy-evaluate map entries (or provide an intrinsic). I'm not sure if this is beneficial from an API perspective, but it could allow to get locals info even during a round.

The latter approach would bear no behavioural improvement over the former, as an intrinsic returning different outputs with no change to its inputs (in this case no inputs) would be a violation of chassembly's immutability and mess with semantics and the optimizer. What could be done instead is making the local variable information an intrinsic function which takes in a chasm method, which therefore is allowed to return different results if the method changes.

CheaterCodes · 2022-11-29T15:15:56Z

Not really. There is already a fairly idiomatic way of adding an entry to a map using the + operator

No, you misunderstood: It's straightforward in Chassembly, but on a Chasm level, a Target can either be a single node (i.e. the entire map) or a slice into a list (not appliccable to maps).

leo60228 · 2023-01-13T04:40:31Z

Mixin also supports targeting by source variable name, which is fairly convenient when mixing into other mods.

CheaterCodes mentioned this issue Dec 30, 2022

Local Variable Representation #9

Closed

CheaterCodes added this to the Chasm Release milestone Jan 11, 2023

CheaterCodes mentioned this issue Jan 13, 2023

Ignore parameter debug names for now #99

Closed

This was referenced Jan 21, 2023

Empty slices in maps #100

Open

New locals representation #102

Merged

CheaterCodes closed this as completed in #102 Jan 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inspect Local Variable Representation #93

Inspect Local Variable Representation #93

CheaterCodes commented Nov 22, 2022

Earthcomputer commented Nov 27, 2022

CheaterCodes commented Nov 29, 2022 •

edited

Earthcomputer commented Nov 29, 2022 •

edited

CheaterCodes commented Nov 29, 2022 •

edited

leo60228 commented Jan 13, 2023

Inspect Local Variable Representation #93

Inspect Local Variable Representation #93

Comments

CheaterCodes commented Nov 22, 2022

Earthcomputer commented Nov 27, 2022

CheaterCodes commented Nov 29, 2022 • edited

Earthcomputer commented Nov 29, 2022 • edited

CheaterCodes commented Nov 29, 2022 • edited

leo60228 commented Jan 13, 2023

CheaterCodes commented Nov 29, 2022 •

edited

Earthcomputer commented Nov 29, 2022 •

edited

CheaterCodes commented Nov 29, 2022 •

edited