Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inspect Local Variable Representation #93

Closed
CheaterCodes opened this issue Nov 22, 2022 · 5 comments · Fixed by #102
Closed

Inspect Local Variable Representation #93

CheaterCodes opened this issue Nov 22, 2022 · 5 comments · Fixed by #102
Milestone

Comments

@CheaterCodes
Copy link
Contributor

Let's have a look at the following java method:

private void localTest(int parameter1, String parameter2, List<String> parameter3) {
    int local1 = 5;
    System.out.println(local1);
    int local2 = 42;
    System.out.println(local2);
}

The following is the Chasm-Internal representation of said method:

{
	access: 2, 
	name: "localTest", 
	parameters: [{
			type: "I", 
			name: "P0", 
		},{
			type: "Ljava/lang/String;", 
			name: "P1", 
		},{
			type: "Ljava/util/List;", 
			name: "P2", 
		},], 
	returnType: "V", 
	signature: "(ILjava/lang/String;Ljava/util/List<Ljava/lang/String;>;)V", 
	exceptions: [], 
	annotations: [], 
	code: {
		instructions: [{
				label: "L0", 
			},{
				opcode: 8, 
			},{
				opcode: 54, 
				var: "V3", 
			},{
				label: "L1", 
			},{
				opcode: 178, 
				owner: "java/lang/System", 
				name: "out", 
				descriptor: "Ljava/io/PrintStream;", 
			},{
				opcode: 21, 
				var: "V3", 
			},{
				opcode: 182, 
				owner: "java/io/PrintStream", 
				name: "println", 
				descriptor: "(I)V", 
				isInterface: false, 
			},{
				label: "L2", 
			},{
				opcode: 16, 
				operand: 42, 
			},{
				opcode: 54, 
				var: "V12", 
			},{
				label: "L3", 
			},{
				opcode: 178, 
				owner: "java/lang/System", 
				name: "out", 
				descriptor: "Ljava/io/PrintStream;", 
			},{
				opcode: 21, 
				var: "V12", 
			},{
				opcode: 182, 
				owner: "java/io/PrintStream", 
				name: "println", 
				descriptor: "(I)V", 
				isInterface: false, 
			},{
				label: "L4", 
			},{
				opcode: 177, 
			},{
				label: "L5", 
			},], 
		sourceLocals: [{
				name: "this", 
				descriptor: "Lorg/quiltmc/chasm/Main$Inner;", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 0, 
			},{
				name: "parameter1", 
				descriptor: "I", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 1, 
			},{
				name: "parameter2", 
				descriptor: "Ljava/lang/String;", 
				signature: null, 
				start: "L0", 
				end: "L5", 
				index: 2, 
			},{
				name: "parameter3", 
				descriptor: "Ljava/util/List;", 
				signature: "Ljava/util/List<Ljava/lang/String;>;", 
				start: "L0", 
				end: "L5", 
				index: 3, 
			},{
				name: "local1", 
				descriptor: "I", 
				signature: null, 
				start: "L1", 
				end: "L5", 
				index: 4, 
			},{
				name: "local2", 
				descriptor: "I", 
				signature: null, 
				start: "L3", 
				end: "L5", 
				index: 5, 
			},], 
		tryCatchBlocks: [], 
		lineNumbers: [{
				line: 23, 
				label: "L0", 
			},{
				line: 24, 
				label: "L1", 
			},{
				line: 25, 
				label: "L2", 
			},{
				line: 26, 
				label: "L3", 
			},{
				line: 27, 
				label: "L4", 
			},], 
	}, 
}

First, let's observe:

  • The local variable analysis assigns V3 as the name for local1
  • The local variable analysis assigns V12 as the name for local2

However, the debug information provided in sourceLocals is not in any way associated with said locals.
I want to investigate the possibility and usefulness of merging these representations somehow.

One problem is currently that it is very hard to impossible to match a local by it's type, which is very common with mixin.
Another possible problem is duplicate information shared between signature and descriptor.

Please suggest different representations below.

@Earthcomputer
Copy link
Contributor

There is a fundamental problem with completely relying on source locals, is that it's only debugging information not required to run the code. I reckon we should use a best-effort attempt to match the source locals with the bytecode local variables, in this case V3 and V12, duplicating source local information where necessary.

We can create a map from bytecode local name to local metadata.

{
   this: {
      descriptor: "Lorg/quiltmc/chasm/Main$Inner;",
      start: "L0",
      end: "L5",
      name: "this",
   },
   P0: {
      descriptor: "I",
      start: "L0",
      end: "L5",
      name: "parameter1",
   },
   P1: {
      descriptor: "Ljava/lang/String;",
      start: "L0",
      end: "L5",
      name: "parameter2",
   },
   P2: {
      descriptor: "Ljava/util/List;",
      signature: "Ljava/util/List<Ljava/lang/String;>;",
      start: "L0",
      end: "L5",
      name: "parameter3",
   },
   V3: {
      descriptor: "I",
      start: "L1",
      end: "L5",
      name: "local1",
   },
   V12: {
      descriptor: "I",
      start: "L3",
      end: "L5",
      name: "local2",
   }
}

The descriptors above can be computed from the local variable information that chasm already computes, and therefore reliable for correctness. The signatures are debug information only, and therefore not reliable for correctness; signatures can be almost arbitrary strings in the JVM, therefore I do not recommend deduplicating them with the descriptor.

@CheaterCodes
Copy link
Contributor Author

CheaterCodes commented Nov 29, 2022

So I think we can agree that descriptor is crucial information that shold be available.
I also agree that the signature is effectively optional (and formless). Should we just drop it? Or keep it, just in case some tool wants it?
Are labels optional too? I think the point of line numbers is to allow different locals to share the same index. But iirc, this can never happen in Chasm? Should we just omit them?

The main concern I have is with Transformers adding locals. They have to do two things:

  • Insert instructions referencing a local
  • Insert the local type into the map you described

Inserting into a map can currently only be done by replacing the entire map. Is this a problem?
We can work around that by using a list, or add official map-slice-indexing to Chasm (though I'm not sure how that would work).

Alternatively, we can drop the requirement that transformers should specify the descriptor. The easiest way to do this would be to recalculate the locals-map after every round. This does however mean, that locals only become visible in the locals map at the end of a round.

A more complicated approach would be to lazy-evaluate map entries (or provide an intrinsic). I'm not sure if this is beneficial from an API perspective, but it could allow to get locals info even during a round.

@Earthcomputer
Copy link
Contributor

Earthcomputer commented Nov 29, 2022

I don't think we should completely drop debug info, only encourage people not to use it. In other places, the signature can represent the generics of a class or a method, which are actually visible via reflection (although their format is still arbitrary).

As for labels in local variables I'm not sure, maybe they can be dropped.

With transformers adding locals, one other option would be to check that the local variables are well-formed during class writing (conversion back to ASM). This is not the same as class file verification as these locals are not part of the class file format, so therefore is not strictly outside the scope of chasm. But it would force transformers to write to the local type map correctly or else get transformation errors.

Inserting into a map can currently only be done by replacing the entire map. Is this a problem?

Not really. There is already a fairly idiomatic way of adding an entry to a map using the + operator:
m + {foo: "bar"}. This is a trivial case for a chassembly optimizer to convert into mutation.

Alternatively, we can drop the requirement that transformers should specify the descriptor. The easiest way to do this would be to recalculate the locals-map after every round. This does however mean, that locals only become visible in the locals map at the end of a round.

A more complicated approach would be to lazy-evaluate map entries (or provide an intrinsic). I'm not sure if this is beneficial from an API perspective, but it could allow to get locals info even during a round.

The latter approach would bear no behavioural improvement over the former, as an intrinsic returning different outputs with no change to its inputs (in this case no inputs) would be a violation of chassembly's immutability and mess with semantics and the optimizer. What could be done instead is making the local variable information an intrinsic function which takes in a chasm method, which therefore is allowed to return different results if the method changes.

@CheaterCodes
Copy link
Contributor Author

CheaterCodes commented Nov 29, 2022

Not really. There is already a fairly idiomatic way of adding an entry to a map using the + operator

No, you misunderstood: It's straightforward in Chassembly, but on a Chasm level, a Target can either be a single node (i.e. the entire map) or a slice into a list (not appliccable to maps).

@leo60228
Copy link

Mixin also supports targeting by source variable name, which is fairly convenient when mixing into other mods.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants