Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use LLVM unnamed local identifiers #153

Closed
jclark opened this issue Jun 28, 2021 · 6 comments · Fixed by #208 or #219
Closed

Use LLVM unnamed local identifiers #153

jclark opened this issue Jun 28, 2021 · 6 comments · Fixed by #208 or #219
Assignees
Milestone

Comments

@jclark
Copy link
Contributor

jclark commented Jun 28, 2021

At the moment, llvm.nback uses a named LLVM local identifier starting with _ (e.g. %_17) when the API user does not specify a name argument. It would be better for it to use an unnamed LLVM identifier (e.g. %17). This would improve the similarity of code generated using llvm.nback and using #19.

This isn't as simple as just leaving out the _: LLVM has a specify method of numbering basic blocks and variables that the use of unnamed identifiers must comply with.

@jclark jclark added this to the Subset 3 milestone Jun 28, 2021
@jclark jclark modified the milestones: Subset 3, Subset 3 improved Jul 5, 2021
@heshanpadmasiri
Copy link
Member

@jclark if I understand how LLVM does this correctly what it does it is it start (from 0) numbering parameters then skip the next number and start numbering variables. It gives the next number to the basic block. For example

define internal void @_B_foo(i64 %0) {
  %2 = alloca i8, align 1
  %3 = load i8*, i8** @_bal_stack_guard, align 8
  %4 = icmp ult i8* %2, %3
  br i1 %4, label %9, label %5

5:                                                ; preds = %1
  %6 = call i8* @_bal_alloc(i64 8)
  %7 = bitcast i8* %6 to i64*
  store i64 %0, i64* %7, align 8
  %8 = getelementptr i8, i8* %6, i64 504403158265495552
  call void @_Bio__println(i8* %8)
  ret void

9:                                                ; preds = %1
  call void @_bal_panic(i64 1796)
  unreachable
}

It start naming parameters from 0 . Then skipped 1 and start naming variables from 2 to 4 in the first basic block. It named the next basic block 5 and start naming its variables from 6 to 8 and so on. (It uses a single object to generate names for variables and basic blocks in the print stage)
While this is easy in LLVM API where they delay actually converting instructions to strings until the end we have a problem where user can add basic block at any time and we need to give names to variables as soon as they are defined. The easiest way I think we can make our output similar without substantial changes to how our API works inside would be to give all our basic blocks and variables uniquely identifiable names (similar to what we do now) and just before creating the output (inside the output function of basic blocks) update the names correctly. In order to make sure we don't affect variable names user may give in build commands we need to impose restrictions such that they won't name variables similar to the names we issue temporarily to unnamed variables/basic blocks( At the moment I am thinking of preventing user from giving names starting with _). Is this okay?

@jclark
Copy link
Contributor Author

jclark commented Jul 8, 2021

No, that's not okay: Ballerina variable names can be arbitrary strings (because of escapes and quoting), and we need to be able to handle that.

You should transform the user-specified name into a valid LLVM identifier by using quotes and \. LLVM identifier rules are here: https://llvm.org/docs/LangRef.html#identifiers (The right way to do this for strings that contain code points >= 128 is to transform to UTF-8 first by using string:toBytes)

So a user specified name of 1 would turn into an identifier "%1" with quotes. Then you use %1 for unnamed variables, which won't conflict.

@jclark
Copy link
Contributor Author

jclark commented Jul 8, 2021

Check what happens in the JNI API if you specify a name parameter starting with a digit e.g. 123: I am guessing it will turn into "%123" or maybe something with escapes.

@heshanpadmasiri
Copy link
Member

It will become %"123". This is because you can't have a number as the first character (it fine for something like v1) so it'll use quotations. Fallowing the same logic then I'll use names in the form %*1 for the initial temporary names (if users gives name as "*1" it'll end up as %"*1" since you can't start with a * as well) then replace them inside the output function.

@jclark
Copy link
Contributor Author

jclark commented Jul 8, 2021

Can't you just use %1 as the temporary name?

In print.llvm, if the user specifies 1 or *1 as the name parameter, you need to transform it "1" or "*1".

@jclark
Copy link
Contributor Author

jclark commented Jul 9, 2021

I would suggest you represent each line as an array of strings, where each string is either

  1. a maximal chunk of the output line, not including any unnamed variables, or
  2. just an unnamed variable

At the output stage, you just transform strings of type 2 via a lookup in a map and concatenate the members of the array.

This should hopefully reduce the performance hit from implementing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants