Use LLVM unnamed local identifiers #153

jclark · 2021-06-28T06:24:41Z

At the moment, llvm.nback uses a named LLVM local identifier starting with _ (e.g. %_17) when the API user does not specify a name argument. It would be better for it to use an unnamed LLVM identifier (e.g. %17). This would improve the similarity of code generated using llvm.nback and using #19.

This isn't as simple as just leaving out the _: LLVM has a specify method of numbering basic blocks and variables that the use of unnamed identifiers must comply with.

The text was updated successfully, but these errors were encountered:

heshanpadmasiri · 2021-07-08T07:10:14Z

@jclark if I understand how LLVM does this correctly what it does it is it start (from 0) numbering parameters then skip the next number and start numbering variables. It gives the next number to the basic block. For example

define internal void @_B_foo(i64 %0) {
  %2 = alloca i8, align 1
  %3 = load i8*, i8** @_bal_stack_guard, align 8
  %4 = icmp ult i8* %2, %3
  br i1 %4, label %9, label %5

5:                                                ; preds = %1
  %6 = call i8* @_bal_alloc(i64 8)
  %7 = bitcast i8* %6 to i64*
  store i64 %0, i64* %7, align 8
  %8 = getelementptr i8, i8* %6, i64 504403158265495552
  call void @_Bio__println(i8* %8)
  ret void

9:                                                ; preds = %1
  call void @_bal_panic(i64 1796)
  unreachable
}

It start naming parameters from 0 . Then skipped 1 and start naming variables from 2 to 4 in the first basic block. It named the next basic block 5 and start naming its variables from 6 to 8 and so on. (It uses a single object to generate names for variables and basic blocks in the print stage)
While this is easy in LLVM API where they delay actually converting instructions to strings until the end we have a problem where user can add basic block at any time and we need to give names to variables as soon as they are defined. The easiest way I think we can make our output similar without substantial changes to how our API works inside would be to give all our basic blocks and variables uniquely identifiable names (similar to what we do now) and just before creating the output (inside the output function of basic blocks) update the names correctly. In order to make sure we don't affect variable names user may give in build commands we need to impose restrictions such that they won't name variables similar to the names we issue temporarily to unnamed variables/basic blocks( At the moment I am thinking of preventing user from giving names starting with _). Is this okay?

jclark · 2021-07-08T07:35:16Z

No, that's not okay: Ballerina variable names can be arbitrary strings (because of escapes and quoting), and we need to be able to handle that.

You should transform the user-specified name into a valid LLVM identifier by using quotes and \. LLVM identifier rules are here: https://llvm.org/docs/LangRef.html#identifiers (The right way to do this for strings that contain code points >= 128 is to transform to UTF-8 first by using string:toBytes)

So a user specified name of 1 would turn into an identifier "%1" with quotes. Then you use %1 for unnamed variables, which won't conflict.

jclark · 2021-07-08T07:36:58Z

Check what happens in the JNI API if you specify a name parameter starting with a digit e.g. 123: I am guessing it will turn into "%123" or maybe something with escapes.

heshanpadmasiri · 2021-07-08T08:31:08Z

It will become %"123". This is because you can't have a number as the first character (it fine for something like v1) so it'll use quotations. Fallowing the same logic then I'll use names in the form %*1 for the initial temporary names (if users gives name as "*1" it'll end up as %"*1" since you can't start with a * as well) then replace them inside the output function.

jclark · 2021-07-08T09:51:25Z

Can't you just use %1 as the temporary name?

In print.llvm, if the user specifies 1 or *1 as the name parameter, you need to transform it "1" or "*1".

jclark · 2021-07-09T05:23:19Z

I would suggest you represent each line as an array of strings, where each string is either

a maximal chunk of the output line, not including any unnamed variables, or
just an unnamed variable

At the output stage, you just transform strings of type 2 via a lookup in a map and concatenate the members of the array.

This should hopefully reduce the performance hit from implementing this issue.

Fixes #153 #202

Fixes #153, #202

jclark added this to the Subset 3 milestone Jun 28, 2021

jclark assigned heshanpadmasiri Jun 28, 2021

jclark modified the milestones: Subset 3, Subset 3 improved Jul 5, 2021

heshanpadmasiri mentioned this issue Jul 12, 2021

Use LLVM unnamed local identifiers #208

Merged

jclark modified the milestones: Subset 3 improved, Subset 4 Jul 13, 2021

jclark closed this as completed in #208 Jul 13, 2021

jclark pushed a commit that referenced this issue Jul 13, 2021

Use LLVM unnamed local identifiers (#208)

289df74

Fixes #153 #202

heshanpadmasiri mentioned this issue Jul 13, 2021

Use LLVM unnamed local identifiers fix #219

Merged

jclark pushed a commit that referenced this issue Jul 13, 2021

Use LLVM unnamed local identifiers fix (#219)

ee5cd55

Fixes #153, #202

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use LLVM unnamed local identifiers #153

Use LLVM unnamed local identifiers #153

jclark commented Jun 28, 2021 •

edited

Loading

heshanpadmasiri commented Jul 8, 2021

jclark commented Jul 8, 2021

jclark commented Jul 8, 2021

heshanpadmasiri commented Jul 8, 2021

jclark commented Jul 8, 2021

jclark commented Jul 9, 2021

Use LLVM unnamed local identifiers #153

Use LLVM unnamed local identifiers #153

Comments

jclark commented Jun 28, 2021 • edited Loading

heshanpadmasiri commented Jul 8, 2021

jclark commented Jul 8, 2021

jclark commented Jul 8, 2021

heshanpadmasiri commented Jul 8, 2021

jclark commented Jul 8, 2021

jclark commented Jul 9, 2021

jclark commented Jun 28, 2021 •

edited

Loading