Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance mangling to handle module names #448

Closed
jclark opened this issue Sep 14, 2021 · 14 comments
Closed

Enhance mangling to handle module names #448

jclark opened this issue Sep 14, 2021 · 14 comments
Assignees
Milestone

Comments

@jclark
Copy link
Contributor

jclark commented Sep 14, 2021

For subset 8, we need to be able to handle public names with

  • an org that is either empty or ballerina
  • arbitrary list of module names

We need to distinguish these from non-public names.

This is part of #438.

@jclark jclark added this to the Subset 8 milestone Sep 14, 2021
@jclark jclark mentioned this issue Sep 14, 2021
8 tasks
@jclark
Copy link
Contributor Author

jclark commented Sep 14, 2021

C++ mangling rules here: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#mangling

@jclark
Copy link
Contributor Author

jclark commented Sep 14, 2021

Unquoted identifiers in LLVM must match [-a-zA-Z$._][-a-zA-Z$._0-9]*

https://llvm.org/docs/LangRef.html#identifiers

@jclark
Copy link
Contributor Author

jclark commented Sep 14, 2021

We will rely on the proposed spec restrictions here: ballerina-platform/ballerina-spec#791 (comment)

@jclark
Copy link
Contributor Author

jclark commented Sep 14, 2021

What we are currently doing is described here: #43 (comment)

@jclark
Copy link
Contributor Author

jclark commented Sep 14, 2021

GNU as syntax for names is described here: https://sourceware.org/binutils/docs/as/Symbol-Names.html#Symbol-Names

Symbol names begin with a letter or with one of ‘._’. On most machines, you can also use $ in symbol names; exceptions are noted in Machine Dependencies. That character may be followed by any string of digits, letters, dollar signs (unless otherwise noted for a particular target machine), and underscores.

@jclark
Copy link
Contributor Author

jclark commented Sep 14, 2021

Public name looks like:

"_B" digit+ (org-or-mod-name "__")+ local-name

The digits are a decimal number equals to (number of org-or-mod-name - 1)*4 + K, where K is

  • 0 if there is a non-empty org name other than "ballerina" (so first org-or-mod-name is organization name)
  • 1 if there is an empty org name (so first org-or-mod-name is a module-name)
  • 2 if the org name is "ballerina" (so first org-or-mod-name is a module-name)
  • 3 if the org name is "ballerina", there are two or more module names and the first module name is "lang"

So following imports would start with following prefixes

ballerina/io => _B2io__
ballerina/lang.value => _B3value__
wso2/nballerina.err => _B8wso2__nballerina__err__
foo_bar.baz => _B5foo_bar__baz__

We will keep $ for escaping individual characters.

@jclark
Copy link
Contributor Author

jclark commented Sep 14, 2021

Alternative scheme:

"_B" digit* letter  (org-or-mod-name "__")* local-name

In this scheme, "well-known" org/module name sequences are assigned a lower-case ASCII letter

  • b ballerina/
  • l ballerina/lang
  • i ballerina/io
  • h ballerina/http
  • x ballerinax/
  • w wso2/
  • n (no organization) (first org-or-mod-name is module name)
  • o other organization (first org-or-mod-name is org name)

The digits are a decimal number saying how many org-or-mod-name's there are. Omitting the digits is equivalent to 1.

ballerina/io => _B0i
ballerina/lang.value => _Blvalue__
wso2/nballerina.err => _B2wnballerina__err__
foo_bar.baz => _B2nfoo_bar__baz__

@jclark
Copy link
Contributor Author

jclark commented Sep 15, 2021

@jclark
Copy link
Contributor Author

jclark commented Sep 15, 2021

Inspired by Rust

public-name = "_B" org-name
# "b" adds "ballerina" to start of org
org-name = ["b"] zqual mod-name
mod-name =
    nqual local-name
    | "m" nqual mod-name
local-name = ident
zqual = qual # decimal-number >= 0 
nqual = qual # decimal-number > 0
# decimal-number is the number of bytes in ident, no leading zeros
# (except zqual can be 0) 
# "_" is present if ident starts with a digit or an underscore
# must not be present if decimal-number is 0
qual = decimal-number ["_"] ident
ident = 0 or bytes

Examples

ballerina/io => _Bb02io
ballerinax/choreo => _Bb1x6choreo
ballerina/lang.value => _Bb0m4lang5value
wso2/nballerina.front.syntax=> _B4wso2m10nballerinam5front6syntax
foo_bar.baz => _B0m7foo_bar3baz

With the restrictions on org/module names ballerina-platform/ballerina-spec#791 (comment), the "_" in qual won't be needed. But we'll include this to support root module names based on filenames.

@manuranga
Copy link
Contributor

@jclark What does qual stand for?

@jclark
Copy link
Contributor Author

jclark commented Sep 20, 2021

Qualifier. Maybe there's a better word to use here?

@manuranga
Copy link
Contributor

Qualifier is alright, I just didn't think of it when I saw the abbreviation. I would have used something like part, fragment or group, no strong preference though.

@jclark
Copy link
Contributor Author

jclark commented Sep 20, 2021

The logic is that, in the spec, m:foo is a qualified name: the thing that the prefix m refers to is thus the qualifier of foo.

@manuranga
Copy link
Contributor

Ah makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants