Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

proposals: Add new BIP-003 proposal, Agglutinative Roles language #82

Merged
merged 4 commits into from
Sep 12, 2017
Merged

proposals: Add new BIP-003 proposal, Agglutinative Roles language #82

merged 4 commits into from
Sep 12, 2017

Conversation

abeaumont
Copy link
Contributor

@abeaumont abeaumont commented Sep 4, 2017

Part of bblfsh/sdk#167


This presents some issues:
* It doesn't scale well.
For a set of N properties, in the worst case 2^N roles would be needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing comma after case.


This combination of roles is already done to some extent,
since a preincrement operator would actually be annotated with 2 roles: `Expression`, `OpPreIncrement`.
Current proposal just deepens this property separation into roles.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current proposal.

arithmetic operators, would have an easier way to filter the `UAST` to find the interesting nodes.

Additionally, this agglutination of roles makes unsupported node types degrade more gracefully.
For example, a `log` operator, which currently lacks an specific role,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpicking: at first I tough about logging, maybe change for natural logarithm operator?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lacks a specific

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to ** (pow), to avoid potential confusion.

The set of Roles are changed by this proposal.
It's limited to partition the multiple property roles currently defined,
leaving the potential addition of new property roles
(`Arithmetic`, `Comparsion`, `Loop`, ...) to a future BIP.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparison


## Impact

Imcompatible changes to the Role set are proposed, in order to do that,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incompatible

* Versioning should be added to the [SDK](https://github.com/bblfsh/sdk/),
to allow existing server and drivers work with a previous version of the SDK.
* Roles should be updated in the SDK.
* Protobuf generated code should be updated for [server](https://github.com/bblfsh/server) and [python](https://github.com/bblfsh/client-python) and [go](https://github.com/bblfsh/client-go) clients
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python and Go (uppercase).

* `VisibleFromInstance`: `Visibility`, `Instance`
* `VisibleFromType`: `Visibility`, `Type`
* `VisibleFromSubtype`: `Visibility`, `Subtype`
* `VisibleFromPackage`: `Visiblity`, `Package`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visibility

@juanjux
Copy link
Contributor

juanjux commented Sep 5, 2017

Good job!

About possible improvements, I miss a note in the Impact section about the current users of the UAST having to update their code and the way they interpret the roles.

Also, a third alternative could be the "two-level UAST" I proposed some time ago, where the first level roles are the more generic (loop, branch, procedure) and a second level would have a more concrete semantic meaning (foreach, if, coroutine). But it is probably tangential for this proposal (both levels could use agglutinative roles) so it doesn't need to be added.

Other than that, this BIP has my ACK for the main proposal.

@abeaumont
Copy link
Contributor Author

Updated with typo and grammar fixes from the review.

* `Import`
* `Path`
* `Alias`
* `Function`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have a Class role?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be TypeDeclaration or Type+Declaration with the new roles.

would be just left as: `Expression`, `Incomplete`.
With the new language, the node could still retain most of the information:

* `Expression`
Copy link

@ajnavarro ajnavarro Sep 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this or other cases, can be a possibility that a role with specific properties can match one by one properties from another role and be nodes slightly different?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For nodes with incomplete roles, that can certainly be the case. You could have unsupported operators (let's say pow and log), that you know they're arithmetic operators, but cannot distinguish between them. You'd have to go and check the token, if available, in that case.

@eiso
Copy link
Contributor

eiso commented Sep 6, 2017

First of all, thank you for the work on bip-3. It's a really interesting and very well written proposal. Having been giving it some thought today it makes a lot of sense to me (de)composing the roles, for the reasons that you state.

The one concern that I have, is around higher level abstractions on the roles for easy usability (what @juanjux I believe calls 2 level roles), in particular in the spark-api project. Since that project will be the way a large # of our intended audience will use babelfish, with the composed roles I can imagine some difficulties.

Here is some pseudo code from the Spark API design document.

//DS - PySpark
//for 1000 Python repositories (repos.txt or LanguageDataset demo)
src-d.select()

//clone to local FS (or use .siva files files in hdfs://)
src-d.clone(repos.txt, “/path/to/cloned/repos”)

//get UASTs for HEAD
uasts = src-d.read.gitLocal(“/path/to/cloned/repos”)
  .getReferences().filter($“reference_name===HEAD”)
  .getFiles()
  .applyEnry()
  .applyBblfsh()

//extract SimpleIdentifier roles (ids)
uasts.filter("uast uast_lib('//*[@class=SimpleIdentifier]')")

uast_lib is essentially libuast as you can see here. If instead of being able to extract all simple identifiers, I know need a rule based system for roles (e.g. include x, but excludes y, z). I can imagine the syntax becoming difficult, less easy to get started with and also more intense to process on the Spark side. That's why I am tagging @ajnavarro @bzz and @erizocosmico here as well.

The above could be partially solved with the two-level UAST suggestion of @juanjux .

@abeaumont
Copy link
Contributor Author

@eiso that is a very valid concern, and I see now that I forgot to give a proper answer to @juanjux suggestion, sorry.
I've two comments on this topic:

  1. I think what both @juanjux and @eiso are suggesting is just a particular case of the alternative approach I included at https://github.com/bblfsh/documentation/pull/82/files#diff-b3182c12132bb09dddfe38e712ae248bR289, which just talks about a categorization in general. The current proposal doesn't deepen in that approach for the reasons commented there, but if you consider otherwise, we can go deeper into that.
  2. It's true that with this proposal search may become more complex but apart from categorization there are two additional tools we can use to solve them:
  • Use appropiate roles to facilitate code analysis. It may be true that a code analyst may want to look for simple identifiers and doesn't want to get the qualified identifiers. So let's say that's the case for discussion's sake. Then we could add a Simple role to handle this properly and make analyst's life easier. I think the new approach makes this easier. Look for example at the arithmetic operator example I used, it would make it easier to look for arithmetic operators instead of all of them. The same way we can add roles for say, relational operators, control flow constructs, looping constructs, etc.
  • Use the power of xpath syntax to do the search. Even if we missed some valid searches needed by code analysts with our roleset, I think it would still be a reasonable approach (at the usability level) to use something like: //*[@roleIdentifier and not(@roleQualified)]. Not sure about performance, we may need to check that.

Note that these points are presented to have a wider view of the possibilities, not to discourage a categorization (of two levels or otherwise), which I consider a valid approach.

@eiso
Copy link
Contributor

eiso commented Sep 6, 2017

@abeaumont regarding topic 1, could you take your suggested approach and have a 'category type' role that gets added in the same manner. Or would this be mixing concepts?

@abeaumont
Copy link
Contributor Author

@eiso I'm not sure I understand what you mean, do you mean having a special field/attribute in a node named type which would contain the main category of a node? Something like:

internalType: SimpleIdentifier
type: Identifier
roles: Simple, OtherRole, ...

If that's what you mean, yes, that could be a way to do it. If not, please elaborate a bit on your appoach.

@eiso
Copy link
Contributor

eiso commented Sep 7, 2017

@abeaumont the options I meant were:

Option 1. When ForEach becomes For, Iterator, adding Loop as a sub-role. So having (For, Iterator, Loop). Where Loop is defined in the spec as a higher level role.

Option 2. ...or having (For, Iterator, Loop) but typed as:

role: For
type: level2

role: Iterator
type: level2

role: Loop
type: level1

Please ignore the terrible naming.

@abeaumont
Copy link
Contributor Author

@eiso ok, I understand now. I think both options would be similar from a Babelfish point of view. I think we could add automatic support for option 2, to make role 'level' explicit, without much work, it would just be more verbose.

So the main question would be if this approach would be of any use for code analysis. I think an analyst would need to know the roles and their categories beforehand anyway, but you surely have a better code analysis perspective and I guess you may have some use case in mind where this kind of annotation would be of help?

* `FunctionDeclarationName`: `Function`, `Declaration`, `Identifier`
* `FunctionDeclarationReceiver`: `Function`, `Declaration`, `Receiver`
* `FunctionDeclarationArgument`: `Function`, `Declaration`, `Argument`
* `FunctionDeclarationArgumentName`: `Function`, `Declaration`, `Argument`, `Name, `
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, at the end.

@juanjux juanjux merged commit 320ec3b into bblfsh:master Sep 12, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants