Skip to content
This repository has been archived by the owner on Mar 8, 2020. It is now read-only.

Non semantically mapped properties on semantic objects #361

Open
juanjux opened this issue Feb 14, 2019 · 3 comments
Open

Non semantically mapped properties on semantic objects #361

juanjux opened this issue Feb 14, 2019 · 3 comments
Assignees

Comments

@juanjux
Copy link
Contributor

juanjux commented Feb 14, 2019

The problem:

Sometimes (or almost always) native objects that have a semantic mapping have more properties than the ones we have in our schema. What to do with them? This is more evident with the latest SDK update that produces and error when you silently drop any field on the semantic mapping.

Possible solutions:

  1. Drop them explicitly.
    Pros: trivial to implement. User could still have access in the non semantic UAST.
    Cons: It's ugly to lose information in the semantic UAST.

  2. Add an extra or similar field on GenNode where these properties can be loaded.
    Pros: easy to implement.
    Cons: turns the semantic objects into "schema-less" at least regarding the contents of the extra field.

  3. Add an SDK feature to get a list of strings and create a group where these properties are together with the semantic node. Then call that on the Normalize transform. It's similar to 2), but with the properties one level above instead of into the semantic object. This is similar to what the C# and Python driver do (in different ways) to keep comments.
    Pros: easy to implement (on the drivers, unsure about the SDK feature itself).
    Cons: more noisy semantic UAST. Adds groups without real semantic meaning. Have to implement feature in the SDK. More boilerplate for the driver programmer. Won't always work for all drivers (the C# driver for example sometimes maps the comments inside properties instead of a parent group).

  4. Continue writing custom transforms for each driver like the C# and Python driver do.
    Pros: will work for each driver.
    Cont: Very cumbersome to write and maintain, ugly, fails the DRY principle since most will be similar but not equal.

  5. Other solutions...?

This is semi-urgent since the latest SDK made mandatory to map these fields.

@bzz
Copy link
Contributor

bzz commented Feb 14, 2019

The same is need and will be handled by 4 in bblfsh/javascript-driver#57

@juanjux
Copy link
Contributor Author

juanjux commented Feb 14, 2019

@bzz yes, for what I'm seeing, it'll happen on almost all drivers. For the Python driver I'm fixing some node-specific mappings but resorting to 1) + adding an issue for the ones that can be on almost every node until we've a better solution.

@dennwc dennwc self-assigned this Feb 14, 2019
@dennwc
Copy link
Member

dennwc commented Feb 14, 2019

  1. Agree, this cannot be done - dropping information defeats the whole purpose of Semantic mode.
  2. I would argue against it. Semantic mode = everything should have a meaning, and dumping unknown things into extra will silently change the meaning of the node in question, so all our guarantees and the user has no way to know how exactly.
  3. This is actually a way to go from my point of view. I'll describe why it's the case below.
  4. It's more like a question of how we implement 3 - it turned out to be a more common pattern than I expected, so I will make a helper in SDK to assist with those transformations.
  5. Graphs. 😁

For now, I propose to drop those fields explicitly with a FIXME comment. Drivers already do this because of the validation bug, so nothing will change here for users, and we will at least identify all the failing cases.

A bit more on why I think dumping fields to the extra is not optimal. The way Semantic mode is designed is to define the smallest possible building blocks for a UAST, each having a clear and distinct meaning. Adding extra breaks this base assumption - node may contain everything, and, as a user, you can no longer trust the meaning.

You may ask how it's different from having external nodes wrapping semantic nodes. The best way to describe it is that by design, each semantic node = a UAST role in the old SDK. Adding extra is similar to adding some parameters to each role - they are not distinct and clear anymore. But adding and composing roles (read: semantic nodes) makes more sense - each one has a meaning, and the composite meaning is a sum of those. The problem with original role annotations was that they are not ordered and not structured, hence ambiguous. We should not repeat this mistake with semantic nodes.

@juanjux May rightfully argue that we already use external nodes to dump unknown properties in some cases. That's true, but it's not because of the issues with the design, nor schema. It's an issue with a DSL implementation - there are no helpers to do this in a clean way right now, so we either group unknown properties to a separate external node, or write custom transforms that solves it properly.

So from my perspective, we need a bit more helpers in SDK that will allow to move nodes cleanly. This is indeed urgent, but C# is a priority, so I'll look into it right after I finish the current work on annotations.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants