Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bash driver is not a detected language #39

Closed
smola opened this issue Aug 7, 2018 · 10 comments · Fixed by #60
Closed

Bash driver is not a detected language #39

smola opened this issue Aug 7, 2018 · 10 comments · Fixed by #60

Comments

@smola
Copy link
Member

smola commented Aug 7, 2018

We use enry for language detection both on bblfshd side as well as on our applications. Enry, as well as github/linguist, do not detect Bash itself, but just Shell (this includes the whole family of POSIX-like shell languages).

This generates the following problem:

  • A bash file will never be routed to bash-driver if using automatic language detection at bblfshd side.
  • If we use enry in our applications, we'll never use bash-driver out of the box either.
@juanjux
Copy link
Contributor

juanjux commented Aug 7, 2018

As @mcuadros said, we could also modify enry. Alternatively, we could support more than one language name per driver in the manifest which is probably more future-proof (example: python2+python3, typescript+javascript, etc).

@smola
Copy link
Member Author

smola commented Oct 19, 2018

cc @mcuadros what should we do about this? I don't think enry (or github/linguist) is ever going to be able to tell bash apart other shell dialects in a reliable way. Should we just document this and recommend users to use Bash language when enry outputs Shell? This would be pretty nasty in gitbase...

@bzz
Copy link
Contributor

bzz commented Nov 19, 2018

support more than one language name per driver in the manifest

I like this idea as it simplifies things for clients as they can just call bblfsh with output of enry

@bzz
Copy link
Contributor

bzz commented Nov 28, 2018

Small summary of the context for this issue below.

Use case

  • client asks bblfshd for list of the supported languages
  • client uses enry to detect language and if it's supported
  • client asks bblfshd to parse \w a given language

Problem

  • "Language" field from the driver manifest can be different from language identifier that Enry uses.
    Example: bash (manifest) VS shell (enry)

Alternatives

  1. 🙅‍♂️ rename driver to shell-driver. Discarded, see Some lang names guessed by bblfsh has a different driver name bblfshd#162 (comment)
  2. change Language field to support multiple drivers (either as array, or as a string \w separator)
    • "language" field is used in many places for different purposes
    • changes would be invasive and include code for new driver generation, etc, etc
  3. add new field to manifest "enry_languages" "language_aliase"
    • use it to map incoming "language" from parse requests, before DriverPool selection
    • depending on whether we want to impose changes on clients or not:
      • either just add new field to v1 protocol1.DriverManifest and expect clients to update client version and read it instead
      • or reply to protocol1.SupportedLanguages with additional "fake" DriverManifest, one per each item of "enry_languages" "language_aliase" arrays.

It may be a good time to implement any of this directly in v2, when we decide to port SupportedLanguagesRequest from protocol v1 to v2.

WDYT about approach 3? Feedback is very welcome.

@juanjux
Copy link
Contributor

juanjux commented Nov 29, 2018

My proposal (make the languages field a list or a string with a separator) doesn't imply generating multiple drivers, just multiple names for the same drivers.

@bzz
Copy link
Contributor

bzz commented Nov 29, 2018

My proposal (make the languages field a list or a string with a separator) doesn't imply generating multiple drivers, just multiple names for the same drivers.

I'm not familiar with the codebase very much, but a quick glance showed that in this case, on top of the mandatory changes listed in 3, we would also need to

  • update the protocol definition (if changing a field type)
  • introduce new special logic in driver generation, as "language" field is used all over templates

Would not that be the case? If you could give some preliminary pointers to which parts of the codebase would be touched by such proposal, that would be very appreciated!

@juanjux
Copy link
Contributor

juanjux commented Nov 29, 2018

Yes, changes are going to be needed in any case. We could also keep the same language field and all a "language_aliases" field (instead of "enry_languages" so we're not tying the drivers to enry in any way).

@smola
Copy link
Member Author

smola commented Nov 29, 2018

I would rather use something like language_aliases to avoid tying the protocol to enry itself. If we really wanted this to reference enry, I would reference linguist instead (linguist_languages), since that is our ground truth for language names.

@bzz
Copy link
Contributor

bzz commented Nov 29, 2018

@juanjux @smola language_aliases thank you guys, makes perfect sense 👍 Updated the msg above.

Having a separate field have benefits that at least we do not need to touch any templates/driver generation part. So far, the only changes that I found to be needed in this case seem to be well localized and are listed in 3. (And we even seem to have an option of not changing a protocol1 at all! I vote for that.)

If there is no more feedback, I will be moving result of this discussion with a proposal to a separate issue in SDK in next few days, as it's not Bash specific at all.

We can keep this one open until that one is merged and released.

@juanjux
Copy link
Contributor

juanjux commented Nov 29, 2018

I'm fine with the new field then, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants