Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of non-ASCII letters #82

Merged
merged 1 commit into from Apr 5, 2019

Conversation

Projects
None yet
3 participants
@PEZ
Copy link
Contributor

commented Nov 14, 2018

Description of the Change

Clojure allows for unicode characters in keywords, symbols, etcetera. This PR replaces matching of ASCII only letters to corresponding unicode matching. Making for example this code snippet tokenize and colorize correctly:

(defn ^:kräsen äppelmust [äpplen]
  (when (:Åkerö äpplen)
    (-> äpplen
      (pressa)
      (häll-på-flaska)
      (sätt-på-etikett))))

Generally the changes are applied to regexp character classes like so:

  • [a-zA-Z0-9] -> \w
  • [a-zA-Z] -> \p{L}
  • [a-z]-> \p{Ll}

Alternate Designs

Concerning tests:

I considered adding separate tests for this, but decided against it as it is really a hygien factor for the grammar to match Clojure's reader as good as possible.

Another way it could be tested is to just change all foo and bar and stuff to things like Öπ, but it would make the tests generally a bit less clear, I think.

So Instead, in places where I couldn't just add non-ASCII strings to a list of test strings, I added asserts and attached comments about the non-ASCII nature of them.

Benefits

For people using their non-english, native languages for naming stuff in their code, or people who just use a wider naming space than ASCII, this will make the syntax highlighting work much better.

Possible Drawbacks

I don't see how it could hurt anywhere.

Applicable Issues

Fixes #57

@PEZ

This comment has been minimized.

Copy link
Contributor Author

commented Nov 20, 2018

Will someone look at this?

@rsese

This comment has been minimized.

Copy link
Member

commented Nov 20, 2018

Thanks @PEZ - someone from the team will take a look as soon as they can.

@PEZ

This comment has been minimized.

Copy link
Contributor Author

commented Nov 21, 2018

Thanks. The process gets a bit extra slow for me, since I want this into VS Code, and they instruct me to first add it here, because their syntax tokenizer is built from this one.

@nathansobo nathansobo merged commit 6012a06 into atom:master Apr 5, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@nathansobo nathansobo self-assigned this Apr 5, 2019

@nathansobo

This comment has been minimized.

Copy link
Contributor

commented Apr 5, 2019

Thanks very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.