Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of non-ASCII letters #82

merged 1 commit into from Apr 5, 2019


Copy link

@PEZ PEZ commented Nov 14, 2018

Description of the Change

Clojure allows for unicode characters in keywords, symbols, etcetera. This PR replaces matching of ASCII only letters to corresponding unicode matching. Making for example this code snippet tokenize and colorize correctly:

(defn ^:kräsen äppelmust [äpplen]
  (when (:Åkerö äpplen)
    (-> äpplen

Generally the changes are applied to regexp character classes like so:

  • [a-zA-Z0-9] -> \w
  • [a-zA-Z] -> \p{L}
  • [a-z]-> \p{Ll}

Alternate Designs

Concerning tests:

I considered adding separate tests for this, but decided against it as it is really a hygien factor for the grammar to match Clojure's reader as good as possible.

Another way it could be tested is to just change all foo and bar and stuff to things like Öπ, but it would make the tests generally a bit less clear, I think.

So Instead, in places where I couldn't just add non-ASCII strings to a list of test strings, I added asserts and attached comments about the non-ASCII nature of them.


For people using their non-english, native languages for naming stuff in their code, or people who just use a wider naming space than ASCII, this will make the syntax highlighting work much better.

Possible Drawbacks

I don't see how it could hurt anywhere.

Applicable Issues

Fixes #57

Copy link
Contributor Author

@PEZ PEZ commented Nov 20, 2018

Will someone look at this?

Copy link

@rsese rsese commented Nov 20, 2018

Thanks @PEZ - someone from the team will take a look as soon as they can.

Copy link
Contributor Author

@PEZ PEZ commented Nov 21, 2018

Thanks. The process gets a bit extra slow for me, since I want this into VS Code, and they instruct me to first add it here, because their syntax tokenizer is built from this one.

@nathansobo nathansobo merged commit 6012a06 into atom:master Apr 5, 2019
2 checks passed
2 checks passed
continuous-integration/appveyor/pr AppVeyor build succeeded
continuous-integration/travis-ci/pr The Travis CI build passed
@nathansobo nathansobo self-assigned this Apr 5, 2019
Copy link

@nathansobo nathansobo commented Apr 5, 2019

Thanks very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants