Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Improve handling of non-ASCII letters #82
Description of the Change
Clojure allows for unicode characters in keywords, symbols, etcetera. This PR replaces matching of ASCII only letters to corresponding unicode matching. Making for example this code snippet tokenize and colorize correctly:
(defn ^:kräsen äppelmust [äpplen] (when (:Åkerö äpplen) (-> äpplen (pressa) (häll-på-flaska) (sätt-på-etikett))))
Generally the changes are applied to regexp character classes like so:
I considered adding separate tests for this, but decided against it as it is really a hygien factor for the grammar to match Clojure's reader as good as possible.
Another way it could be tested is to just change all
So Instead, in places where I couldn't just add non-ASCII strings to a list of test strings, I added asserts and attached comments about the non-ASCII nature of them.
For people using their non-english, native languages for naming stuff in their code, or people who just use a wider naming space than ASCII, this will make the syntax highlighting work much better.
I don't see how it could hurt anywhere.