Add strokes scoring (frontend-only): APL, UIUA, 05AB1E, BQN, and Vyxal. #2513
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR only adds an extra entry in the code editor frontend that tells you how many strokes are total/selected. The hope is this will shape discussion moving forward.
Characters that have semantic meaning in the language count for only 1 stroke, so golfing is more straightforward than dealing with UTF-8:


High codepoints count for as many strokes as they have bytes, so packing is prevented:
Reference lang-allowed-strokes.tsx for the set of allowed bytes for the languages (APL, UIUA, 05AB1E, BQN, and Vyxal).
Strokes scoring has the following properties:
chars ≤ strokes ≤ bytesbytes = strokes = chars.strokes = chars.stokes > chars(so this wards against char packing).A backend implementation would have more considerations like:
One potential concern people may have with stroke scoring is it allows more than 256 characters in some cases. In particular, 05AB1E and Vyxal both have 160 characters in their allowed-strokes set. Adding to the 127 non-null bytes under 128, this gives 287 characters that count as a single stroke.
This is much less egregious than the 'chars' scoring which has 1114110 chars that count as a single char, but it still means stroke scoring can't play on even ground with other languages using byte scoring. I don't think that matters. These golf languages aren't playing on even ground no matter what, and other languages don't play on even ground with each other anyways. Handicapping with a constant factor like 2x is silly, as is handicapping them with whatever factor UTF-8 works out to on average.
Some history: comparison to SBCS and Bytes/Chars
Bytes doesn't work well for these languages that heavily use multi-byte characters because it means code length doesn't always match visual length, since some multi-byte characters are longer than others. Also, certain languages such as APL can use 1:1 packing to encode each multi-byte character with one single-byte character.
Chars doesn't work well for many languages because they allow for 2:1 or 3:1 packers where large codepoints can be used in single chars to unpack into several bytes of regular code.
Using an SBCS (single-byte character set) is the traditional approach to this, in part motivated by CGSE's strict scoring that everything must be scored in bytes reversible from a file (I'm not sure if this policy has loosened in recent years). For example, APL has Adám's APL SBCS, and the golflangs 05AB1E and Vyxal explicitly have their own code pages: 05AB1E's codepage and [Vyxal's codepage]. (https://github.com/Vyxal/Vyxal/blob/version-3/documentation/codepage.md).
However, SBCS introduces some implementation difficulties, such as what to do with characters not in the code page. Since code.golf has many unicode-based holes (like emojify), it would be a bummer to entirely ban characters not in a code page.
APL on code.golf is a gist by ovs-code that goes into more detail about this.