Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Latin-1 conversion table #7

Merged
merged 1 commit into from Sep 24, 2021
Merged

Conversation

jkboyce
Copy link
Contributor

@jkboyce jkboyce commented Sep 24, 2021

Great list @Q726kbXuN that you compiled at glyph_counts.txt. Very interesting to see the frequency of unusual characters in NYT crosswords.

I dropped your list as-is into the code, but removed:

  • Glyphs from u0000-u00ff inclusive, since these are valid Latin-1, and
  • Emojis, since the encoder's replacements like \N{WINKING FACE} are as good as anything I could come up with.

So this is apparently every non-Latin-1, non-emoji character ever used in a NYT puzzle. A value of 'None' in the table causes the encoder to insert its own replacement. I put in conversions where I could think of something decent, but clearly this is more art than science so please feel free to edit. :)

@Q726kbXuN
Copy link
Owner

Nice, thanks a ton for the work here, I'll (or anyone can, feel free!) to try and clean it up over time, but for now, it's a great start.

@Q726kbXuN Q726kbXuN merged commit a2899e0 into Q726kbXuN:master Sep 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants