Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(text-to-unicode): handle non-BMP + more conversion options #1087

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

lionel-rowe
Copy link

@lionel-rowe lionel-rowe commented May 14, 2024

Fixes #1081
Fixes #1175

image

Copy link

vercel bot commented May 14, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Updated (UTC)
it-tools ✅ Ready (Inspect) Visit Preview Aug 9, 2024 0:23am

Copy link

sonarcloud bot commented May 15, 2024

Quality Gate Passed Quality Gate passed

Issues
7 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@sharevb
Copy link
Contributor

sharevb commented May 15, 2024

Hi @lionel-rowe, great job, could be interesting to add a transcript including "character names" (ie, using https://www.npmjs.com/package/@unicode/unicode-15.1.0) ?
This would fix #544

@lionel-rowe
Copy link
Author

lionel-rowe commented May 17, 2024

could be interesting to add a transcript including "character names" (ie, using https://www.npmjs.com/package/@unicode/unicode-15.1.0) ? This would fix #544

I have a standalone tool I currently use that provides similar functionality. The JSON file mapping chars/ranges to their names is around 2MB, which doesn't seem like a reasonable amount to pull in unconditionally here. An async solution loading only the relevant Unicode blocks with dynamic import() (perhaps with ASCII range loaded synchronously by default) could work, but it'd take a little work to make async-friendly.

Edit: Actually it looks like the RLE+gzip+base64-encoded version of the name data in the node-unicode package you linked to is "only" ~194 KB, which is much more reasonable, especially if only loaded conditionally. And it could be reduced by a further ~25% if loaded as a raw binary file instead of base64. But I'm not convinced it's within the scope of this tool, it's more of a Unicode "explainer" than a Unicode "converter". Output should probably be tabular, maybe with a link to an external site like compart (like in my standalone tool). And the reverse direction (if implemented) obviously wouldn't be inputting that tabular data, it'd be a search function, preferably with fuzzy matching. In any case, it's definitely not a simple bidirectional converter like the text-to-unicode tool.

@sharevb
Copy link
Contributor

sharevb commented May 19, 2024

Hi @lionel-rowe, yes, right this is not bi-directional and yes the reverse will be a lookup

@sharevb
Copy link
Contributor

sharevb commented Jul 7, 2024

Hi @lionel-rowe, implemented text-to-unicode-names in #1183

Copy link

sonarcloud bot commented Aug 9, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unescape JSON Unicode sequences Pile-of-poo test for text-to-unicode
2 participants