Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistent handling of graphemes πŸ‘©πŸ½β€πŸŒΎπŸ§ŸπŸ§žβ€β™€οΈπŸ¦Ήβ€β™‚οΈπŸ§”πŸ§›πŸΎβ€β™‚οΈ #73

Closed
drahnr opened this issue Jul 11, 2020 · 11 comments
Labels
bug Something isn't working internal βš™οΈ Internal issues or TODOs

Comments

@drahnr
Copy link
Owner

drahnr commented Jul 11, 2020

Describe the bug
In various places, the length used for offset calculation is used in bytes, which breaks Dow when multicharacter graphemes are used in literals.

To Reproduce
Currently there is a lack of unit tests to cover this issue sufficiently ( = none)

Expected behavior
Spans must be calculated based on graphemes (superset of chars) OR bytes consistently.

@drahnr drahnr added bug Something isn't working internal βš™οΈ Internal issues or TODOs labels Jul 11, 2020
@drahnr drahnr added this to the v0.3.0 milestone Jul 14, 2020
@drahnr
Copy link
Owner Author

drahnr commented Jul 15, 2020

Requires an external library https://crates.io/crates/unicode-segmentation since String does not handle that too well https://doc.rust-lang.org/std/string/struct.String.html#method.chars

CC @laysauchoa

@drahnr drahnr modified the milestones: v0.3.0, v0.4.0 Jul 16, 2020
@drahnr
Copy link
Owner Author

drahnr commented Jul 23, 2020

Since graphemes are not even displayed in vscode, I think it's safe to say that operations on unicode chars is sufficient.

@laysauchoa
Copy link
Collaborator

laysauchoa commented Jul 25, 2020

CC @drahnr

My vscode actually shows emojis and I see that it gets read as it was a misspelled word.

cargo-spellcheck-emojis

although, I am not sure if it is common to use emojis in documentation. ^^

@drahnr
Copy link
Owner Author

drahnr commented Jul 25, 2020

It's more common than you'd expect, and you can see the issue already above, it should be one ^ instead of 4 (and it's not even a grapheme)

@drahnr
Copy link
Owner Author

drahnr commented Jul 29, 2020

@laysauchoa this should not be the case anymore, is it?

@laysauchoa
Copy link
Collaborator

laysauchoa commented Jul 30, 2020

cc @drahnr

yes ^^, this is not the case anymore.

error: spellcheck(Hunspell)
   --> /home/tmhdev/Documents/cargo-spellcheck/src/suggestion.rs:23
    |
 23 | ..ation. This is a very logn sentence with some errors and emojis Search Resul πŸ˜‹πŸ˜‹
    |                                                                          ^^^^^
    | - Result or Restful
    |
    |   Possible spelling mistake found.
    |

How did it ignore emoji now or do not recognize as an error? Just curious!

Although, would this issue still be relevant due to some misalignment when emojis are present?

error: spellcheck(Hunspell)
   --> /home/tmhdev/Documents/cargo-spellcheck/src/suggestion.rs:23
    |
 23 | ... This is a very logn πŸ˜‹πŸ˜‹πŸ˜‹πŸ˜‹ sentence with some errors and emojis Search Resul πŸ˜‹πŸ˜‹
    |                                                             ^^^^^^
    | - emoji
    |
    |   Possible spelling mistake found.
    |

@drahnr
Copy link
Owner Author

drahnr commented Jul 30, 2020

Would you like to look into the offset issue, I would have expected this to be ok with the 0.3.0 release at least.

Could you create a PR with an end2end style test showing the issue, so this is easily reproducable, no matter if you want to fix it or not, that'd be very helpful already!

@laysauchoa
Copy link
Collaborator

laysauchoa commented Jul 30, 2020

Hi @drahnr.

Do you mean the offset issue when the emojis are being used, such as the example above?

Do you mean a test such as those ones [0]?
[0] https://github.com/drahnr/cargo-spellcheck/blob/master/src/suggestion.rs#L699

If yes, then yes, I can try to add. If not, please give me more info, and I will check it.

@drahnr
Copy link
Owner Author

drahnr commented Jul 31, 2020

Hi @drahnr.

Do you mean the offset issue when the emojis are being used, such as the example above?

Do you mean a test such as those ones [0]?
[0] https://github.com/drahnr/cargo-spellcheck/blob/master/src/suggestion.rs#L699

If yes, then yes, I can try to add. If not, please give me more info, and I will check it.

Yes, that's what I meant :)

@drahnr drahnr modified the milestones: v0.4.0, v0.5.0 Aug 17, 2020
@drahnr
Copy link
Owner Author

drahnr commented Aug 20, 2020

The largest remaining issue here is that the display of graphemes might be individual smilies but also combined into a single character so the width / offset of ^^^^ might be wrong.
The resolution here would be a way to probe the terminal and resolve that, plus a grapheme library, but this smells like distant future.

@drahnr drahnr removed this from the v0.5.0 milestone Aug 20, 2020
@drahnr
Copy link
Owner Author

drahnr commented Sep 10, 2020

This is an edge case, and unless somebody actually uses graphemes AND a terminal that can display them, this won't be an issue. This is not on the agenda for any-time-soonβ„’. Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working internal βš™οΈ Internal issues or TODOs
Projects
None yet
Development

No branches or pull requests

2 participants