Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling of conversions near punctuation #81

Closed
RichardForshaw opened this issue Sep 20, 2022 · 5 comments
Closed

Handling of conversions near punctuation #81

RichardForshaw opened this issue Sep 20, 2022 · 5 comments

Comments

@RichardForshaw
Copy link

I recently upgraded unidecode, and saw some failing test.

The test in question:

"Pickup 65” TV from Platform 9¾, Kingʹs Cross Station."

The result:

- Pickup 65" TV from Platform 93/4, King's Cross Station.
+ Pickup 65" TV from Platform 9 3/4 , King's Cross Station.

I think that separating the 9 from the 3/4 is a good idea, so as to distinguish it from the possibility of 93 / 4 (which the original is not), however there is also a space placed between the 3/4 and the comma which does not read well.

Not a major issue but probably something that will bug people.

@avian2
Copy link
Owner

avian2 commented Sep 20, 2022

The extra space was introduced in b8af436 to prevent fractions from merging with adjacent numbers.

@RichardForshaw
Copy link
Author

Yes, I think that adding the extra space at the start to prevent the merging is good, but I wonder if the extra trailing space is needed? I can't currently think of any examples where something deliberately adjacent & following a fraction such as ¾ would require a space separation. I expect other trailing numbers would already have a space in the original string. (But preceding numbers may not, in which case I agree introducing a space there is a good thing).

@avian2
Copy link
Owner

avian2 commented Sep 23, 2022

@IamJeffG Since you contributed the commit that added the spaces, do you have any objection to removing the trailing space?

@IamJeffG
Copy link
Contributor

I have no objection to that change. I do often deal with ranges like "¼–½" but I'm equally fine to receive "1/4-1/2" as "1/4 - 1/2".

If anyone is out there who's parsing strings like "½¾" or "¾9", they would view the change as a regression, but seems very unlikely. In those cases I'm not even sure we can intuit what the expected behavior ought to be.

@avian2
Copy link
Owner

avian2 commented Sep 28, 2022

Thank you both for your comments. I'm removing the trailing space in the replacements for fractions. I'll be releasing a new version of Unidecode with this change shortly.

@avian2 avian2 closed this as completed in 92147e6 Sep 28, 2022
marcoffee pushed a commit to marcoffee/unidecode that referenced this issue Dec 14, 2022
This prevents adding an extra space between fraction and punctuation in strings
like "Platform 9¾, Kingʹs Cross Station".

Closes avian2#81
marcoffee pushed a commit to marcoffee/unidecode that referenced this issue Dec 14, 2022
This prevents adding an extra space between fraction and punctuation in strings
like "Platform 9¾, Kingʹs Cross Station".

Closes avian2#81
marcoffee pushed a commit to marcoffee/unidecode that referenced this issue Dec 15, 2023
This prevents adding an extra space between fraction and punctuation in strings
like "Platform 9¾, Kingʹs Cross Station".

Closes avian2#81
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants