Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster and edit changes space character in values to nbsp #5581

Closed
fitnycdigitalinitiatives opened this issue Jan 24, 2023 · 6 comments · Fixed by #5584
Closed

Cluster and edit changes space character in values to nbsp #5581

fitnycdigitalinitiatives opened this issue Jan 24, 2023 · 6 comments · Fixed by #5584
Assignees
Labels
clustering Issues related to the clustering operation, to merge similar values in a text column Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Milestone

Comments

@fitnycdigitalinitiatives

When using cluster and edit, when you select an existing value by clicking on the hyperlinked option, it replaces any spaces in the original values with non-breaking spaces. See here.

To Reproduce

Steps to reproduce the behavior:

  1. Use cluster and edit on a column of data
  2. In the interface, select one of the clustering option by clicking on the hyperlinked options. Select merge selected and close.
  3. New values will all have replaced standard spaces with non-breaking spaces

Current Results

Space replaced with non-breaking space which can cause inconsistencies with rest of data

Expected Behavior

Should not alter original data in this way

Screenshots

Versions

  • Operating System: Mac
  • Browser Version: Chrome
  • JRE or JDK Version:
  • OpenRefine: 3.6.2

Datasets

Additional context

@fitnycdigitalinitiatives fitnycdigitalinitiatives added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Jan 24, 2023
@thadguidry
Copy link
Member

In 3.7 version, cannot reproduce using key collision - Daitch-Mokotoff or nearest neighbor - Levenshtein.
image

@wetneb wetneb added the clustering Issues related to the clustering operation, to merge similar values in a text column label Jan 25, 2023
@wetneb
Copy link
Member

wetneb commented Jan 25, 2023

Argh, I should have thought about this! I indeed introduced those non-breaking spaces in the UI to make sure that options which differed in spaces were rendered differently in the dialog, but I did not think the values were read from the DOM when merging a cluster. Sorry about this! I'll work on a fix asap.

@wetneb wetneb removed the Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators label Jan 25, 2023
@fitnycdigitalinitiatives
Copy link
Author

Thanks @wetneb

And @thadguidry, I'm sorry I wasn't clear in my op, but it occurs when values already have spaces in them, so I think if you check 'small size' you'll see what I'm talking about.

@wetneb
Copy link
Member

wetneb commented Jan 25, 2023

Note that I could only reproduce this bug with Chromium, not with Firefox. I think it is known that they do not handle non-breaking spaces in the same way (or at least, until recently).

@robroc
Copy link

robroc commented Jan 31, 2023

If I may add to this issue... this non-breaking space added during clustering appears as \xa0 when the data is exported as CSV and loaded into pandas.

It also prevents text filtering in OpenRefine when using a normal space. For example, after clustering, the string DEW TECHNOLOGIES doesn't return when you search for DEW (with a regular space after the word). The only fix is to copy the whitespace character from the string (in a facet, for example), and do a search-replace with a regular space.

I'm using version 3.6.2 on Windows, Firefox.

@wetneb
Copy link
Member

wetneb commented Jan 31, 2023

This issue should be fixed in 3.7-beta5.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clustering Issues related to the clustering operation, to merge similar values in a text column Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants