Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster & Edit doesn't work anymore in OR 3.6.0 #5138

Closed
RolfBly opened this issue Aug 2, 2022 · 13 comments · Fixed by #5153
Closed

Cluster & Edit doesn't work anymore in OR 3.6.0 #5138

RolfBly opened this issue Aug 2, 2022 · 13 comments · Fixed by #5153
Assignees
Labels
clustering Issues related to the clustering operation, to merge similar values in a text column Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Type: Bug Issues related to software defects or unexpected behavior, which require resolution.

Comments

@RolfBly
Copy link
Contributor

RolfBly commented Aug 2, 2022

To Reproduce

Select any appropriate column, Edit cells, Cluster and edit; tick some boxes, select Merge and recluster.

Current Results

The first time, the same clusters re-appear, it hasn't processed any. A subsequent time, a message "You must check some Edit? checkboxes for your edits to be applied." appears, but some boxes are ticked.
The problem appears on a data set of just over 5000 rows, 6 columns.
On a small sample from my dataset, the problem is less severe. It processes 1 of 2 clusters selected.

Expected Behavior

In version 3.5.2 everything still works fine.

Screenshots

Schermafbeelding 2022-08-02 165813

Versions

  • Operating System: Windows 10 10.0.19044.1766
  • Browser Version: Google Chrome 103.0.5060.134 (Official build) (64-bits)
  • JRE or JDK Version: openjdk version "17.0.3" 2022-04-19
  • OpenRefine: 3.6.0
@RolfBly RolfBly added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Aug 2, 2022
@elebitzero
Copy link
Member

I tried to reproduce this on Windows 10, Java17, OpenRefine 3.6 with a dataset with 88k rows and 40 columns, but it worked fine for me. I thought maybe it is a timing issue, so I used developer tools to introduce a 10 second lag time in network requests, but it still worked fine.

Do you see any errors in the JavaScript console, or on the command line where OpenRefine server is running? If you can share your dataset, we can try to reproduce the issue with the same data.

Operating System: Windows 10 Pro 10.0.19044.1826
Browser: Google Chrome Version 103.0.5060.134 (Official Build) (64-bit)
JRE or JDK Version: java 17.0.2 2022-01-18 LTS
OpenRefine: 3.6 downloaded 8/2/2022

@RolfBly
Copy link
Contributor Author

RolfBly commented Aug 3, 2022

Meanwhile,

  • I've asked in the Google group how to share the data set privacy-proof - there's no formal way for it. I can zip it with password but how to get the password safely to you?
  • I've tried the same dataset on a different machine with the same setup, problem appears there too
  • I made a recording showing the problem
  • I've asked my customer if they have any objection with me sharing the data set.

Just FYI, bear with me please.

Also, @wetneb, I noticed that too. Thanks for dotting the i.

@ostephens
Copy link
Sponsor Member

@RolfBly can you share the recording?
Also I notice that @elebitzero asked:

Do you see any errors in the JavaScript console, or on the command line where OpenRefine server is running?

Are you able to check either of these?

@wetneb wetneb added clustering Issues related to the clustering operation, to merge similar values in a text column Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. labels Aug 3, 2022
@RolfBly
Copy link
Contributor Author

RolfBly commented Aug 3, 2022

There are no errors in the console, it just stops as soon as the error message appears. The last line in the console is
14:58:11.688 [ compute-clusters_command] computed clusters [binning] in 196ms (29ms)
which is after a partially succesful merge & recluster.
Then you get a new list in the cluster & edit window.
Partially because a number of names that you did in the previous step, reappears, in other words it has not clustered them. So you re-select, hit Merge & re-cluster. At that point the error window appears. Nothing changes in the console. No lines added.

Unfortunately, my customer won't allow uploading the data or the screen recording here. If there's any other way of getting it to the right people, and to them only, please let me know.

That said, it's not that urgent. There is an easy workaround: just use version 3.5.2.

@ostephens
Copy link
Sponsor Member

ostephens commented Aug 4, 2022

@RolfBly thanks.

So far I've failed to recreate this locally - but that doesn't really mean anything beyond knowing that the problem doesn't occur in all circumstances.

I know this is asking more effort from you but just to try to see if we can recreate this on other datasets are you able to share step by step what you did (for example: is the Cluster option done from a facet or from the Edit Cells menu? Do you have any facets applied when you do the clustering? etc.) so we can just eliminate any variations in workflow from the issue? (I'm assuming that just writing down the steps is easier, but an alternative might be to share a screen recording with any confidential information blurred out - just trying to make sure I'm not missing any step when I try to recreate the issue locally)

@RolfBly
Copy link
Contributor Author

RolfBly commented Aug 4, 2022

Meanwhile, I've found an open source data set that I can hopefully reproduce the problem with. I'll have to do this in my spare time, so it may take a while.

There is one additional thing: my workspace is a Dropbox folder that gets auto-synced to the cloud, plus I've set the auto-save interval to 5 in openrefine.l4j.ini. This is because in the recent past, I've had a few important projects disappear or corrupted and I've never been able to find out why or how.

@ostephens
Copy link
Sponsor Member

What's the dataset @RolfBly - I can have a look as well

@RolfBly
Copy link
Contributor Author

RolfBly commented Aug 4, 2022

UFO_sightings.csv.zip
Here you go. The offending columns in my case has [name]|[iso-date] or just [name] where name to cluster and edit may be something like

Doe, john, simon
Doe, John Simon
DOE, John

It probably should work with location names too.

@ostephens
Copy link
Sponsor Member

Thanks @RolfBly Do you have a Text Facet displaying for the column (in the project Facet / Filter pane) while you are doing the clustering?

@RolfBly
Copy link
Contributor Author

RolfBly commented Aug 4, 2022

No.

@RolfBly
Copy link
Contributor Author

RolfBly commented Aug 4, 2022

I can reproduce the problem with this data set. Just create project from the csv, on the city colum go to edit, cluster & edit, select a few things, hit Merge & re-cluster. Error. See screen recording.
https://user-images.githubusercontent.com/198277/182936737-3e4f9759-ecc1-4de8-a5c6-b150b39f7aac.mp4

Again, nothing happens in the console. Is there any way to turn on verbose reporting? I've done that before by setting a switch in Preferences, but that was for a plugin.

@elebitzero
Copy link
Member

The first time, the same clusters re-appear, it hasn't processed any.

I cannot reproduce this part of your issue.

A subsequent time, a message "You must check some Edit? checkboxes for your edits to be applied." appears, but some boxes are ticked.

Once I saw your screen recording, I could easily reproduce this part. I had been directly clicking the 'Merge' checkbox, but I saw you were clicking on the links that select the value to use and check the 'Merge?' checkbox.

I found a bug that was introduced in 3.6.0 due to a mistake in the jQuery 3.6.0 migration. I am submitting a PR to fix it.

@RolfBly
Copy link
Contributor Author

RolfBly commented Aug 5, 2022

The first time, the same clusters re-appear, it hasn't processed any.

On the data set provided, I didn't see this either. I'll wait & see if the fix resolves it for the set I can't share.

@tfmorris tfmorris removed the Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators label Oct 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clustering Issues related to the clustering operation, to merge similar values in a text column Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants