Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added Reconciliation Services disappear on restart #2627

Closed
chsteiner opened this issue May 19, 2020 · 9 comments · Fixed by #2657
Closed

added Reconciliation Services disappear on restart #2627

chsteiner opened this issue May 19, 2020 · 9 comments · Fixed by #2657
Assignees
Labels
Priority: High Denotes issues that require urgent attention and may be blocking progress. reconciliation Related to the reconciliation operations and other features Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Milestone

Comments

@chsteiner
Copy link

chsteiner commented May 19, 2020

Describe the bug
All added Reconciliation Services other than the default "Wikidata (en)" disappear on restart.
The only hint I get in the console could be this line: [FileProjectManager] com.fasterxml.jackson.databind.JsonMappingException: Invalid UTF-8 middle byte 0x72
at [Source: (File); line: 1, column: 2280] (through reference chain: com.google.refine.io.FileProjectManager["preferences"]->com.google.refine.preference.PreferenceStore["entries"]) (368ms)

This only happens with OpenRefine 3.3 and 3.4 (Beta). I also tested 3.1 and 3.2 and encountered no problems there!

I tested it on two different Windows machines with the same bug persisting on both.

To Reproduce
Steps to reproduce the behavior:
Just add any reconciliation service, e.g. https://lobid.org/gnd/reconcile (GND service), shut down OpenRefine and restart

Expected behavior
all added Reconciliation Services should be preserved on restart

Desktop (please complete the following information):

  • OS: Windows 10
  • Browser Version: Chrome Version 81.0.4044.138 (64-Bit)
  • JRE or JDK Version: java version "1.8.0_251"

OpenRefine (please complete the following information):

  • Version 3.3

Datasets
all of my datasets are affected

@wetneb wetneb added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. Priority: High Denotes issues that require urgent attention and may be blocking progress. reconciliation Related to the reconciliation operations and other features Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels May 19, 2020
@tfmorris
Copy link
Member

Sorry to hear about your trouble. Could we get a copy of your workspace.json file (and workspace.old.json if it exists)? To find it, click on "Open Project" then "Browse Workspace Directory" at the bottom of the page.

I suspect that there is something corrupted in it (probably not properly escaped by us).

@chsteiner
Copy link
Author

Hi, thanks for offering help! Both files are incuded in this zip file:

workspace.zip

@tfmorris
Copy link
Member

Thanks. I don't see anything obviously bad about those files and when I add the GND service, it remains after restart.
Do you get the error every time or was it just a one time thing?
How are you shutting down / restarting OpenRefine?

@chsteiner
Copy link
Author

I get it every time and I get it on two different Laptops, so I thought it should be reproducable.
I am shutting down using ctrl+c in the console.

@chsteiner chsteiner reopened this May 22, 2020
@chsteiner
Copy link
Author

chsteiner commented May 29, 2020

any ideas what I can try? this is really slowing down my work. I also teach some untis about OpenRefine and literally everyone using a Windows machine has this problem. So it must be reproducible for you. This is a problem that persists in 3.4 Beta btw!

i can narrow it down to the problem happening with the POST /command/core/set-preference command. Every time this command is run, after restart we get this:

FileProjectManager] com.fasterxml.jackson.databind.JsonMappingException: Invalid UTF-8 middle byte 0x72
 at [Source: (File); line: 1, column: 2277] (through reference chain: com.google.refine.io.FileProjectManager["preferences"]->com.google.refine.preference.PreferenceStore["entries"]) (353ms)

@tfmorris
Copy link
Member

Googling the error makes me think that it's due to the wrong file encoding being used during JSON serialization for platforms which don't have UTF-8 as their default encoding (e.g. Windows). It's also only likely to be trigger when the JSON has non ASCII characters in it (e.g. "Körperschaft").

As a workaround, you could try adding -Dfile.encoding=UTF-8 to the Java command line in your refine.bat file. I'm not sure offhand if this will have any negative side effects on other file I/O, but I suspect not (everyone should be using UTF-8 these days anyway).

literally everyone using a Windows machine has this problem. So it must be reproducible for you.

That assumes, of course, that I'm running Windows, which is not the case.

I'll investigate a permanent fix to the JSON serializer.

@tfmorris
Copy link
Member

Perhaps related to #2543 #2544

tfmorris added a commit to tfmorris/OpenRefine that referenced this issue May 30, 2020
Fixes OpenRefine#2543 OpenRefine#2544 OpenRefine#2627.

Always use UTF-8 to write JSON because platform default encoding
might not be legal JSON (e.g. ISO 8859-1).

Also be more conservative about keeping backups if we fail to write.
@tfmorris tfmorris added this to the 3.4 milestone May 30, 2020
@wetneb
Copy link
Sponsor Member

wetneb commented May 30, 2020

@tfmorris I can try to reproduce this on a Windows machine in mid-June and check your PR too.

@chsteiner
Copy link
Author

chsteiner commented Jun 4, 2020

image
I have acutally always set this parameter in the JRE. But it didn't seem to help with this problem.
However, adding -Dfile.encoding=UTF-8 to openrefine.l4j.ini did the trick!
Many thanks for taking care of it!

tfmorris added a commit to tfmorris/OpenRefine that referenced this issue Jun 4, 2020
Fixes OpenRefine#2543. Fixes OpenRefine#2627.

Always use UTF-8 to write JSON because platform default encoding
might not be legal JSON (e.g. ISO 8859-1).

Also be more conservative about keeping backups if we fail to write.
wetneb pushed a commit that referenced this issue Jun 6, 2020
…2657)

* Save preferences JSON using UTF-8 encoding. Bulletproof prefs load.

Fixes #2543. Fixes #2627.

Always use UTF-8 to write JSON because platform default encoding
might not be legal JSON (e.g. ISO 8859-1).

Also be more conservative about keeping backups if we fail to write.

* Handle case where backup prefs is better than more recent

* Recover from corrupted prefs with null starred list.

Fixes #2544. Replaces null with an empty list.

* Run tests with non-UTF-8 encoding

Make sure that we don't depend on UTF-8 being the default encoding
because it isn't true everywhere (e.g. Windows)

* Add test for non-ASCII chars in workspace.json

This depends on the default Java encoding being something
other than UTF-8 to test properly.
wetneb pushed a commit that referenced this issue Jun 6, 2020
…2657)

* Save preferences JSON using UTF-8 encoding. Bulletproof prefs load.

Fixes #2543. Fixes #2627.

Always use UTF-8 to write JSON because platform default encoding
might not be legal JSON (e.g. ISO 8859-1).

Also be more conservative about keeping backups if we fail to write.

* Handle case where backup prefs is better than more recent

* Recover from corrupted prefs with null starred list.

Fixes #2544. Replaces null with an empty list.

* Run tests with non-UTF-8 encoding

Make sure that we don't depend on UTF-8 being the default encoding
because it isn't true everywhere (e.g. Windows)

* Add test for non-ASCII chars in workspace.json

This depends on the default Java encoding being something
other than UTF-8 to test properly.
@tfmorris tfmorris removed the Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators label Jun 22, 2020
@tfmorris tfmorris self-assigned this Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: High Denotes issues that require urgent attention and may be blocking progress. reconciliation Related to the reconciliation operations and other features Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants