Implement Wikidata reconciliation (was Freebase) [$265] #805

Open
ghirardinicola opened this Issue Sep 4, 2013 · 16 comments

Projects

None yet

7 participants

@ghirardinicola

tfmorris suggest it could be an upgrade issue and this service should not be used.
The reconciliation services I see are:

  • Sindice (dbpedia.org)
  • Sindice
  • Freebase Reconciliation Service (the one tested here)
  • Freebase Query-based Reconciliation (is this the only one supported right now?)

ERROR REPORTED
11:36:13.186 [ refine] POST /command/core/guess-types-of-column (9400ms)
11:36:16.806 [ command] Failed to guess cell types for load

There is a $265 open bounty on this issue. Add to the bounty at Bountysource.

@tfmorris tfmorris was assigned Sep 4, 2013
@tfmorris
Member
tfmorris commented Sep 4, 2013

The standard style of Freebase reconciliation is still supported, but I'm guessing that we're not correctly upgrading existing installations. This was an upgrade from Google Refine 2.5, correct?

The URL listed:

4.standard-reconcile.dfhuynh.user.dev.freebaseapps.com./reconcile

is the old reconciliation service. One clue is the "Could not fetch URL: http://api.freebase.com/api/service/search?read=15000" error message. That's the previous API endpoint which Google has decommissioned.

The new reconciliation services lives at

http://reconcile.freebaseapps.com/reconcile

As a workaround, you should be able to add it by hand (Add Standard Service button at bottom left of reconciliation dialog).

I'll have a look at what needs to be done to make the upgrade smoother.

@ghirardinicola

It's not an upgrade actually, I am using the last version from github.
I was able to configure the new service by hands, thanks!

@cldwalker

@tfmorris Fwiw, I installed 2.5 on a mac last night and then when I installed 2.6-beta.1 I saw this issue. The suggested workaround works. Thanks!

@tfmorris
Member

@cldwalker Thanks for the confirmation. We'll have it fixed before the next kit goes out.

@ghirardinicola

I just downloaded the trunk (and cleaned the settings ) and I still have this problem using freebase reconciliation.
Creating a new reconciliation service using the new url works.

This the error:

    <h1>Error in <span class="script">//standard-reconcile.freebaseapps.com./reconcile</span></h1>

    <p class="msg">JS exception: acre.errors.URLError: urlfetch failed: 410</p>
@ghirardinicola ghirardinicola changed the title from Freebase Reconciliation Service hangs when selected (working...) to Freebase Reconciliation Service hangs when selected (working...) [$68] Apr 19, 2014
@thadguidry
Member

We just need to add support for the new Freebase /reconcile service on googleapis
https://developers.google.com/freebase/v1/reconciliation-overview

And ensure that a OpenRefine user has a dialog box where they can input their API key.

@tfmorris
Member

There are a couple of different problems describe in this thread. The most recent problem is that the entire freebaseapps.com domain has been retired, so anything that lives on it, including our new Freebase reconciliation service, is gone.

@thadguidry The new reconciliation APIs have been supported since they were introduced. Unfortunately they were proxied through a service which has been shut down by Google. We can either:: a) host the service somewhere else or b) special case Freebase support differently from all the other reconciliation services and self-host it in the Refine server.

@thadguidry
Member

b) self host in Refine. I already spoke with David H. today about that actually and he also thought it was a good idea.

@ghirardinicola ghirardinicola changed the title from Freebase Reconciliation Service hangs when selected (working...) [$68] to Freebase Reconciliation Service hangs when selected (working...) [$165] Apr 23, 2014
@magdmartin
Member

If the reconciliation service is hosted on the machine running refine, what are the impact on the local resource comsumption (RAM and processor). Refine is already demanding on local resource for large project, should we worry about adding an other local service?

@thadguidry
Member

Martin, The design should support the reconciliation service to be OFF by default, and enabled as an OpenRefine preference.

@vladan-me

Eh, I was planning a whole project based on this feature and it broke... Some sort of alternative is to use Fetching URLs From Web Services but that can be slow, limited (single query request) and it creates another column which I will have to rename after deleting previous one? Do you have any other temporary better idea?

@ghirardinicola ghirardinicola changed the title from Freebase Reconciliation Service hangs when selected (working...) [$165] to Freebase Reconciliation Service hangs when selected (working...) [$265] Jun 6, 2014
@ghirardinicola

Do you have an advice on how to implement this?
In order to start I'd like to know where is the code of the existing wrapper.
Thanks!

@thadguidry thadguidry changed the title from Freebase Reconciliation Service hangs when selected (working...) [$265] to Implement Wikidata reconciliation (was Freebase) [$265] Jan 2, 2015
@thadguidry
Member

I have updated the bounty / issue to reflect the new needs for Wikidata Reconciliation. (Given that Freebase is going away and will be absorbed into Wikidata this spring)

The starting point for those interested looks like this:

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=Valve&language=en&type=item

Help for the Wikidata API is here: https://www.wikidata.org/w/api.php

@tfmorris tfmorris added enhancement and removed bug labels Apr 30, 2015
@tfmorris tfmorris modified the milestone: 2.7, 2.6 Apr 30, 2015
@magnusmanske

I did implement one a while ago, partially. Will that do?
https://tools.wmflabs.org/wikidata-reconcile/

@thadguidry
Member
thadguidry commented Jun 25, 2016 edited

@magnusmanske
Its a bit more than just your server side on Wikidata :) changes also need to be done on our client side in OpenRefine as well. Essentially, the Freebase Standard Reconcile as a default needs to be replaced. This requires working through and cleaning up most of the Java, JS, HTML, and JSON files here: https://github.com/OpenRefine/OpenRefine/search?utf8=%E2%9C%93&q=reconciliation+OR+reconcile+OR+recon&type=Code

The bounty gets awarded when both sides are done. The bounty can be split between developers if they wish, for instance, if you want to have someone do the client side improvements in our code, while you take credit and a partial bounty for the server side. Up to those involved, we don't care as long as it gets done properly and tests complete. Good luck, get others involved, or finish it all yourself :)

Also @magnusmanske when OpenRefine performs the guess-types-of-columns command there's sometimes failures against the Wikidata reconcile. I know that Wikidata doesn't really have Types, but instead of displaying Q numbers, our users are looking to see name values, like "automobile", "animal", etc... not Q1420 and Q729, etc.

`er'>2.1758411128file_get_contents
( )../index.php:147

{"q0":{"result":[{"id":"Q2085381","score":0.5,"match":false,"type":[],"name":"publisher"},{"id":"Q649953","score":0.33333333333333,"match":false,"type":["Q618779"],"name":"Pulitzer Prize for Editorial Cartooning"},{"id":"Q871232","score":0.5,"match":true,"type":["Q4894405"],"name":"editorial"}],"total_search_results":818},"q1":{"result":[{"id":"Q700750","score":0.5,"match":false,"type":["Q215380"],"name":"Blank & Jones"},{"id":"Q6529244","score":0.33333333333333,"match":false,"type":["Q5"],"name":"Les Blank"},{"id":"Q18441355","score":0.25,"match":false,"type":["Q134556","Q7366"],"name":"Blank Space"}],"total_search_results":540}} couldn't be parsed as JSON object at com.google.refine.util.ParsingUtilities.evaluateJsonStringToObject(ParsingUtilities.java:131) at com.google.refine.commands.recon.GuessTypesOfColumnCommand.guessTypes(GuessTypesOfColumnCommand.java:196) at com.google.refine.commands.recon.GuessTypesOfColumnCommand.doPost(GuessTypesOfColumnCommand.java:89) at com.google.refine.RefineServlet.service(RefineServlet.java:177) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)`

capture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment