Freebase API to be retired #38

Open
thatandromeda opened this Issue Dec 26, 2014 · 27 comments

Projects

None yet

5 participants

@thatandromeda
Member

Google's retiring the Freebase API on 30 June 2015. Parts of this code depend on Freebase. What's the fallback?

@edsu
Member
edsu commented May 17, 2015

Thanks @thatandromeda it really looks like this really is happening June 30th. Wikidata have it on their roadmap to provide a Wikidata Suggest type of service. But who knows if it will be ready in time. Some work that needs to be done:

  • adjust schema to use wikidata ids
  • port over current freebase ids to wikidata ids
  • adjust curation interface to use wikidata instead of freebase
@edsu
Member
edsu commented May 17, 2015

This API call is being used by Wikidata's search, and seems to have the basics of what we would need in the UI to select employers and tags.

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=encyc&format=json&language=en&type=item&continue=0

There is a JSON-P callback to allow it to be used, to maybe help get around cross-origin requests (JavaScript from jobs.code4lib.org that wants to talk to wikidata.org).

https://www.wikidata.org/w/api.php?action=wbsearchentities&search=encyc&format=json&language=en&type=item&continue=0&callback=foo

@edsu
Member
edsu commented May 17, 2015

One possible way to map our Freebase ids to WikiData ids. https://gist.github.com/edsu/c95c9ae9f60ecdf80077

@tfmorris

Google has said that the shutdown will be delayed. I'm pretty sure it was mentioned on the Freebase mailing list, but I can't find the thread right now. If you look at the Wikidata Freebase project page, you'll see the same info:

  • In Q2 2015, a new KG-based Google API will be launched
  • Earliest three months later, the Freebase website will close (planned for Q3 2015)

Because we're already inside the three month window for June 30, the API retirement won't be happening then.

I'd suggest deferring planning of your migration strategy until things are a little clearer, but here are a few random thoughts:

  • Wikidata includes Freebase IDs, so the need to migrate away from them isn't urgent
  • WDQ is experimental and the Wikidataians are debating what the "real" query API will look like
  • There is no good Wikidata-based Freebase Suggest replacement yet, as far as I know
  • Google has said that they'll be making available Knowledge Graph replacements for Freebase Search & Freebase Suggest, but hasn't published the transition plan (which they said would be available by the end of March, 2015)

The whole thing is kind of a mess, but it seems unlikely that Freebase will get shut down without a fair amount of notice, so I'd hold off committing to a transition plan until both Wikidata and Google firm up their plans.

If/when you need to map Freebase IDs to Wikidata IDs, this bulk dump might be easier to use than an API.

@edsu
Member
edsu commented May 18, 2015

Thanks for those details @tfmorris ; I didn't know that the announcement on the Freebase website was out of date. Still, I think it should be doable to use the wbsearchentities API call to do the suggest portion, and to use WDQ as a temporary way to turn a few thousand Freebase IDs into WikiData IDs. I'd like to rip this bandaid off now rather than wait, but we'll see since I'm the only person actually maintaining shortimer at this point, and I have other things contending for my attention.

@tfmorris

@edsu - I think it's early days still for Wikidata and I have concerns about performance and stability of the API, but it's your call. I'd be happy to generate the ID mapping table for you, if that helps.

At a DPLA Hackathon a few years ago, we hacked up Freebase Suggest to work with the DPLA API. You might consider doing something similar for Wikidata. Suggest is actually one of the nicer autocomplete widgets out there (in my opinion).

https://github.com/scande3/dpla-discovery
http://static.digitalcommonwealth.org/dpla-discovery/

I don't know if you constrain your Suggest searches by type, etc, but if you're using the Freebase schema at all (types or properties), mapping to the Wikidata schema is another task that needs to be added to the list.

@edsu
Member
edsu commented May 18, 2015

The API may change, but it's hard to imagine it going away entirely after all the integration work that has gone on at Wikimedia. I'm ok with things changing -- in fact that's the best situation, because it means the service isn't dying, and people are working on it. Alas, the writing is definitely on the wall for Freebase.

The suggestions are constrained by type in a few places in shortimer: by employer and location. I see that wbsearchentities has a type parameter that could be used similarly, maybe. If a mapping of types/properties is put together that would be very useful. I think I will be OK with mapping the IDs, but I will be in touch if it gets tricky.

@edsu
Member
edsu commented Dec 23, 2015

It looks like there may be a path forward using the Google Knowledge Graph, which now has an API and they are planning on adding a suggest widget, similar to the one Freebase offers, and which is so important to the workflow here in shortimer.

Apparently even the freebase identifiers are being used, so there may not be a whole lot of cleanup work that needs to happen in the shortimer database. I think I would prefer to use Wikidata on principle, but it may be easier to transition to the Knowledge Graph.

@tfmorris

I think using the KG Suggest is the right call. The KG Search API is much less powerful than old Freebase Search API, but it should be fine for this application. The Wikidata Refine Reconciliation Service uses the websearchentities followed by WDQ/SPARQL approach internally and it doesn't appear to me that the search is very robust.

One of the things that I've got on my (long) list of spare time projects is to improve the coverage of matching for Freebase<->Wikidata mappings, which will help provide an escape path if it's needed in the future (plus having the Wikidata reconciliation service for OpenRefine should help with these types of mapping tasks).

BTW, the beta SPARQL endpoint is much faster than the experimental WDQ API, and the data is more current, if you ever have a need to query Wikidata.

@tfmorris

p.s. My interpretation is that the 3 month clock doesn't start until the KG Suggest API is available too, so there's still some time...

@edsu
Member
edsu commented Dec 23, 2015

@tfmorris thanks for your comments. If you notice KG Suggest get announced and remember this issue it would be really helpful if you can add a note here. I feel like I only accidentally noticed the KG API announcement!

@edsu
Member
edsu commented Jan 27, 2016

In preparation for the shortimer db should be updated to store the Freebase Machine ID or mid instead of the id that comes back from the suggest API. This will involve looking them up again.

@tfmorris

Freebase switched to MIDs for most purposes a while ago, so you may find that the IDs coming back from the Suggest API were MIDs already.

If you have historical /en/... IDs, you can look up the MID with this query:

https://www.googleapis.com/freebase/v1/mqlread/?lang=%2Flang%2Fen&query=%5B%7B+%22id%22%3A+%22%2Fen%2Fharvard_university%22%2C+%22mid%22%3A+null+%7D%5D

Replace the (encoded) /en/harvard_university with the link that you want to look up. If you've got a list of IDs, I'd be happy to look them up for you and generate a crosswalk.

BTW, haven't heard anything additional on shutdown timeframes...

@edsu edsu added a commit that referenced this issue Jan 28, 2016
@edsu edsu tighten up object lookups
over time the db has accumulated a fair number of subjects and
employers with duplicate names, which causes problems for
views that use a slugified version of the name.

this commit tightens up the lookups to use the freebase id
and also includes a new command line utility to help diagnose
and correct these duplicates.

refs #38
7d3d14c
@edsu
Member
edsu commented Jan 28, 2016

@tfmorris thanks for the update! I did get the database converted over to the mids. I looked them up by resolving URLs like:

https://www.googleapis.com/freebase/v1/topic/{freebase_id}

which seemed to work pretty well still...

@edsu
Member
edsu commented May 30, 2016 edited

It looks like the new Knowledge Graph Search Widget is available. Also some of the old Freebase API calls are starting to fail now, for example getting the location for an organization.

@edsu
Member
edsu commented Sep 10, 2016

Well, now the old Freebase APIs for looking up Employers and Locations are dead. So people can't enter in new jobs. I guess it would be good to move over to the Knowledge Graph API now ;-)

@edsu
Member
edsu commented Sep 10, 2016 edited

@tfmorris @danbri do you happen know (or know someone who might know) why topical things like "Semantic Web" don't show up in the Knowledge Graph Search Widget? I get lots of books but not the topic. I even tried with a Search API call to see if I could find the topic in there, but I couldn't find it in 200 results.

Using the JSON-LD context I can see that Google have URIs for entities which is cool. So I can easily turn the old Freebase IDs into Knowledge Graph URIs. For example here's the URI for Semantic Web:

https://g.co/kg/m/076k0

So I can see the entity "Semantic Web" is in the Knowledge Graph, but how can I get the search widget to return it? Would one of the available entity types work?

@edsu
Member
edsu commented Sep 10, 2016

Maybe this is the push I need to move over to using Wikidata....

@danbri
danbri commented Sep 10, 2016

I don't know but I'll see what I can find out

@danbri
danbri commented Sep 10, 2016

(and +1 for Wikidata, regardless)

@danbri
danbri commented Sep 10, 2016

From a quick guess, is it only returning entities whose types are in https://developers.google.com/knowledge-graph/ (and mapped there to schema.org)?

@edsu
Member
edsu commented Sep 11, 2016 edited

Hmm, that does seem to be the case? Here are the types returned in the first 200 results when searching for 'semantic web' from the search API:

% curl --silent 'https://kgsearch.googleapis.com/v1/entities:search?query=semantic+web&key=AIzaSyDnh2jo5mhnf1EyIs2VQwc9H_bq1_RAgsE&limit=200&indent=True' | jq -r '.itemListElement[].result["@type"][]' - | sort | uniq -c | sort -rn
 124 Thing
  64 Person
  26 Organization
  21 Corporation
  20 Book
   8 Place
   4 EducationalOrganization
   3 CollegeOrUniversity
   1 Movie
   1 CivicStructure
   1 BookSeries
   1 AdministrativeArea

Unfortunately it seems like a lot of terms used to tag jobs in shortimer are rendered invisible in the KG search api ...

@edsu edsu added a commit that referenced this issue Sep 22, 2016
@edsu edsu wikidata_id
this is step one in moving form freebase to wikidata. I added wikidata_id
to the Employer, Location and Subject models. Then I added a migration to
lookup the existing entities in Wikidata using Wikidata's SPARQL endpoint.
The matching logic thus far is:

1. Look up entity using the Freebase ID
2. Use the name of the entity to derive the Wikipedia URL and look that up
3. To search for the label

The next step is to purge entities that don't have Wikidata IDs, and then
to create new suggest functionality that uses Wikidata instead of Freebase.

refs #38
refs #57
ffb9d7d
@edsu
Member
edsu commented Sep 22, 2016 edited

I've been doing some preliminary work trying to migrate things to Wikidata. If you are interested you can track the work over on the wikidata branch.

@edsu
Member
edsu commented Sep 25, 2016 edited

WIkidata does offer an autosuggest API interface but it doesn't allow you to limit by particular entity types (locations, organizations, etc). This leads to a lot of noise when looking things up. I also tried using the SPARQL endpoint with regex filters, but it seemed very unstable. There were lots of 502 errors. Perhaps that was just something else going on at the time, but it doesn't lend much confidence as a foundation for building on.

Actually, it does look like other people were experiencing problems.

@edsu
Member
edsu commented Oct 4, 2016

So, even with the Wikidata SPARQL endpoint back to functioning normally it still can take multiple seconds for regex queries (what is needed for autosuggest) to come back. Unfortunately this won't be good enough. The wbsearchentities API call is fast, but it doesn't return back much information, and can't be limited to entities of a particular type (Locations, Organizations, etc).

So, my current thinking is to use the entities that have already been collected in jobs.code4lib.org and run autosuggest against them, and let people enter new entities as needed. This will have the downside that they aren't mapped to Google Knowledge Graph or Wikidata, but I just don't have the cycles to do that at the moment...and the site risks dying completely if it's not possible to post new jobs.

@sprater
sprater commented Oct 12, 2016

Could the Geonames service be used to look up institutions and locations? It has a rich and snappy API, and support for linked data: http://www.geonames.org/

@edsu
Member
edsu commented Oct 18, 2016 edited

It could, but that's only part of the puzzle. Unfortunately I don't have the bandwidth to fully address this problem. I'm planning on shutting the site down on November 1st after making static snapshots of the data and website available on Internet Archive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment