Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

spice for looking up doi’s #119

Closed
wants to merge 8 commits into from

3 participants

@nomeata

I’d tried to implement https://duckduckhack.uservoice.com/forums/5168-plugins/suggestions/3552182-show-bibliographic-data-when-a-doi-is-entered-

There are two issues where I need a bit help

  • The better data provider requires the "Accept" header to be set appropriately to return JSON data, see http://crosscite.org/cn/ for details. I could not figure out how to configure that.
  • When the web service returns 404, the data is still returned as such to the client. Shouldn’t the proxy handle that more graceful? Is that related to wrap_jsonp_callback?

Thanks,
Joachim

@nomeata

Also, formatting the citations would be more reliable and consistent if we could use a library like citeproc-js. Would that be acceptable?

@nospampleasemam

Hi Joachim!

We spoke briefly in #duckduckgo, and I've taken a look at your code. I'm actually not getting a javascript error with 404 responses from the API. Can you be more specific about what you're seeing there?

About the header, I'm still looking in to how it can be done.

Thank you for submitting this pull request by the way! It's a cool plugin, and a great result for these queries.

@nomeata

Indeed, I cannot reproduce the syntax error any more. Maybe I was confused by a probably unrelated syntax error in y.js that I get every time. So ignore that point.

What about including external libraries? So far I only added code to format entries based on what I have seen in the wild so far, but the format allows for many more fields. Especially the names need relatively complicated code to cater for all cases. Here, using the citeproc-js library would be the sane thing to do. Is is ok to use it? If so, should I manually copy the library into the share/spice/doi/spice.js file? Or could I somehow tell the system (in Doi.pm) to also load spice/doi/citeproc.js as an additional resource when the doi spice is activated?

@jagtalon
Owner

@nomeata I didn't get a 404 either.

The Forvo plugin does exactly that! It loads several libraries when it gets triggered. Take a look at the code: https://github.com/duckduckgo/zeroclickinfo-spice/blob/master/share/spice/forvo/spice.js

I don't quite know how to test this using duckpan, but if you run into any problems, you can ask @moollaza (author of Forvo plugin).

@jagtalon
Owner
nrj("/forvo/jquery.min.js", true);
nrj("/forvo/mediaelement-and-player.min.js", true);
nrj("/forvo/init.js", true);
@nomeata

@jagtalon thx for the pointer. I tried to get citeproc-js running, but it is a big beast, requiring various different style and locale data in XML at runtime and seems generally not well suited for formatting a single citation, but rather a whole bibliography. It seems saner to stick with manual formatting for now.

@nomeata

Improved formatting of names and added direct download links for citations. From my side this is good to go (although it would be nice if the bibtex link would copy the bibtex data to the clipboard instead... maybe some other time)

@nomeata

Besides, being able to use dx.doi.org would still be desirable, as the datacite source does miss many papers from my field (computer science) actually. I’m not sure what the relation between these sites is, but dx.doi.org certainly sounds more canonical.
So you can merge the plugin as it is, and when support for the Accept header is added to the ddg spice framework, I’ll change it to use the other site.

@nomeata

Here are some interesting pointes about nginx and the Accept header:

http://mgustafson.wordpress.com/2011/03/17/nginx-set-the-accept-header-based-on-file-extension/ shows how to configure nginx to set the accept header based on the input url. This can of course be simplified to just set proxy_set_header Accept $acceptHeader in _build_nginx_conf in ./lib/DDG/Rewrite.pm if an appropriate flag is set in the spice module (analogous to the handling of wrap_jsonp_callback handling).

Theoretically it would be cleaner to have the header already in the request to nginx, instead of setting it there, but that request is simply caused by a <script src="..."> tag in the html, right? I don’t think that the header can be set there, so putting it in the nginx config sounds like the right thing to do.

Besides that, duckpan will need to support the feature in request in ./lib/App/DuckPAN/Web.pm.

@nomeata nomeata referenced this pull request in duckduckgo/duckduckgo
Merged

New property of spice and rewrite: accept_header #7

@nomeata nomeata Use http://dx.doi.org via Accept header
This requires the appropriate feature in duckduckgo, commit
177d9dad9790515892385f799facca4fe10bd36b
ad0b020
@nomeata

I implemented that, please see duckduckgo/duckduckgo#7 and duckduckgo/p5-app-duckpan#6 (pretty low pull request numbers, looks like not many people dare to touch these modules).

For a better BibTeX-download-button I’d need to install another redirect with a different Accept header. How can I configure that – an additional spice that does never match on the query, but that I can call using nrj from the doi spice?

@nomeata nomeata Add a proper bibtex lookup functionality to the doi spice
As this requires the doi spice callback to set of another request, a
dummy spice was added that does not match any request and purely exists
for its JSONP request functionality. I guess this spice needs to be
hidden from any lists of enabled spices, if that is possible.
6fb41d8
@nomeata

After confirmation on IRC by crazedpsyc that this is just really crazy, but the way to go, I created a second dummy spice, DoiBibtex, whose only purpose it is to provide the doi spice with a jsonp-request-URL to get the bibtex data.

If that is not really wanted, you can still merge up to ad0b020.

An alterntive route would be to just get bibtex in the first place and parse that in the callback, instead of getting the pretty JSON data, but that feels somewhat wrong.

@nospampleasemam

Hi Joachim,

I've checked over and merged your commits in p5-app-duckpan and duckduckgo. Thanks for taking the time to implement this! We've released both packages, which you should be able to download using duckpan DDG and duckpan duckpan. We've also added a new feature to duckpan, duckpan env, which makes it a little easier to manage API keys during development.

I'm still getting some errors with your spice due to redirects, I'm going to take some time to check this out now and I'll get back to you here. Creating a "dummy-spice" isn't too crazy, especially once you see them as endpoints as you noticed in your other pull requests :-).

Thanks again!

@nomeata

What kind of errors? It is expected that dx.doi.org is redirecing you. But if dx.doi.org is redirecting you to some other site that then ships HTML, then likely the Accept header was not set:

$ curl -v -LH "Accept: application/x-bibtex" http://dx.doi.org/10.1145/158511.158618
* About to connect() to dx.doi.org port 80 (#0)
*   Trying 132.151.9.181...
* connected
* Connected to dx.doi.org (132.151.9.181) port 80 (#0)
> GET /10.1145/158511.158618 HTTP/1.1
> User-Agent: curl/7.28.0
> Host: dx.doi.org
> Accept: application/x-bibtex
> 
< HTTP/1.1 303 See Other
< Server: Apache-Coyote/1.1
< Location: http://data.crossref.org/10.1145%2F158511.158618
< Expires: Wed, 30 Jan 2013 20:33:16 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 182
< Date: Wed, 30 Jan 2013 11:08:25 GMT
< 
* Ignoring the response-body
* Connection #0 to host dx.doi.org left intact
* Issue another request to this URL: 'http://data.crossref.org/10.1145%2F158511.158618'
* About to connect() to data.crossref.org port 80 (#1)
*   Trying 63.123.152.249...
* connected
* Connected to data.crossref.org (63.123.152.249) port 80 (#1)
> GET /10.1145%2F158511.158618 HTTP/1.1
> User-Agent: curl/7.28.0
> Host: data.crossref.org
> Accept: application/x-bibtex
> 
< HTTP/1.1 200 OK
< Date: Wed, 30 Jan 2013 11:08:26 GMT
< Server: Apache/2.2.3 (CentOS)
< X-Powered-By: Phusion Passenger (mod_rails/mod_rack) 3.0.7
< Vary: Accept
< Access-Control-Allow-Origin: *
< Content-Length: 392
< Status: 200
< Connection: close
< Content-Type: application/x-bibtex
< 
* Closing connection #1
@inbook{Launchbury_1993, title={A natural semantics for lazy evaluation}, ISBN={0897915607}, url={http://dx.doi.org/10.1145/158511.158618}, DOI={10.1145/158511.158618}, booktitle={Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages  - POPL  �93}, publisher={Association for Computing Machinery}, author={Launchbury, John}, year={1993}, pages={144-154}}* Closing connection #0
@nospampleasemam

It's that 301 redirect that nginx is not following. I confirmed the HTTP header by changing the proxy_pass to another server I control and watching the incoming requests:

~ $ sudo tcpdump -s 0 -A 'tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
07:16:38.278079 IP ec2-23-21-193-196.compute-1.amazonaws.com.58304 > 10.248.51.81.www: Flags [P.], seq 2447955073:2447955318, ack 4169104129, win 46, options [nop,nop,TS val 48772066 ecr 741131190], length 245
E..).d@.7.DH....
.3Q...P......{............
..3.,,..GET /10.1145/158511.158618 HTTP/1.0
Accept: application/vnd.citationstyles.csl+json
Host: dylansserver.com
Connection: close
User-Agent: curl/7.21.0 (i486-pc-linux-gnu) libcurl/7.21.0 OpenSSL/0.9.8o zlib/1.2.3.4 libidn/1.15 libssh2/1.2.6


^C
1 packets captured
1 packets received by filter
0 packets dropped by kernel


~ $ curl https://dylan.duckduckgo.com/js/spice/doi/10.1145/158511.158618 >/dev/null
~ $ curl https://dylan.duckduckgo.com/js/spice/doi/10.1145/158511.158618
ddg_spice_doi(

<HTML><HEAD><TITLE>Handle Redirect</TITLE></HEAD>
<BODY><A HREF="http://data.crossref.org/10.1145%2F158511.158618">http://data.crossref.org/10.1145%2F158511.158618</A></BODY></HTML>);

I was looking for a way to have nginx follow these redirects on its own, but then I noticed that they seem consistent. What do you think of simply changing the spice to => value to http://data.crossref.org ? I've made this change, and redeployed on dylan.duckduckgo.com - it seems to work! Do you think that'd be okay, or are there cases where it redirects elsewhere or does not at all?

Check it out! https://dylan.duckduckgo.com/?q=10.1145%2F158511.158618

I think this looks fantastic already. I'm wondering if it might be helpful to return a bit more of the information the API returns such as container-title? Especially since with these queries I think we can be confident these are exact matches, meaning we're sure it's what the user is looking for. Although, they are ocassionally overridden by Wolfram|Alpha (thinking it's a math problem), I think we can take care of this though.

One other problem is that nrj doesn't seem to be firing with bibtex, I'm not quite sure why at the moment, but I'll see if I can figure that out.

@nomeata

What do you think of simply changing the spice to => value to http://data.crossref.org ?

No, that is not a good idea. dx.doi.org is a multiplexer that redirects you to the right registry for the doi, and might redirect you other registries – try 10.5524/100005.

http://serverfault.com/questions/423265/how-to-follow-http-redirects-inside-nginx might be useful.

@nospampleasemam

I figured that was likely the case. Unfortunately it seems like proxy_intercept_errors (and error_page) only works for >400 level status codes. It looks nginx will do it on it's own if the API server returns an X-Accel-Redirect header, I'm wondering if maybe we can append that somehow to the response but it doesn't look like it (by the time we can play with values, it's too late). I'm still exploring! There's got to be a way to do it.

@nomeata

What about the computer scientist answer to every problem: Recursion¹. Have nginx proxy the request to itself firs (with a different url). The second time the X-Accel-Redirect header is added, so the first invocation of the proxy follows the redirect, and returns the final page to the user?

What a hack, but hey, if it works?

¹ well, after abstraction of course.

@nomeata

@nospampleasemam, have you removed the plugin from dylan.ddg.com again? I just wanted to see what icon is shown next to the zeroclickinfo, and ask how to configure and where to store that – it seems that the zeroclickinfo-spice repo does not contain icons.

I’m not sure whether there is a particularly well-recognized doi logo, but http://www.doi.org/favicon.ico is a good first shot.

@jagtalon
Owner

@nomeata Really sorry we weren't able to get back to you soon enough. We've recently launched an update to the way Spice is created called "Spice 2", and it's documented in https://github.com/duckduckgo/duckduckgo/blob/master/documentation/spice_overview.md#spice-frontend.

Would you be interested in porting your plugin? Don't worry, though--it only affects the front-end code. :)

Just ask us if you're having any trouble or if something isn't clear.

For the plugin, is it possible if you could get the image of the Emperor Penguin for something like http://gigadb.org/dataset/100005? I think that would be nice. Maybe you could also add a part of the abstract in the plugin (I imagine it to look a little bit like the Wikipedia plugin: https://duckduckgo.com/Yahoo!) What do you think?

@jagtalon
Owner

@nomeata It still works on Duckpan, though. :) (I just had to rename spice.js to doi.js)

screen shot 2013-09-19 at 1 32 21 pm

@nomeata

@nomeata It still works on Duckpan, though. :) (I just had to rename spice.js to doi.js)

Does that mean that no porting is required?

I don’t think its possible to get an image or an abstract; doi is a lookup service that forwards you to many different publishers, and these have no standard API for such things.

@jagtalon
Owner

@nomeata plugins written the old way still work because we have not ported all our internal plugins to Spice 2. We'd like all open source plugins (especially new ones) to use Spice 2, though. :)

Okay, thanks for the info on the images.

@nomeata

I just looked into this a bit, but I forgot most of what I knew about duckpan, the repos and stuff, and at the moment I’m not motivated enough to re-learn everything. Maybe some other time when I have less other projects to worry about.

@jagtalon
Owner

@nomeata Definitely. :) I'll close this for now, but I hope we get to hear from you soon!

@jagtalon jagtalon closed this
@jagtalon
Owner

I made a copy of your branch here: https://github.com/duckduckgo/zeroclickinfo-spice/tree/ideas/doi -- just in case someone else wants to have a go at it. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Commits on Jan 18, 2013
  1. @nomeata
Commits on Jan 21, 2013
  1. @nomeata
  2. @nomeata
  3. @nomeata

    Remove debugging output

    nomeata authored
Commits on Jan 22, 2013
  1. @nomeata

    Use http://dx.doi.org via Accept header

    nomeata authored
    This requires the appropriate feature in duckduckgo, commit
    177d9dad9790515892385f799facca4fe10bd36b
  2. @nomeata

    Add a proper bibtex lookup functionality to the doi spice

    nomeata authored
    As this requires the doi spice callback to set of another request, a
    dummy spice was added that does not match any request and purely exists
    for its JSONP request functionality. I guess this spice needs to be
    hidden from any lists of enabled spices, if that is possible.
Commits on Feb 3, 2013
  1. @nomeata

    Escape HTML stuff properly

    nomeata authored
  2. @nomeata

    HTML escaping bibtex output

    nomeata authored
This page is out of date. Refresh to see the latest.
View
34 lib/DDG/Spice/Doi.pm
@@ -0,0 +1,34 @@
+package DDG::Spice::Doi;
+
+use DDG::Spice;
+
+name "doi";
+description "Look up a digital object identifier";
+source "doi";
+primary_example_queries "10.5524/100005";
+category "reference";
+topics "science";
+code_url "https://github.com/duckduckgo/zeroclickinfo-spice/blob/master/lib/DDG/Spice/Doi.pm";
+attribution github => ["https://github.com/nomeata", "Joachim Breitner"],
+ web => ["http://www.joachim-breitner.de", "Joachim Breitner"],
+ email => ['mail@joachim-breitner.de', "Joachim Breitner"];
+status "enabled";
+
+# Regex from http://stackoverflow.com/a/10324802/946226
+triggers query_lc => qr%\b(10[.][0-9]{4,}(?:[.][0-9]+)*/(?:(?!["&\'<>])\S)+)\b%;
+
+# This would work better, but needs content negotiation
+spice to => 'http://dx.doi.org/$1';
+spice wrap_jsonp_callback => 1;
+spice accept_header => 'application/vnd.citationstyles.csl+json';
+
+spice is_cached => 1;
+
+handle matches => sub {
+ my ($uname) = @_;
+ return $uname if $uname;
+ return;
+};
+
+
+1;
View
25 lib/DDG/Spice/DoiBibtex.pm
@@ -0,0 +1,25 @@
+package DDG::Spice::DoiBibtex;
+
+use DDG::Spice;
+
+description "Look up a digital object identifier (bibtex code)";
+code_url "https://github.com/duckduckgo/zeroclickinfo-spice/blob/master/lib/DDG/Spice/DoiBibtex.pm";
+attribution github => ["https://github.com/nomeata", "Joachim Breitner"],
+ web => ["http://www.joachim-breitner.de", "Joachim Breitner"],
+ email => ['mail@joachim-breitner.de', "Joachim Breitner"];
+status "enabled";
+
+triggers start => 'This spice should never match any request, as it is just a helper for the doi spice.';
+
+# This would work better, but needs content negotiation
+spice to => 'http://dx.doi.org/$1';
+spice wrap_string_callback => 1;
+spice accept_header => 'application/x-bibtex';
+
+spice is_cached => 1;
+
+handle remainder => sub {
+ return $_ if defined $_;
+};
+
+1;
View
82 share/spice/doi/spice.js
@@ -0,0 +1,82 @@
+function ddg_spice_doi(bib) {
+
+ function format_author(author) {
+ if (author['family']) {
+ var ret = "";
+ if (author['given']) {
+ ret += author['given'] + " ";
+ }
+ if (author['dropping-particle']) {
+ ret += author['dropping-particle'] + " ";
+ }
+ if (author['non-dropping-particle']) {
+ ret += author['non-dropping-particle'] + " ";
+ }
+ ret += author['family'];
+ if (author['suffix']) {
+ ret += " " + author['suffix'];
+ }
+ return ret;
+ } else {
+ return author['literal']
+ }
+ }
+
+ function format_authors(authors) {
+ var i = 0;
+ var ret = "";
+ while (authors.length - i > 3) {
+ ret += format_author(authors[i]);
+ ret += ", ";
+ i++;
+ }
+ while (authors.length - i > 1) {
+ ret += format_author(authors[i]);
+ ret += " and ";
+ i++;
+ }
+ ret += format_author(authors[i]);
+ return ret;
+ }
+
+ // validity check
+ if (bib['DOI'] && bib['author'] && bib['title']) {
+
+ items = new Array();
+ items[0] = new Array();
+ items[0]['a'] = "by " + h(format_authors(bib['author']));
+ if (bib['issued'] && bib['issued']['raw']) {
+ items[0]['a'] += ", " + h(bib['issued']['raw']);
+ }
+ items[0]['a'] += ", doi:" + h(bib['DOI']) + ". ";
+ items[0]['a'] += "<br />";
+ items[0]['a'] += "<pre style=\"display:none\" id=\"bibtex\"></pre>";
+ items[0]['a'] += "<a href=\"javascript:fetch_bibtex('" + h(bib['DOI']) + "');\")>BibTeX</a> &bull; ";
+ items[0]['h'] = h(bib['title']);
+ items[0]['s'] = "dx.doi.org";
+ if (bib['url']) {
+ items[0]['u'] = bib["URL"];
+ } else {
+ items[0]['u'] = "http://dx.doi.org/" + bib['DOI'];
+ }
+ nra(items);
+ }
+}
+
+// This uses the dummy doi_bibtex spice to lookup the bibliography data in
+// bibtex format
+function fetch_bibtex(doi) {
+ nrj("/js/spice/doi_bibtex/" + doi);
+}
+
+// And this is the return call, showing the bibtex display field.
+function ddg_spice_doi_bibtex(bibtex) {
+ document.getElementById('bibtex').style.display = 'block';
+ document.getElementById('bibtex').innerHTML = h(bibtex);
+
+}
+
+function h(txt) {
+ return txt.replace(/&/g, '&amp;').replace(/"/g, '&quot;').replace(/'/g, '&#39;').replace(/</g, '&lt;').replace(/>/g, '&gt;');
+}
+
Something went wrong with that request. Please try again.