Service to transform URLs back into DOIs. This service is passive and almost-stateless.
Provided as a Docker image with Docker Compose file for testing.
To run a demo:
docker-compose -f docker-compose.yml run -w /usr/src/app -p "8004:8004" test lein run
To run tests
docker-compose -f docker-compose.yml run -w /usr/src/app test lein test
Environment variable | Description | Default | Required? |
---|---|---|---|
PORT |
Port to listen on | 9990 | Yes |
STATUS_SERVICE |
Public URL of the Status service | No, ignored if not supplied | |
JWT_SECRETS |
Comma-separated list of JTW Secrets | Yes |
NB These URLs will not work until CED launched.
10.5555/123456789
- a DOI expressed outside a URL. demo.doi: 10.5555/12345678
- a DOI expressed with adoi:
prefix, flexibly. [demo](https://reverse.eventdata.crossref.org/guess-doi?q=doi: 10.5555/12345678).doi.org/10.5555/12345678
- a DOI expressed as a web address without full scheme. demo.http://doi.org/10.5555/123456789
- a DOI expressed as a full URL. demo.
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0144297
- a landing page URL where the DOI is embedded in the URL as a GET query parameter. demo.http://www.jim.org.cn/EN/10.15541/jim20130556
- landing page where the DOI is included at the end of the URL. demohttp://www.jstor.org/stable/10.13110/antipodes.27.2.0157#references_tab_contents
- landing page where the DOI is in the url but there's a hash fragment. demohttp://www.nomos-elibrary.de/10.5235/219174411798862578/criminal-law-issues-in-the-case-law-of-the-european-court-of-justice-a-general-overview-jahrgang-1-2011-heft-2
- URL with a DOI in it, but well buried. demohttp://onlinelibrary.wiley.com/doi/10.1002/1521-3951(200009)221:1<453::AID-PSSB453>3.0.CO;2-Q/abstract;jsessionid=FAD5B5661A7D092460BEEDA0D55204DF.f02t01
- URL with a SICI in the DOI and a gnarly jsessionid too. demohttp://www.ijorcs.org/manuscript/id/12/doi:10.7815/ijorcs.21.2011.012/arul-anitha/network-security-using-linux-intrusion-detection-system
- URL with a DOI mixed with other text. demohttp://link.springer.com/article/10.1007%2Fs10552-015-0707-0
- URL with embedded URL encoded DOI. demohttp://doi.org/hvx
- ShortDOI as a URL demodoi.org/hvx
- ShortDOI not as a URL demo10/hvx
- ShortDOI expressed as a Handle demo
There will no doubt be a few other methods for figuring out the DOI from the URL in future.
http://www.hindawi.com/archive/2015/708915/
- a landing page where the DOI is mentioned in the page. demo.http://www.nature.com/nrendo/journal/v10/n9/full/nrendo.2014.114.html?WT.mc_id=TWT_NatureRevEndo
- landing page with a DOI, but there are extra bits on the end of the URL compared to the DOI link. demo
- The textual and link content of the page are consulted and all DOIs are extracted.
- The first DOI on the page is checked to see if it resolves back to the same URL.
- After that, every other DOI is checked. If there are several matches (i.e. several DOIs resolve to the same URL) then the shortest DOI is chosen. This can happen with component DOIs, which in the case of e.g. PLOS, are extended versions of the base DOI.
Every DOI is checked against the API to ensure it exists.
All of the above formats can be extracted from inside text.
Look at this article lol 10.5555/12345678 what is a psycho ceramic neway?
- DOI mentioned in the text of a tweet. demoShrtr DOIs Mns Lgr Twts doi.org/hvx
- ShortDOI mentioned in the text of a tweet. [demo](https://reverse.eventdata.crossref.org/guess-doi?q=Shrtr DOIs Mns Lgr Twts doi.org/hvx)Is This The Future of Concise Citation Identifiers? One Psychoceramicist Thinks So 10/hvx
- ShortDOI Handle mentioned in the text of a tweet. demo
All the weird things mentioned in Geoff Bilder's Appendix A: On the complexity of handling DOIs in the wild
10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demo - SICI with URLEncdoded trailing slashDOI%3A10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/10.1002/(SICI)1099-050X(199823/24)37:3/4<197::AID-HRM2>3.0.CO;2-#
demohttp://doi.org/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/urn:doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/info:doi/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://dx.doi.org/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://dx.doi.org/doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://dx.doi.org/urn:doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://dx.doi.org/info:doi/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/bwhrfx
demohttp://doi.org/doi:bwhrfx
demohttp://doi.org/urn:bwhrfx
demohttp://doi.org/info:doi/bwhrfx
demohttp://dx.doi.org/bwhrfx
demohttp://dx.doi.org/doi:bwhrfx
demohttp://dx.doi.org/urn:bwhrfx
demohttp://dx.doi.org/info:doi/bwhrfx
demohttps://doi.org/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://doi.org/doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://doi.org/urn:doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://doi.org/info:doi/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://dx.doi.org/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://dx.doi.org/doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://dx.doi.org/urn:doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://dx.doi.org/info:doi/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://doi.org/bwhrfx
demohttps://doi.org/doi:bwhrfx
demohttps://doi.org/urn:bwhrfx
demohttps://doi.org/info:doi/bwhrfx
demohttps://dx.doi.org/bwhrfx
demohttps://dx.doi.org/doi:bwhrfx
demohttps://dx.doi.org/urn:bwhrfx
demohttps://dx.doi.org/info:doi/bwhrfx
demodoi:10.1002/(SICI)1099-050X(199823/24)37:3/4<197::AID-HRM2>3.0.CO;2-#
demoDOI:10.1002/(SICI)1099-050X(199823/24)37:3/4<197::AID-HRM2>3.0.CO;2-#
demohttp://doi.org/10.1002/(SICI)1099-050X(199823/24)37:3/4<197::AID-HRM2>3.0.CO;2-#
demohttp://doi.org/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/urn:doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/info:doi/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://dx.doi.org/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://dx.doi.org/doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://dx.doi.org/urn:doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://dx.doi.org/info:doi/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttp://doi.org/bwhrfx
demohttp://doi.org/doi:bwhrfx
demohttp://doi.org/urn:bwhrfx
demohttp://doi.org/info:doi/bwhrfx
demohttp://dx.doi.org/bwhrfx
demohttp://dx.doi.org/doi:bwhrfx
demohttp://dx.doi.org/urn:bwhrfx
demohttp://dx.doi.org/info:doi/bwhrfx
demohttps://doi.org/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://doi.org/doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://doi.org/urn:doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://doi.org/info:doi/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://dx.doi.org/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://dx.doi.org/doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://dx.doi.org/urn:doi:10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://dx.doi.org/info:doi/10.1002%2F(SICI)1099-050X(199823%2F24)37%3A3%2F4%3C197%3A%3AAID-HRM2%3E3.0.CO%3B2-%23
demohttps://doi.org/bwhrfx
demohttps://doi.org/doi:bwhrfx
demohttps://doi.org/urn:bwhrfx
demohttps://doi.org/info:doi/bwhrfx
demohttps://dx.doi.org/bwhrfx
demohttps://dx.doi.org/doi:bwhrfx
demohttps://dx.doi.org/urn:bwhrfx
demohttps://dx.doi.org/info:doi/bwhrfx
demo10.1002/(sici)1099-050x(199823/24)37:3/4<197::aid-hrm2>3.0.co;2-#
demo10.1002/(SICI)1099-050x(199823/24)37:3/4<197::aid-hrm2>3.0.co;2-#
demo10.1002/(sici)1099-050X(199823/24)37:3/4<197::AID-HRM2>3.0.CO;2-#
demohttp://doi.org/10.5555/12345678
demohttps://doi.org/10.5555/12345678
demohttp://doi.org/hvx
demohttps://doi.org/hvx
demo
- Create a mysql database with schema in
etc/schema.sql
. - Create
config.edn
inconfig/dev
orconfig/prod
with:database-name
,:database-username
,:database-host
.
Some tasks can be run on the command line but the main mode of operation is as a server.
lein with-profile dev run server
To update the list of DOI and member IDs per publisher:
lein run with-profile dev update
To try and resolve the DOIs and extract domains. This will run until all DOIs are resolved (i.e. pretty much indefinitely).
lein run with-profile dev resolve-all
Some domains are just too broad (like Google Pages). Maintain a list of these, in MySQL LIKE format in resources/ignore.txt
. To update the database after a scan, or after updating the file:
lein with-profile dev run mark-ignored
http://dl.acm.org/citation.cfm?id=1852107
is quoted, DOI resolves to http://dl.acm.org/citation.cfm?doid=1852102.1852107
.
- Check out to
/home/deploy/doi-destinations
/home/deploy/doi-destinations
thenlein uberjar
sudo cp /home/deploy/doi-destinations/etc/doi-destinations.service /etc/systemd/system/doi-destinations.service
sudo systemctl enable doi-destinations.service
sudo systemctl start doi-destinations.service
Copyright © 2016 Crossref
Distributed under the MIT License.
Copyright © Crossref
Distributed under the The MIT License (MIT).