New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reach out to authors of depositable DOIs used in Wikipedia #251

Open
nemobis opened this Issue Jun 2, 2016 · 23 comments

Comments

Projects
None yet
3 participants
@nemobis
Member

nemobis commented Jun 2, 2016

It would be nice to make a list of authors who

  • have at least one publication cited on a Wikipedia article via the DOI,
  • and that publication is paywalled,
  • and Dissemin is able to list at least one depositable publication of theirs (keeping in mind #215 ).

Such list could then be used by someone (e.g. Wikimedia Foundation, see https://meta.wikimedia.org/w/index.php?title=Talk%3AThe_Wikipedia_Library&type=revision&diff=15669109&oldid=15488619 ) to encourage usage of Dissemin to deposit said publications.


Want to back this issue? Post a bounty on it! We accept bounties via Bountysource.

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Jun 2, 2016

Member

That's surely doable. Actually it should be enough to use the API (http://dev.dissem.in/api.html), just like what oabot does, tracking the publisher's policy in addition.

Member

wetneb commented Jun 2, 2016

That's surely doable. Actually it should be enough to use the API (http://dev.dissem.in/api.html), just like what oabot does, tracking the publisher's policy in addition.

@wetneb wetneb added this to the Add-ons milestone Jun 2, 2016

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Jun 3, 2016

Member

Antonin, 02/06/2016 23:33:

That's surely doable. Actually it should be enough to use the API
(http://dev.dissem.in/api.html), just like what oabot does, tracking the
publisher's policy in addition.

Right. I'm querying the API for Italian Wikipedia DOIs to start with,
and most of them are not found (status 400). For these one can assume
that the publication is not available in a "real" repository (even
though maybe it's buried in some Elsevier wasteland like SSRN) an should
ask CrossRef+RoMEO for publication details and journal policy, right?

Member

nemobis commented Jun 3, 2016

Antonin, 02/06/2016 23:33:

That's surely doable. Actually it should be enough to use the API
(http://dev.dissem.in/api.html), just like what oabot does, tracking the
publisher's policy in addition.

Right. I'm querying the API for Italian Wikipedia DOIs to start with,
and most of them are not found (status 400). For these one can assume
that the publication is not available in a "real" repository (even
though maybe it's buried in some Elsevier wasteland like SSRN) an should
ask CrossRef+RoMEO for publication details and journal policy, right?

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Jun 3, 2016

Member

HTTP status 400 should only be returned when your query was invalid (invalid DOI in your case, I guess). This is curious if the DOIs come from wikipedia. Can you show me your code?
If you are querying by DOI, it might be easier to use the dedicated endpoint http://dissem.in/api/<your_doi_comes_here>.

Member

wetneb commented Jun 3, 2016

HTTP status 400 should only be returned when your query was invalid (invalid DOI in your case, I guess). This is curious if the DOIs come from wikipedia. Can you show me your code?
If you are querying by DOI, it might be easier to use the dedicated endpoint http://dissem.in/api/<your_doi_comes_here>.

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Jun 3, 2016

Member

Antonin, 03/06/2016 11:40:

HTTP status 400 should only be returned when your query was invalid
(invalid DOI in your case, I guess). This is curious if the DOIs come
from wikipedia. Can you show me your code?

Good point, I had forgotten to percent-decode some of the DOIs. Much
better now!

Member

nemobis commented Jun 3, 2016

Antonin, 03/06/2016 11:40:

HTTP status 400 should only be returned when your query was invalid
(invalid DOI in your case, I guess). This is curious if the DOIs come
from wikipedia. Can you show me your code?

Good point, I had forgotten to percent-decode some of the DOIs. Much
better now!

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Jun 4, 2016

Member

I attach some example partial lists made with doi-doai-openaccess.py --depositable (about 40000 DOIs from some 140 wikis).

Of course, to be usable the output should be in CSV format and the names need to be converted to contact information and ideally ORCID.

doi-doai-dissemin.zip

(Also copied on https://zenodo.org/record/54799 )

Member

nemobis commented Jun 4, 2016

I attach some example partial lists made with doi-doai-openaccess.py --depositable (about 40000 DOIs from some 140 wikis).

Of course, to be usable the output should be in CSV format and the names need to be converted to contact information and ideally ORCID.

doi-doai-dissemin.zip

(Also copied on https://zenodo.org/record/54799 )

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Jun 6, 2016

Member

I notice that https://api.crossref.org/ often doesn't return affiliation for a DOI, while https://api.elsevier.com/documentation/AUTHORSearchAPI.wadl maybe returns affiliation ID more often and from that one can get something like https://api.elsevier.com/documentation/retrieval/affiliationRetrievalResp.json

Member

nemobis commented Jun 6, 2016

I notice that https://api.crossref.org/ often doesn't return affiliation for a DOI, while https://api.elsevier.com/documentation/AUTHORSearchAPI.wadl maybe returns affiliation ID more often and from that one can get something like https://api.elsevier.com/documentation/retrieval/affiliationRetrievalResp.json

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Jun 7, 2016

Member

Interesting! Thanks for pointing that out.

Member

wetneb commented Jun 7, 2016

Interesting! Thanks for pointing that out.

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Jun 1, 2017

Member

I see that there is https://github.com/leereilly/swot to validate university email addresses. It might still be easier to just mass-download the papers and parse the email addresses contained in them...

Member

nemobis commented Jun 1, 2017

I see that there is https://github.com/leereilly/swot to validate university email addresses. It might still be easier to just mass-download the papers and parse the email addresses contained in them...

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Jun 1, 2017

Member

The Open Access button has put a lot of effort into retrieving email addresses of authors, maybe @JosephMcArthur has ideas about that?

Member

wetneb commented Jun 1, 2017

The Open Access button has put a lot of effort into retrieving email addresses of authors, maybe @JosephMcArthur has ideas about that?

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Jun 3, 2017

Member

Thanks for mentioning it. I see they have a few issues on the matter: OAButton/discussion#490 (and OAButton/discussion#648), OAButton/discussion#725

Member

nemobis commented Jun 3, 2017

Thanks for mentioning it. I see they have a few issues on the matter: OAButton/discussion#490 (and OAButton/discussion#648), OAButton/discussion#725

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Jun 3, 2017

Member

Back to the devils... Web of Science seems pretty neat: http://ip-science.interest.thomsonreuters.com/sample-data/
There is a python client: https://github.com/enricobacis/wos
I've registered but login failed from home.

The Elsevier API has a ratelimit of 5000 every 7 days, quite reasonable. I registered but did not try yet.

Will try again from my institution.

Member

nemobis commented Jun 3, 2017

Back to the devils... Web of Science seems pretty neat: http://ip-science.interest.thomsonreuters.com/sample-data/
There is a python client: https://github.com/enricobacis/wos
I've registered but login failed from home.

The Elsevier API has a ratelimit of 5000 every 7 days, quite reasonable. I registered but did not try yet.

Will try again from my institution.

@nemobis nemobis changed the title from Make a list of depositable DOIs used in Wikipedia to Reach out to authors of depositable DOIs used in Wikipedia Aug 8, 2017

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Aug 8, 2017

Member

FYI I'm now sending out a few thousands of invites for Dissemin usage. The interest so far is high, some 10 % of recipients click; but I'm not able to quickly gauge the traffic generated on Dissemin (let alone the actual uploads). Absent other data sources, I plan to just see if the amount of self-archived DOIs increases in the next few weeks.

Member

nemobis commented Aug 8, 2017

FYI I'm now sending out a few thousands of invites for Dissemin usage. The interest so far is high, some 10 % of recipients click; but I'm not able to quickly gauge the traffic generated on Dissemin (let alone the actual uploads). Absent other data sources, I plan to just see if the amount of self-archived DOIs increases in the next few weeks.

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Aug 8, 2017

Member

@nemobis that's fantastic, thanks a lot! But be aware that there are still some important bugs that might put users off - mainly the fact that we can miss full texts for various reasons. (For instance, #431). I unfortunately do not have that much time to look into them, but I would think it would be better to solve them before doing too much publicity for the service.

Member

wetneb commented Aug 8, 2017

@nemobis that's fantastic, thanks a lot! But be aware that there are still some important bugs that might put users off - mainly the fact that we can miss full texts for various reasons. (For instance, #431). I unfortunately do not have that much time to look into them, but I would think it would be better to solve them before doing too much publicity for the service.

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Aug 8, 2017

Member
Member

nemobis commented Aug 8, 2017

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Aug 8, 2017

Member

@nemobis Sure! Thank you so much again, that is very welcome!

Ideally we could also think about some tighter integration with OAbot, as you are focusing on DOIs extracted from Wikipedia. But for that we would first need to run OAbot on itwiki…

Member

wetneb commented Aug 8, 2017

@nemobis Sure! Thank you so much again, that is very welcome!

Ideally we could also think about some tighter integration with OAbot, as you are focusing on DOIs extracted from Wikipedia. But for that we would first need to run OAbot on itwiki…

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Aug 8, 2017

Member

Speaking of which, I was looking for a dump of OAbot DOIs on en.wiki, is there such a thing?

Member

nemobis commented Aug 8, 2017

Speaking of which, I was looking for a dump of OAbot DOIs on en.wiki, is there such a thing?

@wetneb

This comment has been minimized.

Show comment
Hide comment
@wetneb

wetneb Aug 8, 2017

Member

There is a dump of all the citations of enwiki, as parsed by OAbot: https://zenodo.org/record/55004
But they are not annotated with self-archiving policies of publishers.

Member

wetneb commented Aug 8, 2017

There is a dump of all the citations of enwiki, as parsed by OAbot: https://zenodo.org/record/55004
But they are not annotated with self-archiving policies of publishers.

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Aug 20, 2017

Member

I ended up running mwcites and cleaning up a bit, see mediawiki-utilities/python-mwcites#7 (comment)
Now I'm extracting the information for those DOIs, should be done in a couple days.

Member

nemobis commented Aug 20, 2017

I ended up running mwcites and cleaning up a bit, see mediawiki-utilities/python-mwcites#7 (comment)
Now I'm extracting the information for those DOIs, should be done in a couple days.

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Aug 29, 2017

Member

I posted a summary on https://lists.wikimedia.org/pipermail/openaccess/2017-August/000226.html (with example text; more or less the same you had already seen).

Meanwhile, the latest round of emails from 4 days ago has reached almost 40 % open rate and 14 % click rate (after discounting bounces). The numbers for deposits and replies I'll check in a couple weeks.

Member

nemobis commented Aug 29, 2017

I posted a summary on https://lists.wikimedia.org/pipermail/openaccess/2017-August/000226.html (with example text; more or less the same you had already seen).

Meanwhile, the latest round of emails from 4 days ago has reached almost 40 % open rate and 14 % click rate (after discounting bounces). The numbers for deposits and replies I'll check in a couple weeks.

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Sep 12, 2017

Member

A revised message is going out now to a few thousands more authors. We've got about 50 deposits in the last 3 hours.

Member

nemobis commented Sep 12, 2017

A revised message is going out now to a few thousands more authors. We've got about 50 deposits in the last 3 hours.

@jibe-b

This comment has been minimized.

Show comment
Hide comment
@jibe-b

jibe-b Mar 10, 2018

Contributor

Dear @nemobis, may I ask you the text/subject of the email that gave the best rate of upload?

cc @JosephMcArthur

Contributor

jibe-b commented Mar 10, 2018

Dear @nemobis, may I ask you the text/subject of the email that gave the best rate of upload?

cc @JosephMcArthur

@nemobis

This comment has been minimized.

Show comment
Hide comment
@nemobis

nemobis Mar 10, 2018

Member
Member

nemobis commented Mar 10, 2018

@jibe-b

This comment has been minimized.

Show comment
Hide comment
@jibe-b

jibe-b Mar 10, 2018

Contributor

ok, no worries

Contributor

jibe-b commented Mar 10, 2018

ok, no worries

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment