Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the purpose of the file fulltext_html_urls.txt #53

Closed
anusharanganathan opened this issue Sep 28, 2015 · 2 comments
Closed

What is the purpose of the file fulltext_html_urls.txt #53

anusharanganathan opened this issue Sep 28, 2015 · 2 comments
Milestone

Comments

@anusharanganathan
Copy link

What is the purpose of the file fulltext_html_urls.txt available as a part of the output?

Purpose: Search open access papers in eupmc for the query dinosaurs and download fulltext XMLs, supplementary files and fulltext PDFs if available

Query used

$ getpapers -q 'dinosaurs' -x -s -p -o dinosaursOutput2 >> dinosaursOutput2.log

This generated a fulltext_html_urls.txt file with 22 urls

Not all pmids listed in fulltext_html_urls.txt had a corresponding fulltext.xml or fulltext.html file downloaded. Of the 22 urls with pmcids listed in the file, the breakdown of what I found was as follows:

  • 20 of the pmcids had an empty dir
  • 2 of the pmcids had a dir with a fulltext.xml file but an empty fulltext.html file
  • For each of the pmcids in the fulltext_html_urls.txt file, the output produced a message similar to the following one
    warn: Article with pmcid "PMC3381548" had no fulltext PDF url
@blahah
Copy link
Member

blahah commented Dec 4, 2015

the fulltext HTML file is just a list of the fulltext HTML urls that were available. I'm moving it to an --html option so that users can request the HTML to be downloaded, and there will no longer be a fulltex_html_urls.txt file

@blahah blahah added this to the 0.5 milestone Dec 4, 2015
@blahah
Copy link
Member

blahah commented Dec 24, 2015

done in 0.4.1

@blahah blahah closed this as completed Dec 24, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants