Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Add a fetcher for APS #6026

Closed
wants to merge 8 commits into from
Closed

Conversation

augustjanse
Copy link
Contributor

Added a fetcher for APS, fixes #818, part of #2581. Based on the ScienceDirect fetcher.

I consider #818 fixed because this is all that can be done without an API key or token, and I hardly think any terms would let them be distributed with this software. I'm not exactly sure what your standards are on that, though.

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your work. I have just a few small comments.

public Optional<URL> findFullText(BibEntry entry) throws IOException {
Objects.requireNonNull(entry);

Optional<DOI> doi = entry.getDOI();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do that in a single statement:

return entry.getDOI().map(doi -> {
...
if (code == 200) {
...
} else {
  return Optional.empty();
}
});

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote it like that. A bit unsure about throwing an unchecked exception, is that fine? I guess those are supposed to be handled elsewhere, since the method interface throws them.

// Throws FileNotFoundException if DOI doesn't exist
InputStream is = con.getInputStream();

return con.getURL().toString().split("abstract/")[1];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some code comment needed. 1 is a magic number.
For instance, provide an example output in the comment and state why you splitted.
Not sure about NPEs if the response URL changes. Thus, I would split this statement and first check if the array size is >= 1.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrote it like that. I threw an exception that was already being caught instead of returning null, hope that's fine.

@Test
@DisabledOnCIServer("CI server is blocked")
void findOpenAccess() throws IOException {
entry.setField(StandardField.DOI, "10.1103/PhysRevLett.116.061102");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the test cases. The all look equal.

Could you create a ParameterizedTest (JUnit5)? See https://www.baeldung.com/parameterized-tests-junit-5#6-method. Then you can also pass Optional.empty() for the last two tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did that, not exactly sure about the style

CHANGELOG.md Outdated Show resolved Hide resolved
@augustjanse
Copy link
Contributor Author

Thanks for your feedback! Please see the new commits.

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the follow up. I have still some comments 😇 . Hope, they are OK ^^

public class ApsFetcher implements FulltextFetcher {
private static final Logger LOGGER = LoggerFactory.getLogger(ApsFetcher.class);

// The actual API needs either an API key or a header. This is a workaround.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what "workaround" refers to. Maybe, the comment should be moved to the actual code - and not to constants.


/**
* FulltextFetcher implementation that attempts to find a PDF URL at APS. Also see the <a
* href="https://harvest.aps.org/docs/harvest-api">API</a>, although it isn't currently used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "used" at the end? Is the link for more information for another implementation alternative, which you did not choose. Could you give reasons why? Maybe, one sentence is enough. -- Then separate that from the first sentence.

return null;
} catch (IOException e) {
// Handle IOExceptions elsewhere
throw new UncheckedIOException(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will crash complete JabRef - or did you catch it out of the stream thing?

Proposal: Log the exception and return null. -- Although null is not good - with the current code, it seems that null has to be used.

Did my proposal from yesterday not work out? ^^

} else {
return null;
}
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add filter(result -> result != null) to avoid returning an Optional containing a null value - or does this not happen? Did you add test cases for some of the null branches? ^^ - Maybe, you need to use wiremock.

con.connect();

// Throws FileNotFoundException if DOI doesn't exist
con.getInputStream();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the getInputStream() be wrapped in try-with-resources?

Shouldn't the openConnection() be wrapped in try-with-resources?

// Expected URL example:
// https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.116.061102
// Expected parts: "https://journals.aps.org/prl/", "10.1103/PhysRevLett.116.061102"
String[] urlParts = con.getURL().toString().split("abstract/");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comment. Think, I would also have implemented it like that. Others would have used Regular Expressions. However, since this is simple, it is good to go.

ALTERNATIVE code - fosters reuse -- however, appears to be more complex

return org.jabref.model.entry.identifier.DOI.findInText(con.getURL.toString()).map(DOI::getDOI).orElseThrow(ew FileNotFoundException("Unexpected URL received"));

ALTERNATIVE 2: Why not returning Optional<DOI> here and NOT throwing any exception? In the code above, you can then continue with map instead of throwing unchecked exceptions.

unauthorized.setField(StandardField.DOI, "10.1103/PhysRevLett.89.127401");

// Unavailable article returns empty
BibEntry notFoundByDoi = new BibEntry();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

USe

new BibEntry().withField(StandardField.DOI, "10.1016/j.aasri.2014.0559.002");

Then you can also directly put it into line 52 - and add the comment above line 52 -- save the variable

This can also be done for the other test cases... try to only return the Stream and put the expected results and requests there.

return Stream.of(
    // Standard DOI works
    Arguments.of(
      Optional.of(new URL("https://journals.aps.org/prl/pdf/10.1103/PhysRevLett.116.061102")),
      new BibEntry().withField(StandardField.DOI, "10.1103/PhysRevLett.116.061102)),

    // DOI in lowercase works
    Arguments.of(
...

@koppor
Copy link
Member

koppor commented Feb 28, 2020

The codacy report is because of a PMD false positive. See pmd/pmd#1175 for details.

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR. I've only one suggestion for improvement.

A general remark: I would be nice if we could simply use their API. Would reduce payload costs on our and on their side. @koppor can you try to request an API key (probably need to write an email as were are not a contributing organization). Thanks!

public Optional<URL> findFullText(BibEntry entry) throws IOException {
Objects.requireNonNull(entry);

return entry.getDOI().map(doi -> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code is more complicated due to functional style (which needs to catch exceptions etc). Thus either use a normal version (with quick exit ala if(entry.getDoi().isEmpty()) return empty() else ... ) or alternatively try to reduce the exceptions thrown. For example, the FileNotFoundEx is only thrown to signal the the id is empty (if I'm not mistaken). A prefered option would be to make getId return optional<string> and than use flatMap instead of map.

In summary, the functional style is ok if the code could be refactored such that the following works:

return entry.getDOI().flatMap(this::determineId).flatMap(this::determinePdfUrl);

```

@koppor
Copy link
Member

koppor commented Mar 1, 2020

@tobiasdiez Reading https://github.com/augustjanse/jabref/blob/7a855dc0fc28df6ac546ae289367522c47dad643/src/main/java/org/jabref/logic/importer/fetcher/ApsFetcher.java#L26 and the description of the PR, we don't need any API key. I think, this is a good thing, isn't it? Is your idea to rewrite the code to rely on an API key?

@tobiasdiez
Copy link
Member

Yes, the code works right now without an API key. However, it's kind of hacky and relies on a certain url format scheme which may break at any point.

I think it doesn't hurt to ask if we can get access to the official API. If it doesn't work, then we can still go ahead with the current workaround.

@augustjanse
Copy link
Contributor Author

Thanks for your further feedback. This functional style is a bit new to me, so excuse me for falling into what I can only guess are common pitfalls, but your comments are very helpful. I'll try to do the changes in a few days.

Regarding the API: @tobiasdiez is right about the current solution being a hack. However, my understanding is that a FulltextFetcher must return a direct link to the PDF that doesn't need any headers with the request. If so, I don't think the API can supply such a link even with an API key or a CHORUS token. My comment might have been misleading on this part.

(The ScienceDirect fetcher got such a link by first making a request with the key in its header, and then extracting the direct link from the response. But from what I can see, it's not anywhere in any of the formats.)

@sambo57u
Copy link

sambo57u commented Mar 3, 2020

I am not sure this is relevant here but fetching bibliography entries from APS using DOI does not bring their page number. This has been a problem for a few years and I contacted APS and they said they are working on it and it was due to some indexing issue. That was a few years ago and nothing happened. e.g. try fetching 10.1103/PhysRevC.101.014611

@koppor koppor force-pushed the master branch 5 times, most recently from b8ef7b7 to 21c6e5e Compare March 4, 2020 17:02
@koppor koppor changed the title Add a fetcher for APS [WIP] Add a fetcher for APS Mar 6, 2020
@koppor
Copy link
Member

koppor commented Mar 17, 2020

Decision: Go without key; in a follow-up PR switch to new AP

Use withField instead of setField.

@Siedlerchr Siedlerchr closed this Mar 17, 2020
@Siedlerchr Siedlerchr mentioned this pull request Mar 17, 2020
5 tasks
koppor pushed a commit that referenced this pull request Apr 25, 2022
76b4268 Style for Acta Physica Sinica (物理学报) (#6009)
604de6f MLA Publication Date update (#5927)
41ce2d4 Update netherlands-journal-of-geosciences-geologie-en-mijnbouw.csl (#6027)
ad08f5d Adds space after title writing in ABNT style file (#6031)
0b5c1c2 Create CRCAO Light (#5935)
2f42b8c Create annals-of-laboratory-medicine.csl (#5959)
80a9506 Create annual-review-of-linguistics (#5992)
0fb9c40 Update the-journal-of-egyptian-archaeology.csl (#6028)
5ff53b1 Update guide-des-citations-references-et-abreviations-juridiques.csl (#6002)
c1c0212 Update society-for-american-archaeology.csl (#5919)
129ef3c Update administrative-science-quarterly.csl (#5991)
8bc22bd Create multilingual-matters.csl (#5955)
aca84fd Create expert-reviews-in-molecular-medicine.csl (#5961)
3c4ddc0 Create ethnobiology-letters.csl (#5986)
c301e04 Update heidelberg-university-faculty-of-medicine.csl (#5932)
a8c4396 Update tyndale-bulletin.csl (#5949)
c18cbcf Bluebook hotfix
d950b2b  Create incontext-studies-in-translation-and-interculturalism.csl (#5907)
7cfc106 Bump nokogiri from 1.13.2 to 1.13.4 (#6016)
0baa43a Liebert update (#6026)
41ca2b3 Bluebook Add space for ibid  (#6025)
6707a37 Update american-journal-of-botany.csl (#5954)
926fad5 Update boletin-de-pediatria.csl (#6024)

git-subtree-dir: buildres/csl/csl-styles
git-subtree-split: 76b4268
Jonathan-Oliveira pushed a commit to Jonathan-Oliveira/jabref that referenced this pull request May 7, 2022
76b4268 Style for Acta Physica Sinica (物理学报) (JabRef#6009)
604de6f MLA Publication Date update (JabRef#5927)
41ce2d4 Update netherlands-journal-of-geosciences-geologie-en-mijnbouw.csl (JabRef#6027)
ad08f5d Adds space after title writing in ABNT style file (JabRef#6031)
0b5c1c2 Create CRCAO Light (JabRef#5935)
2f42b8c Create annals-of-laboratory-medicine.csl (JabRef#5959)
80a9506 Create annual-review-of-linguistics (JabRef#5992)
0fb9c40 Update the-journal-of-egyptian-archaeology.csl (JabRef#6028)
5ff53b1 Update guide-des-citations-references-et-abreviations-juridiques.csl (JabRef#6002)
c1c0212 Update society-for-american-archaeology.csl (JabRef#5919)
129ef3c Update administrative-science-quarterly.csl (JabRef#5991)
8bc22bd Create multilingual-matters.csl (JabRef#5955)
aca84fd Create expert-reviews-in-molecular-medicine.csl (JabRef#5961)
3c4ddc0 Create ethnobiology-letters.csl (JabRef#5986)
c301e04 Update heidelberg-university-faculty-of-medicine.csl (JabRef#5932)
a8c4396 Update tyndale-bulletin.csl (JabRef#5949)
c18cbcf Bluebook hotfix
d950b2b  Create incontext-studies-in-translation-and-interculturalism.csl (JabRef#5907)
7cfc106 Bump nokogiri from 1.13.2 to 1.13.4 (JabRef#6016)
0baa43a Liebert update (JabRef#6026)
41ca2b3 Bluebook Add space for ibid  (JabRef#6025)
6707a37 Update american-journal-of-botany.csl (JabRef#5954)
926fad5 Update boletin-de-pediatria.csl (JabRef#6024)

git-subtree-dir: buildres/csl/csl-styles
git-subtree-split: 76b4268
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Full text search for American Physical Society (APS) journals
5 participants