Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manual publication search by title can turn up publications that cannot be added #1219

Open
peetucket opened this issue Jan 4, 2021 · 4 comments
Labels

Comments

@peetucket
Copy link
Member

peetucket commented Jan 4, 2021

The sitatuion: an already harvested publication has slightly bad metadata. A user declines this publication, then uses the "search by title" feature to find it again. This executes a new search against WoS, turning up an updated version of the publication. But when you add it to the profile, our code realizes the publication already exists locally and simply connects the existing publication with the profile instead of adding the new one.

We may want to consider forcing the "search by title" to find and return local publications before going off to WoS to avoid confusion. Curiously enough, this is already the case when users run a manual search by PMID or DOI.

We may also want to consider the ability of forcing an update of an existing publication in some way, see #163

See https://github.com/sul-dlss/sul_pub/blob/master/app/controllers/publications_controller.rb#L145 and https://github.com/sul-dlss/sul_pub/blob/master/app/controllers/publications_controller.rb#L149

which then hit

https://github.com/sul-dlss/sul_pub/blob/master/lib/pubmed/fetcher.rb#L9 for pmid searches
and https://github.com/sul-dlss/sul_pub/blob/master/lib/doi_search.rb#L11 for doi searches

directly hitting our local publication table before going off to remote services.

By contrast, when searching by title, we first go off to WoS before searching locally:

https://github.com/sul-dlss/sul_pub/blob/master/app/controllers/publications_controller.rb#L154-L160

@peetucket
Copy link
Member Author

An example from a support request:

a=Author.find_by(cap_last_name:'Sigurdson')

p=Publication.find_by(pmid:'32540390')
# pulled from MEDLINE database of WoS in June 2020 without the initial

contrib = Contribution.find_by(cap_profile_id:a.cap_profile_id, publication_id: p.id)
# it is denied and set to private

# Manual title search pulls a new record that includes the author's middle initial since it runs a new search against WoS and ignores the current publication record .. It finds the record in the WoS database with the initial in the name:

wos_matches = WebOfScience::Queries.new('WOS').user_query('TI="Redundant meta-analyses are common in genetic epidemiology"').next_batch.to_a
wos_matches[0].pub_hash

# But re-addding the record finds the old record because this method eventually just associates the existing pub instead of creating a new one (by design) no matter which source record we use… (namely line 231, which goes to https://github.com/sul-dlss/sul_pub/blob/master/lib/web_of_science/record.rb#L132 to find the match before associating it)

https://github.com/sul-dlss/sul_pub/blob/master/app/controllers/authorships_controller.rb#L222-L236

# We have two WoS source records though, one for the old Medline record, one for the new WoS record:
wossr = WebOfScienceSourceRecord.where(doi: "10.1016/j.jclinepi.2020.05.035") # two results

wossr[0].publication_id == p.id # true, matches the publication without the initials
wossr[1].publication_id # nil, dangling WoS source record for the new version with initials

@peetucket
Copy link
Member Author

peetucket commented Jan 4, 2021

My response to the original support request shown below:

The original publication without the initial was harvested from the MEDLINE database from the Web of Science.  It is currently marked as "declined", which explains why it is not showing up in the inbox (I didn't move it into that state, so someone must have declined it).  When you search for the publication by title, our code runs a brand new search against the Web of Science, turning up a newer version of the publication fron the Web of Science Core Collection database in the Web of Science, which now has the initial in the name (for whatever reason).  However, when you then ask to add this publication to a profile, our code then finds a match against the existing publication record  (since they both have the same PMID and DOI) and then just reconnects the original publication without the initial.  This is by design, to prevent duplicate publication records from being created needlessly … the logic being if it has the same primary identifier (be it a PMID, DOI or other primary identifier), it must be the same.  It doesn't account for updated publications.

This does pose problems though like you note above, as it prevents a user from accessing a newer version of the publication, even if that newer version shows up via a manual search.  

So for this particular case:

1. We could put the original publication for Dr. Sigurdson back into the new state, to get it back in their inbox.
2. We could manually remediate the publication record to include the initial.

We could consider for future work having the manual search turn up existing local publication records before searching for new ones at the Web of Science, which would reduce the confusion, though this still blocks newer versions of the same publication from being added.  Allowing for existing publication updates is another possible scope of work, though could end up being somewhat complicated.

@peetucket
Copy link
Member Author

Support request:

I am reviewing a ticket from a user who indicated that an early version of a publication / citation was waiting in his Inbox for review.  He indicated that since that added, the final version had come out and his name was corrected in the latest version.  In checking what was in his Inbox, I could see the issue he was mentioning where his middle initial was missing in the author section of the citation.
 
I first tried the CAP search and find a publication option using the PMID he provided - PMID 32540390.  The result of this search was the same citation as in his Inbox, missing his middle initial:
 
Redundant meta-analyses are common in genetic epidemiology. Journal of clinical epidemiology 
Sigurdson, M., Khoury, M. J., Ioannidis, J. P. 
2020
 
I then tried using the title for the search.  This result was promising as it appears to return the correct citation with his middle initial – for Sigurdson, M. with Sigurdson, M. K.:
 
Redundant meta-analyses are common in genetic epidemiology JOURNAL OF CLINICAL EPIDEMIOLOGY 
Sigurdson, M. K., Khoury, M. J., Ioannidis, J. A. 
2020; 127: 40–48
 
However, when I select to add the above version of the citation to his profile by clicking on the ‘+add’ option, the citation is added but changes to the older version shown further above without the middle initial and other page information shown in the second citation above.  Also, even though I did not select to approve or deny his publication from his Inbox, it disappeared from the Inbox at some point in my investigation after adding and undoing the add for the search results options above.
 
Can you investigate on the SUL side to see what may be causing the citation to revert to the old one after I select to add the latest one shown above from the search results?  If you have access to our UAT environment, https://profiles-uat.stanford.edu/intranet, you can recreate the issue above using the profile for Matthew Sigurdson.
 
Here is some additional information:
 
-	Select to edit the profile for Matthew Sigurdson.
-	Select to +add new publication from the Publications section.
-	In the search box, enter the title of the publication “Redundant meta-analyses are common in genetic epidemiology” and click on the ‘Search’ button.
-	See the result listed:
 
-	Next, click on the ‘+Add’ option shown in the picture above.
-	See now how the result listed changes to the old citation after the add:
 
The new citation shown further above that was shown in the search results is now reverted to the old one without the middle initial for Sigurdson and without the page information.

@peetucket peetucket changed the title Manual publication search by title can turn up publications that cannot be added. Manual publication search by title can turn up publications that cannot be added Jan 4, 2021
@peetucket
Copy link
Member Author

Support case identified above manually remediated by updating the publication to add the author's initial, as well as set the contribution back to a new state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant