-
Notifications
You must be signed in to change notification settings - Fork 24
Reduce import/merging errors #557
base: master
Are you sure you want to change the base?
Conversation
We already imported a lot of papers with bogus author names such as "&NA;" and in there are some 17 million lines with null z_author in the latest unpaywall dump (2018-09-24).
* Don't merge more than 10 papers together. * Always consider the year in comparisons, full date if available. dissemin#512
Didn't test yet! |
@@ -405,7 +405,8 @@ def save_doi_metadata(self, metadata, extra_orcids=None): | |||
if metadata is None or type(metadata) != dict: | |||
raise ValueError('Invalid metadata format, expecting a dict') | |||
if not metadata.get('author'): | |||
raise ValueError('No author provided') | |||
# BareName has "last" as mandatory field | |||
metadata['author'] = {'family': 'N/A'} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is never going to be executed because it is just after an exception is raised.
This would need to be motivated by a careful analysis of the existing conflicts, I think (taking dissemin papers with a lot of oai records and understanding how we got to this situation). |
To be more precise: first, thanks for the PR, it's a very important issue. Also: I think there are cases where records were merged even though the fingerprints were completely different - I want to investigate this (maybe a common wrong DOI), but I don't have an example at hand at the moment. |
@nemobis Travis is functional again, so the build log should help you find bugs in the PR :) |
Antonin Delpeuch, 03/02/19 16:27:
@nemobis <https://github.com/nemobis> Travis is functional again, so the
build log should help you find bugs in the PR :)
Thank you, will give a look by the weekend.
|
Some papers were skipped or overzealous clusters were created.
See issue #512