title | author | date | output |
---|---|---|---|
Author preprocessing summary |
Leo Lahti |
2019-10-09 |
markdown_document |
- 583071 unique authors These final names capture all name variants from the custom author synonyme table, and exclude known pseudonymes (see below). If multiple names for the same author are still observed on this list, they should be added on the author synonyme table.
- 3841478 documents have unambiguous author information (71%).
- 1148 unique pseudonymes are recognized based on custom pseudonyme lists.
- 721 discarded author names This list should not include any real authors (if it does, please send a note to the admin). The stopword lists are considered when discarding names.
- Author name conversions Non-trivial conversions from the original raw data to final names.
Top-20 uniquely identified authors and their productivity (title count).
Authors with ambiguous living year information - can we spot here cases where these are clearly known identical or distinct authors? Should also add living year information from supporting sources later.
448127 authors with missing life years (Life year info can be augmented here)
3.9728 × 104 authors with ambiguous life years Some of these might be synonymous and could be added to author synonyme list (the first term will be selected for the final data)
Ordered by productivity (number of documents))
1704965 documents (32%) have author age at the publication year. These have been calculated for documents where the publication year and author life years (birth and death) are available, and the document has been printed during the author's life time.
Title count versus paper consumption (all authors):