Skip to content

annelhote/softMeScite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

softMeScite

During the hackaton on Mapping the Impact of Research Software in Science, we embarked on a project aimed at harmonizing the data models of three distinct software citation datasets: SoMeSci, Softcite, and RRID. Our endeavor was geared towards laying down the groundwork for the creation of a gold dataset.

Project Goal

By endeavoring to harmonize the data models of SoMeSci, Softcite, and RRID, we aim to construct a gold dataset that can significantly contribute to the automated extraction of software citations from scientific literature.

Datasets Overview

Feature SoMeSci Softcite RRID
Description A 5 Star Open Data Gold Standard Knowledge Graph of software mentions in scientific articles. A gold dataset of software mentions in biomedical and economic research publications. A portal for obtaining and exploring Research Resource Identifiers (RRIDs) for referencing research resources.
Data Model Overview Contains annotations with relation labels for additional information such as version, developer, URL, or citations. Distinguishes between different types like applications, plugins, or programming environments, and different types of mentions like usage or creation. Contains metadata of annotated research publications and software mentions identified in these publications. Further annotated with details about the software including software version, publisher, and access URL. Uses RRIDs - persistent and unique identifiers for referencing a research resource. Identifiers are prefixed with "RRID:" followed by a tag indicating the source authority.
Number of Software Mentions 3,756 software mentions in 1,367 PubMed Central articles. 5,134 uniq software mentions in 4,971 research publications (v2.0, 2023). 78,140 software mentions.
Domain Life sciences Life sciences and social sciences. Biomedical literature and other domains referencing the generation or use of research resources.
Usage Provides training samples for Named Entity Recognition, Relation Extraction, Entity Disambiguation, and Entity Linking. Designed for supervised learning based scholarly text mining, software entity recognition in text, and investigating how software has been used for research. Promotes research resource identification, discovery, and reuse. Facilitates citation of resources in biomedical literature and other places that reference their generation or use.

Resources

  • openaccess_rrid_links.txt: Collection of the links of open access publications with software mentions from the RRID repository.
  • Registry of RRID mentions. DOI: 10.5281/zenodo.10048228
  • openaccess_selection.py: Script to select open access publications (and links to the RRID annotations) from the Registry. It uses the PubMed API to detect open access publications.

The collection of links will be used in the future to extract the sentences of software mentions.

About this project

This repository was developed as part of the Mapping the Impact of Research Software in Science hackathon hosted by the Chan Zuckerberg Initiative (CZI). By participating in this hackathon, owners of this repository acknowledge the following:

  1. The code for this project is hosted by the project contributors in a repository created from a template generated by CZI. The purpose of this template is to help ensure that repositories adhere to the hackathon’s project naming conventions and licensing recommendations. CZI does not claim any ownership or intellectual property on the outputs of the hackathon. This repository allows the contributing teams to maintain ownership of code after the project, and indicates that the code produced is not a CZI product, and CZI does not assume responsibility for assuring the legality, usability, safety, or security of the code produced.
  2. This project is published under a MIT license.

Code of Conduct

Contributions to this project are subject to CZI’s Contributor Covenant code of conduct. By participating, contributors are expected to uphold this code of conduct.

Reporting Security Issues

If you believe you have found a security issue, please responsibly disclose by contacting the repository owner via the ‘security’ tab above.

Authors

Anita Bandrowski, Esteban Gonzalez, Tom Honeyman, James Howison, Anne L'Hôte, Arcangelo Massari, David Schindler

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages