Skip to content

Latest commit

 

History

History
98 lines (84 loc) · 9.84 KB

RPPR.md

File metadata and controls

98 lines (84 loc) · 9.84 KB

Budget

Don't edit this - the RPPR generator populates this section

Research Design

Finding resources of translational relevancy across the CTSA program. This platform aims to reveal and connect expertise, services, educational resources, datasets, software tools, and other translational artifacts.

Methodology

Organizations expend substantial effort maintaining local databases of effectively the same data - people, publications, grants, etc. and the challenge of scholar disambiguation and longitudinal data collection and tracking remains unsolved. The Science of translational science platform performs large-scale data integration from a large variety of sources using both structured and unstructured information. These data elements are indexed using semantic technologies for querying and discovery. A user interface will allow querying and exploration. A key responsibility of the to-be-hired web developer will be refining the usability of the "front door" search interface. Finally, widgets will be built to deliver context-specific content to CTSA hubs, CLIC Forums, the Competitions review software, etc. A major component of this (scheduled for version 1) is a search interface widget that can be embedded in hub web sites (in addition to other interested groups, e.g., CLIC).

A shared data environment in the form of a warehouse of research data was strongly endorsed by participants in the most recent PEA Community meeting. Collaborative population and maintenance of common data would reduce local hub effort, improve data quality, and serve as an exemplar of collaborative activity for the CTSA program and NIH programs overall. Substantial effort has been spent on this topic by hubs establishing priorities and developing manual and semi-automated processes which can help to guide efforts toward automation.

The 4DM Project (Drug Discovery, Development and Deployment Map) created by NCATS has generated substantial interest in understanding the interdependencies of translational research and the entities involved. The 4DM prototype will be extended to incorporate relevant backing data from the data warehouse to display when selecting a vertex in the visualization graph. Ultimately, we can leverage these data for a variety of purposes at hubs, including workflows for improved data quality, process efficiency, automation, benchmarking, etc.

We will use the Science of Translational Science Platform to demonstrate addressing the common CTSA need of longitudinal scholar data tracking and reporting.

Expected Outcomes

The primary outcomes for this project are:

  • a warehouse integrating the spectrum of expertise and services described above, and
  • a suite of services that hubs and others (e.g. CLIC) can interrogate and/or embed into their local information environments.
  • long-term, we will migrate the production instances of both the warehouse and the search services to the NCATS cloud. Given current activities by the CD2H cloud team, this will likely be towards the end of year 3 or beginning of year 4. The overall result will hence be a persistent resource in the NCATS cloud with optional customizable deployment of the interface in a hub's local environment.

Deliverables

  • SciTS Warehouse, Version 1
    • Content includes CD2H resources/status, NIH FOAs, ClinicalTrials.gov personnel/expertise, REDCap instrument libraries, CTSA-relevant GitHub repositories/personnel/expertise
    • CTSA Consortium access via JDBC and GraphQL interfaces
  • SciTS Warehouse, Version 1.5
    • Extended 4DM connectivity to CD2H faceted search engine using UMLS CUIs
    • Researcher disambiguation services using tools from Harvard, Weill Cornell and Iowa linking hubs and their investigators to publications and grants
  • SciTS Warehouse, Version 2
    • Content extended to include the hub service catalog and expanded set of widgets

Timeline (monthly)

  • 6/1 - Version 1 of the SciTS warehouse
  • 6/1 - Integration of the data generated by the WashU CDEK regarding disambiguated organizations, molecules, etc. from ClinicalTrials.gov into SciTS warehouse
  • 6/1 - Landscape analysis of scholar reporting best practices
  • 8/1 - Deployment of a set of researcher disambiguation services
  • 9/1 - Full expansion to semantic search within 4DM (via UMLS CUIs) with the CD2H faceted search engine
  • 9/1 - Version 1.5 of the SciTS warehouse
  • 9/20 - Scholar tracking guidebook, dashboard, use cases
  • 1/1 - Deployment of CTSA hub service catalog in the CD2H faceted search engine
  • 1/1 - Coverage of CTSA hub web content harvesting to include partner organizations
  • 1/1 - v. 2.0 of warehouse deployed (incl. Hub service catalog and expanded set of widgets)

Potential Pitfalls and Alternative Strategies

A substantial number of external sources are already being regularly harvested and integrated into SciTS: educational resources (DIAMOND, N-Lighten, BD2K GitHub repositories), data repositories (e.g., DataCite), GitHub repositories, etc. - significantly reducing the overall risk to success in populating SciTS. Key outstanding potential challenges include:

  • successful harmonization of educational resources, simplifying the current diversity of their descriptions,
  • formulation of an effective descriptive framework for services provided by CTSA hubs, and
  • significant services adoption by the hubs and other organizations.

The first two involve primarily modeling and alignment with and by external teams. We have been in continued consultation with the education platform teams and the SPARC team at MUSC to ensure our work aligns well with their plans. The last one is more demonstrating utility to the population of hubs. We have already received strong interest from various hubs for particular services - e.g. from Medical College of Wisconsin for a service listing open funding opportunities relevant to their investigator profiles and from Wisconsin-Madison for a service listing W-M personnel active on CD2H projects. We will continue to recruit suggestions from hubs for services seen as high value to their local environment.

Y3 (July 1, 2019-June 30, 2020) Accomplishments

The following content is from the June 30 - Dec 30, 2019 mid year progress report here. Please add progress for Jan 1 - June 30th, 2020.

  • SciTS Warehouse, Version 1 | Content includes CD2H resources/status, NIH FOAs, ClinicalTrials.gov personnel/expertise, REDCap instrument libraries, CTSA-relevant GitHub repositories/personnel/expertise | CTSA Consortium access via JDBC and GraphQL interfaces

    • Version 1 completed. Data sources included in SciTS:
    • NIH
    • FOAs
    • MEDLINE
    • PubMed Central
    • ClinicalTrials.gov
    • RePORTER
    • Educational Resources
    • N-lighten
    • DIAMOND
    • ERuDITE (BD2K)
    • OHSU GitHub (BD2K)
    • Other
    • VIVO sites
    • Harvard Profiles sites
    • GitHub
    • DataCite
    • SPARCRequest services (MUSC)
    • DataMed (bioCADDIE)
  • All prototypes are available on the CD2H Labs web site:

    • Main SciTS site: labs.cd2h.org/scits
    • CTSAsearch: labs.cd2h.org/search
    • gitForager (supports browsing of harvested GitHub data regarding repositories, organizations and persons): labs.cd2h.org/gitforager
    • SciTs APIs: labs.cd2h.org/scits/apis.jsp
    • Acknowledgements (demo of mapping non-author collaborators to their contributions, using PubMed Central publications): labs.cd2h.org/acknowledgements
  • SciTS Warehouse, Version 1.5 | Extended 4DM connectivity to CD2H faceted search engine using UMLS CUIs | Researcher disambiguation services using tools from Harvard, Weill Cornell and Iowa linking hubs and their investigators to publications and grants:

    • The extended 4DM connectivity is currently on hold pending further development by the NCATS team. UMLS concept code recognition is in place but not directly accessible from end-user interfaces. The disambiguation work was tabled in lieu of accelerating the schedule for developing the framework for harvesting and annotating service information from hub web sites. New data sources deployed during this period include:
      • REDCap instrument library
      • CLIC educational resource repository
      • CTSA hub services derived from web site text mining
    • Additional prototypes added in this version:
      • Viva (VIVO-like profiling platform supporting integration of standard profile data with data drawn from GitHub, Twitter, etc.):
      • Tableau demo (linking SciTS data to CTSA hub demographics using 2017 census data):
      • CTSA Web Search demo (initial prototype for exploring CTSA hub web site content):
  • SciTS Warehouse, Version 2 | Content extended to include the hub service catalog and expanded set of widgets:

    • Version 2 is substantially complete from the perspective of extending the SciTS data model, information harvesting, and search access for hub services. We have developed a framework enabling multiple taxonomies to be defined and annotations generated by an internal web application. This framework has been used to generate a taxonomy de novo from the hub web pages and one from NIH classification sources. The de novo taxonomy is currently deployed in CTSAsearch. The indexing workflow is now configured to allow swapping taxonomies in and out as needed. Consortium demand for widgets has not yet occurred, but we have the framework in place to easily build these on demand, using our GraphQL engine and a JSP tag library supporting custom tags for access. New data sources deployed during this period include:
      • ORCiD researcher profiles
      • YouTube videos published by relevant constituencies (hubs, NIH, …)
    • Additional prototypes added in this version: