Skip to content

Welcome to PRAGMA Data Service Prototype

Kunalan Ratharanjan edited this page Nov 19, 2018 · 8 revisions

This document describes the current state of PRAGMA Rice Genomics Prototype and its access points. Find the current prototype is shown below.

Both the Data Identity GUI and Galaxy Workflow Server run in a PRAGMA VM located at SDSC, California. All the users of this prototype enter through this single VM to carry out the analysis. IU, AIST, and SDSC cooperatively have been responsible for the ongoing availability of this VM. The results of analysis for all users is sent to a public repository (MongoDB database) living at AIST, Japan. In the next step, through continuous collaboration with the International Rice Research Institute (IRRI) and Advanced Science and Technology Institute (ASTI), Philippines, we will further deploy data service and Galaxy instance on ASTI computing resources.

As the last step of DO publishing, when results are created, before they are sent off to the data store, a handle is retrieved by contacting the RPID testbed. The RPID testbed runs a local handle server that resolves handles assigned to the data products produced from the Rice Genomics Prototype.

PRAGMA Data Service Prototype access points

Home page for IRRI data products:

http://rocks-53.sdsc.edu:8079/testGUI/irri-index.html

The Galaxy frontend (The front-end interacts with the Galaxy Workflow Server, also installed within the same VM):

http://rocks-53.sdsc.edu:8080/

The backend of PRAGMA Data Repository (MongoDB) is hosted in AIST VMs. An example query is:

http://rocks-53.sdsc.edu:8079/testGUI/irri-search.html?DataTypePID=20.5000.343/1af9b7467412d3982998&DataTypeName=IRRI%20Rice%20Genomes%20tassel%20workflow

The PRAGMA Rice Genomics prototype accesses the RPID Testbed for PID resolution services. RPID [1] is funded by the US National Science Foundation under Grant No. 1659310. RPID project details can be accessed by the link below.

https://rpidproject.github.io/rpid/

The URLs of RPID Handle Service and Data Type Registry Services are listed as below:

PID Kernel Information type definition records

  • PID KI Type Definition PID: 20.5000.347/e099b1a0e62f493a9e48
  • Galaxy DOs metadata type definition PID: 20.5000.347/1af9b7467412d3982998
Type Record Name Content Format Mandatory? Description Record Tag
creationDate ISO Date YES Specify creation date and time of DO;Use ISO-8601 format, e.g. YYYY-MM-DD HH:MM:SS PID KI Type
digitalObjectLocation URL YES Pointer to the content object location (pointer to DO’s landing page URL) PID KI Type
metadataLocation URL YES Pointer to the DO’s metadata object PID KI Type
checksum STRING YES MD5 Checksum of DO PID KI Type
predecessor IDENTIFIER NO List of DO's predecessor PIDs to describe linkage between DOs PID KI Type
successor IDENTIFIER NO List of DO's successor PIDs to describe linkage between DOs PID KI Type

This prototype builds a framework of a repository and persistent identification services for long-term access and discoverability of heterogeneous data objects across scientific domains. The PID Kernel Information, which is a concept brought by RDA PID KI Working Group [2] and Data Type Registry(DTR) [3] together improve sharing and interoperability of scientific DOs by embedding agreed upon minimum metadata in Persistent Identifiers (PID). Such functionality enables easier harvest of data objects from applications and assign PIDs for persistent access, which provides reusable and reproducible features to scientific outcomes. We propose a PID KI type of Galaxy Tassel 5 DOs as listed in table and develop a smart client to enable automatic PID KI capture during Galaxy workflow runtime.

Acknowledgements

We thank members of PID Kernel Information WG for discussions about PID KI usage and principles; We thank RPID Testbed at IU to provide Handle/DTR services for our experiments.

References

[1]. Robust Persistent Identifier Testbed (RPID) Project, https://rpidproject.github.io/rpid [Online Access, Mar. 28th, 2018]

[2]. PID Kernel Information WG Case Statement, https://www.rd-alliance.org/group/pid-kernel-information-wg/case-statement/pid-kernel-information-wg-case-statement [Online Access, Mar. 28th, 2018]

[3]. Data Type Registry WG Case Statement, https://www.rd-alliance.org/sites/default/files/case_statement/DTR2%20Case%20Statement_Final.pdf [Online Access, Mar. 28th, 2018]