Skip to content
shm7 edited this page Apr 10, 2024 · 15 revisions

HPC‐ED Searching, Publishing, and Metadata

Introduction

This set of documents covers HPC-ED metadata publishing software tools and methods, with examples.

HPC-ED uses multiple catalogs to support development and testing efforts, each implemented using a specific version of the metadata. We anticipate a single production catalog shared by all collaborators. Catalogs are implemented using Globus Search Indexes. To browse HPC-ED catalogs (Globus Indexes) and their metadata visit: https://search-pilot.operations.access-ci.org/

In Globus Search publishing is called ingesting documents into an index.

HPC-ED catalogs are public and can be searched without authorized credentials. Publishing requires authorized credentials. We are currently issuing a single publishing (writer) credential to each partner or beta testing institution. A coordinator at each institution is responsible for sharing that credential with other developers writing publishing software. A credential is granted writer access to specific catalogs/indexes. A credential has two parts, a client id and a client secret. Be very careful NOT to store credentials in code that is on GitHub or other public locations.

About the project

This project was started when a group of HPC community members got together to discuss how difficult it was to find and share training materials. We created a survey to learn more about how others felt about this issue. The results were very clear that a lot of people had noticed the same issue. You can see the results of the survey here: https://doi.org/10.22369/issn.2153-4136/14/2/4

Survey_interest

The HPC-ED project is currently in its pilot phase (as of September 2023) and we’re working to develop the tools to make this federated catalog available to the HPC education & training community.

We are planning to seek continuing funding after our 1-year pilot is over and intend to expand this project accordingly. We hope the community continues to support this project and provide ideas and feedback so we can continue improving the sharing and discovery tools.

HPC-ED is a CyberTraining Pilot Project (#2320977) supported by the National Science Foundation.

Overview of the proposed process

Share_Discover

Glossary of terms

Discover - refers to the process of finding or identifying items within the catalog that match specific criteria or meet a particular need. This typically involves searching the catalog using keywords, filters, or other search criteria to locate items of interest.

Federated catalog - a system or approach that integrates metadata and information about data resources from multiple, possibly distributed, sources. It provides a unified view or search interface to discover, access, and manage data across these disparate sources

Federated Catalog

FAIR - stands for Findable, Accessible, Interoperable, and Reusable. These principles are used to guide the management and sharing of data and other digital resources to ensure that they can be easily discovered, accessed, understood, and used by both humans and machines.

Metadata - data that provides information about other data. Metadata describes various aspects of a piece of data, such as its content, format, structure, and context.

Metadata_RDA

Publish - refers to the process of making items or resources available for others to discover and access, often with associated metadata and access permissions. When an item is published in a catalog, it means that it has been officially added to the catalog's collection and is now accessible to users who search or browse the catalog.

RESTful API - (Representational State Transfer) is an architectural style for designing networked applications. It relies on a stateless, client-server communication protocol, typically HTTP, and uses standard HTTP methods (such as GET, POST, PUT, DELETE) for data manipulation.