Vocabulary + HTML for describing the schema.org extension
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
html/voc
research
.gitignore
Dockerfile
NOTES
README.md
docker-compose.yml
p418Vocabulary
p418Vocabulary.go

README.md

documentation v1.0.3

Table of Contents

UPDATES

Mar 21 2018 Adding an example of how to attach physical samples, and their IGSN identifiers, to a Dataset. See: Attaching Physical Samples to a Dataset

Mar 15 2018

Adding the schema:identifier field can be done in three ways - a text description, a URL, or by using the schema:PropertyValue type to describe the identifier in more detail. We highly recommend using the schema:PropertyValue as the use of text or url does not get indexed properly by Google and other JSON-LD testing tools due to an issue with the properties definition.

Mar 1 2018

We've updated our documentation for fixing our use of schema:additionalType and schema:propertyID to use fully-qualified URLs. This corrects the mistake of using a vocabulary prefix such as gdx: to make reference to a vocabulary class. An example of this fix for schema:additionalType:

OLD

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  ...
  "additionalType": "gdx:ResearchRepositoryService"
  ...
}

NEW

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  ...
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService"
  ...
}

This pattern is also applied to the schema:propertyID in this way:

OLD

{
  "@context": {
    "@vocab": "http://schema.org/",
    "datacite": "http://purl.org/spar/datacite/"
  },
  ...
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "datacite:doi",
    "value": "10.13039/100000141",
    "url": "https://doi.org/10.13039/100000141"
  },
}

NEW

{
  "@context": {
    "@vocab": "http://schema.org/",
    "datacite": "http://purl.org/spar/datacite/"
  },
  ...
  "identifier": {
    "@type": "PropertyValue",
    "propertyID": "http://purl.org/spar/datacite/doi",
    "value": "10.13039/100000141",
    "url": "https://doi.org/10.13039/100000141"
  },
}

About

Serves the vocabulary in JSON-LD at https://geodex.org/voc/.

Goals

  1. To produce quality schema.org markup with additional extensions to schema.org classes to help improve harvesting technologies.

  2. Produced markup will pass the Google Structured Data Testing Tool with 0 errors.

Approach

The preferred format for schema.org markup by its harvesters is JSON-LD. For a primer on JSON-LD, see https://json-ld.org/

To produce quality schema.org, all extensions to schema.org classes will be made through the use of the recommended property schema:additionalType.

The gdx: vocabulary will extend schema.org using rdfs:subClassOf in it's formal ontology, but in schema.org this doesn't translate into the use of JSON-LD's @type as traditional RDF publishing would encourage.

Vocabulary Prefixes

Prefix Vocabulary URI
schema: https://schema.org/
gdx: https://geodex.org/voc/
geolink: http://schema.geolink.org/1.0/base/main#
earthcollab: https://library.ucar.edu/earthcollab/schema#
vivo: http://vivoweb.org/ontology/core#
geo-upper: http://www.geoscienceontology.org/geo-upper#
datacite: http://purl.org/spar/datacite/

schema: the defacto vocabulary for publishing structured data in web pages for search engine harvesting

gdx: the P418 project's vocabulary

geolink: an EarthCube Building Block focusing on describing discovery-level metadata for datasets

earthcollab: an EarthCube Building Block focusing on extensions to the ViVO ontology

vivo: the ViVO ontology

geo-upper: a segment of the Geoscience Standard Names Ontology, an EarthCube product. This ontology could be used when describing dataset variables.

datacite: describes persistent identifier schemes like DOI, ARK, URI for helping to represent PIDs.

Schema.org Extensions

From P418, the vocabulary we built specifically for addressing gaps in schema.org and other EarthCube and community ontologies, we have:

P418: http://geodex.org/voc/

We also use terms from other vocabularies:

GeoLink

EarthCollab

ViVO (part of EarthCollab)

GeoStandardNames

DataCite

Graphical Notation

The graphs display the classes, properties and literals for producing valid schema.org markup.

Graphical Notation

Back to top

Schema.org JSON-LD

Schema.org's preferred format for markup is JSON-LD. THere are a number of tools that will help build valid schema.org JSON-LD.

Back to top

Describing a Repository

Research Repository Service Vocabulary In schema.org, we model a repository as both an schema:Organization and a schema:Service. This double-typing gives us the most flexibility in describing the characteristics of the organization providing the service and the services offered by the organization. Becuase the Service class in schema.org is very broad, to uniquely identify repositories curating research products, this vocabulary defines an extension to schema:Service as gdx:ResearchRepositoryService.

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO"
  
}

The other fields you can use to describe the Organziation and the Service are:

Research Repository Service - Fields

  • schema:legalName should be the official name of the repository,
  • schema:name can be an acronym or the name typcially used for the repository,
  • schema:url should be the url of your repository's homepage,
  • schema:description should be text describing your repository,
  • schema:sameAs can be used to link the repository to other URLs such as Re3Data, Twitter, LinkedIn, etc.,
  • schema:category can be used to describe the discipline, domain, area of study that encompasses the repository's holdings.
{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  "description": "The Sample Data Repository Service provides access to data from an imaginary domain accessible from this website.",
  "sameAs": [
        "http://www.re3data.org/repository/r3d1000000xx",
        "https://twitter.com/SDRO",
        "https://www.linkedin.com/company/123456789/"
    ],
  "category": [
    "Biological Oceanography",
    "Chemical Oceanography"
  ]
}

(See advanced publishing techniques for how to describe categories/disciplines in more detail than just simple text.)

If you are using the "@id" attribute for your Repository, you can specify the schema:provider of the schema:Service in this way:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  "@type": ["Service", "Organization"],
  "@id": "https://www.sample-data-repository.org",
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  "description": "The Sample Data Repository Service provides access to data from an imaginary domain accessible from this website.",
  "category": [
    "Biological Oceanography",
    "Chemical Oceanography"
  ],
  "provider": {
    "@id": "https://www.sample-data-repository.org"
  }
}

However, if your repository has a situation where multiple organizations act as the provider or you want to recognize a different organization as the provider of the repository's service, schema:provider can be used in this way:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  "description": "The Sample Data Repository Service provides access to data from an imaginary domain accessible from this website.",
  "category": [
    "Biological Oceanography",
    "Chemical Oceanography"
  ],
  "provider": [
    {
      "@type": "Organization",
      "name": "SDRO Technical Office",
      "description": "We provide all the infrastructure for the SDRO"
      ...
    },
    {
      "@type": "Organization",
      "name": "SDRO Science Support Office",
      "description": "We provide all the science support functionality for the SDRO"
      ...
    }
  ]
}

Adding additional fields of schema:Organization:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  "description": "The Sample Data Repository Service provides access to data from an imaginary domain accessible from this website.",
  "category": [
    "Biological Oceanography",
    "Chemical Oceanography"
  ],
  "provider": {
    "@id": "https://www.sample-data-repository.org"
  }
  "logo": {
    "@type": "ImageObject",
    "url": "https://www.sample-data-repository.org/images/logo.jpg"
  },
  "contactPoint": {
    "@type": "ContactPoint",
    "name": "Support",
    "email": "info@bco-dmo.org",
    "url": "https://www.sample-data-repository.org/about-us",
    "contactType": "customer support"
  },
  "foundingDate": "2006-09-01",
  "address": {
    "@type": "PostalAddress",
    "streetAddress": "123 Main St.",
    "addressLocality": "Anytown",
    "addressRegion": "ST",
    "postalCode": "12345",
    "addressCountry": "USA"
  }
}

If this Organization has a parent entity such as a college, university or research center, that information can be provided using the schema:parentOrganization property:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  "description": "The Sample Data Repository Service provides access to data from an imaginary domain accessible from this website.",
  "category": [
    "Biological Oceanography",
    "Chemical Oceanography"
  ],
  "provider": {
    "@id": "https://www.sample-data-repository.org"
  },
   "parentOrganization": {
     "@type": "Organization",
     "@id": "http://www.someinstitute.edu",
     "legalName": "Some Institute",
     "name": "SI",
     "url": "http://www.someinstitute.edu",
     "address": {
       "@type": "PostalAddress",
       "streetAddress": "234 Main St.",
       "addressLocality": "Anytown",
       "addressRegion": "ST",
       "postalCode": "12345",
       "addressCountry": "USA"
     }
   }
  }
}

Back to top

Describing a Repository's Funding Source

To describe the funding source of a repository, you use the schema:funder property of schema:Organization:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  "description": "The Sample Data Repository Service provides access to data from an imaginary domain accessible from this website.",
  "category": [
    "Biological Oceanography",
    "Chemical Oceanography"
  ],
  "provider": {
    "@id": "https://www.sample-data-repository.org"
  },
   "parentOrganization": {
     "@type": "Organization",
     "@id": "http://www.someinstitute.edu",
     "legalName": "Some Institute",
     "name": "SI",
     "url": "http://www.someinstitute.edu",
     "address": {
       "@type": "PostalAddress",
       "streetAddress": "234 Main St.",
       "addressLocality": "Anytown",
       "addressRegion": "ST",
       "postalCode": "12345",
       "addressCountry": "USA"
     },
     "funder": {
      "@type": "Organization",
      "@id": "https://dx.doi.org/10.13039/100000141",
      "legalName": "Division of Ocean Sciences",
      "alternateName": "OCE",
      "url": "https://www.nsf.gov/div/index.jsp?div=OCE",
      "identifier": {
        "@type": "PropertyValue",
        "propertyID": "http://purl.org/spar/datacite/doi",
        "value": "10.13039/100000141",
        "url": "https://doi.org/10.13039/100000141"
      },
      "parentOrganization": {
        "@type": "Organization",
        "@id": "http://dx.doi.org/10.13039/100000085",
        "legalName": "Directorate for Geosciences",
        "alternateName": "NSF-GEO",
        "url": "http://www.nsf.gov",
        "identifier": {
          "@type": "PropertyValue",
          "propertyID": "http://purl.org/spar/datacite/doi",
          "value": "10.13039/100000085",
          "url": "https://doi.org/10.13039/100000085"
         },
        "parentOrganization": {
          "@type": "Organization",
          "@id": "http://dx.doi.org/10.13039/100000001",
          "legalName": "National Science Foundation",
          "alternateName": "NSF",
          "url": "http://www.nsf.gov",
          "identifier": {
            "@type": "PropertyValue",
            "propertyID": "http://purl.org/spar/datacite/doi",
            "value": "10.13039/100000001",
            "url": "https://doi.org/10.13039/100000001"
          }
        }
      }
    }
  }
}

Describing a Repository's Identifier

Some organizations may have a persistent identifier (DOI) assigned to their organization from authorities like the Registry of Research Data Repositories (re3data.org). The way to describe these organizational identifiers is to use the schema:identifier property in this way:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "datacite": "http://purl.org/spar/datacite/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  "description": "The Sample Data Repository Service provides access to data from an imaginary domain accessible from this website.",
  "category": [
    "Biological Oceanography",
    "Chemical Oceanography"
  ],
  "provider": {
    "@id": "https://www.sample-data-repository.org"
  },
  "identifier": {
    "@type": "PropertyValue",
    "name": "Re3data DOI for this repository",
    "propertyID": "http://purl.org/spar/datacite/doi",
    "value": "10.17616/R37P4C",
    "url": "http://doi.org/10.17616/R37P4C"
  }
}

We add the datacite vocabulary to the @context because the Datacite Ontology available at http://purl.org/spar/datacite/ has URIs to describe a DOI, ORCiD, ARK, URI, URN - all identifier scheme that help for disamiguating identifiers. To properly disambiguate a globally unique identifier, 2 pieces of information are needed - 1) the identifier value and 2) the scheme that on which that identifier exists. Some examples of this concept for common identifiers are:

Scheme Value
DOI 10.17616/R37P4C
ORCiD 0000-0002-6059-4651

When describing PIDs, it's important to include both of these pieces for downstream activities like searching and linking resources. FOor example, a user may want to query for all repositories with a DOI identifier or all Datasets authored by a researcher with an ORCiD. These types of filters become more difficult when only the URL to these identifiers are provided. The reason here is that there are multiple URLs for an persistent identifier. On example is the DOI:

So, the best practice is to provide the scheme and value for an identifier, but you can also provide a URL representation using the schema:url property.

Back to top

Describing a Repository's Types of Resources

Research Repository Service - Types of Resources

To describe the types of research resources a repository curates, we use the schema:OfferCatalog. With an extension of gdx:ResearchResourceTypes, we define that the OfferCatalog will be a list of types that are dervied from schema:CreativeWork.

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "datacite": "http://purl.org/spar/datacite/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "schema": "http://schema.org/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  ...
  "hasOfferCatalog":{
    "@type": "OfferCatalog",
    "additionalType": "https://geodex.org/voc/ResearchResourceTypes",
    "itemListElement": [
      {"@type": "Thing", "@id": "schema:Dataset", "name": "Dataset"},
      {"@type": "Thing", "@id": "geolink:PhysicalSample", "name": "Physical Sample" }
    ]
  }
}

Notice, that we use @id to describe the type of resource. To reference schema.org classes using @id properly, we must add schema.org to the @context with a prefix name. Here, we chose schema: in the @context, thus we use schema:Dataset to say that our repository curates resource types of schema.org/Dataset.

Because schema.org does not have a class for a Physical Sample yet, we use teh calss definition from the EarthCube GeoLink vocabulary to specify that this repository curates physical samples. We add geolink: to the @context section, and then specify geolink:PhysicalSample as another @id offered by this repository.

Describing a Repository's Outreach Activities

Research Repository Service - Outreach Activities

To describe the outreach activities of a repository, we again use the schema:OfferCatalog but specify it's schema:additionalType to be gdx:OutreachActivities:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "datacite": "http://purl.org/spar/datacite/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "schema": "http://schema.org/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  ...
  "hasOfferCatalog":[
    {
      "@type": "OfferCatalog",
      "additionalType": "https://geodex.org/voc/ResearchResourceTypes",
      "itemListElement": [
        {"@type": "Thing", "@id": "schema:Dataset", "name": "Dataset"},
        {"@type": "Thing", "@id": "geolink:PhysicalSample", "name": "Physical Sample" }
      ]
    },
    {
      "@type": "OfferCatalog",
      "additionalType": "https://geodex.org/voc/OutreachActivities",
      "itemListElement": [
        {
          "@type": "Action",
          "@id": "gdx:OutreachActivity-Training",
          "additionalType": "https://geodex.org/voc/OutreachActivity-Training",
          "name": "User Training",
          "description": "...",
          "url": "https://sample-data-repository.org/training/user-training"
        },
        {
          "@type": "Action",
          "@id": "gdx:OutreachActivity-UserWorkshop",
          "additionalType": "https://geodex.org/voc/OutreachActivity-UserWorkshop",
          "name": "User Workshops",
          "description": "...",
          "url": "https://sample-data-repository.org/workshops/data-submission-workshops"
        },
        {
          "@type": "Action",
          "@id": "gdx:OutreachActivity-UserSupport",
          "additionalType": "https://geodex.org/voc/OutreachActivity-UserSupport",
          "name": "User Support",
          "description": "...",
          "url": "https://sample-data-repository.org/support/user-support"
        },
      ]
    }
}

These Action items above are not instances of actual events, but specify the type of potential events a repository may hold. To describe a specific schema:Event related to one of these activities, you could publish on a different web page in this way:

{
  "@context": { "@vocab": "http://schema.org/" },
  "@type": "Event",
  "name": "SDRO Data Submission Workshop - Summer 2018",
  "url": "https://sample-data-repository.org/workshops/data-submission-workshops/summer-2018",
  "about": { "@id": "gdx:OutreachActivity-UserSupport" }
  ... goes on to describe the schema.org/Event
}

Describing a Repository's Policies

Research Repository Service - Policies

If your repository has policy documents about access control, terms of use, etc. You can provide those using the schema:publishingPrinciples field. Becuase schema.org does not make a distiction for the types of these documents, P418 has created some class names for some common policy document types. These will help make it clear to users what types of policies you have. If you would like us to add more, please let us know by creating an Issue.

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "datacite": "http://purl.org/spar/datacite/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  ...
  "publishingPrinciples": [
      {
        "@type": "DigitalDocument",
        "additionalType": "https://geodex.org/voc/Protocol-TermsOfUse",
        "name": "Terms of Use",
        "url": "https://www.sample-data-repository.org/terms-of-use",
        "fileFormat": "text/html"
      },
      {
        "@type": "DigitalDocument",
        "additionalType": "https://geodex.org/voc/Protocol-ResourceSubmissionPolicy",
        "name": "How to Get Started Contributing Data",
        "url": "https://www.sample-data-repository.org/submit-data",
        "fileFormat": "text/html"
      }
    ],
  ]
}

Back to top

Describing a Repository's Services

Research Repository Service - Service Channel

For repositories might offer services for accessing data as opposed to directly accessing data files. The schema:Service allows us to describe these services as well as repository searches, data submission services, and syndication services. In this first example, we describe a search service at the repository using schema:ServiceChannel.

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "datacite": "http://purl.org/spar/datacite/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  ...
  "availableChannel": [
    {
      "@type": "ServiceChannel",
      "serviceUrl": "https://www.sample-data-repository.org/search",
      "providesService": {
        "@type": "Service",
        "additionalType": "https://geodex.org/voc/SearchService",
        "name": "SDRO Website Search",
        "description": "Search for webpages, datasets, authors, funding awards, instrumentation and measurements",
        "potentialAction": {
          "@type": "SearchAction",
          "target": "https://www.sample-data-repository.org/search?keywords={query_string}",
          "query-input": {
            "@type": "PropertyValueSpecification",
            "valueRequired": true,
            "valueName": "query_string"
          }
        }
      }
    },
    {
       "@type": "ServiceChannel",
       "serviceUrl": "https://www.sample-data-repository.org/sitemap.xml",
       "providesService": {
         "@type": "Service",
         "additionalType": "https://geodex.org/voc/SyndicationService",
         "name": "Sitemap XML",
         "description": "A Sitemap XML providing access to all of the resources for harvesting",
         "potentialAction": {
           "@type": "ConsumeAction",
           "target": {
             "@type": "EntryPoint",
             "additionalType": "https://geodex.org/voc/SitemapXML",
             "urlTemplate": "https://www.sample-data-repository.org/sitemap.xml?page={page}"
           },
           "object": {
             "@type": "DigitalDocument",
             "url": "https://www.sample-data-repository.org/sitemap.xml",
             "fileFormat": "application/xml"
           }
         }
       }
     }
  ]
}

By specifying the [schema:potentialAction(https://schema.org/potentialAction), we create a machine-actionable way to execute searches. This means that an EarthCube Registry could take a user submitted query, and pass it along to the repository for the EarthCube Registry user.

If your repository does have datasets or other resources with schema.org JSON-LD markup on their landing pages, Google recommends that all URLs be put inside a sitemap.xml file. To create a sitemap.xml, go here. To describe your sitemap.xml, add a schema:ServiceChannel similar to the following markup:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "datacite": "http://purl.org/spar/datacite/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  ...
  "availableChannel": [
    {
      "@type": "ServiceChannel",
      "serviceUrl": "https://www.sample-data-repository.org/search",
      "providesService": {
        "@type": "Service",
        "additionalType": "https://geodex.org/voc/SearchService",
        "name": "SDRO Website Search",
        "description": "Search for webpages, datasets, authors, funding awards, instrumentation and measurements",
        "potentialAction": {
          "@type": "SearchAction",
          "target": "https://www.sample-data-repository.org/search?keywords={query_string}",
          "query-input": {
            "@type": "PropertyValueSpecification",
            "valueRequired": true,
            "valueName": "query_string"
          }
        }
      }
    },
    {
       "@type": "ServiceChannel",
       "serviceUrl": "https://www.sample-data-repository.org/sitemap.xml",
       "providesService": {
         "@type": "Service",
         "additionalType": "https://geodex.org/voc/SyndicationService",
         "name": "Sitemap XML",
         "description": "A Sitemap XML providing access to all of the resources for harvesting",
         "potentialAction": {
           "@type": "ConsumeAction",
           "target": {
             "@type": "EntryPoint",
             "additionalType": "https://geodex.org/voc/SitemapXML",
             "urlTemplate": "https://www.sample-data-repository.org/sitemap.xml?page={page}"
           },
           "object": {
             "@type": "DigitalDocument",
             "url": "https://www.sample-data-repository.org/sitemap.xml",
             "fileFormat": "application/xml"
           }
         }
       }
     }
  ]
}

Back to top

Describing a Repository's Data Collections

If your repository has a concept of a data collection, some grouping of a number of datasets, we can use the schema:DataCatalog to describe these collections using the schema:OfferCatalog. One exampel of a DataCatalog might be to group datasets by a categorization such as 'biological data' or 'chemical data'. Or a catalog could be grouped by instrument, parameter or whatever logical grouping a repository may have.

Research Repository Service - Offer Catalog

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "datacite": "http://purl.org/spar/datacite/"
  },
  "@type": ["Service", "Organization"],
 "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  ...
  "hasOfferCatalog": {
    "@type": "OfferCatalog",
    "name": "Sample Data Repository Resource Catalog",
    "itemListElement": [
      {
       "@type": "DataCatalog",
        "@id": "https://www.sample-data-repository.org/collection/biological-data",
        "name": "Biological Data",
        "audience": {
          "@type": "Audience",
          "audienceType": "public",
          "name": "General Public"
        }
      },
      {
        "@type": "DataCatalog",
        "@id": "https://www.sample-data-repository.org/collection/geological-data",
        "name": "Geological Data",
        "audience": {
          "@type": "Audience",
          "audienceType": "public",
          "name": "General Public"
        }
      }
    ]
  }
}

Back to top

Describing a Dataset

The schema:Dataset is a very expressive type within schema.org.

Dataset

However, Google has drafted a guide to help publishers. THe guide describes the only required fields as - name and description.

  • name - A descriptive name of a dataset (e.g., “Snow depth in Northern Hemisphere”)
  • description - A short summary describing a dataset.
{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  "description": "This dataset includes results of laboratory experiments which measured dissolved organic carbon (DOC) usage by natural bacteria in seawater at different pCO2 levels. Included in this dataset are; bacterial abundance, total organic carbon (TOC), what DOC was added to the experiment, target pCO2 level. "
}

The guide suggests the following recommended fields:

  • url - Location of a page describing the dataset.
  • sameAs - Other URLs that can be used to access the dataset page. A link to a page that provides more information about the same dataset, usually in a different repository.
  • version - The version number or identifier for this dataset (text or numeric).
  • isAccessibleForFree - Boolean (true|false) speficying if the dataset is accessible for free.
  • keywords - Keywords summarizing the dataset.
  • license - A license under which the dataset is distributed (text or URL).
  • identifier - An identifier for the dataset, such as a DOI. (text,URL, or PropertyValue).
  • variableMeasured - What does the dataset measure? (e.g., temperature, pressure)
{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    "datacite": "http://purl.org/spar/datacite/"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  "description": "This dataset includes results of laboratory experiments which measured dissolved organic carbon (DOC) usage by natural bacteria in seawater at different pCO2 levels. Included in this dataset are; bacterial abundance, total organic carbon (TOC), what DOC was added to the experiment, target pCO2 level. ",
  "url": "https://www.sample-data-repository.org/dataset/472032",
  "sameAs": "https://search.dataone.org/#view/https://www.sample-data-repository.org/dataset/472032",
  "version": "2013-11-21",
  "isAccessibleForFree": true,
  "keywords": "ocean acidification, Dissolved Organic Carbon, bacterioplankton respiration, pCO2, carbon dioxide, oceans",
  "license": "http://creativecommons.org/licenses/by/4.0/"
}

Back to top

Adding the schema:identifier field can be done in three ways - a text description, a URL, or by using the schema:PropertyValue type to describe the identifier in more detail. We highly recommend using the schema:PropertyValue as the use of text or url does not get indexed properly by Google and other JSON-LD testing tools due to an issue with the properties definition.

Describing a Dataset Identifier

Identifiers

In it's most basic form, the identifier as a schema:PropertyValue can be published as:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  "description": "This dataset includes results of laboratory experiments which measured dissolved organic carbon (DOC) usage by natural bacteria in seawater at different pCO2 levels. Included in this dataset are; bacterial abundance, total organic carbon (TOC), what DOC was added to the experiment, target pCO2 level. ",
  "url": "https://www.sample-data-repository.org/dataset/472032",
  "sameAs": "https://search.dataone.org/#view/https://www.sample-data-repository.org/dataset/472032",
  "version": "2013-11-21",
  "keywords": "ocean acidification, Dissolved Organic Carbon, bacterioplankton respiration, pCO2, carbon dioxide, oceans",
  "license": "http://creativecommons.org/licenses/by/4.0/",
  "identifier": "urn:sdro:dataset:472032"
}

The Persistent Identifier, such as a DOI, ARK, URL, etc as a schema:PropertyValue can be published using the DataCite Ontology Resource Identifier Scheme to define the identifier as:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    "datacite": "http://purl.org/spar/datacite/"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  "description": "This dataset includes results of laboratory experiments which measured dissolved organic carbon (DOC) usage by natural bacteria in seawater at different pCO2 levels. Included in this dataset are; bacterial abundance, total organic carbon (TOC), what DOC was added to the experiment, target pCO2 level. ",
  "url": "https://www.sample-data-repository.org/dataset/472032",
  "sameAs": "https://search.dataone.org/#view/https://www.sample-data-repository.org/dataset/472032",
  "version": "2013-11-21",
  "keywords": "ocean acidification, Dissolved Organic Carbon, bacterioplankton respiration, pCO2, carbon dioxide, oceans",
  "license": "http://creativecommons.org/licenses/by/4.0/",
  "identifier": {
    "@type": "PropertyValue",
    "additionalType": ["http://schema.geolink.org/1.0/base/main#Identifier", "http://purl.org/spar/datacite/Identifier"],
    "propertyID": "http://purl.org/spar/datacite/doi",
    "url": "https://doi.org/10.1575/1912/bco-dmo.665253",
    "value": "10.1575/1912/bco-dmo.665253"
  }
}

schema:Dataset also defines a field for the schema:citation as either text or a schema:CreativeWork. To provide citation text:

NOTE: If you have a DOI, the citation text can be automatically generated for you by querying a DOI URL with the Accept Header of 'text/x-bibliography'.

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    "datacite": "http://purl.org/spar/datacite/"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  "description": "This dataset includes results of laboratory experiments which measured dissolved organic carbon (DOC) usage by natural bacteria in seawater at different pCO2 levels. Included in this dataset are; bacterial abundance, total organic carbon (TOC), what DOC was added to the experiment, target pCO2 level. ",
  "url": "https://www.sample-data-repository.org/dataset/472032",
  "sameAs": "https://search.dataone.org/#view/https://www.sample-data-repository.org/dataset/472032",
  "version": "2013-11-21",
  "keywords": "ocean acidification, Dissolved Organic Carbon, bacterioplankton respiration, pCO2, carbon dioxide, oceans",
  "license": "http://creativecommons.org/licenses/by/4.0/",
  "identifier": {
    "@id": "https://doi.org/10.1575/1912/bco-dmo.665253",
    "@type": "PropertyValue",
    "additionalType": ["http://schema.geolink.org/1.0/base/main#Identifier", "http://purl.org/spar/datacite/Identifier"],
    "propertyID": "http://purl.org/spar/datacite/doi",
    "url": "https://doi.org/10.1575/1912/bco-dmo.665253",
    "value": "10.1575/1912/bco-dmo.665253"
   },
   "citation": "J.Smith 'How I created an awesome dataset’, Journal of Data Science, 1966"
}

Back to top

Adding the schema:variableMeasured field can be done in two ways - a text description of each variable or by using the schema:PropertyValue type to describe the variable in more detail. We highly recommend using the schema:PropertyValue.

Describing a Dataset's Variables

Variables

In it's most basic form, the variable as a schema:PropertyValue can be published as:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    "earthcollab": "https://library.ucar.edu/earthcollab/schema#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "variableMeasured": [
    {
      "@type": "PropertyValue",
      "additionalType": "https://library.ucar.edu/earthcollab/schema#Parameter",
      "name": "Bottle identifier",
      "description": "The bottle number for each associated measurement."
    },
    ...
  ]
}

A fully-fleshed out example that uses a vocabulary to describe the variable can be published as:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    "gsn-quantity": "http://www.geoscienceontology.org/geo-lower/quantity#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "variableMeasured": [
    {
      "@type": "PropertyValue",
      "additionalType": "http://www.geoscienceontology.org/geo-lower/quantity#latitude",
      "name": "latitude",
      "url": "https://www.sample-data-repository.org/dataset-parameter/665787",
      "description": "Latitude where water samples were collected; north is positive.",
      "unitText": "decimal degrees",
      "minValue": "45.0",
      "maxValue": "15.0"
    },
    ...
  ]
}

Back to top

Describing a Dataset's Catalog

For some repositories, defining a one or many data collections helps contextualize the datasets. In schema.org, you define these collections using schema:DataCatalog.

DataCatalog

The most optimal way to use these DataCatalogs for a repository is to define these catalogs as an "offering" of your repository and including the @id property to be reused in the dataset JSON-LD. For example, the repository JSON-LD defines a schema:DataCatalog with the

"@id": "https://www.sample-data-repository.org/collection/biological-data".

In the dataset JSON-LD, we reuse that @id to say a dataset belongs in that catalog:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "includedInDataCatalog": {
    "@id": "https://www.sample-data-repository.org/collection/biological-data"
  }
}

Back to top

Describing a Dataset's Distributions

Where the schema:url property of the Dataset should point to a landing page, the way to describe how to download the data in a specific format is through the schema:distribution property. The "distribution" property describes where to get the data and in what format by using the schema:DataDownload type. If your dataset is not accessible through a direct download URL, but rather through a service URL that may need input parameters jump to the next section Accessing Data through a Service Endpoint.

Distributions

For data available in multipe formats, there will be multiple values of the schema:DataDownload:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "distribution": {
    "@type": "DataDownload",
    "contentUrl": "https://www.sample-data-repository.org/dataset/472032.tsv",
    "encodingFormat": "text/tab-separated-values"
  }
}

Accessing Data through a Service Endpoint

If access to the data requires some input parameters before a download can occur, we can use the schema:potentialAction in this way:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "potentialAction": {
    "@type": "SearchAction",
    "target": {
        "@type": "EntryPoint",
        "contentType": ["application/x-netcdf", "text/tab-separated-values"],
        "urlTemplate": "https://www.sample-data-repository.org/dataset/1234/download?format={format}&startDateTime={start}&endDateTime={end}&bounds={bbox}",
        "description": "Download dataset 1234 based on the requested format, start/end dates and bounding box",
        "httpMethod": ["GET", "POST"]
    },
    "query-input": [
      {
        "@type": "PropertyValueSpecification",
        "valueName": "format",
        "description": "The desired format requested either 'application/x-netcdf' or 'text/tab-separated-values'",
        "valueRequired": true,
        "defaultValue": "application/x-netcdf",
        "valuePattern": "(application\/x-netcdf|text\/tab-separated-values)"
      },
      {
        "@type": "PropertyValueSpecification",
        "valueName": "start",
        "description": "A UTC ISO DateTime",
        "valueRequired": false,
        "valuePattern": "(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?(Z)?"
      },
      {
        "@type": "PropertyValueSpecification",
        "valueName": "end",
        "description": "A UTC ISO DateTime",
        "valueRequired": false,
        "valuePattern": "(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(.[0-9]+)?(Z)?"
      },
      {
        "@type": "PropertyValueSpecification",
        "valueName": "bbox",
        "description": "Two points in decimal degrees that create a bounding box fomatted at 'lon,lat' of the lower-left corner and 'lon,lat' of the upper-right",
        "valueRequired": false,
        "valuePattern": "(-?[0-9]+(.[0-9]+)?),[ ]*(-?[0-9]+(.[0-9]+)?)[ ]*(-?[0-9]+(.[0-9]+)?),[ ]*(-?[0-9]+(.[0-9]+)?)"
      }
    ]
  }
}

Here, we use the schema:SearchAction type becuase it lets you define the query parameters and HTTP methods so that machines can build user interfaces to collect those query parmaeters and actuate a request to provide the user what they are looking for.

Back to top

Describing a Dataset's Temporal Coverage

Temporal coverage is a difficult concept to cover across all the possible scenarios. Schema.org uses ISO 8601 standard to describe time intervals and time points, but doesn't provide capabilities for geologic time scales or dynamically generated data up to present time. We ask for your feedback on any temporal coverages you may have that don't currently fit into schema.org. You can follow [similar issues at the schema.org Github issue] queue(https://github.com/schemaorg/schemaorg/issues/242)

Temporal

To represent a single date and time:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "temporalCoverage": "2018-01-22T14:51:12+00:00"
}

Or a single date:

{
  ...
  "temporalCoverage": "2018-01-22"
}

Or a date range:

{
  ...
  "temporalCoverage": "2012-09-20/2016-01-22"
}

Schema.org also lets you provide date ranges and other temporal coverages through the DateTime data type. For more granular temporal coverages go here: http://schema.org/DateTime.

Back to top

Describing a Dataset's Spatial Coverage

Spatial

The types of spatial coverages in schema.org are

The following shapes use the schema:GeoShape type where a 'point' is defined as a latitude/longitude pair separated by a comma.

  • line - a series of two or more point objects separated by space.
  • polygon - a series of four or more space delimited points where the first and final points are identical.
  • box - two points separated by a space character where the first point is the lower corner and the second point is the upper corner.

These spatial definitiosn were added to schema.org very early on in its development where they decided to follow the GeoRSS specification. While this is not ideal, there are ongoing conversations about improving this in schema.org.

A point, or coordinate, would defined in this way:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "spatialCoverage": {
    "@type": "Place",
    "geo": {
      "@type": "GeoCoordinates",
      "latitude": 39.3280
      "longitude": 120.1633
    }
  }
}

All other shapes, are defined using the schema:GeoShape:

  "spatialCoverage": {
    "@type": "Place",
    "geo": {
      "@type": "GeoShape",
      "line": "39.3280,120.1633 40.445,123.7878"
    }
  }
}

A polygon

  "polygon": "39.3280 120.1633 40.445 123.7878 41 121 39.77 122.42 39.3280 120.1633"

A box where 'lower-left' corner is 39.3280/120.1633 and 'upper-right' corner is 40.445/123.7878

  "box": "39.3280 120.1633 40.445 123.7878"

For Project418, we feel the defined spatial coverages are inadequate for the needs of our community, but we also recognize that schema.org continues to hear the needs of its schema.org publishers on these issues.

To alleviate some of the pain of converting spatial information into these defined shapes, Project418 offers support for GeoJSON by using the schema:subjectOf property of the schema:Place type. The schema:fileFormat property should have the value of the GeoJSON mime type application\/vnd.geo+json and the schema:text property should be the encoded value of the GeoJSON itself:

"spatialCoverage": {
    "@type": "Place",
    "subjectOf": {
      "@type": "CreativeWork",
      "fileFormat": "application\/vnd.geo+json",
      "text":"{\u0022type\u0022:\u0022Feature\u0022,\u0022geometry\u0022: {\u0022type\u0022:\u0022Polygon\u0022,\u0022coordinates\u0022:[[[-64.6353,34.407],[-149.8727,34.407],[-149.8727,-17.45],[-64.6353,-17.45],[-64.6353,34.407]]],\u0022properties\u0022:[]}}"
    }
  }

We also recognize that there is no defined property for specifying a Coordinate Reference System, but we see from the schema.org issue queue that this has been mentioned.

If you have multiple geometries, you can publish those by making the schema:geo field an array of GeoShape or GeoCoordinates like so:

{
  ...
  "spatialCoverage": {
    "@type": "Place",
    "geo": [
      {
        "@type": "GeoCoordinates",
        "latitude": -17.65,
        "longitude": 50
      },
      {
        "@type": "GeoCoordinates",
        "latitude": -19,
        "longitude": 51
      },
      ...
    ]
  }
  ...
}

Back to top

Describing a Dataset's Creators/Contributors

People can be linked to datasets iusing three fields: author, creator, and contributor. Since schema:contributor is defined as a secondary author, and schema:Creator is defined as being synonymous with the schema:author field, we recommend using the more expressive fields of creator and contribulds of creator and contributor. But using any of these fields are okay. Becuase there are more things that can be said about how and when a person contributed to a Dataset, we use the schema:Role. You'll notice that the schema.org documentation does not state that the Role type is an expected data type of author, creator and contributor, but that is addressed in this blog post introducing Role into schema.org. Thanks to Stephen Richard for this contribution

Variables

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "creator": [
    {
      "@id": "http://lod.bco-dmo.org/id/person-role/472036",
      "@type": "Role",
      "additionalType": "http://schema.geolink.org/1.0/base/main#Participant",
      "roleName": "Principal Investigator",
      "url": "http://lod.bco-dmo.org/id/person-role/472036",
      "creator": {
        "@id": "https://www.bco-dmo.org/person/51317",
        "@type": "Person",
        "additionalType": "http://schema.geolink.org/1.0/base/main#Person",
        "name": "Dr Uta Passow",
        "givenName": "Uta",
        "familyName": "Passow",
        "url": "https://www.bco-dmo.org/person/51317"
      }
    },
    {
      "@id": "http://lod.bco-dmo.org/id/person-role/472038",
      "@type": "Role",
      "additionalType": "http://schema.geolink.org/1.0/base/main#Participant",
      "roleName": "Co-Principal Investigator",
      "url": "https://www.bco-dmo.org/person-role/472038",
      "creator": {
        "@id": "https://www.bco-dmo.org/person/50663",
        "@type": "Person",
        "additionalType": "http://schema.geolink.org/1.0/base/main#Person",
        "identifier": {
          "@type": "PropertyValue",
          "additionalType": ["http://schema.geolink.org/1.0/base/main#Identifier", "http://purl.org/spar/datacite/Identifier"],
          "propertyID": "http://purl.org/spar/datacite/orcid",
          "url": "https://orcid.org/0000-0003-3432-2297",
          "value": "0000-0003-3432-2297"
        },
        "name": "Dr Mark Brzezinski",
        "url": "https://www.bco-dmo.org/person/50663"
      }
    }
}

NOTE that the Role inherits the property creator and contributor from the Dataset when pointing to the schema:Person.

{
  "@context": {
    "@vocab": "http://schema.org/",
    ...
  },
  "@type": "Dataset",
  ...
  "creator": [
    {
      "@id": "http://lod.bco-dmo.org/id/person-role/472036",
      "@type": "Role",
      "additionalType": "http://schema.geolink.org/1.0/base/main#Participant",
      "roleName": "Principal Investigator",
      "url": "http://lod.bco-dmo.org/id/person-role/472036",
      "creator": {
        "@id": "https://www.bco-dmo.org/person/51317",
        "@type": "Person",
        "additionalType": "http://schema.geolink.org/1.0/base/main#Person",
        "name": "Dr Uta Passow",
        "givenName": "Uta",
        "familyName": "Passow",
        "url": "https://www.bco-dmo.org/person/51317"
      }
    }
}

If a single Person plays multiple roles on a Dataset, each role should be explicitly defined in this way:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
  "creator": [
    {
      "@id": "http://lod.bco-dmo.org/id/person-role/472036",
      "@type": "Role",
      "additionalType": "http://schema.geolink.org/1.0/base/main#Participant",
      "roleName": "Principal Investigator",
      "url": "http://lod.bco-dmo.org/id/person-role/472036",
      "creator": {
        "@id": "https://www.bco-dmo.org/person/51317",
        "@type": "Person",
        "additionalType": "http://schema.geolink.org/1.0/base/main#Person",
        "name": "Dr Uta Passow",
        "givenName": "Uta",
        "familyName": "Passow",
        "url": "https://www.bco-dmo.org/person/51317"
      }
    },
    {
      "@id": "https://www.bco-dmo.org/person-role/472037",
      "@type": "Role",
      "additionalType": "http://schema.geolink.org/1.0/base/main#Participant",
      "roleName": "Contact",
      "url": "https://www.bco-dmo.org/person-role/472037",
      "creator": { "@id": "https://www.bco-dmo.org/person/51317" }
    },
    {
      "@id": "http://lod.bco-dmo.org/id/person-role/472038",
      "@type": "Role",
      "additionalType": "http://schema.geolink.org/1.0/base/main#Participant",
      "roleName": "Co-Principal Investigator",
      "url": "https://www.bco-dmo.org/person-role/472038",
      "creator": {
        "@id": "https://www.bco-dmo.org/person/50663",
        "@type": "Person",
        "additionalType": "http://schema.geolink.org/1.0/base/main#Person",
        "identifier": {
          "@type": "PropertyValue",
          "additionalType": ["http://schema.geolink.org/1.0/base/main#Identifier", "http://purl.org/spar/datacite/Identifier"],
          "propertyID": "http://purl.org/spar/datacite/orcid",
          "url": "https://orcid.org/0000-0003-3432-2297",
          "value": "0000-0003-3432-2297"
        },
        "name": "Dr Mark Brzezinski",
        "url": "https://www.bco-dmo.org/person/50663"
      }
    }
}

Notice that since Uta Passow has already been defined in the document with "@id": "https://www.bco-dmo.org/person/51317" for her role as Principal Investigator, the @id can be used for her role as Contact by defining the Role's creator as "creator": { "@id": "https://www.bco-dmo.org/person/51317" }.

Back to top

Describing a Dataset's Publisher/Provider

Publisher/Provider

If your repository is the publisher and/or provider of the dataset then you don't have to describe your repository as a schema:Organziation if your repository markup includes the @id. For example, if you published repository markup such as:

{
  "@context": {...},
  "@type": ["Service", "Organization"],
  ...
  "@id": "https://www.sample-data-repository.org"
  ...
}

then you can reuse that @id here. Harvesters such as Google and Project418 will make the appropriate linkages and your dataset publisher/provider can be published in this way:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
"provider": {
    "@id": "https://www.sample-data-repository.org"
  },
  "publisher": {
    "@id": "https://www.sample-data-repository.org"
  }
}

Otherwise, you can define the organization inline in this way:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
"provider": {
    "@id": "https://www.sample-data-repository.org",
    "@type": "Organization",
    "additionalType": "http://schema.geolink.org/1.0/base/main#Organization",
    "legalName": "Sample Data Repository Office",
    "name": "SDRO",
    "sameAs": "http://www.re3data.org/repository/r3dxxxxxxxxx",
    "url": "https://www.sample-data-repository.org"
  },
  "publisher": {
    "@id": "https://www.sample-data-repository.org"
  }
}

Back to top

Describing a Dataset's Protocols

Datasets can have a number of policies and protocols attached to them - Terms of Use, access restrictions, certain licenses, etc. If you want to represent one or more of these protocols and there is a URL at which a user can read that protocol, we can use the schema:DigitalDocument to describe the protocol using the schema:publishingPrinciples field.

Protocols

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
"publishingPrinciples": {
    "@id": "http://creativecommons.org/licenses/by/4.0/",
    "@type": "DigitalDocument",
    "additionalType": "https://geodex.org/voc/Protocol-License",
    "name": "Dataset Usage License",
    "url": "http://creativecommons.org/licenses/by/4.0/"
  }
}

P418 has created some class names for some common protocol document types. These will help make it clear to users what types of policies you have. If you would like us to add more, please let us know by creating an Issue.

Back to top

Trying to describe a Dataset's funding award is one area of schema.org that doesn't fit all that well. There is a lot of discussion on this topic already happening with schema.org governance. Schema.org's most recent communication with P418 recommended that the award be something generated from the schema:funder. We feel the best class to classify as an Award until this is addressed by schema.org is the schema:Offer. If you specify an Award, you should also use the gdx:fundedBy property to directly link the Dataset to the Award in this way.

Describing a Dataset's Funding

Funding

{
  "@context": {
    "@vocab": "http://schema.org/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "vivo": "http://vivoweb.org/ontology/core#",
    earthcollab": "https://library.ucar.edu/earthcollab/schema#",
    "geo-upper": "http://www.geoscienceontology.org/geo-upper#",
    "geolink-vocab": "http://schema.geolink.org/1.0/voc/local#"
  },
  "@type": "Dataset",
  "additionalType": ["http://schema.geolink.org/1.0/base/main#Dataset", "http://vivoweb.org/ontology/core#Dataset"],
  "name": "Removal of organic carbon by natural bacterioplankton communities as a function of pCO2 from laboratory experiments between 2012 and 2016",
  ...
"funder": {
    "@type": "Organization",
    "additionalType": "http://schema.geolink.org/1.0/base/main#Organization",
    "legalName": "National Science Foundation",
    "name": "NSF",
    "url": "https://www.nsf.gov",
    "identifier": {
      "@type": "PropertyValue",
      "propertyID": "http://purl.org/spar/datacite/doi",
      "value": "10.13039/100000141",
      "url": "https://doi.org/10.13039/100000001"
    },
    "makesOffer": {
      "@type": "Offer",
      "@id": "https://www.nsf.gov/awardsearch/showAward?AWD_ID=1623751",
      "additionalType": "http://schema.geolink.org/1.0/base/main#Award",
      "name": "EarthCube Science Support Office (ESSO)",
      "description": "EarthCube is a community-driven effort with the goal of transforming the conduct of geoscience research and education by creating a well-integrated and facile environment to share scientific data, information tools and services, and knowledge in an open, transparent, and inclusive manner....[truncated]",
      "identifier": {
        "@type": "PropertyValue",
        "name": "NSF Award Number",
        "value": "1623751",
        "url": "https://www.nsf.gov/awardsearch/showAward?AWD_ID=1623751"
      },
      "validFrom": "2016-05-01",
      "validThrough": "2019-04-30",
      "offeredBy": {
        "@type": "Person",
        "additionalType": "http://schema.geolink.org/1.0/voc/local#roletype_program_manager",
        "name": "Eva E. Zanzerkia"
      }
    },
    "parentOrganization": {
      "@type": "Organization",
      "legalName": "Directorate for Geosciences",
      "identifier": {
        "@type": "PropertyValue",
        "propertyID": "http://purl.org/spar/datacite/doi",
        "value": "10.13039/100000085",
        "url": "https://doi.org/10.13039/100000085"
       },
      "parentOrganization": {
        "@type": "Organization",
        "legalName": "National Science Foundation",
        "url": "http://www.nsf.gov",
        "identifier": {
          "@type": "PropertyValue",
          "propertyID": "http://purl.org/spar/datacite/doi",
          "value": "10.13039/100000001",
          "url": "https://doi.org/10.13039/100000001"
        }
      }
    }
  },
  "gdx:fundedBy": { "@id": "https://www.nsf.gov/awardsearch/showAward?AWD_ID=1623751" }
 }

Back to top

Examples

All examples can be found at: https://github.com/earthcubearchitecture-project418/p418Vocabulary/tree/master/html/voc/static/schema/examples/

Back to top

Issues

https://stackoverflow.com/questions/38243521/schema-org-contacttype-validation-issue-the-value-provided-for-office-must-be

Back to top

Advanced Publishing Techniques

How to publish resources for the categories/disciplines at repository services.

& How to use external vocabularies

The SWEET ontology defines a number of science disciplines and a repository could reference those, or another vocabuary's resources, by adding the vocabular to the @context attribute of the JSON-LD markup.

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "sweet-rel": "http://sweetontology.net/rela/",
    "sweet-kd": "http://sweetontology.net/humanKnowledgeDomain/"
  },
  "@type": ["Service", "Organization"],
  "additionalType": "https://geodex.org/voc/ResearchRepositoryService",
  "legalName": "Sample Data Repository Office",
  "name": "SDRO",
  "url": "https://www.sample-data-repository.org",
  "description": "The Sample Data Repository Service provides access to data from an imaginary domain accessible from this website.",
  "sweet-rel:hasRealm": [
    { "@id": "sweet-kd:Biogeochemistry" },
    { "@id": "sweet-kd:Oceanography" }
  ]
  
}

Attaching Physical Samples to a Dataset

Currently, there isn't a breat semantic property for a Dataset to distinguish the related physical samples. However, we can use the schema:hasPart property to accomplish this without too much compromise. A GitHub issue has been setup to follow this scenario. Here is the best way, so far, to link physical samples to a Dataset:

{
  "@context": {
    "@vocab": "http://schema.org/",
    "gdx": "https://geodex.org/voc/",
    "geolink": "http://schema.geolink.org/1.0/base/main#",
    "igsn": "http://pid.geoscience.gov.au/def/voc/igsn-codelists/",
  },
  "@type": "Dataset",
  ...,
  "hasPart": [
    { 
      "@type": "CreativeWork",
      "additionalType": "http://schema.geolink.org/1.0/base/main#PhysicalSample",
      "identifier": {
        "@type": "PropertyValue",
        "additionalType": ["http://schema.geolink.org/1.0/base/main#Identifier", "http://purl.org/spar/datacite/Identifier"],
        "propertyID": "IGSN",
        "url": "https://app.geosamples.org/sample/igsn/WHO000A53",
        "value": "WHO000A53"
      },
      "spatialCoverage": {
        "@type": "Place",
        "geo": {
          "@type": "GeoCoordinates",
          "latitude": -26.94486389,
          "longitude": 143.43508333,
          "elevation": 219.453
        }
      }
      ...
    },
    { 
      "@type": "CreativeWork",
      "additionalType": "http://schema.geolink.org/1.0/base/main#PhysicalSample",
      "identifier": {
        "@type": "PropertyValue",
        "additionalType": ["http://schema.geolink.org/1.0/base/main#Identifier", "http://purl.org/spar/datacite/Identifier"],
        "propertyID": "IGSN",
        "url": "https://app.geosamples.org/sample/igsn/WHO000A67",
        "value": "WHO000A67"
      }
      ...
    }
  ]
}

Here, we use the superclass of a Dataset, the schema:CreativeWork to also define a Physical Sample. We disambiguate the Creative Work to be a physical sample by using the GeoLink definition in the schema:additionalType field. See the schema:CreativeWork to for the additional fields available for adding to the physical sample.

NOTE: We use "IGSN" as the schema:propertyID until a canonical URI is defined by IGSN governance.

Back to top