Skip to content
@gleanerio

GleanerIO

A set of projects implementing principles around indexing structured data on the web / schema.org (Developed as part of NSF's EarthCube)

GleanerIO

About

Gleaner is a tool for extracting JSON-LD from web pages. You provide Gleaner a list of sites to index and it will access and retrieve pages based on the sitemap.xml of the domain(s). Gleaner can then check for well formed and valid structure in documents and process the JSON-LD data graphs into a form usable to drive a search interface.

Pinned Loading

  1. gleaner Public

    Gleaner: JSON-LD and structured data on the web harvesting

    Go 17 9

  2. nabu Public

    Nabu: Synchronize data graph objects with a triplestore

    Go 1 1

  3. scheduler Public

    Scheduling approaches related to gleaner tooling

    Python 1 3

  4. archetype Public

    A testbench repo with the three primary personnas of user, provider and indexer.

    CSS 4 1

  5. notebooks Public

    Jupyter notebooks for SHACL processing, JSON-LD framing and object operaations

    Jupyter Notebook 1

  6. scienceonschemaexamples Public

    This repository will contain actual science on schema JSON-LD files, and HTML pages that contain JSON-LD scripts to document as possible test cases

    HTML

Repositories

Showing 10 of 11 repositories
  • archetype Public

    A testbench repo with the three primary personnas of user, provider and indexer.

    CSS 4 1 5 0 Updated Apr 16, 2025
  • gleaner Public

    Gleaner: JSON-LD and structured data on the web harvesting

    Go 17 Apache-2.0 9 69 (5 issues need help) 3 Updated Nov 15, 2024
  • nabu Public

    Nabu: Synchronize data graph objects with a triplestore

    Go 1 1 13 (1 issue needs help) 2 Updated Nov 15, 2024
  • scheduler Public

    Scheduling approaches related to gleaner tooling

    Python 1 Apache-2.0 3 8 0 Updated Nov 12, 2024
  • signposting Public

    A test of approaches to parse signposting conventions

    Go 0 0 0 0 Updated Jul 18, 2023
  • notebooks Public

    Jupyter notebooks for SHACL processing, JSON-LD framing and object operaations

    Jupyter Notebook 0 1 0 0 Updated May 24, 2023
  • scienceonschemaexamples Public

    This repository will contain actual science on schema JSON-LD files, and HTML pages that contain JSON-LD scripts to document as possible test cases

    HTML 0 0 0 0 Updated Mar 7, 2023
  • gleanerdocs.github.io Public

    Documentation repo

    0 2 0 2 Updated Feb 3, 2023
  • .github Public

    GleanerIO Profile

    0 0 0 0 Updated Jan 17, 2023
  • tangram Public

    Simple RESTful wrapper around pySHACL

    HTML 2 2 0 1 Updated Jul 18, 2022

Top languages

Loading…

Most used topics

Loading…