Skip to content
Peter Selby edited this page Jun 6, 2024 · 14 revisions

Data federation technology overview

The AgBioData Data Federation Training Working Group was tasked with developing training material for the AgBioData Community on different solutions for data sharing and data federation. Below is the compiled results of that 12 month effort. This is not an exhaustive list, but the technologies described here represent the most used data sharing techniques within the agricultural, breeding, and scientific community.

How to use this resource

Each page listed below contains a short description of the technology, a recorded presentation by an expert, a subjective cost estimate, pros, cons, example use cases, and an assessment of how the technology promotes FAIR data.

This web page should act as a beginners guide to data sharing technologies. Use this resource to simply browse and see what is available. Or use it as a comparison tool when trying to decide which technology is best for a new use case.

Technologies to Explore

Project Website Pros Cons
Faidare Public Faidare - Increases data findability and accessability
- Can connect to existing systems
- Supports data discovery only, no other data management features
iRODS irods.org - Can manage large datasets
- Robust access management and collaboration
- Not suitable for small or highly structured datasets
- Setup and maintenance may require dedicated staff
RDF, YARRRML,
& Shallot
RDF 1.1 Primer - Privacy preserving data sharing
- Very flexible metadata
- Semantic models can be difficult to setup
- Strict data structure requirements shared between stakeholders
BrAPI brapi.org - Represents a domain specific standard
- Good as an addon to existing systems
- Requires custom development work
- Mapping existing data models to the standard can be time consuming
GraphQL graphql.org - Robust querying language for precise results
- Easy integration from multiple sources
- Specific data model must be developed first
- Compute time/space complexity for large or complex datasets
Globus globus.org - Strong emphasis on data sharing and data transfer
- Well suited for large datasets and large files
- Built for file sharing, not database access
- Not useful for data discovery
SOLID solidproject.org - Open source standards built on existing tech
- Very strong security, ownership, and access controls
- Still a developing technology
- No specific data standards or tools for biological or PGR data

Feedback

Questions, comments, concerns, and general feedback about this documentation are welcome and encouraged. You can post comments publicly in the Issues Board or contact the AgBioData group directly with the Contact Form.