Skip to content

Improved API Coherence

Claus Stadler edited this page May 28, 2020 · 2 revisions

First-class RDDs

Triple-based

  • RDD[Triple]
  • RDD[Graph]
  • RDD[Resource]
  • RDD[Model]

Quad-based

  • RDD[Quad]
  • RDD[DatasetGraph]
  • RDD[ResourceInDataset] (This is a custom type that is not in Jena)
  • RDD[Dataset]

Triple-based operations

RDD[Triple] rddOfTriple

val rddOfGraphs: RDD[Graph] = rddOfTriple.toGraphsFromSubjects();
val rddOfGraphs: RDD[Graph] = rddOfTriple.toGraphsFromQuery("CONSTRUCT {... }");

Graph-based operations

RDD[Graph] rddOfGraphs

val RDD[Resource] = rddOfGraphs.toResourcesByType(FOAF.Person)

Examples

From triples to resources

rddOfTriples
  .filterPropertiesDrop(RDFS.label)
  .toGraphsFromSubjects()
  .toResourcesByType(FOAF.Person)
RDD[Resource] rddOfResources = ...
rddOfTuples = rdd.map(r -> (r.getURI, r.getPropertyValue(RDF.type))
rddOfTuples.toDF() // Maybe we need some more parameters to link tuples to dataframes?

rddOfResources.asDataset()