Features

  • Spark
    • Further support for RDF quality assessment
    • Refactor and improve stats sub-module
    • Semantic partitioning (refactoring)
    • Adding the possibility to generate R2RML mappings and the associated SQL commands
  • Flink
    • Support for RDF quality assessment
    • Refactor and improve stats sub-module
    • Add support for TripleOps on DataSet
    • Add support for GraphOps on Gelly
    • Introduced implicit calls for partitioning strategies
    • Introduced implicit calls for io operations and align the API with sansa-rdf-module

Bug Fixes

  • #33 kryo exceptions
  • #60 Issue with avg. untyped String literal length measure
  • #62 Issue with Distinct entities measure
  • #63 Improvement of the return type of the Link measure
  • #64 Issue with Class Hierarchy Depth measure
  • #65 Issues with Max/Avg Per Property measure
  • #68 Signature of net.sansa_stack.rdf.spark.partition.semantic.RdfPartition not clear

Dependency changes

  • Apache Spark : 2.3.1 -> 2.4.0
  • Apache Flink : 1.5.0 -> 1.7.0
  • Apache Jena : 3.7.0 -> 3.9.0
Assets 4