Skip to content

Sparksoniq 0.9.5 "Larch"

Pre-release
Pre-release

Choose a tag to compare

@ghislainfourny ghislainfourny released this 04 Mar 10:08
· 8325 commits to master since this release
1e45945

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • Many bugfixes
  • All FLWOR clauses are now supported locally (that is when parallelize() or json-file() is not used) Locally means: without invoking Spark transformations. Local FLWOR expressions can execute on the client but also within a transformation triggered by a non-local FLWOR.
  • Local FLWOR expressions can fully nest. All queries of the tutorial now work and you can use and abuse let clauses.
  • Pushdowns: json-file("file.json").foo[].bar[[2]].foobar works on Spark
  • Significant improvements in memory footprint: some queries are no longer materialized in memory (e.g., filtering query with a where clause or count).
  • Significant improvements in performance: a file of 16,000,000 objects was successfully tested for count, filtering, grouping and ordering with a local Spark execution on a single laptop. Performance also improved on bigger datasets on clusters.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/