Sparksoniq 0.9.5 "Larch"
Pre-release
Pre-release
New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.
New:
- Many bugfixes
- All FLWOR clauses are now supported locally (that is when parallelize() or json-file() is not used) Locally means: without invoking Spark transformations. Local FLWOR expressions can execute on the client but also within a transformation triggered by a non-local FLWOR.
- Local FLWOR expressions can fully nest. All queries of the tutorial now work and you can use and abuse let clauses.
- Pushdowns: json-file("file.json").foo[].bar[[2]].foobar works on Spark
- Significant improvements in memory footprint: some queries are no longer materialized in memory (e.g., filtering query with a where clause or count).
- Significant improvements in performance: a file of 16,000,000 objects was successfully tested for count, filtering, grouping and ordering with a local Spark execution on a single laptop. Performance also improved on bigger datasets on clusters.
The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.
Documentation: http://sparksoniq.readthedocs.io/en/latest/