Skip to content

@ghislainfourny ghislainfourny released this May 31, 2019 · 54 commits to master since this release

This is the first beta release of Rumble, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • Bugfixes.
  • Jar auto-displays CLI examples when invoked with no parameters, also with java.
  • distinct-values() is pushed down to Spark
  • Fixes NullPointerException in some cases when exceptions are raised in closures

The jar file was built with ANTLR 4.7 and is compatible with all tested distributions of Spark 2.3+. It is meant to be used with the spark-submit script either as an interactive shell, or to execute a single query from a JSONiq file (local or HDFS) and output the result either on stdin or back to the disk (local or HDFS). This works both locally and with a deployed cluster.

The jar file for older versions of Spark (2.0+) with ANTLR 4.5.3 is available on request (if you receive a warning on the command line).

Documentation: http://rumble.readthedocs.io/en/latest/

Assets 3
May 31, 2019
Merge pull request #267 from RumbleDB/DocName
Update name in doc header.
Pre-release

@ghislainfourny ghislainfourny released this May 20, 2019 · 89 commits to master since this release

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • Bugfixes.
  • It is now possible to read a query locally (--query-path), and output the results on stdin rather than to the local filesystem.
  • Fix error on non-existing JSONObject keySet() method due to a backward incompatibility of org.json in some environments.

The jar file was built with ANTLR 4.7 and is compatible with all tested distributions of Spark 2.3+. It is meant to be used with the spark-submit script either as an interactive shell, or to execute a single query from a JSONiq file (local or HDFS) and output the result either on stdin or back to the disk (local or HDFS). This works both locally and with a deployed cluster.

The jar file for older versions of Spark (2.0+) with ANTLR 4.5.3 is available on request (if you receive a warning on the command line).

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Assets 3
Pre-release

@ghislainfourny ghislainfourny released this Apr 23, 2019 · 116 commits to master since this release

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • New functions text-file#1, text-file#2, tokenize#1, tokenize#2 to open text files as input. Now billions of lines can be manipulated as sequences of strings with FLWORs, in the same way billions of objects could until now.
  • Fixing serialization bugs (escaping)
  • Fixing bug in string literal escaping in the shell
  • Fix bug with local count clause execution
  • Fix bug in the shell leading to a crash when a parallelized FLWOR execution was outputting the empty sequence
  • Fix bug leading to a crash when the where clause expression was not returning a boolean in local execution. Now the effective boolean value is taken.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Assets 4
Pre-release

@ghislainfourny ghislainfourny released this Mar 4, 2019 · 198 commits to master since this release

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New:

  • Many bugfixes
  • All FLWOR clauses are now supported locally (that is when parallelize() or json-file() is not used) Locally means: without invoking Spark transformations. Local FLWOR expressions can execute on the client but also within a transformation triggered by a non-local FLWOR.
  • Local FLWOR expressions can fully nest. All queries of the tutorial now work and you can use and abuse let clauses.
  • Pushdowns: json-file("file.json").foo[].bar[[2]].foobar works on Spark
  • Significant improvements in memory footprint: some queries are no longer materialized in memory (e.g., filtering query with a where clause or count).
  • Significant improvements in performance: a file of 16,000,000 objects was successfully tested for count, filtering, grouping and ordering with a local Spark execution on a single laptop. Performance also improved on bigger datasets on clusters.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Assets 4
Pre-release

@ghislainfourny ghislainfourny released this Nov 22, 2018 · 472 commits to master since this release

New alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New: various bugfixes:

  • count clauses are supported and pushed down to Spark
  • simple keys must no longer be quoted when constructing objects (in particular: null pointer exception is fixed)
  • error message when a function name+arity is not found is more helpful
  • it is no longer necessary to supply the --master option twice on the CLI: only once to spark-submit is enough.

The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Assets 4
Pre-release

@ghislainfourny ghislainfourny released this Nov 15, 2018 · 498 commits to master since this release

Third alpha release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New: various bugfixes:

  • Ctrl+D now exits nicely from the shell
  • count() calls are pushed down to Spark if the nested expression uses underlying RDDs.
  • various exceptions are now caught and displayed with a nice error messages.
  • Strings can be concatenated with atomic types (they get serialized to a string)
  • Lookup can be done on a sequence of objects

The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Assets 4
Pre-release

@ghislainfourny ghislainfourny released this Oct 11, 2018 · 531 commits to master since this release

Second release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

New: various bugfixes (e.g., empty sequence handling), richer function library, general comparison operators.

The jar files no longer contain the Spark libraries, as they are provided by the local environment or the cluster.

The jar file with ANTLR 4.7 is to be used with Spark 2.3+. Older versions (2.0+) use ANTLR 4.5.3.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Assets 4
Pre-release

@wscsprint3r wscsprint3r released this Jan 18, 2018 · 768 commits to master since this release

First release for Sparksoniq, a JSONiq engine to query large-scale JSON datasets stored on HDFS. Spark under the hood.

Documentation: http://sparksoniq.readthedocs.io/en/latest/

Assets 3
You can’t perform that action at this time.