Skip to content

Release 0.1.14

Latest
Compare
Choose a tag to compare
@vkhristenko vkhristenko released this 20 Oct 18:10
· 27 commits to master since this release

Updates for experimental package:

  • Polishing of I/O
  • Added Optimization Passes over the constructed Intermediate Schema:
    • Remove Empty Rows Passes
    • Remove nulls. Comes in 2 versions: Soft and Hard. Hard will remove all the branches that are not splittable and contains null as one of the fields. Soft just removes nulls without checking for this "branch safety".
    • Schema Pruning - Prunes as deep as Spark allows. Takes effect together with Apache Spark PR: apache/spark#16578
    • All of optimizations are enabled by default (w/o SoftRemove) and can be turned off/on with spark.sqlContext.read.option("OptimizationName", true/false or "on/off").

Updates for org.dianahep.sparkroot package:

  • Default parallelism is the number of files.