SnappyData 0.7 Release

@ashetkar ashetkar released this Dec 21, 2016 · 65 commits to master since this release

SnappyData 0.7 Release with the following major changes.

  • In sync and fully compatible with Apache Spark 2.0.2.
  • Try SnappyData without any download as a Spark dependency
  • 20X faster than Spark in-memory Caching. Try simple perf example on your laptop. Some of the individual optimizations listed below.
  • Performance optimizations:
    • New GROUP BY and HASH JOIN operators used with SnappyData storage tables that are 5-10X faster than the ones in Spark. (SNAP-1067)
    • Support for plan caching to reuse SparkPlan, RDD and PlanInfo (SNAP-1191)
    • Optimizations for single dictionary column with SnappyData's GROUP BY and JOIN operator that improve the performance further by 2-3X. (SNAP-1194)
    • Pooled version of Kryo serializer including for closures. Spark updated to allow for pluggable closure serializer. (SNAP-1136)
    • Column batch level statistics to allow query predicates to skip entire batches when possible. (SNAP-1087)
  • Reduce serialization overheads of biggest contributors in queries. (SNAP-1202)
  • Plan optimizations to minimize data shuffle and combine aggregates when possible. (SNAP-1260)
  • New SnappyData Dashboard as a an extension to Spark UI. Explore your SnappyData cluster and Spark artifacts in the same UI.
  • HowTos: Working code snippets of various features for developers to get started. Check out the docs for more details.
  • Amazon Web Services AMI and Docker image with SnappyData 0.7 now available. Refer to docs for more details.
  • Support for map, flatMap, filter, glom, mapPartition and transform APIs to SchemaDStream (SNAP-1182)
  • Use ConfigEntry mechanism for SnappyData properties (SNAP-1180)
  • INSTALL JAR utility to load application jars that are available to all the jobs submitted to SnappyData. This is in addition to the existing way of providing application jars using --jars in spark-submit.
  • EC2 scripts are now moved to a new repository with enhancements and fixes.
  • Several other bug-fixes and optimizations. See release notes for more details.

SnappyData Synopses Data Engine:

  • Row count for sample tables is now displayed in SnappyData Dashboard.
  • Enabling HA semantics and redundancy for sample tables.
  • Other bug-fixes and performance improvements.

Download artifacts description

  • snappydata-0.7-bin.tar.gz ---> Full product binary (includes Hadoop 2.7)
  • ---> Full product binary (includes Hadoop 2.7)
  • snappydata-0.7-without-hadoop-bin.tar.gz---> Product without the Hadoop dependency JARs
  • ---> Product without the Hadoop dependency JARs
  • snappydata-client-1.5.3.jar ---> Client (JDBC) JAR
  • snappydata-core_2.11-0.7.jar ---> Only dependency to connect to SnappyStore from Apache Spark cluster (Smart Connector mode)
  • snappydata-ec2-0.7.tar.gz ---> Script to Launch EC2 instances on AWS


SnappyData 0.6.1 Release

@ashetkar ashetkar released this Oct 20, 2016 · 275 commits to master since this release

SnappyData 0.6.1 (Row Store 1.5.2) Release with the following major changes over the previous release.

  • Failure in IMPORT causes the system to close region and network interfaces. So threads are not interrupted anymore. (SNAP-1138)
  • Added a service to publish store table size that is used for query plan generation.
    These stats are also published on Snappy store UI tab. (SNAP-1075)
  • Fixes for Streaming related issues after Spark 2.0 merge. (SNAP-1060, SNAP-1141, SNAP-1115)
  • Other bug-fixes. (SNAP-1083, SNAP-1113)



SnappyData 0.6 Release

@ashetkar ashetkar released this Sep 20, 2016 · 277 commits to master since this release

SnappyData 0.6 Release with the following major changes over the previous release.

  • Spark 2.0 based - we merged with Apache Spark 2.0 and we remain fully compatible with Spark
  • 20X Gains in performance - While the "full stage code generation" (vectorization) improvements in Spark gives good improvements, we extended the code generation to several critical areas and into access of the SnappyStore making Snappy attain 20X better performance than Spark cached DataFrames for Scan/aggregation queries.
  • Cloud service - this is our first release that bundles supports for launching SnappyData on AWS. As part of this service we also added deep integration for Apache Zeppelin so you can visualize results using Snappy cluster as a Spark as well as a database cluster for Analytics (SNAP-864, SNAP-978)
    • Download and extract snappydata-ec2-0.6.tar.gz to start using it. Refer docs/
  • Support for describe table and show table using SnappyContext. (SNAP-1044)
  • Support for multiple Hadoop versions. (SNAP-981)
  • Single install/replace jar utility across SnappyData cluster. (SNAP-293)
  • Support for CUBE/ROLLUP/GROUPING SETS through sql. (SNAP-824)
  • Support for window clauses and partition/distribute by clauses.
  • SnappyData interpreter for Apache Zeppelin. (SNAP-861)
  • Support for EXISTS from sql. (SNAP-734)
  • Fix column table row count in Spark UI. (SNAP-1047)
  • Supporting VARCHAR with size and processing STRING as VARCHAR(32762), by default. (SNAP-735)
  • Moved spark-jobserver to 0.6.2.
  • Several other bug-fixes and performance improvements.

SnappyData Synopses Data Engine (AQP):

  • Better accuracy, error estimates, High level accuracy contracts - we added many improvements in this area.
  • Support for functions in sample creation. (AQP-214)
  • Support float datatype for sample created on row table. (AQP-216)
  • Several bug-fixes and optimizations.

Download artifacts description

  • snappydata-0.6-bin.tar.gz ---> Full product binary
  • ---> Full product binary
  • snappydata-0.6-without-hadoop-bin.tar.gz---> Product without the Hadoop dependency JARs
  • ---> Product without the Hadoop dependency JARs
  • snappydata-client-1.5.1.jar ---> Client (JDBC) JAR
  • snappydata-core_2.11-0.6.jar ---> Only dependency to connect to SnappyStore from Apache Spark cluster (Split mode)
  • snappydata-ec2-0.6.tar.gz ---> Script to Launch on Ec2
  • snappydata-zeppelin-0.6.jar ---> Apache Zeppelin interpreter



SnappyData 0.5 Release

@ashetkar ashetkar released this Jul 4, 2016 · 446 commits to master since this release

SnappyData 0.5 Release with the following major changes over the previous release.

  • Two tools, VSD and Pulse, are now packaged into the SnappyData distribution.
  • Added new fields on the Snappy Store tab in Spark UI (SNAP-852).
  • A new tool to collect the debug artifacts like logs, stats file and stack dumps, automatically
    and output as a tar zipped file. Time range based collection is also provided.
  • SnappyData AQP:
    • Optimizations of bootstrap for sort based aggregate.
    • Minimize the query plan size for bootstrap.
    • Optimized the Declarative aggregate function.
  • SnappyData RowStore:
    • SnappyData RowStore 1.5 is now GA, which offers GemFireXD users the bits to upgrade to a much more robust and stable version of the product. More details here.
  • Several other bug fixes and test additions.


SnappyData 0.4 Preview Release

@ashetkar ashetkar released this May 26, 2016 · 495 commits to master since this release

SnappyData 0.4 Preview Release with the following changes over the previous release.

  • New Java APIs for JobServer interfaces. (SNAP-760)
  • Python API for Snappy StreamingContext
  • Added quickstart example with Python API for SnappyData (SNAP-741)
  • Support for "spark.snappydata" properties (SNAP-606)
  • Snappy's extension of UnifiedMemoryManager (SNAP-810)
  • Enabled code generation for Column table scan (SNAP-623)
  • Several other bug fixes and new tests.


SnappyData 0.3 Preview Release

@ashetkar ashetkar released this May 4, 2016 · 553 commits to master since this release

SnappyData 0.3 Preview Release with the following changes over the previous 0.2.1 Preview release.

  • Updated code to Apache Spark version 1.6.1, spark-jobserver to 0.6.1
  • Ability to run snappydata core against stock Apache Spark 1.6.1 in split-cluster mode.
  • Support for complex types: ARRAY (ArrayType), MAP (MapType), STRUCT (StructType), for column tables.
  • New Java and Python APIs for SnappyData additions to Spark and jobserver.
  • AQP additions:
    • New closed form error estimate implementations that give vastly improved results with filters in queries
    • Addition of closed form error estimate for COUNT
    • Bootstrap based error estimates
    • Updated implementation of AQP for Spark 1.6.x compatibility
  • Index implementation and API for column tables. These are distributed partitioned indexes that are stored like regular column tables (TBD: automatic selection of best index in plan generation)
  • Unified partitioning schema for Spark and store layers. This allows minimizing shuffle for both queries and inserts when the number of partitions in shuffle and store match.
  • New optimized SQL parser implementation that is orders of magnitude faster and more flexible than Spark SQL parser.
  • Added a script to collect logs, statistics, stack dumps for all data store nodes in the system (with optional time range)
  • Addition of a pure "rowstore" startup mode that will inhibit Spark layer and lead nodes.
  • Column and row tables now return proper sizeInBytes in plan generation to let Spark determine the best join order.
  • Fix for issues related to Row, InternalRow usage and conversions in streaming API.
  • Fix for row tables, the INSERT and PUT operations behave correctly now with former throwing constraint violation where appropriate.


SnappyData 0.2.1 Preview Release

@ashetkar ashetkar released this Mar 16, 2016 · 750 commits to master since this release

SnappyData 0.2.1 Preview Release with the following changes over the previous 0.2 Preview release.

  • Update docs for snappy-store HDFS feature and include hbase jar in distribution for users that need the HDFS feature (issue #194)
  • Many more fixes for snappy-store test failures and updated precheckin target for combined report generation.
  • Fixing mismatch of message in an unsupported exception in snappy-store (SQLState=0A000.S.29)
  • Support for custom key class(property: K), value class (V), key decoder class (KD), value decoder class (VD) for direct kafka DataSource of CREATE STREAM