@ashetkar ashetkar released this Nov 6, 2018 · 28 commits to master since this release

Assets 5

The SnappyData team is pleased to announce the availability of version 1.0.2.1 of the platform. You can find the release artifacts of its Community Edition towards the end of this page.

You can also download the Enterprise Edition here. The following table summarizes the features available in Enterprise and OSS editions.

Feature Community Enterprise
Mutable Row & Column Store X X
Compatibility with Spark X X
Shared Nothing Persistence and HA X X
REST API for Spark Job Submission X X
Fault Tolerance for Driver X X
Access to the system using JDBC Driver X X
CLI for backup, restore, and export data X X
Spark console extensions X X
System Perf/Behavior statistics X X
Support for transactions in Row tables X X
Support for indexing in Row Tables X X
SQL extensions for stream processing X X
Runtime deployment of packages and jars X X
Synopsis Data Engine for Approximate Querying X
ODBC Driver with High Concurrency X
Off-heap data storage for column tables X
CDC Stream receiver for SQL Server into SnappyData X
GemFire/Apache Geode connector X
Row Level Security X
Use encrypted password instead of clear text password X
Restrict Table, View, Function creation even in user’s own schema X
LDAP security interface X

New Features

  • Support Spark's HiveServer2 in SnappyData cluster. Enable starting an embedded Spark HiveServer2 on leads, in embedded mode.
  • Provided a default Structured Streaming Sink implementation for Snappy column and row tables. Conflation of events with same key columns can be enabled by a sink property.
  • Added a -agent jvm argument in the launch commands to kill the jvm as soon as OOM occurs. This is important because the VM sometimes used to crash in very unexpected ways later as a side effect of this corrupting internal metadata which later gave restart troubles.
  • Allow NONE as a valid policy for server-auth-provider. Essentially, the cluster can now be configured only for user authentication and mutual peer to peer authentication of cluster members can be disabled by specifying this property as NONE.
  • Add support for query hints to force a join type. This may be useful for cases where result is known to be small, for example, but plan rules cannot determine so.
  • Allow deleteFrom api to work as far as the dataframe contains key columns.

Performance Enhancements

  • Avoid shuffle when join key columns are a superset of child partitioning.
  • Added a pooled version of SnappyData JDBC driver for Spark to connect to SnappyData cluster as jdbc datasource.
  • Added caching for hive catalog lookups. Meta-data queries with large number of tables take quite long because of nested loop joins between SYSTABLES and HIVETABLES for most meta-data queries. Even if the table numbers were in hundreds it used to take a lot of time. (SNAP-2657)

Select Fixes and Performance Related Fixes

  • Reset the pool at the end of collect to avoid spillover of lowlatency pool setting to latter operations that may not use the CachedDataFrame execution paths. (SNAP-2659)
  • Fixed: Column added using 'ALTER TABLE ... ADD COLUMN ...' through snappy shell does not reflect in spark-shell. (SNAP-2491)
  • Fixing occasional failures in serialization using CachedDataFrame if node is just starting/stopping. Also, fix a hang in shutdown for cases where hive client close is trying to boot up the node again, waiting on the locks taken during shutdown.
  • Lead and Lag window functions were failing due to incorrect analysis error. (SNAP-2566)
  • Fixed the validate-disk-store tool. It was not getting initialized with registered types. This was required to desrialize byte arrays being read from persisted files.
  • Fix schema in ResultSet metadata. It used to show default schema 'APP' always.
  • Sometimes a false unique constraint violation happened due to removed or destroyed AbstractRegionEntry. Now attempt is made to remove it from index and another try is made to put the new value against the index key. (SNAP-2627)
  • Fix for memory leak in oldEntrieMap leading to LowMemoryException and OutOfMemoryException. (SNAP-2654)
  • Note the change in the name of SnappyData JDBC Client jar.

Description of download artifacts

Artifact Name Description
snappydata-1.0.2.1-bin.tar.gz Full product binary (includes Hadoop 2.7)
snappydata-1.0.2.1-without-hadoop-bin.tar.gz Product without the Hadoop dependency JARs
snappydata-jdbc_2.11-1.0.2.1.jar Client (JDBC) JAR
snappydata-zeppelin_2.11-0.7.3.4.jar The Zeppelin interpreter jar for SnappyData, compatible with Apache Zeppelin 0.7.3
snappydata-ec2-0.8.2.tar.gz Script to Launch SnappyData cluster on AWS EC2 instances

@ashetkar ashetkar released this Aug 27, 2018 · 132 commits to master since this release

Assets 5

The SnappyData team is pleased to announce the availability of version 1.0.2 of the platform. You can find the release artifacts of its Community Edition towards the end of this page.

You can also download the Enterprise Edition here. The following table summarizes the features available in Enterprise and OSS editions.

Feature Community Enterprise
Mutable Row & Column Store X X
Compatibility with Spark X X
Shared Nothing Persistence and HA X X
REST API for Spark Job Submission X X
Fault Tolerance for Driver X X
Access to the system using JDBC Driver X X
CLI for backup, restore, and export data X X
Spark console extensions X X
System Perf/Behavior statistics X X
Support for transactions in Row tables X X
Support for indexing in Row Tables X X
SQL extensions for stream processing X X
Runtime deployment of packages and jars X X
Synopsis Data Engine for Approximate Querying X
ODBC Driver with High Concurrency X
Off-heap data storage for column tables X
CDC Stream receiver for SQL Server into SnappyData X
GemFire/Apache Geode connector X
Row Level Security X
Use encrypted password instead of clear text password X
Restrict Table, View, Function creation even in user’s own schema X
LDAP security interface X

New Features

  • Introduced an API in snappy session catalog to get Primary Key of Row tables or Key Columns of Column Tables, as DataFrame. (SNAP-2459)
  • Introduced an API in snappy session catalog to get table type as String (SNAP-2477).
  • Added support for arbitrary size view definition. It use to fail when view text size went beyond 32k.
    Support for displaying VIEWTEXT for views in SYS.HIVETABLES.
    For example: Select viewtext from sys.hivetables where tablename = ‘view_name” will give the text with which the view was created.
  • Added Row level Security feature. Admins can define multiple security policies on tables for different users or ldap groups.
    Refer Row Level Security
  • Auto refresh of UI page. Now the SnappyData UI page gets updated automatically and frequently. User does not have to refresh or reload. Refer SnappyData Pulse
  • More richer User Interface. Added graphs for memory, CPU consumption etc. for last 15 minutes. The user has the ability to see how the cluster health has been for the last 15 minutes instead of just current state.
  • Total CPU core count capacity of the cluster is now displayed on the UI.
    Refer SnappyData Pulse
  • Bucket count of tables are also displayed now on the user interface.
  • Support deployment of packages and jars as DDL command.
  • Added support for reading maven dependencies using --packages option in our job server scripts.
  • Changes to procedure sys.repair_catalog to execute it on the server (earlier this was run on lead by sending a message to it). This will be useful to repair catalog even when lead is down.
    Refer Catalog Repair
  • Added support for** PreparedStatement.getMetadata() JDBC API **. This is on an experimental basis.
  • Added support for execution of some ddl commands viz CREATE/DROP DISKSTORE, GRANT, REVOKE. CALL procedures from snappy session as well.
  • Quote table names in all store DDL/DML/query strings to allow for special characters and keywords in table names
    Spark application with same name cannot be submitted to SnappyData. This has been done so that individual apps can be killed by its name when required.
  • Users are not allowed to create tables in their own schema based on system property - snappydata.RESTRICT_TABLE_CREATION. In some cases it may be required to control use of cluster resources in which case the table creation is done only by authorized owners of schema.
  • Schema can be owned by an LDAP group also and not necessarily by a single user.
  • Support for deploying SnappyData on Kubernetes using Helm charts. This feature is currently experimental.
    Refer Kubernetes
  • Disk Store Validate tool enhancement. Validation of disk store can find out all the inconsistencies at once.
  • BINARY data type is same as Blob data type.

Performance Enhancements

  • Fixed concurrent query performance issue by resolving the incorrect output partition choice. Due to numBucket check, all the partition pruned queries were converted to hash partition with one partition. This was causing an exchange node to be introduced. (SNAP-2421)
  • Fixed SnappyData UI becoming unresponsive on LowMemoryException.(SNAP-2071)
  • Cleaning up tokenization handling and fixes. Main change is addition of the following two separate classes for tokenization:
    • ParamLiteral
    • TokenLiteral

Both classes extend a common trait TokenizedLiteral. Tokenization will always happen independently of plan caching, unless it is explicitly turned off. (SNAP-1932)

  • Procedure for smart connector iteration and fixes. Includes fixes for perf issues as noted for all iterators (disk iterator, smart connector and remote iterator). (SNAP-2243)

Select Fixes and Performance Related Fixes

  • Fixed incorrect server status shown on the UI. Sometimes due to a race condition for the same member two entries were shown up on the UI. (SNAP-2433)
  • Fixed missing SQL tab on SnappyData UI in local mode. (SNAP-2470)
  • Fixed few issues related to wrong results for Row tables due to plan caching. (SNAP-2463 - Incorrect pushing down of OR and AND clause filter combination in push down query, SNAP-2351 - re-evaluation of filter was not happening due to plan caching, SNAP-2451, SNAP-2457)
  • Skip batch, if the stats row is missing while scanning column values from disk. This was already handled for in-memory batches and the same has been added for on-disk batches. (SNAP-2364)
  • Fixes in UI to not let unauthorized users to see any tab. (ENT-21)
  • Fixes in SnappyData parser to create inlined table. (SNAP-2302), ‘()’ as optional in some function like ‘current_date()’, ‘current_timestamp()’ etc. (SNAP-2303)
  • Consider the current schema name also as part of Caching Key for plan caching. So same query on same table but from different schema should not clash with each other. (SNAP-2438)
  • Fix for COLUMN table mysteriously shown as ROW table on dashboard after LME in data server. (SNAP-2382)
  • Fixed off-heap size for Partitioned Regions, showed on UI. (SNAP-2186)
  • Fixed failure when query on view does not fallback to Spark plan in case Code Generation fails. (SNAP-2363)
  • Fix invalid decompress call on stats row.(SNAP-2348). Use to fail in run time while scanning column tables.(SNAP-2348)
  • Fixed negative bucket size with eviction. (GITHUB-982)
  • Fixed the issue of incorrect LowMemoryException, even if a lot of memory was left. (SNAP-2356)
  • Handled int overflow case in memory accounting. Due to this ExecutionMemoryPool released more memory than it has throws AssertionError (SNAP-2312)
  • Fixed the pooled connection not being returned to the pool after authorization check failure which led to unusable cluster. (SNAP-2255)
  • Fixed different results of nearly identical queries, due to join order. Its due to EXCHANGE hash ordering being different from table partitioning. It will happen for the specific case when query join order is different from partitioning of one of the tables while the other table being joined is partitioned differently. (SNAP-2225)
  • Corrected row count updated/inserted in a column table via putInto. (SNAP-2220)
  • Fixed the OOM issue due to hive queries. This was a memory leak. Due to this the system became very slow after sometime even if idle. (SNAP-2248)
  • Fixed the issue of incomplete plan and query string info in UI due to plan caching changes.
  • Corrected the logic of existence join.
  • Sensitive information, like user password, LDAP password etc, which are passed as properties to the cluster are masked on the UI now.
  • Schema with boolean columns sometimes returned incorrect null values. Fixed. (SNAP-2436)
  • Fixed the scenario where break in colocation chain of buckets due to crash led to disk store metadata going bad causing restart failure.
  • Wrong entry count on restart, if region got closed on a server due to DiskAccessException leading to a feeling of loss of data. Do not let the region close in case of LME. This has been done by not letting non IOException get wrapped in DiskAccessException. (SNAP-2375)
  • Fix to avoid hang or delay in stop when stop is issued and the component has gone into reconnect cycle. (SNAP-2380)
  • Handle joining of new servers better. Avoid ConflictingPersistentDataException when a new server starts before any of the old server start. SNAP-2236
  • ODBC driver bug fix. Added EmbedDatabaseMetaData.getTableSchemas.
  • Change the order in which backup is taken. Internal DD diskstore of backup is taken first followed by rest of the disk stores. This helps in stream apps which want to store offset of replayable source in snappydata. They can create the offset table backed up by the internal DD store instead of default or custom disk store.

Description of download artifacts

Artifact Name Description
snappydata-1.0.2-bin.tar.gz Full product binary (includes Hadoop 2.7)
snappydata-1.0.2-without-hadoop-bin.tar.gz Product without the Hadoop dependency JARs
snappydata-client-1.6.2.jar Client (JDBC) JAR
snappydata-zeppelin_2.11-0.7.3.2.jar The Zeppelin interpreter jar for SnappyData, compatible with Apache Zeppelin 0.7.3
snappydata-ec2-0.8.2.tar.gz Script to Launch SnappyData cluster on AWS EC2 instances

@ashetkar ashetkar released this Feb 14, 2018 · 344 commits to master since this release

Assets 5

The SnappyData team is pleased to announce the availability of version 1.0.1 of the platform. You can find the release artifacts of its Community Edition towards the end of this page.

You can also download the Enterprise Edition here. The table below summarizes the features available in Enterprise and OSS editions.

Feature Community Enterprise
Mutable Row & Column Store X X
Compatibility with Spark X X
Shared Nothing Persistence & HA X X
REST API for Spark Job Submission X X
Fault Tolerance for Driver X X
JDBC Driver X X
CLI for backup, restore & export X X
Spark console extensions X X
System Perf/Behavior statistics X X
Support for transactions in Row tables X X
Support for indexing in Row Tables X X
SQL extensions for stream processing X X
Synopsis Data Engine for Approximate Querying X
ODBC Driver with High Concurrency X
Off-heap data storage for column tables X
CDC Stream receiver for SQL Server into SnappyData X
GemFire/Apache Geode connector X
LDAP security interface X

More details about the release:

New Features:

  • putInto and deleteFrom bulk operations support for column tables (SNAP-2092, SNAP-2093, SNAP-2094):
    • ability to specify "key columns" in the table DDL to use for putInto and deleteFrom APIs
    • "PUT INTO" SQL or putInto API extension to overwrite existing rows and insert non-existing ones
    • "DELETE FROM" SQL or deleteFrom API extension to delete a set of matching rows
    • UPDATE SQL now supports using expressions with column references of another table in RHS of SET
  • Improvements in cluster restart with off-line, failed nodes or with corrupt meta-data (SNAP-2096)
    • new admin command "unblock" to allow the initialization of a table even if it is waiting for offline members
    • retain data unlike revoke and initialize with the latest online working copy (SNAP-2143)
    • parallel recovery of data regions to break any cyclic dependencies between the nodes, and allow reporting on all off-line nodes that may have more recent copy of data
    • many bug-fixes related to startup issues due to meta-data inconsistencies:
      incorrect data conflicts (SNAP-2097, SNAP-2098), metadata corruption (SNAP-2140)
  • Compression of column batches in disk storage and over the network (SNAP-1743)
    • support for LZ4, SNAPPY compression codecs in disk storage and transport of column table data
    • new SOURCEPATH and COMPRESSION columns in SYS.HIVETABLES virtual table
  • Support for temporary, global temporary and persistent VIEWs (SNAP-2072):
    • CREATE VIEW, CREATE TEMPORARY VIEW and CREATE GLOBAL TEMPORARY VIEW DDLs
  • No jar dependencies in snappydata cluster for external datasources of smart connector (SNAP-2072)
  • External tables display in dashboard and snappy command-line (SNAP-2086)
  • Auto-configuration of SPARK_PUBLIC_DNS, hostname-for-clients etc in AWS environment (SNAP-2116)
  • GRANT/REVOKE SQL support in SnappySession.sql() earlier only allowed from JDBC/ODBC (SNAP-2042)
  • LATERAL VIEW support in SnappySession.sql() (SNAP-1283)
  • FETCH FIRST syntax as an alternative to LIMIT to support some SQL tools that use former
  • Addition of IndexStats in for local row table index lookup and range scans
  • SYS.DISKSTOREIDS virtual table to disk-store IDs being used in the cluster by all members (SNAP-2113)

Performance Enhancements:

  • Major performance improvements in smart connector mode (SNAP-2101, SNAP-2084)
    • minimized buffer copying, key lookups in column table rather than full scan for filters, reduce round-trips
    • allow using SnappyUnifiedMemoryManager with smart connector (SNAP-2084)
  • New memory and disk iterator to minimize faultins and serialize disk reads (SNAP-2102):
    • reduce faultins and cross-iterator serial disk reads per diskstore to minimize random reads from disk
    • new remote iterator that substantially reduces the memory overhead and caches only current batch
  • Startup performance improvements to cut down on locator/server/lead start and restart times (SNAP-338)
  • Improve performance of reads of variable length data for some queries (SNAP-2118)
  • Use colocated joins with VIEWs when possible (SNAP-2204)
  • Separate disk store for delta buffer regions to substantially improve column table compaction (SNAP-2121)
  • Projection push-down to scan layer for non-deterministic expressions like spark_partition_id() (SNAP-2036)
  • code-generation cache is larger by default and configurable (SNAP-2120)

Select bug fixes and performance related fixes:
A sample of bug fixes done as part of this release are noted below. For a more comprehensive list, see ReleaseNotes.txt.

  • Now only overflow-to-disk is allowed as eviction action for tables (SNAP-1501):
    • only overflow-to-disk is allowed as a valid eviction action and cannot be explicitly specified
    • OVERFLOW=false property can be used to disable eviction which is true by default
  • Memory accounting fixes:
    • incorrect initial memory accounting causing insert failure even with memory available (SNAP-2084)
    • zero usage shown in UI on restart (SNAP-2180)
  • Disable embedded Zeppelin interpreter in a secure cluster which can bypass security (SNAP-2191)
  • Fix import of JSON data (SNAP-2087)
  • selects missing results or failing during node failures (SNAP-889, SNAP-1547)
  • fixes and improvements to server and lead status in both the launcher status and SYS.MEMBERS table
    (SNAP-1960, SNAP-2060, SNAP-1645)
  • fix updates on complex types (SNAP-2141)
  • column table scan fixes related to null value reads (SNAP-2088)
  • disable tokenization for external tables, flags to disable it and plan caching (SNAP-2114, SNAP-2124)
  • deadlock in transactional operations with GII (SNAP-1950)
  • couple of fixes in UPDATE SQL: unexpected rollover (SNAP-2192), show as update count (SNAP-2156)
  • fixes ported from Apache Geode (GEODE-2109, GEODE-2240)
  • fixes to all failures in snappy-spark test suite which includes both product and test changes
  • more comprehensive python API testing (SNAP-2044)

Description of download artifacts:

Artifact Name Description
snappydata-1.0.1-bin.tar.gz Full product binary (includes Hadoop 2.7)
snappydata-1.0.1-without-hadoop-bin.tar.gz Product without the Hadoop dependency JARs
snappydata-client-1.6.1.jar Client (JDBC) JAR
snappydata-zeppelin-0.7.3.jar The Zeppelin interpreter jar for SnappyData, compatible with Apache Zeppelin 0.7.3
snappydata-ec2-0.8.1.tar.gz Script to Launch SnappyData cluster on AWS EC2 instances
Assets 5

The SnappyData team is pleased to announce the availability of version 1.0.0 GA of the platform.
Download the Enterprise Edition here

New Features:

  • Fully compatible with Apache Spark 2.1.1
  • Mutability support for column store (SNAP-1389):
    • UPDATE and DELETE operations are now supported on column tables.
  • ALTER TABLE support for row table (SNAP-1326).
  • Security Support (available in enterprise edition): This release introduces cluster security with authentication and authorisation based on LDAP mechanism. Will be extended to other mechanisms in future (SNAP-1656, SNAP-1813).
  • DEB and RPM installers (distProduct target in source build).
  • Support for setting scheduler pools using the set command.
  • Multi-node cluster now boots up quickly as background start of server processes is enabled by default.
  • Pulse Console: SnappyData Pulse has been enhanced to be more useful to both developers and operations personnel (SNAP-1890, SNAP-1792). Improvements include
    • Ability to sort members list based on members type.
    • Added new UI view named SnappyData Member Details Page which includes, among other things, latest logs.
    • Added members Heap and Off-Heap memory usage details along with their storage and execution splits.
  • Users can specify streaming batch interval when submitting a stream job via conf/snappy-job.sh (SNAP-1948).
  • Row tables now support LONG, SHORT, TINYINT and BYTE datatypes (SNAP-1722).
  • The history file for snappy shell has been renamed from .gfxd.history to .snappy.history. You may copy your existing ~/.gfxd.history to ~/.snappy.history to be able to access your historical snappy shell commands.

Performance Enhancements:

  • Performance enhancements with dictionary decoder when dictionary is large. (SNAP-1877)
    • Using a consistent sort for pushed down predicates so that different sessions do not end up creating different generated code.
    • Reduced the size of generated code.
  • Indexed cursors in decoders to improve heavily filtered queries (SNAP-1936)
  • Performance improvements in Smart Connector mode, specially with queries on tables with wide schema (SNAP-1363, SNAP-1699)
  • Several other performance improvements.

Select bug fixes and performance related fixes:
There have been numerous bug fixes done as part of this release. Some of these are included below. For a more comprehensive list, see ReleaseNotes.txt.

  • Fixed data inconsistency issues when a new node is joining the cluster and at the same time write operations are going on. (SNAP-1756).
  • The product internally does retries on redundant copy of partitions on the event of a node failure (SNAP-1377, SNAP-902).
  • Fixed the wrong status of locators on restarts. After cluster restart, snappy-status-all.sh used to show locators in waiting state even when the actual status changed to running (SNAP-1893).
  • Fixed the SnappyData Pulse freezing when loading data sets (SNAP-1426).
  • More accurate accounting of execution and storage memory (SNAP-1688, SNAP-1798).
  • Corrected case-sensitivity handling for query API calls (SNAP-1714).

Description of download artifacts:

Artifact Name Description
snappydata-1.0.0-bin.tar.gz Full product binary (includes Hadoop 2.7)
snappydata-1.0.0-without-hadoop-bin.tar.gz Product without the Hadoop dependency JARs
snappydata-client-1.6.0.jar Client (JDBC) JAR
snappydata-zeppelin-0.7.2.jar The Zeppelin interpreter jar for SnappyData, compatible with Apache Zeppelin 0.7.2

@ashetkar ashetkar released this Aug 31, 2017 · 1 commit to branch-1.0-rc since this release

Assets 8

The SnappyData team is pleased to announce the availability of version 1.0.0-RC1 of the platform.

New Features:

  • Fully compatible with Apache Spark 2.1.1
  • Mutability support for column store (SNAP-1389):
    -- UPDATE and DELETE operations are now supported on column tables.
  • ALTER TABLE support for row table (SNAP-1326).
  • Security Support (available in enterprise edition): This release introduces cluster security with authentication and authorisation based on LDAP mechanism. Will be extended to other mechanisms in future (SNAP-1656, SNAP-1813).
  • Support for setting scheduler pools using the set command.
  • Multi-node cluster now boots up quickly as background start of server processes is enabled by default.
  • Pulse Console: SnappyData Pulse has been enhanced to be more useful to both developers and operations personnel (SNAP-1890, SNAP-1792). Improvements include
    -- Ability to sort members list based on members type.
    -- Added new UI view named SnappyData Member Details Page which includes, among other things, latest logs.
    -- Added members Heap and Off-Heap memory usage details along with their storage and execution splits.
  • Users can specify streaming batch interval when submitting a stream job via conf/snappy-job.sh (SNAP-1948).
  • Row tables now support LONG, SHORT, TINYINT and BYTE datatypes (SNAP-1722).
  • The history file for snappy shell has been renamed from .gfxd.history to .snappy.history. You may copy your existing ~/.gfxd.history to ~/.snappy.history to be able to access your historical snappy shell commands.

Performance Enhancements:

  • Performance enhancements with dictionary decoder when dictionary is large. (SNAP-1877)
    -- Different sessions end up creating different code due to indeterminate statsPredicate
    ordering. Now using a consistent sort order so that generated code is identical across
    sessions for the same query.
    -- Reduced the size of generated code.
  • Indexed cursors in decoders to improve heavily filtered queries (SNAP-1936)
  • Performance improvements in Smart Connector mode, specially with queries on tables with wide schema (SNAP-1363, SNAP-1699)
  • Several other performance improvements.

Select bug fixes and performance related fixes:
Some of these are included below. For the complete list, see ReleaseNotes.txt.

  • Fixed data inconsistency issues when a new node is joining the cluster and at the same time write operations are going on. (SNAP-1756).
  • The product internally does retries on redundant copy of partitions on the event of a node failure (SNAP-1377, SNAP-902).
  • Fixed the wrong status of locators on restarts. After cluster restart, snappy-status-all.sh used to show locators in waiting state even when the actual status changed to running (SNAP-1893).
  • Fixed the SnappyData Pulse freezing when loading data sets (SNAP-1426).
  • More accurate accounting of execution and storage memory (SNAP-1688, SNAP-1798).
  • Corrected case-sensitivity handling for query API calls (SNAP-1714).

Description of download artifacts:

Artifact Name Description
snappydata-1.0.0-rc1-bin.tar.gz Full product binary (includes Hadoop 2.7)
snappydata-1.0.0-rc1-bin.zip Full product binary (includes Hadoop 2.7)
snappydata-1.0.0-rc1-without-hadoop-bin.tar.gz Product without the Hadoop dependency JARs
snappydata-1.0.0-rc1-without-hadoop-bin.zip Product without the Hadoop dependency JARs
snappydata-client-1.5.6-rc1.jar Client (JDBC) JAR
snappydata-core_2.11-1.0.0-rc1.jar The only dependency needed to connect to SnappyStore from Apache Spark 2.1.1 cluster (Smart Connector mode)
snappydata-zeppelin-0.7.1.jar The Zeppelin interpreter jar for SnappyData, compatible with Apache Zeppelin 0.7

(Details will be added here soon)

Pre-release

@ashetkar ashetkar released this Jun 13, 2017 · 7 commits to branch-0.9 since this release

Assets 12

The SnappyData team continues to march towards a 1.0GA and we are pleased to announce the availability of version 0.9 of the platform today. This release contains significant new functionality, several performance enhancements, design changes to make the platform scale better and a new and improved console that improves enterprise readiness of the SnappyData cluster.

In-memory but disk persistent, by default now
Until this release, by default, all tables were only memory resident and required explicit configuration for disk persistence (e.g. using the 'persistent' clause in 'create table'). From this release, we make all tables all persist to disk, by default. You can explicitly turn this OFF for pure memory-only tables.

Memory Management:

  • Improved “Unified Memory Manager” with more accurate accounting of memory. The previous release could prematurely spill to disk or cause GC pauses (SNAP-1235).
  • Support for Off-Heap storage in column store (SNAP-1454). The previous release required users to over allocate Java heap memory to avoid GC pauses or exposed applications to an increased risk of stop-the-world GC pauses. In addition to performance benefits, off-heap storage contributes to predictable system performance and behavior and is absolutely recommended for all production deployments.

Performance Enhancements:
Version 0.9 includes several product enhancements that contribute to improved product performance.
These include

  • The disk storage design for Column tables is more optimized. Before, the logical disk storage unit was still a set of rows. Instead now, the unit is now a set of column values making queries that require faulting data from disk significantly more efficient (SNAP-990).
  • The Query engine now caches the physical plan as well as the generated code for queries. Spark, likewise SnappyData, dynamically generates JVM byte code for the query, compiles and caches this generated code so any subsequent execution of the same query is much faster. But, often queries are similar not the same. For instance, the bound constants change( common in Where clauses). This meant the compiled plans are all that useful. Now, the generated code tokenizes literals and constants so that subsequent similar queries with different bound values execute much faster. (SNAP-1346).
  • Previous to 0.9, SnappyData was not optimized for PreparedStatements (JDBC) when the query was routed to the Spark Catalyst engine. Now, it is. (SNAP-1323).

Scaling Improvements:

  • SnappyData offers a smart connector mode, which allows Spark applications running in a remote cluster to intelligently and efficiently (very high degree of parallelism) access data stored in a SnappyData cluster. Version 0.9 offers a redesigned smart connector which acts as a client to the SnappyData cluster, offering much high levels of scaling for both the client and also improves the ability of the cluster to handle such connections without impacting the cluster’s ability to scale (Previous versions of the connector had to join the cluster as a peer member limiting the scalability of the cluster) (SNAP-1286)

Enterprise Readiness:

  • Consistency improvements: This release introduces snapshot Isolation semantics, by default, while processing queries using an MVCC algorithm so queries are guaranteed to access a stable view of the database (SNAP-1304).
  • Pulse Console: SnappyData Pulse has been redesigned to provide both developers and operations personnel with useful insights into the running of the SnappyData cluster. Improvements include
    -- Redesigned member view which displays detailed member description, heap and off-heap usage along with snappy storage and execution splits
    -- Cluster level aggregate memory and CPU usage
    -- SQL tab that shows the SQL statements executed within the system with the ability to view query plans for the same

Select bug fixes and performance related fixes:

  • Starting version 0.9, row tables support the Boolean data type
  • Support for slash ('/') and special characters in column names (SNAP-1705).
  • Scans and ingest through code generation could fail if the generated code of a single method exceeds 64k (SNAP-1384).

For the complete list of tickets that were fixed in this release, see ReleaseNotes.txt.

Description of download artifacts:

Artifact Name Description
snappydata-0.9-bin.tar.gz Full product binary (includes Hadoop 2.7)
snappydata-0.9-bin.zip Full product binary (includes Hadoop 2.7)
snappydata-0.9-without-hadoop-bin.tar.gz Product without the Hadoop dependency JARs
snappydata-0.9-without-hadoop-bin.zip Product without the Hadoop dependency JARs
snappydata-client-1.5.5.jar Client (JDBC) JAR
snappydata-core_2.11-0.9.jar Only dependency to connect to SnappyStore from Apache Spark 2.0.X cluster (Smart Connector mode)
snappydata-0.9-odbc32.zip 32-bit ODBC driver for 32-bit Windows. Extract and run the msi.
snappydata-0.9-odbc64.zip 64-bit ODBC driver for 64-bit Windows. Extract and run the msi.
snappydata-0.9-odbc32_64.zip 32-bit ODBC driver for 64-bit Windows. Extract and run the msi.
ODBC-and-Tableau-Setup.pdf Installation instructions for the ODBC driver including Tableau setup.
odbc-snappydata.tdc TDC file for Tableau setup (see setup guide for details)
snappydata-zeppelin-0.7.1.jar The Zeppelin interpreter jar for SnappyData, compatible with Apache Zeppelin 0.7
Pre-release

@dhavalstr dhavalstr released this Mar 10, 2017 · 1045 commits to master since this release

Assets 14

SnappyData 0.8 Release with the following major changes:

New features/Fixes

  • ODBC Driver and Installer. You can now connect to the SnappyData cluster using the SnappyData ODBC driver and execute SQL queries. (SNAP-1357)
  • Multiple Language Binding using Thrift Protocol. SnappyData now provides support for Apache Thrift protocol enabling users to access the cluster from other languages that are not supported directly by SnappyData. (SNAP-1313)
  • Insert Performance Optimizations - Insert into tables is much more optimized and performant now.
    A new insert plan has been introduced which uses code generation and a new encoding format. (SNAP-490)
  • Fixes backward compatibility with Spark 2.0.0. The 0.8 SnappyData release is based on the Spark
    2.0.2 version. The SnappyData Smart connector is now backward compatible with Spark 2.0.0 and Spark 2.0.1
  • SnappyData JDBC now uses a new, more optimized Thrift based driver to communicate with the Data Servers.
  • Column table bloat issue - Bouncing of data servers (due to failure) could result in data accumulating in the Delta Row Buffers in column tables instead of being aged into the expected compressed columnar format resulting in bloat and inferior query performance. This has been addressed. (SNAP-1146)
  • SnappyData now supports "persistent UDFs". UDFs once registered are persisted in the Catalog and hence usable upon restarts. (SNAP-982)
  • For other bug-fixes, see release notes for more details.

Known issues

  • Inserting into or querying a column table with wide schema may fail with StackOverflowException due to a limitation of JVM. (SNAP-1384)

SnappyData Synopses Data Engine:

  • Sample selection logic enhanced. It can now select best suited sample table even if SQL
    functions are used on QCS columns while creating sample tables.
  • Poisson multiplicity generator logic for bootstrap is improved. Error estimated using
    bootstrap are now more accurate.
  • Improved performance of closed-form and bootstrap error estimations.

Description of download artifacts

Artifact Name Description
snappydata-0.8-bin.tar.gz Full product binary (includes Hadoop 2.7)
snappydata-0.8-bin.zip Full product binary (includes Hadoop 2.7)
snappydata-0.8-without-hadoop-bin.tar.gz Product without the Hadoop dependency JARs
snappydata-0.8-without-hadoop-bin.zip Product without the Hadoop dependency JARs
snappydata-client-1.5.4.jar Client (JDBC) JAR
snappydata-core_2.11-0.8.jar Only dependency to connect to SnappyStore from Apache Spark 2.0.X cluster (Smart Connector mode)
snappydata-0.8.0.1-odbc32.zip 32-bit ODBC driver for 32-bit Windows. Extract and run the msi.
snappydata-0.8.0.1-odbc64.zip 64-bit ODBC driver for 64-bit Windows. Extract and run the msi.
snappydata-0.8.0.1-odbc32_64.zip 32-bit ODBC driver for 64-bit Windows. Extract and run the msi.
ODBC-and-Tableau-Setup.pdf Installation instructions for the ODBC driver including Tableau setup.
odbc-snappydata.tdc TDC file for Tableau setup (see setup guide for details)
pulse.war Needed only in RowStore mode. Classes for Pulse UI.

Get the Zeppelin interpreter jar, compatible with Apache Zeppelin 0.7, for SnappyData.

Jan 17, 2017
Adding some more test cases.
Pre-release

@ashetkar ashetkar released this Dec 21, 2016 · 1124 commits to master since this release

Assets 9

SnappyData 0.7 Release with the following major changes.

  • In sync and fully compatible with Apache Spark 2.0.2.
  • Try SnappyData without any download as a Spark dependency
  • 20X faster than Spark in-memory Caching. Try simple perf example on your laptop. Some of the individual optimizations listed below.
  • Performance optimizations:
    • New GROUP BY and HASH JOIN operators used with SnappyData storage tables that are 5-10X faster than the ones in Spark. (SNAP-1067)
    • Support for plan caching to reuse SparkPlan, RDD and PlanInfo (SNAP-1191)
    • Optimizations for single dictionary column with SnappyData's GROUP BY and JOIN operator that improve the performance further by 2-3X. (SNAP-1194)
    • Pooled version of Kryo serializer including for closures. Spark updated to allow for pluggable closure serializer. (SNAP-1136)
    • Column batch level statistics to allow query predicates to skip entire batches when possible. (SNAP-1087)
  • Reduce serialization overheads of biggest contributors in queries. (SNAP-1202)
  • Plan optimizations to minimize data shuffle and combine aggregates when possible. (SNAP-1260)
  • New SnappyData Dashboard as a an extension to Spark UI. Explore your SnappyData cluster and Spark artifacts in the same UI.
  • HowTos: Working code snippets of various features for developers to get started. Check out the docs for more details.
  • Amazon Web Services AMI and Docker image with SnappyData 0.7 now available. Refer to docs for more details.
  • Support for map, flatMap, filter, glom, mapPartition and transform APIs to SchemaDStream (SNAP-1182)
  • Use ConfigEntry mechanism for SnappyData properties (SNAP-1180)
  • INSTALL JAR utility to load application jars that are available to all the jobs submitted to SnappyData. This is in addition to the existing way of providing application jars using --jars in spark-submit.
  • EC2 scripts are now moved to a new repository with enhancements and fixes.
  • Several other bug-fixes and optimizations. See release notes for more details.

SnappyData Synopses Data Engine:

  • Row count for sample tables is now displayed in SnappyData Dashboard.
  • Enabling HA semantics and redundancy for sample tables.
  • Other bug-fixes and performance improvements.

Description of download artifacts

Artifact Name Description
snappydata-0.7-bin.tar.gz Full product binary (includes Hadoop 2.7)
snappydata-0.7-bin.zip Full product binary (includes Hadoop 2.7)
snappydata-0.7-without-hadoop-bin.tar.gz Product without the Hadoop dependency JARs
snappydata-0.7-without-hadoop-bin.zip Product without the Hadoop dependency JARs
snappydata-client-1.5.3.jar Client (JDBC) JAR
snappydata-core_2.11-0.7.jar Only dependency to connect to SnappyStore from Apache Spark cluster (Smart Connector mode)
snappydata-ec2-0.7.tar.gz Script to Launch EC2 instances on AWS

Get the Zeppelin interpreter jar, compatible with Apache Zeppelin 0.6.1, for SnappyData.

@ashetkar ashetkar released this Oct 20, 2016 · 1334 commits to master since this release

Assets 10

SnappyData 0.6.1 (Row Store 1.5.2) Release with the following major changes over the previous release.

  • Failure in IMPORT causes the system to close region and network interfaces. So threads are not interrupted anymore. (SNAP-1138)
  • Added a service to publish store table size that is used for query plan generation.
    These stats are also published on Snappy store UI tab. (SNAP-1075)
  • Fixes for Streaming related issues after Spark 2.0 merge. (SNAP-1060, SNAP-1141, SNAP-1115)
  • Other bug-fixes. (SNAP-1083, SNAP-1113)