Skip to content

Releases: BlazingDB/blazingsql

One repo to rule them all and its all Python

17 Dec 15:21
Compare
Choose a tag to compare

New Features:

  • Merged all the code repos for the whole stack into one repo
  • Pythonization of the whole BlazingSQL stack. See our blog post for more information
  • New API for being able to query performance and execution logs
  • Ability to create BlazingSQL tables from Hive tables
  • Partial support for Non-equality joins. For example SELECT * FROM tableA as A INNER JOIN tableB as B ON A.key = B.key AND A.this_date > B.that_date
  • Added arrow-provider

Improvements:

  • Optimized simple queries that only have COUNT(*)
  • Removed limitation on number of operands for outer joins
  • Improved error messaging
  • Improvements to relational algebra optimization

Bug Fixes:

  • Fixed bug where a python script running BlazingSQL would hang at the end of a script
  • Fixed bug when using wildcards for file paths and using dask distribution
  • Fixed bug with HDFS
  • Fixed bug with projects with large amounts of transformations on large GPUs
  • Fixed bug with multiple projections on the same column
  • Fixed COUNT(*) to properly ignore nulls
  • Fixed stability issues with certains queries running on 3 or more nodes
  • Fixed bug with querying a GDF and no transformations are applied
  • Fixed bug with empty result sets
  • Fixed bug with empty column names

New string operators, performance improvements and many bug fixes

12 Nov 02:28
f3dd193
Compare
Choose a tag to compare

New Features

  • Implemented string concat operator
  • Implemented substring operator

Improvements:

  • Improved management of services
  • Changed Apache Calcite schema database to an in-memory database
  • Improved performance of communication between nodes by enabling parallel messaging
  • Improved performance of data loading by enabling parallel file reading
  • Added new distributed join method for joining small tables

Bug Fixes:

  • Fixed various issues with Timestamp data types
  • Fixed issue when column names were too long
  • Fixed bug in relational algebra generation
  • Fixed various bugs in communication layer
  • Fixed bug with order by with strings
  • Fixed issue with parsing Apache Parquet file schemas
  • Fixed memory leak in joins
  • Fixed memory leak in communication layer
  • Fixed bug in table concatenation in disitrubiton algorithms
  • Fixed bug when trying to join on columns of integers of different byte widths, or floats of different byte widths
  • Fixed bug when trying to do a union on columns of integers of different byte widths, or floats of different byte widths
  • Fixed bug in passing error message to user

Revamped data transport layer, LIKE operator and much more

22 Oct 01:30
39c05d9
Compare
Choose a tag to compare

New Features

  • Completely revamped data transport layer is much faster and robust
  • Added support for LIKE operator
  • Added ability to create tables from Dask dataframes.
  • Improved how services are launched from BlazingContext. Including new ready() function which checks to see if all services are online and shutdown() function to shutdown all services.

Improvements

  • Improved performance logging
  • Now using in-memory H2 database for Apache Calcite table catalog
  • Updated to cudf v0.10

Bug Fixes

  • Fixed bug in expression parsing
  • Fixed various bugs with date literals, date functions and GDF_TIMESTAMP data type
  • Fixed bug with aliases
  • Fixed bug in order by for distributed queries when there are empty partitions
  • Fixed bug in creating tables from S3 directories
  • Fixed bug where predicate pushdown was not happening in certain types of queries

CAST and minor bug fixes

22 Oct 01:23
07b8590
Compare
Choose a tag to compare

New Features

  • Added support for CAST
  • Added file_format parameter to create_table. This parameter is used for when the file format is not determinable from the file extension.

Bug Fixes

  • Fixed bug where aliases would sometimes not be set correctly

Fixed Conda Packaging Versioning

26 Sep 13:13
acddc93
Compare
Choose a tag to compare

New Features

  • Added file_format parameter to create_table to help create tables from files that don't have extensions

Bug Fixes

  • Fixed how releases are versioned for Conda
  • Fixed bug with joining against an empty table

Google Cloud Storage support and MORE!

20 Sep 02:36
3e728c3
Compare
Choose a tag to compare

New Features

  • Added support for CASE
  • Improved support for Boolean columns
  • Creating tables using wildcards in file paths
  • Added support for Google Cloud Storage

Bug Fixes

  • Fixed bug in groups by's with strings in distributed cluster
  • Fixed issues in how BlazingContext launches processes
  • Fixed issue where releases were being done in Debug mode
  • Fixed bug related to creating multiple tables with the same name

Conda Install and JSON and ORC file tables

20 Sep 02:21
04e5637
Compare
Choose a tag to compare

New Features:

  • Ability to compile and install using Conda
  • Creating BlazingContext can now automatically launches processes
  • Support for creating tables from JSON and ORC files
  • Added more CSV parsing parameters for creating tables from CSV files
  • Updated to use cudf v0.9 release
  • Added support for LIMIT

Bug fixes

  • Fixed bug with processing queries using date literals
  • Fixed distribution issues with data with nulls

Distributed Query Execution

16 Aug 18:29
d237503
Compare
Choose a tag to compare

A great deal has happened since we last released.

  • We now support distributed query execution!
  • Distributed results output to dask-cudf
  • Updated to cuDF 0.9
  • Millions, literally millions, of bug fixes.
  • No longer use main. before any table names. That was awful.
    bc.sql('select * from main.table_name') --> bc.sql('select * from table_name')

simple-distribution-tcp-cudf0.7

14 Jun 20:43
Compare
Choose a tag to compare
Pre-release

before cudf 0.8 and before table scan