Releases: BlazingDB/blazingsql
Releases · BlazingDB/blazingsql
One repo to rule them all and its all Python
New Features:
- Merged all the code repos for the whole stack into one repo
- Pythonization of the whole BlazingSQL stack. See our blog post for more information
- New API for being able to query performance and execution logs
- Ability to create BlazingSQL tables from Hive tables
- Partial support for Non-equality joins. For example SELECT * FROM tableA as A INNER JOIN tableB as B ON A.key = B.key AND A.this_date > B.that_date
- Added arrow-provider
Improvements:
- Optimized simple queries that only have COUNT(*)
- Removed limitation on number of operands for outer joins
- Improved error messaging
- Improvements to relational algebra optimization
Bug Fixes:
- Fixed bug where a python script running BlazingSQL would hang at the end of a script
- Fixed bug when using wildcards for file paths and using dask distribution
- Fixed bug with HDFS
- Fixed bug with projects with large amounts of transformations on large GPUs
- Fixed bug with multiple projections on the same column
- Fixed COUNT(*) to properly ignore nulls
- Fixed stability issues with certains queries running on 3 or more nodes
- Fixed bug with querying a GDF and no transformations are applied
- Fixed bug with empty result sets
- Fixed bug with empty column names
New string operators, performance improvements and many bug fixes
New Features
- Implemented string concat operator
- Implemented substring operator
Improvements:
- Improved management of services
- Changed Apache Calcite schema database to an in-memory database
- Improved performance of communication between nodes by enabling parallel messaging
- Improved performance of data loading by enabling parallel file reading
- Added new distributed join method for joining small tables
Bug Fixes:
- Fixed various issues with Timestamp data types
- Fixed issue when column names were too long
- Fixed bug in relational algebra generation
- Fixed various bugs in communication layer
- Fixed bug with order by with strings
- Fixed issue with parsing Apache Parquet file schemas
- Fixed memory leak in joins
- Fixed memory leak in communication layer
- Fixed bug in table concatenation in disitrubiton algorithms
- Fixed bug when trying to join on columns of integers of different byte widths, or floats of different byte widths
- Fixed bug when trying to do a union on columns of integers of different byte widths, or floats of different byte widths
- Fixed bug in passing error message to user
Revamped data transport layer, LIKE operator and much more
New Features
- Completely revamped data transport layer is much faster and robust
- Added support for LIKE operator
- Added ability to create tables from Dask dataframes.
- Improved how services are launched from
BlazingContext
. Including newready()
function which checks to see if all services are online andshutdown()
function to shutdown all services.
Improvements
- Improved performance logging
- Now using in-memory H2 database for Apache Calcite table catalog
- Updated to cudf v0.10
Bug Fixes
- Fixed bug in expression parsing
- Fixed various bugs with date literals, date functions and GDF_TIMESTAMP data type
- Fixed bug with aliases
- Fixed bug in order by for distributed queries when there are empty partitions
- Fixed bug in creating tables from S3 directories
- Fixed bug where predicate pushdown was not happening in certain types of queries
CAST and minor bug fixes
New Features
- Added support for CAST
- Added
file_format
parameter tocreate_table
. This parameter is used for when the file format is not determinable from the file extension.
Bug Fixes
- Fixed bug where aliases would sometimes not be set correctly
Fixed Conda Packaging Versioning
New Features
- Added
file_format
parameter tocreate_table
to help create tables from files that don't have extensions
Bug Fixes
- Fixed how releases are versioned for Conda
- Fixed bug with joining against an empty table
Google Cloud Storage support and MORE!
New Features
- Added support for CASE
- Improved support for Boolean columns
- Creating tables using wildcards in file paths
- Added support for Google Cloud Storage
Bug Fixes
- Fixed bug in groups by's with strings in distributed cluster
- Fixed issues in how BlazingContext launches processes
- Fixed issue where releases were being done in Debug mode
- Fixed bug related to creating multiple tables with the same name
Conda Install and JSON and ORC file tables
New Features:
- Ability to compile and install using Conda
- Creating BlazingContext can now automatically launches processes
- Support for creating tables from JSON and ORC files
- Added more CSV parsing parameters for creating tables from CSV files
- Updated to use cudf v0.9 release
- Added support for LIMIT
Bug fixes
- Fixed bug with processing queries using date literals
- Fixed distribution issues with data with nulls
Distributed Query Execution
A great deal has happened since we last released.
- We now support distributed query execution!
- Distributed results output to dask-cudf
- Updated to cuDF 0.9
- Millions, literally millions, of bug fixes.
- No longer use
main.
before any table names. That was awful.
bc.sql('select * from main.table_name')
-->bc.sql('select * from table_name')
simple-distribution-tcp-cudf0.7
before cudf 0.8 and before table scan