Releases · BlazingDB/blazingsql · GitHub

17 Dec 15:21

wmalpica

One repo to rule them all and its all Python

New Features:

Merged all the code repos for the whole stack into one repo
Pythonization of the whole BlazingSQL stack. See our blog post for more information
New API for being able to query performance and execution logs
Ability to create BlazingSQL tables from Hive tables
Partial support for Non-equality joins. For example SELECT * FROM tableA as A INNER JOIN tableB as B ON A.key = B.key AND A.this_date > B.that_date
Added arrow-provider

Improvements:

Optimized simple queries that only have COUNT(*)
Removed limitation on number of operands for outer joins
Improved error messaging
Improvements to relational algebra optimization

Bug Fixes:

Fixed bug where a python script running BlazingSQL would hang at the end of a script
Fixed bug when using wildcards for file paths and using dask distribution
Fixed bug with HDFS
Fixed bug with projects with large amounts of transformations on large GPUs
Fixed bug with multiple projections on the same column
Fixed COUNT(*) to properly ignore nulls
Fixed stability issues with certains queries running on 3 or more nodes
Fixed bug with querying a GDF and no transformations are applied
Fixed bug with empty result sets
Fixed bug with empty column names

Assets 2

12 Nov 02:28

wmalpica

New string operators, performance improvements and many bug fixes

New Features

Implemented string concat operator
Implemented substring operator

Improvements:

Improved management of services
Changed Apache Calcite schema database to an in-memory database
Improved performance of communication between nodes by enabling parallel messaging
Improved performance of data loading by enabling parallel file reading
Added new distributed join method for joining small tables

Bug Fixes:

Fixed various issues with Timestamp data types
Fixed issue when column names were too long
Fixed bug in relational algebra generation
Fixed various bugs in communication layer
Fixed bug with order by with strings
Fixed issue with parsing Apache Parquet file schemas
Fixed memory leak in joins
Fixed memory leak in communication layer
Fixed bug in table concatenation in disitrubiton algorithms
Fixed bug when trying to join on columns of integers of different byte widths, or floats of different byte widths
Fixed bug when trying to do a union on columns of integers of different byte widths, or floats of different byte widths
Fixed bug in passing error message to user

Assets 2

22 Oct 01:30

wmalpica

Revamped data transport layer, LIKE operator and much more

New Features

Completely revamped data transport layer is much faster and robust
Added support for LIKE operator
Added ability to create tables from Dask dataframes.
Improved how services are launched from BlazingContext. Including new ready() function which checks to see if all services are online and shutdown() function to shutdown all services.

Improvements

Improved performance logging
Now using in-memory H2 database for Apache Calcite table catalog
Updated to cudf v0.10

Bug Fixes

Fixed bug in expression parsing
Fixed various bugs with date literals, date functions and GDF_TIMESTAMP data type
Fixed bug with aliases
Fixed bug in order by for distributed queries when there are empty partitions
Fixed bug in creating tables from S3 directories
Fixed bug where predicate pushdown was not happening in certain types of queries

Assets 2

22 Oct 01:23

wmalpica

CAST and minor bug fixes

New Features

Added support for CAST
Added file_format parameter to create_table. This parameter is used for when the file format is not determinable from the file extension.

Bug Fixes

Fixed bug where aliases would sometimes not be set correctly

Assets 2

26 Sep 13:13

wmalpica

Fixed Conda Packaging Versioning

New Features

Added file_format parameter to create_table to help create tables from files that don't have extensions

Bug Fixes

Fixed how releases are versioned for Conda
Fixed bug with joining against an empty table

Assets 2

20 Sep 02:36

wmalpica

Google Cloud Storage support and MORE!

New Features

Added support for CASE
Improved support for Boolean columns
Creating tables using wildcards in file paths
Added support for Google Cloud Storage

Bug Fixes

Fixed bug in groups by's with strings in distributed cluster
Fixed issues in how BlazingContext launches processes
Fixed issue where releases were being done in Debug mode
Fixed bug related to creating multiple tables with the same name

Assets 2

20 Sep 02:21

wmalpica

Conda Install and JSON and ORC file tables

New Features:

Ability to compile and install using Conda
Creating BlazingContext can now automatically launches processes
Support for creating tables from JSON and ORC files
Added more CSV parsing parameters for creating tables from CSV files
Updated to use cudf v0.9 release
Added support for LIMIT

Bug fixes

Fixed bug with processing queries using date literals
Fixed distribution issues with data with nulls

Assets 2

16 Aug 18:29

roaramburu

Distributed Query Execution

A great deal has happened since we last released.

We now support distributed query execution!
Distributed results output to dask-cudf
Updated to cuDF 0.9
Millions, literally millions, of bug fixes.
No longer use main. before any table names. That was awful.
bc.sql('select * from main.table_name') --> bc.sql('select * from table_name')

Assets 2

14 Jun 20:43

aucahuasi

simple-distribution-tcp-cudf0.7

simple-distribution-tcp-cudf0.7 Pre-release

Pre-release

before cudf 0.8 and before table scan

Assets 2