BlazingSQL 21.08.00 (TBD)

New Features

Improvements

#1571 Update ucx-py versions to 0.21
#1554 return ok for filesystems
#1572 Setting up default value for max_bytes_chunk_read to 256 MB

Bug Fixes

#1570 Fix build due to changes in rmm device buffer
#1576 Fix CC/CXX variables in CI
#1581 Fix latest cudf dependencies

BlazingSQL 21.06.00 (June 10th, 2021)

New Features

#1471 Unbounded partitioned windows
#1445 Support for CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP
#1505 Support for right outer join
#1523 Support for DURATION type
#1552 Support for concurrency in E2E tests

Improvements

#1464 Better Support for unsigned types in C++ side
#1511 Folder refactoring related to caches, kernels, execution_graph, BlazingTable
#1522 Improve data loading when the algebra contains only BindableScan/Scan and Limit
#1524 Enable support for spdlog 1.8.5
#1547 Update RAPIDS version references
#1539 Support ORDERing by null values
#1551 Support for spdlog 1.8.5
#1553 multiple columns inside COUNT() statement

Bug Fixes

#1455 Support for IS NOT FALSE condition
#1502 Fix IS NOT DISTINCT FROM with joins
#1475 Fix wrong results from timestampdiff/add
#1528 Fixed build issues due to cudf aggregation API change
#1540 Comparing param set to true for e2e
#1543 Enables provider unit_tests
#1548 Fix orc statistic building
#1550 Fix Decimal/Fixed Point issue
#1519 Fix for max_bytes_chunk_read param to csv files
#1559 Fix ucx-py versioning specs
#1557 Reading chunks of max bytes for csv files

BlazingSQL 0.19.0 (April 21, 2021)

New Features

#1367 OverlapAccumulator Kernel
#1364 Implement the concurrent API (bc.sql with token, bc.status, bc.fetch)
#1426 Window Functions without partitioning
#1349 Add e2e test for Hive Partitioned Data
#1396 Create tables from other RDBMS
#1427 Support for CONCAT alias operator
#1424 Add get physical plan with explain
#1472 Implement predicate pushdown for data providers

Improvements

#1325 Refactored CacheMachine.h and CacheMachine.cpp
#1322 Updated and enabled several E2E tests
#1333 Fixing build due to cudf update
#1344 Removed GPUCacheDataMetadata class
#1376 Fixing build due to some strings refactor in cudf, undoing the replace workaround
#1430 Updating GCP to >= version
#1331 Added flag to enable null e2e testing
#1418 Adding support for docker image
#1434 Added documentation for C++ and Python in Sphinx
#1419 Added concat cache machine timeout
#1444 Updating GCP to >= version
#1349 Add e2e test for Hive Partitioned Data
#1447 Improve getting estimated output num rows
#1473 Added Warning to Window Functions
#1482 Improve test script for blazingsql-testing-file
#1480 Improve dependencies script
#1433 Adding ArrowCacheData, refactoring CacheData files

Bug Fixes

#1335 Fixing uninitialized var in orc metadata and handling the parseMetadata exceptions properly
#1339 Handling properly the nulls in case conditions with strings
#1346 Delete allocated host chunks
#1348 Capturing error messages due to exceptions properly
#1350 Fixed bug where there are no projects in a bindable table scan
#1359 Avoid cuda issues when free pinned memory
#1365 Fixed build after sublibs changes on cudf
#1369 Updated java path for powerpc build
#1371 Fixed e2e settings
#1372 Recompute columns_to_hash in DistributeAggregationKernel
#1375 Fix empty row_group_ids for parquet
#1380 Fixed issue with int64 literal values
#1379 Remove ProjectRemoveRule
#1389 Fix issue when CAST a literal
#1387 Skip getting orc metadata for decimal type
#1392 Fix substrings with nulls
#1398 Fix performance regression
#1401 Fix support for minus unary operation
#1415 Fixed bug where num_batches was not getting set in BindableTableScan
#1413 Fix for null tests 13 and 23 of windowFunctionTest
#1416 Fix full join when both tables contains nulls
#1423 Fix temporary directory for hive partition test
#1351 Fixed 'count distinct' related issues
#1425 Fix for new joins API
#1400 Fix for Column aliases when exists a Join op
#1456 Raising exceptions on Python side for RAL
#1466 SQL providers: update README.md
#1470 Fix pre compiler flags for sql parsers
#1504 Fixing some conflicts in Dockerfile

Deprecated Features

#1394 Disabled support for outer joins with inequalities

BlazingSQL 0.18.0 (February 24, 2021)

New Features

#1139 Adding centralized task executor for kernels
#1200 Implement string REGEXP_REPLACE
#1237 Added task memory management
#1244 Added memory monitor ability to downgrade task data
#1232 Update PartwiseJoin and JoinPartition kernel using the task executor internally
#1238 Implements MergeStramKernel executor model
#1259 Implements SortAndSamplernel executor model, also avoid setting up num of samples
#1271 Added Hive utility for partitioned data
#1289 Multiple concurrent query support
#1285 Infer PROTOCOL when Dask client is passed
#1294 Add config options for logger
#1301 Added usage of pinned buffers for communication and fixes various UCX related bugs
#1298 Implement progress bar for run query (using tqdm)
#1284 Initial support for Windows Function
#1303 Add support for INITCAP
#1313 getting and using ORC metadata
#1347 Fixing issue when reading orc metadata from DATE dtype
#1338 Window Function support for LEAD and LAG statements
#1362 give useful message when file extension is not recognized
#1361 Supporting first_value and last_value for Window Function

Improvements

#1293 Added optional acknowledgments to message sending
#1236 Moving code from header files to implementation files
#1257 Expose the reset max memory usage C++ API to python
#1256 Improve Logical project documentation
#1262 Stop depending on gtest for runtime
#1261 Improve storage plugin output messages
#1153 Enable warnings and fixes
#1267 Added retrys to comms, fixed deadlocks in executor and order by. Improved logging and error management. Caches have names. Improved Joins
#1239 Reducing Memory pressure by moving shuffle data to cpu before transmission
#1278 Fix race conditions with UCX
#1279 Added cuml to powerpc build scripts
#1286 Fixes to initialization and adding unique ids to comms
#1255 Kernels are resilient to out of memory errors now and can retry tasks that fail this way
#1311 Add queries logger to physical plan
#1308 Improve the engine loggers
#1314 Added unit tests to verify that OOM error handling works well
#1320 Revamping cache logger
#1323 Made progress bar update continuously and stay after query is done
#1336 Improvements for the cache API
#1483 Improve dependencies script

Bug Fixes

#1249 Fix compilation with cuda 11
#1253 Fixed distribution so that its evenly distributes based of rowgroups
#1204 Reenable json parser
#1241 Fixed cython exception handling
#1243 Fixed wrong CHAR regex replacing
#1275 Fixed issue in StringUtil::findAndReplaceAll when there are several matches
#1277 Support FileSystems (GS, S3) when extension of the files are not provided
#1300 Fixed issue when creating tables from a local dir relative path
#1312 Fix progress bar for jupyterlab
#1318 Disabled require acknowledge

BlazingSQL 0.17.0 (December 10, 2020)

New Features

#1105 Implement to_date/to_timestamp functions
#1077 Allow to create tables from compressed files
#1126 Add DAYOFWEEK function
#981 Added powerPC building script and instructions
#912 Added UCX support to how the engine runs
#1125 Implement new TCP and UCX comms layer, exposed graph to python
#1122 Add ConfigOptionsTest, a test with different config_options values
#1110 Adding local logging directory to BlazingContext
#1148 Add e2e test for DAYOFWEEK
#1130 Infer hive folder partition
#1188 Implement upper/lower operators
#1193 Implement string REPLACE
#1218 Added smiles test set
#1201 Implement string TRIM
#1216 Add unit test for DAYOFWEEK
#1205 Implement string REVERSE
#1220 Implement string LEFT and RIGHT
#1223 Add support for UNION statement
#1250 updated README.md and CHANGELOG and others preparing for 0.17 release

Improvements

#878 Adding calcite rule for window functions. (Window functions not supported yet)
#1081 Add validation for the kwargs when bc API is called
#1082 Validate s3 bucket
#1093 Logs configurable to have max size and be rotated
#1091 Improves the error message problem when validating any GCP bucket
#1102 Add option to read csv files in chunks
#1090 Add tests for Uri Data provider for local uri
#1119 Add tests for transform json tree and get json plan
#1117 Add error logging in DataSourceSequence
#1111 output compile json for cppcheck
#1132 Refactoring new comms
#1078 Bump junit from 4.12 to 4.13.1 in /algebra
#1144 update with changes from main
#1156 Added scheduler file support for e2e testing framework
#1158 Deprecated bc.partition
#1154 Recompute the avg_bytes_per_row value
#1155 Removing comms subproject and cleaning some related code
#1170 Improve gpuCI scripts
#1194 Powerpc building scripts
#1186 Removing cuda labels to install due cudatoolkit version
#1187 Enable MySQL-specific SQL operators in addition to Standard and Oracle
#1206 Improved contribution documentation
#1224 Added cudaSetDevice to thread initialization so that the cuda context is available to UCX
#1229 Change hardcoded version from setup.py
#1231 Adding docker support for gpuCI scripts
#1248 Jenkins and Docker scripts were improved for building

Bug Fixes

#1064 Fixed issue when loading parquet files with local_files=True
#1086 Showing an appropriate error to indicate that we don't support opening directories with wildcards
#1088 Fixed issue caused by cudf changing from one .so file to multiple
#1094 Fixed logging directory setup
#1100 Showing an appropriate error for invalid or unsupported expressions on the logical plan
#1115 Fixed changes to RMM api using cuda_stream_view instead of cudaStream_t now
#1120 Fix missing valid kwargs in create_table
#1118 Fixed issue with config_options and adding local_files to valid params
#1133 Fixed adressing issue in float columns when parsing parquet metadata
#1163 added empty line to trigger build
#1108 Remove temp files when an error occurs
#1165 E2e tests, distributed mode, again tcp
#1171 Don't log timeout in output/input caches
#1168 Fix SSL errors for conda
#1164 MergeAggr when single node has multiple batches
#1191 Fix graph thread pool hang when exception is thrown
#1181 Remove unnecesary prints (cluster and logging info)
#1185 Create table in distributed mode crash with a InferFolderPartitionMetadata Error
#1179 Fix ignore headers when multiple CSV files was provided
#1199 Fix non thread-safe access to map containing tag to message_metadata for ucx
#1196 Fix column_names (table) always as list of string
#1203 Changed code back so that parquet is not read a single rowgroup at a time
#1207 Calcite uses literal as int32 if not explicit CAST was provided
#1212 Fixed issue when building the thirdpart, cmake version set to 3.18.4
#1225 Fixed issue due to change in gather API
#1254 Fixing support of nightly and stable on localhost
#1258 Fixing gtest version issue

BlazingSQL 0.16.0 (October 22, 2020)

Improvements

#997 Add capacity to set the transport memory
#1040 Update conda recipe, remove cxx11 abi from cmake
#977 Just one initialize() function at beginning and add logs related to allocation stuff
#1046 Make possible to read the system environment variables to set up BlazingContext
#998 Update TPCH queries, become implicit joins into implicit joins to avoid random values.
#1055 Removing cudf source code dependency as some cudf utilities headers were exposed
#1065 Remove thrift from build prodcess as its no longer used
#1067 Upload conda packages to both rapidsai and blazingsql conda channels

Bug Fixes

#918 Activate validation for GPU_CI tests results.
#975 Fixed issue due to cudf orc api change
#1017 Fixed issue parsing fixed with string literals
#1019 Fix hive string col
#1021 removed an rmm include
#1020 Fixed build issues with latest rmm 0.16 and columnBasisTest due to deprecated drop_column() function
#1029 Fix metadata mistmatch due to parsedMetadata
#1016 Removed workaround for parquet read schema
#1022 Fix pinned buffer pool
#1028 Match dtypes after create_table with multiple files
#1030 Avoid read _metadata files
#1039 Fixed issues with parsers, in particular ORC parser was misbehaving
#1038 Fixed issue with logging dirs in distributed envs
#1048 Pinned google cloud version to 1.16
#1052 Partial revert of some changes on parquet rowgroups flow with local_files=True
#1054 Can set manually BLAZING_CHACHE_DIRECTORY
#1053 Fixed issue when loading paths with wildcards
#1057 Fixed issue with concat all in concatenating cache
#1007 Fix arrow and spdlog compilation issues
#1068 Just adds a docs important links and avoid the message about filesystem authority not found
#1073 Fixed parseSchemaPython can throw exceptions
#1074 Remove lock inside grow() method from PinnedBufferProvider
#1071 Fix crash when loading an empty folder
#1085 Fixed intra-query memory leak in joins. Fixed by clearing array caches after PartwiseJoin is done
#1096 Backport from branch-0.17 with these PRs: #1094, #1086, #1093 and #1091
#1099 Fixed issue with config_options

BlazingSQL 0.15.0 (August 31, 2020)

New Features

#835 Added a memory monitor for better memory management and added pull ordered from cache
#889 Added Sphinx based code architecture documentation
#968 Support PowerPC architecture

Improvements

#777 Update Calcite to the most recent version 1.23
#786 Added check for concat String overflow
#815 Implemented Unordered pull from cache to help performance
#822 remove "from_cudf" code and cudf test utilities from engine code
#824 Added a test on Calcite to compare the logical plans when the ruleset is updated
#802 Support for timestampadd and constant expressions evaluation by Calcite
#849 Added check for CUDF_HOME to allow build to use an existing prebuilt cudf source tree
#829 Python/Cython check code style
#826 Support cross join
#866 Added nogil statements for pure C functions in Cython
#784 Updated set of TPCH queries on the E2E tests
#877 round robing dask workers on single gpu queries
#880 reraising query errors in context.py
#883 add rand() and running unary operations on literals
#894 added exhale to generate doxygen for sphinx docs
#887 concatenating cache improvement and replacing PartwiseJoin::load_set with a concatenating cache
#885 Added initial set of unit tests for WaitingQueue and nullptr checks around spdlog calls
#904 Added doxygen comments to CacheMachine.h
#901 Added more documentation about memory management
#910 updated readme
#915 Adding max kernel num threads pool
#921 Make AWS and GCS optional
#925 Replace random_generator with cudf::sample
#900 Added doxygen comments to some kernels and the batch processing
#936 Adding extern C for include files
#941 Logging level (flush_on) can be configurable
#947 Use default client and network interface from Dask
#945 Added new separate thresh for concat cache
#939 Add unit test for Project kernel
#949 Implemented using threadpool for outgoing messages
#961 Add list_tables() and describe_table() functions
#967 Add bc.get_free_memory() function

Bug Fixes

#774 fixed build issues with latest cudf 0.15 including updating from_cudf
#781 Fixed issue with Hive partitions when doing SELECT *
#754 Normalize columns before distribution in JoinPartitionKernel
#782 fixed issue with hive partitions base folder
#791 Fixes issues due to changes in rmm and fixes allocator issues
#770 Fix interops operators output types
#798 Fix when the algebra plan was provided using one-line as logical plan
#799 Fix uri values computacion in runQueryCaller
#792 Remove orc temp files when cached on Disk
#814 Fix when checking only Limit and Scan Kernels
#816 Loading one file at a time (LimitKernel and ScanKernel)
#832 updated calcite test reference
#834 Fixed small issue with hive and cudf_type_int_to_np_types
#839 Fixes literal cast
#838 Fixed issue with start and length of substring being different types
#823 Fixed issue on logical plans when there is an EXISTS clause
#845 Fixed issue with casting string to string
#850 Fixed issue with getTableScanInfoCaller
#851 Fix row_groups issue in ParquetParser.cpp
#847 Fixed issue with some constant expressions not evaluated by calcite
#875 Recovered some old unit tests and deleted obsolete unit tests
#879 Fixed issue with log directory creation in a distributed environment
#890 Fixed issue where we were including testing hpp in our code
#891 Fixed issue caused by replacing join load_set with concatenating cache
#902 Fixed optimization regression on the select count(*) case
#909 Fixed issue caused by using now arrow_io_source
#913 Fixed issues caused by cudf adding DECIMAL data type
#916 Fix e2e string comparison
#927 Fixed random segfault issue in parser
#929 Update the GPUManager functions
#942 Fix column names on sample function
#950 Introducing config param for max orderby samples and fixing oversampling
#952 Dummy PR
#957 Fixed issues caused by changes to timespamp in cudf
#962 Use new rmm API instead of get_device_resource() and set_device_resource() functions
#965 Handle exceptions from pool_threads
#963 Set log_level when using LOGGING_LEVEL param
#973 Fix how we check the existence of the JAVA_HOME environment variable

BlazingSQL 0.14.0 (June 9, 2020)

#391 Added the ability to run count distinct queries in a distruted fashion
#392 Remove the unnecessary messages on distributed mode
#560 Fixed bug where parsing errors would lead to crash
#565 made us have same behaviour as cudf for reading csv
#612 Print product version: print(blazingsql.version) # shows the git hash
#638 Refactores and fixes SortAndSample kernels
#631 Implemented ability to send config_options to bc.sql function
#621 Clean dead code
#602 Implements cache flow control feature
#625 Implement CAST to TINYINT and SMALLINT
#632 Implement CHAR_LENGTH function
#635 Handle behavior when the optimized plan contains a LogicalValues
#653 Handle exceptions on python side
#661 added hive support to parse_batch
#662 updated from_cudf code and fixed other issue due to new cudf::list_view
#674 Allow to define and use a specific AWS S3 region
#677 added guava to pom.xml
#679 Support modern compilers (>= g++-7.x)
#649 Adding event logging
#660 Changed how we handle the partitions of a dask.cudf.DataFrame
#697 Update expression parser
#659 Improve reading for: SELECT * FROM table LIMIT N
#700 Support null column in projection
#711 Migrate end to end tests into blazingsql repo
#718 Changed all condition variable waits to wait_for
#712 fixed how we handle empty tables for estimate for small table join
#724 Removed unused BlazingThread creations
#725 Added nullptr check to num_rows()
#729 Fixed issue with num_rows() and wait_for
#728 Add replace_calcite_regex function to the join condition
#721 Handling multi-partition output
#750 Each table scan now has its own data loader
#740 Normalizing types for UNION ALL
#744 Fix unit tests
#743 Workaround for interops 64 index plan limitation
#763 Implemented ability to set the folder for all log files
#757 Ensure GPU portability (so we can run on any cloud instance with GPU)
#753 Fix for host memory threshold parameter with Dask envs
#801 Fix build with new cudf 0.15 and arrow 0.17.1
#809 Fix conda build issues
#828 Fix gpuci issues and improve tooling to debug gpuci related issues
#867 Fix boost dependencie issues
#785 Add script for Manual Testing Artifacts.
#931 Add script for error messages validation.
#932 Import pydrill and pyspark only when its generator or full mode.
#1031 adding notebooks into BlazingSQL Tests
#1486 Define generic templates for E2E Testing framework.
#1542 Cleaning code on E2E Test framework.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

BlazingSQL 21.08.00 (TBD)

New Features

Improvements

Bug Fixes

BlazingSQL 21.06.00 (June 10th, 2021)

New Features

Improvements

Bug Fixes

BlazingSQL 0.19.0 (April 21, 2021)

New Features

Improvements

Bug Fixes

Deprecated Features

BlazingSQL 0.18.0 (February 24, 2021)

New Features

Improvements

Bug Fixes

BlazingSQL 0.17.0 (December 10, 2020)

New Features

Improvements

Bug Fixes

BlazingSQL 0.16.0 (October 22, 2020)

Improvements

Bug Fixes

BlazingSQL 0.15.0 (August 31, 2020)

New Features

Improvements

Bug Fixes

BlazingSQL 0.14.0 (June 9, 2020)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

BlazingSQL 21.08.00 (TBD)

New Features

Improvements

Bug Fixes

BlazingSQL 21.06.00 (June 10th, 2021)

New Features

Improvements

Bug Fixes

BlazingSQL 0.19.0 (April 21, 2021)

New Features

Improvements

Bug Fixes

Deprecated Features

BlazingSQL 0.18.0 (February 24, 2021)

New Features

Improvements

Bug Fixes

BlazingSQL 0.17.0 (December 10, 2020)

New Features

Improvements

Bug Fixes

BlazingSQL 0.16.0 (October 22, 2020)

Improvements

Bug Fixes

BlazingSQL 0.15.0 (August 31, 2020)

New Features

Improvements

Bug Fixes

BlazingSQL 0.14.0 (June 9, 2020)