Skip to content

Commit

Permalink
Merge pull request #1593 from BlazingDB/branch-21.08
Browse files Browse the repository at this point in the history
21.08 Release
  • Loading branch information
wmalpica committed Aug 16, 2021
2 parents 95ff589 + 1e59abc commit 15c503b
Show file tree
Hide file tree
Showing 34 changed files with 424 additions and 190 deletions.
48 changes: 32 additions & 16 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,24 @@
# BlazingSQL 21.06.00 (June 10th, 2021)
# BlazingSQL 21.08.00 (August 12th, 2021)

## Improvements
- #1571 Update ucx-py versions to 0.21
- #1554 return ok for filesystems
- #1572 Setting up default value for max_bytes_chunk_read to 256 MB

## Bug Fixes
- #1570 Fix build due to changes in rmm device buffer
- #1574 Fix reading decimal columns from orc file
- #1576 Fix `CC`/`CXX` variables in CI
- #1581 Fix latest cudf dependencies
- #1582 Fix concat suite E2E test for nested calls
- #1585 Fix for GCS credentials from filepath
- #1589 Fix decimal support using float64
- #1590 Fix build issue with thrust package

# BlazingSQL 21.06.00 (June 10th, 2021)

## New Features
- #1471 Unbounded partitioned windows
- #1471 Unbounded partitioned windows
- #1445 Support for CURRENT_DATE, CURRENT_TIME and CURRENT_TIMESTAMP
- #1505 Support for right outer join
- #1523 Support for DURATION type
Expand All @@ -22,7 +38,7 @@
- #1455 Support for IS NOT FALSE condition
- #1502 Fix IS NOT DISTINCT FROM with joins
- #1475 Fix wrong results from timestampdiff/add
- #1528 Fixed build issues due to cudf aggregation API change
- #1528 Fixed build issues due to cudf aggregation API change
- #1540 Comparing param set to true for e2e
- #1543 Enables provider unit_tests
- #1548 Fix orc statistic building
Expand All @@ -37,15 +53,15 @@
## New Features
- #1367 OverlapAccumulator Kernel
- #1364 Implement the concurrent API (bc.sql with token, bc.status, bc.fetch)
- #1426 Window Functions without partitioning
- #1426 Window Functions without partitioning
- #1349 Add e2e test for Hive Partitioned Data
- #1396 Create tables from other RDBMS
- #1427 Support for CONCAT alias operator
- #1424 Add get physical plan with explain
- #1472 Implement predicate pushdown for data providers

## Improvements
- #1325 Refactored CacheMachine.h and CacheMachine.cpp
- #1325 Refactored CacheMachine.h and CacheMachine.cpp
- #1322 Updated and enabled several E2E tests
- #1333 Fixing build due to cudf update
- #1344 Removed GPUCacheDataMetadata class
Expand All @@ -54,7 +70,7 @@
- #1331 Added flag to enable null e2e testing
- #1418 Adding support for docker image
- #1434 Added documentation for C++ and Python in Sphinx
- #1419 Added concat cache machine timeout
- #1419 Added concat cache machine timeout
- #1444 Updating GCP to >= version
- #1349 Add e2e test for Hive Partitioned Data
- #1447 Improve getting estimated output num rows
Expand All @@ -71,18 +87,18 @@
- #1350 Fixed bug where there are no projects in a bindable table scan
- #1359 Avoid cuda issues when free pinned memory
- #1365 Fixed build after sublibs changes on cudf
- #1369 Updated java path for powerpc build
- #1369 Updated java path for powerpc build
- #1371 Fixed e2e settings
- #1372 Recompute `columns_to_hash` in DistributeAggregationKernel
- #1375 Fix empty row_group_ids for parquet
- #1380 Fixed issue with int64 literal values
- #1380 Fixed issue with int64 literal values
- #1379 Remove ProjectRemoveRule
- #1389 Fix issue when CAST a literal
- #1387 Skip getting orc metadata for decimal type
- #1392 Fix substrings with nulls
- #1398 Fix performance regression
- #1401 Fix support for minus unary operation
- #1415 Fixed bug where num_batches was not getting set in BindableTableScan
- #1415 Fixed bug where num_batches was not getting set in BindableTableScan
- #1413 Fix for null tests 13 and 23 of windowFunctionTest
- #1416 Fix full join when both tables contains nulls
- #1423 Fix temporary directory for hive partition test
Expand All @@ -95,7 +111,7 @@
- #1504 Fixing some conflicts in Dockerfile

## Deprecated Features
- #1394 Disabled support for outer joins with inequalities
- #1394 Disabled support for outer joins with inequalities

# BlazingSQL 0.18.0 (February 24, 2021)

Expand All @@ -108,7 +124,7 @@
- #1238 Implements MergeStramKernel executor model
- #1259 Implements SortAndSamplernel executor model, also avoid setting up num of samples
- #1271 Added Hive utility for partitioned data
- #1289 Multiple concurrent query support
- #1289 Multiple concurrent query support
- #1285 Infer PROTOCOL when Dask client is passed
- #1294 Add config options for logger
- #1301 Added usage of pinned buffers for communication and fixes various UCX related bugs
Expand All @@ -117,7 +133,7 @@
- #1303 Add support for INITCAP
- #1313 getting and using ORC metadata
- #1347 Fixing issue when reading orc metadata from DATE dtype
- #1338 Window Function support for LEAD and LAG statements
- #1338 Window Function support for LEAD and LAG statements
- #1362 give useful message when file extension is not recognized
- #1361 Supporting first_value and last_value for Window Function

Expand All @@ -140,7 +156,7 @@
- #1308 Improve the engine loggers
- #1314 Added unit tests to verify that OOM error handling works well
- #1320 Revamping cache logger
- #1323 Made progress bar update continuously and stay after query is done
- #1323 Made progress bar update continuously and stay after query is done
- #1336 Improvements for the cache API
- #1483 Improve dependencies script

Expand All @@ -154,7 +170,7 @@
- #1277 Support FileSystems (GS, S3) when extension of the files are not provided
- #1300 Fixed issue when creating tables from a local dir relative path
- #1312 Fix progress bar for jupyterlab
- #1318 Disabled require acknowledge
- #1318 Disabled require acknowledge

# BlazingSQL 0.17.0 (December 10, 2020)

Expand All @@ -175,7 +191,7 @@
- #1201 Implement string TRIM
- #1216 Add unit test for DAYOFWEEK
- #1205 Implement string REVERSE
- #1220 Implement string LEFT and RIGHT
- #1220 Implement string LEFT and RIGHT
- #1223 Add support for UNION statement
- #1250 updated README.md and CHANGELOG and others preparing for 0.17 release

Expand Down Expand Up @@ -234,7 +250,7 @@
- #1203 Changed code back so that parquet is not read a single rowgroup at a time
- #1207 Calcite uses literal as int32 if not explicit CAST was provided
- #1212 Fixed issue when building the thirdpart, cmake version set to 3.18.4
- #1225 Fixed issue due to change in gather API
- #1225 Fixed issue due to change in gather API
- #1254 Fixing support of nightly and stable on localhost
- #1258 Fixing gtest version issue

Expand Down
15 changes: 8 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,14 +117,14 @@ This is the recommended way of building all of the BlazingSQL components and dep
```bash
conda create -n bsql python=$PYTHON_VERSION
conda activate bsql
./dependencies.sh 21.06 $CUDA_VERSION
./dependencies.sh 21.08 $CUDA_VERSION
```
Where $CUDA_VERSION is is 10.1, 10.2 or 11.0 and $PYTHON_VERSION is 3.7 or 3.8
*For example for CUDA 10.1 and Python 3.7:*
Where $CUDA_VERSION is is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8
*For example for CUDA 11.2 and Python 3.7:*
```bash
conda create -n bsql python=3.7
conda activate bsql
./dependencies.sh 21.06 10.1
./dependencies.sh 21.08 11.2
```

### Build
Expand All @@ -149,14 +149,14 @@ For nightly version cuda 11+ are only supported, see https://github.com/rapidsai
```bash
conda create -n bsql python=$PYTHON_VERSION
conda activate bsql
./dependencies.sh 21.08 $CUDA_VERSION nightly
./dependencies.sh 21.10 $CUDA_VERSION nightly
```
Where $CUDA_VERSION is 11.0 or 11.2 and $PYTHON_VERSION is 3.7 or 3.8
Where $CUDA_VERSION is 11.0, 11.2 or 11.4 and $PYTHON_VERSION is 3.7 or 3.8
*For example for CUDA 11.2 and Python 3.8:*
```bash
conda create -n bsql python=3.8
conda activate bsql
./dependencies.sh 21.08 11.2 nightly
./dependencies.sh 21.10 11.2 nightly
```

### Build
Expand Down Expand Up @@ -240,3 +240,4 @@ The RAPIDS suite of open source software libraries aim to enable execution of en
## Apache Arrow on GPU

The GPU version of [Apache Arrow](https://arrow.apache.org/) is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.

2 changes: 1 addition & 1 deletion build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -95,7 +95,7 @@ fi
# Get version number
export GIT_DESCRIBE_TAG=`git describe --tags`
export MINOR_VERSION=`echo $GIT_DESCRIBE_TAG | grep -o -E '([0-9]+\.[0-9]+)'`
export UCX_PY_VERSION="0.20"
export UCX_PY_VERSION="0.21"

# Process flags
if hasArg -v; then
Expand Down
2 changes: 1 addition & 1 deletion ci/checks/style.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ LC_ALL=C.UTF-8
LANG=C.UTF-8

# Activate common conda env
source activate gdf
source activate rapids

# Run isort and get results/return code
# TODO: cordova in a near future consider hive.py and context.py
Expand Down
4 changes: 2 additions & 2 deletions ci/cpu/build.sh
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,8 @@ conda activate rapids

gpuci_logger "Check versions"
python --version
gcc --version
g++ --version
$CC --version
$CXX --version

gpuci_logger "Conda Information"
conda info
Expand Down
9 changes: 6 additions & 3 deletions conda/recipes/blazingsql/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ build:
- PARALLEL_LEVEL
- CUDA_VERSION
- CUDACXX
- CUDAHOSTCXX
- CC
- CXX

requirements:
build:
Expand All @@ -36,7 +39,7 @@ requirements:
- openjdk >=8.0, <9.0
- maven
- cudf {{ minor_version }}.*
- ucx-py 0.20.*
- ucx-py 0.21.*
- ucx-proc=*=gpu
- boost-cpp 1.72.0
- dlpack
Expand All @@ -55,7 +58,7 @@ requirements:
- {{ pin_compatible('zeromq', max_pin='x.x.x') }}
- dask-cudf {{ minor_version }}.*
- dask-cuda {{ minor_version }}.*
- ucx-py 0.20.*
- ucx-py 0.21.*
- ucx-proc=*=gpu
- {{ pin_compatible('cudatoolkit', max_pin='x.x') }}
- tqdm
Expand All @@ -77,4 +80,4 @@ about:
home: http://www.blazingsql.com/
license: Apache-2.0
license_family: Apache
summary: GPU-powered distributed SQL engine in Python
summary: GPU-powered distributed SQL engine in Python
8 changes: 4 additions & 4 deletions dependencies.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,8 @@ BOLDGREEN="\e[1;${GREEN}"
ITALICRED="\e[3;${RED}"
ENDCOLOR="\e[0m"

RAPIDS_VERSION="21.06"
UCX_PY_VERSION="0.20"
RAPIDS_VERSION="21.08"
UCX_PY_VERSION="0.21"
CUDA_VERSION="11.0"
CHANNEL=""

Expand All @@ -28,9 +28,9 @@ if [ ! -z $3 ]; then
fi

echo -e "${GREEN}Installing dependencies${ENDCOLOR}"
conda install --yes -c conda-forge spdlog'>=1.8.5,<2.0.0a0' google-cloud-cpp=1.25 ninja mysql-connector-cpp=8.0.23 libpq=13 nlohmann_json=3.9.1
conda install --yes -c conda-forge spdlog'>=1.8.5,<2.0.0a0' google-cloud-cpp'>=1.25' ninja mysql-connector-cpp=8.0.23 libpq=13 nlohmann_json=3.9.1
# NOTE cython must be the same of cudf (for 0.11 and 0.12 cython is >=0.29,<0.30)
conda install --yes -c conda-forge cmake=3.18 gtest==1.10.0=h0efe328_4 gmock cppzmq cython=0.29 openjdk=8.0 maven jpype1 netifaces pyhive pytest tqdm ipywidgets boost-cpp=1.72.0
conda install --yes -c conda-forge cmake=3.18 gtest==1.10.0=h0efe328_4 gmock cppzmq cython=0.29 openjdk'>=8.0,<9.0' maven jpype1 netifaces pyhive pytest tqdm ipywidgets boost-cpp=1.72.0


echo -e "${GREEN}Install RAPIDS dependencies${ENDCOLOR}"
Expand Down
2 changes: 1 addition & 1 deletion docsrc/source/Doxyfile
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ PROJECT_NAME = "BlazingSQL Engine"
# could be handy for archiving the generated documentation or if some version
# control system is used.

PROJECT_NUMBER = 21.06
PROJECT_NUMBER = 21.08

# Using the PROJECT_BRIEF tag one can provide an optional one line description
# for a project that appears at the top of each page and should give viewer a
Expand Down
2 changes: 1 addition & 1 deletion docsrc/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@
language = "en"

# The full version, including alpha/beta/rc tags
version = '21.06'
version = '21.08'
release = f'v{version}'

# -- General configuration ---------------------------------------------------
Expand Down
1 change: 1 addition & 0 deletions engine/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,7 @@ def build_extensions(self):
"-Wno-unknown-pragmas",
"-Wno-unused-variable",
"-Wno-unused-function",
"-DTHRUST_IGNORE_CUB_VERSION_CHECK",
'-isystem' + conda_env_inc,
'-isystem' + conda_env_inc_cudf,
'-isystem' + conda_env_inc_cub,
Expand Down
4 changes: 3 additions & 1 deletion engine/src/blazing_table/BlazingHostTable.cpp
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
#include "BlazingHostTable.h"
#include "bmr/BlazingMemoryResource.h"
#include "bmr/BufferProvider.h"
#include "rmm/cuda_stream_view.hpp"
#include "rmm/device_buffer.hpp"
#include "communication/CommunicationInterface/serializer.hpp"

using namespace fmt::literals;
Expand Down Expand Up @@ -83,7 +85,7 @@ std::unique_ptr<BlazingTable> BlazingHostTable::get_gpu_table() const {
try{
int buffer_index = 0;
for(auto & chunked_column_info : chunked_column_infos){
gpu_raw_buffers[buffer_index].resize(chunked_column_info.use_size);
gpu_raw_buffers[buffer_index].resize(chunked_column_info.use_size, rmm::cuda_stream_view{});
size_t position = 0;
for(size_t i = 0; i < chunked_column_info.chunk_index.size(); i++){
size_t chunk_index = chunked_column_info.chunk_index[i];
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ namespace comm {

std::unique_ptr<ral::frame::BlazingTable> deserialize_from_gpu_raw_buffers(
const std::vector<blazingdb::transport::ColumnTransport> & columns_offsets,
const std::vector<rmm::device_buffer> & raw_buffers,
std::vector<rmm::device_buffer> & raw_buffers,
cudaStream_t stream) {
size_t num_columns = columns_offsets.size();
std::vector<std::unique_ptr<cudf::column>> received_samples(num_columns);
Expand All @@ -35,7 +35,7 @@ std::unique_ptr<ral::frame::BlazingTable> deserialize_from_gpu_raw_buffers(
std::move(raw_buffers[columns_offsets[i].strings_data]));
rmm::device_buffer null_mask;
if(columns_offsets[i].strings_nullmask != -1)
null_mask = rmm::device_buffer(std::move(raw_buffers[columns_offsets[i].strings_nullmask]));
null_mask = std::move(raw_buffers[columns_offsets[i].strings_nullmask]);

cudf::size_type null_count = columns_offsets[i].metadata.null_count;
auto unique_column = cudf::make_strings_column(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ namespace comm {
*/
std::unique_ptr<ral::frame::BlazingTable> deserialize_from_gpu_raw_buffers(
const std::vector<blazingdb::transport::ColumnTransport> & columns_offsets,
const std::vector<rmm::device_buffer> & raw_buffers,
std::vector<rmm::device_buffer> & raw_buffers,
cudaStream_t stream = 0);

} // namespace comm
Loading

0 comments on commit 15c503b

Please sign in to comment.