Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async datastream #2786

Merged
merged 7 commits into from Nov 15, 2021
Merged

Async datastream #2786

merged 7 commits into from Nov 15, 2021

Conversation

sundy-li
Copy link
Member

@sundy-li sundy-li commented Nov 14, 2021

I hereby agree to the terms of the CLA available at: https://databend.rs/policies/cla/

Summary

Summary about this PR

  • Migrate BlockStream to SendableDataBlockStream.
  • Remove SendableBlockStream inside the insert_plan and migrate it into the executor.
  • introduce async-stream crate to streams/Source.
  • Make CSV/Parquet read in async mod.
  • Make Parquet table support projection.
  • Add config storage.disk.temp_data_path for external temp data path.

Performance:

## generate test datas
echo "select rand(), toFloat32( sin(rand() ) ) , toString( cos(rand())) from numbers(1000000) format CSV" |  clickhouse-client  --host 127.0.0.1 --port 9999 > aa.csv


CREATE TABLE test
(
    a Int64,
    b Float32,
    c String
) ENGINE = CSV location =  '/Users/sundy/dataset/aa.csv';


set max_threads = 1;

Before:
mysql> select sum(a) from test;
+------------------+
| sum(a)           |
+------------------+
| 2147676606160454 |
+------------------+
1 row in set (0.45 sec)
Read 1000000 rows, 38.64 MB in 0.385 sec., 2.6 million rows/sec., 100.46 MB/sec.


After:
mysql> select sum(a) from test;
+------------------+
| sum(a)           |
+------------------+
| 2147676606160454 |
+------------------+
1 row in set (0.30 sec)
Read 1000000 rows, 38.64 MB in 0.248 sec., 4.04 million rows/sec., 155.99 MB/sec.


Changelog

  • Improvement
  • Performance Improvement

Related Issues

Fixes #2787 #2730

Test Plan

Unit Tests

Stateless Tests

@databend-bot
Copy link
Member

Thanks for the contribution!
I have applied any labels matching special text in your PR Changelog.

Please review the labels and make any necessary changes.

@codecov-commenter
Copy link

codecov-commenter commented Nov 14, 2021

Codecov Report

Merging #2786 (80602f6) into main (8d541a9) will decrease coverage by 0%.
The diff coverage is 91%.

Impacted file tree graph

@@          Coverage Diff          @@
##            main   #2786   +/-   ##
=====================================
- Coverage     69%     69%   -1%     
=====================================
  Files        601     600    -1     
  Lines      32728   32797   +69     
=====================================
+ Hits       22848   22867   +19     
- Misses      9880    9930   +50     
Impacted Files Coverage Δ
common/datavalues/src/types/serializations/mod.rs 27% <ø> (ø)
query/src/catalogs/table.rs 66% <0%> (-1%) ⬇️
query/src/configs/config_test.rs 98% <ø> (ø)
.../src/datasources/database/example/example_table.rs 0% <0%> (ø)
query/src/datasources/table/csv/csv_table_test.rs 90% <ø> (ø)
...uery/src/datasources/table/fuse/io/block_reader.rs 78% <ø> (ø)
query/src/interpreters/interpreter.rs 0% <ø> (ø)
query/src/interpreters/interpreter_kill.rs 0% <0%> (ø)
common/dal/src/impls/local.rs 76% <50%> (+<1%) ⬆️
common/planners/src/plan_read_datasource.rs 80% <72%> (-1%) ⬇️
... and 69 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8d541a9...80602f6. Read the comment docs.

@sundy-li sundy-li marked this pull request as ready for review November 15, 2021 05:15
@databend-bot
Copy link
Member

Wait for another reviewer approval

@BohuTANG BohuTANG merged commit 8b48024 into datafuselabs:main Nov 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make csv/parquet engine io read works in async mode
5 participants