Skip to content

Commit

Permalink
Backport #12013 to 20.5: Fix array size overflow in generateRandom (#…
Browse files Browse the repository at this point in the history
…12249)

* change used flag

* fix address formatting

* use the sentry logger hook

* Add concurrent benchmark to performance test

After the main test, run queries from `website.xml` in parallel using
`clickhouse-benchmark`. This can be useful to test the effects of
concurrency on performance. Comparison test can miss some effects
because it always runs queries sequentially, and many of them are even
single-threaded.

* fixes

* experiment

* typo

* trigger ci

* fixup

* add flag to continue on errors

* trigger ci

* trigger ci

* trigger ci

* experiment

* trigger ci

* fixup

* fixup

* Update CMakeLists.txt

* Update CMakeLists.txt

* Update http_server.py

* make max global thread pool setting configurable

This PR adds a server level config for overriding the default max number
of threads in global thread pool that is currently allowed (10,000).

This might be useful in scenarios where there are a large number of
distributed queries that are executing concurrently and where the
default number of max threads might not be necessarily be sufficient.

* add max_thread_pool_size setting to tests

This adds the `max_thread_pool_size` config to
tests/server-test.xml file.

* docs for max_thread_pool_size

This adds the docs for the new server level setting
`max_thread_pool_size`.

* remove extra vertical space

* fixup

* improvements after review comments

* move the default endpoint to config

* remove extra line

* distinct combinator for single numeric arguments

* rework distinct combinator

* fix build

* Disable optimize_skip_unused_shards if sharding_key has non-deterministic func

Example of such functions is rand()

And this patch disables only optimize_skip_unused_shards, i.e. INSERT
code path does not changed, so it will work as before.

* optimize_skip_unused_shards=2 will disable it for nested distributed queries

P.S. Looks like settings can be converted between SettingUInt64 and
SettingBool without breaking binary protocol.

FWIW maybe it is a good idea to change the semantics of the settings as
follow (but I guess that changing semantic is not a good idea, better to
add new settings and deprecate old ones):
- optimize_skip_unused_shards -- accept nesting level on which the
  optimization will work
- force_skip_optimize_shards_nesting -- accept nesting level on which
  the optimization will work

* Add logging of adjusting conditional settings for distributed queries

* Add missing DROP TABLE in 01319_mv_constants_bug

* Improve 01319_optimize_skip_unused_shards_no_nested

Before there is no check that optimize_skip_unused_shards was working
for the first level, use cluster with unavalable shard to guarantee
this.

* Update 00816_long_concurrent_alter_column.sh

* distinct combinator for function of multiuple arguments

* report the number of errors

* fixup

* fixup

* Add settings to control nesting level for shards skipping optimization

- optimize_skip_unused_shards_nesting (allows control nesting level for
  shards skipping optimization)
- force_skip_optimize_shards_nesting (allows control nesting level for
  checking was shards skipped or not)
- deprecates force_skip_optimize_shards_no_nested

* fix review

* fix

* logger

* Update executeQuery.cpp

* restore immediate stacktrace output

* Support parse uuid without separator

* try fix build failure

* report

* fix read extra bytes when with separator

* add perf test for distinct combinator

* more optimal aggregate functions with both 'if' and 'distinct' combinators

* try fix test failure

* Initialize GlobalThreadPool explicitly

* apply review suggestions

* fixup

* memory usage settings

* allow implicit initialization

* Fix documentation

* Fix anotehr one nullable prewhere column.

* fix style check - removed unused LOGICAL_ERROR

* restore old toStringImpl signature

* send build_id to sentry as well + fix some minor issues

* remove unused imports

* clang format file

* construct path using boost::filesystem::path

* cleanup few unused headers

The following files had some unused headers and caught my eyes, so
cleaning them up:

```bash
programs/obfuscator/Obfuscator.cpp
src/Databases/DatabaseAtomic.cpp
```

* Change MySQL global variables query to globalVariable function

* add column header to test

* lost unused

* [experiment] maybe fix warnings in integration tests

* fix arcadia build

* Fix JDBC @@session.variables

* Remove already fixed bug

* Slightly modernize code around ZooKeeper

* Fix @@session.variables AS

* try to rename py files to less common names

* add __init__.py

* update SELECT description (#11907)

Signed-off-by: Slach <bloodjazman@gmail.com>

* [docs] extra F.A.Q. content (#11898)

* more content for F.A.Q.

* more content

* normalize

* more content

* maybe fix the docs check

* absolute img urls

* fix tests

* Keep alias after substitution of query parameters #11914

* show build commands

* Added a test

* Add SELECT @@Version fake value '5.7.30' #11089

* fix + bump tests

* Fix unitialized memory

* Fix using current database while checking access rights.

* fix style check - clang format for lambdas

* Better github-hook

* Do not update PVS Studio docker automatically.

They keep removing the old image every couple of weeks, and the entire
docker update fails.

* bump CI

* for the love of God just update this container already, i'm begging you

* Update Dockerfile

* Update pvs-studio version

* Fix jemalloc under OSX (by registering it as default zone explicitly)

In case of OSX jemalloc register itself as a default zone allocator.

But when you link statically then zone_register() will not be called,
and even will be optimized out:

  $ nm clickhouse.patched  | grep -c zone_register
  0

Fix this, by manually calling it.

v2: extern C

* Rewrite arithmetic in aggregate functions optimisation (#11899)

* remove clang-format tag comments

* Put clickhouse-local data to /tmp by default

This fixes #9848
Also fixes #11926

* trigger ci

* fix some logical errors

* Update features.html

* Consider allocatedBytes() instead of bytes() in Storage{Buffer,Memory}.

* fix query parameter visitor

* fixpu

* Update 00945_bloom_filter_index.sql

* Less layers in docker file

* Less layers in docker file

* Update docker/test/performance-comparison/perf.py

* rename test back to test.py

* client exit with unrecognized arguments

format code

fix

* remove useless const

* save

* Update canonized values in 00753_system_columns_and_system_tables.

* add ssd to integration test

* one more test

* Fix one reason of test flakiness

* Fix race condition in extractAllGroups

* extract JOIN in own plan step

* fix mistake reported by @Slach

* remove useless logic

* fix crash

* Revive mmap IO

* More verbose CMake in build docker

* Avro UUID support

* remove a trick with expression continuation

* Respect direct_io/mmap settings while reading secondary indices

* Minor modification

* Added a comment #11949

* Added a test

* Add CANNOT_PARSE_UUID extern const

* Fix estimation of the number of marks for various thresholds

* better ExpressionAction::execute()

* performance comparison

* add page fault perf events

* better error messages

* Add CPU frequencies to system.asynchronous_metrics

* Update AsynchronousMetrics.cpp

* keep ArrayJoin optimisation

* Update path to performance tests build

* tests with distributed

* place left join keys in before_join actions

* Added tests for #8692

* review fixes

* fix totals

* Update path to perf test package

* use std::filesystem::path

* Minor cleanup in Client.cpp before fuzzing

* Extend word break config to all non-alphanumeric chars

* Update AsynchronousMetrics.h

* more cleanup

* add test

update test

fix

* remove unused imports + bump tests

* Update 01345_array_join_LittleMaverick.sql

* Update ExpressionActions.h

* Update system.md (#11945)

* Update system.md

Translate the doc to Chinese

* Update system.md

* fix the title's translattion error (#11939)

@zhang2014 I use MergeTree replaced  Chinese version "合并“, and fix an error.

* Update replxx submodule

* Add a test for #10102

* disable tests in arcadia

* Added a test from #5131

* performance comparison

* Support Enums type for MySQL engine #3985

* performance comparison

* Add integration test for MySQL enums type

* Fix bad log message at server startup

* Insert enum column values for test

* Better diagnostics of "Replica {} appears to be already active" message

* Fix typos

* Added a test for #8550

* Update extended-roadmap.md

* Update test

* Fix FPE, step 1

* Make it more correct

* Add a test

* update

* Revert "[experiment] maybe fix warnings in integration tests"

* Add xeus-clickhouse (#12010)

xeus-clickhouse is a Jupyter kernal for ClickHouse

* Fix array size overflow in generateRandom

* Added a test

* Update ErrorCodes.cpp

* More hardening

* More hardening

Co-authored-by: Ivan Blinkov <github@blinkov.ru>
Co-authored-by: Alexander Kuzmenkov <akuzm@yandex-team.ru>
Co-authored-by: alexey-milovidov <milovidov@yandex-team.ru>
Co-authored-by: Bharat Nallan <bharatnc@gmail.com>
Co-authored-by: Anton Popov <pad11rus@gmail.com>
Co-authored-by: Azat Khuzhin <a3at.mail@gmail.com>
Co-authored-by: Nikita Vasilev <vasnikserg@yandex.ru>
Co-authored-by: zhang2014 <coswde@gmail.com>
Co-authored-by: Nikolai Kochetov <nik-kochetov@yandex-team.ru>
Co-authored-by: BohuTANG <overred.shuttler@gmail.com>
Co-authored-by: Nikolai Kochetov <KochetovNicolai@users.noreply.github.com>
Co-authored-by: Eugene Klimov <bloodjazman@gmail.com>
Co-authored-by: Alexander Kuzmenkov <36882414+akuzm@users.noreply.github.com>
Co-authored-by: Nikita Mikhaylov <mikhaylovnikitka@gmail.com>
Co-authored-by: alesapin <alesapin@gmail.com>
Co-authored-by: Vitaly Baranov <vitbar@yandex-team.ru>
Co-authored-by: Artem Zuikov <chertus@gmail.com>
Co-authored-by: Alexander Tokmakov <avtokmakov@yandex-team.ru>
Co-authored-by: Maxim Akhmedov <max42@yandex-team.ru>
Co-authored-by: feng lv <fenglv15@mails.ucas.ac.cn>
Co-authored-by: Nikita Mikhailov <jakalletti@jakalletti-build.sas.yp-c.yandex.net>
Co-authored-by: Andrew Onyshchuk <andryk.rv@gmail.com>
Co-authored-by: Tom Bombadil <565258751@qq.com>
Co-authored-by: Yuntao Wu <wuyuntao2015@163.com>
Co-authored-by: Wang Fenjin <wangfenj@gmail.com>
  • Loading branch information
1 parent 67964e8 commit 9025fd0
Show file tree
Hide file tree
Showing 4 changed files with 26 additions and 5 deletions.
17 changes: 13 additions & 4 deletions src/Columns/ColumnArray.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,6 @@ namespace DB

namespace ErrorCodes
{
extern const int ILLEGAL_COLUMN;
extern const int NOT_IMPLEMENTED;
extern const int BAD_ARGUMENTS;
extern const int PARAMETER_OUT_OF_BOUND;
Expand All @@ -38,8 +37,18 @@ namespace ErrorCodes
ColumnArray::ColumnArray(MutableColumnPtr && nested_column, MutableColumnPtr && offsets_column)
: data(std::move(nested_column)), offsets(std::move(offsets_column))
{
if (!typeid_cast<const ColumnOffsets *>(offsets.get()))
throw Exception("offsets_column must be a ColumnUInt64", ErrorCodes::ILLEGAL_COLUMN);
const ColumnOffsets * offsets_concrete = typeid_cast<const ColumnOffsets *>(offsets.get());

if (!offsets_concrete)
throw Exception("offsets_column must be a ColumnUInt64", ErrorCodes::LOGICAL_ERROR);

size_t size = offsets_concrete->size();
if (size != 0 && nested_column)
{
/// This will also prevent possible overflow in offset.
if (nested_column->size() != offsets_concrete->getData()[size - 1])
throw Exception("offsets_column has data inconsistent with nested_column", ErrorCodes::LOGICAL_ERROR);
}

/** NOTE
* Arrays with constant value are possible and used in implementation of higher order functions (see FunctionReplicate).
Expand All @@ -51,7 +60,7 @@ ColumnArray::ColumnArray(MutableColumnPtr && nested_column)
: data(std::move(nested_column))
{
if (!data->empty())
throw Exception("Not empty data passed to ColumnArray, but no offsets passed", ErrorCodes::ILLEGAL_COLUMN);
throw Exception("Not empty data passed to ColumnArray, but no offsets passed", ErrorCodes::LOGICAL_ERROR);

offsets = ColumnOffsets::create();
}
Expand Down
13 changes: 12 additions & 1 deletion src/Storages/StorageGenerateRandom.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ namespace ErrorCodes
{
extern const int NOT_IMPLEMENTED;
extern const int NUMBER_OF_ARGUMENTS_DOESNT_MATCH;
extern const int TOO_LARGE_ARRAY_SIZE;
extern const int TOO_LARGE_STRING_SIZE;
}


Expand Down Expand Up @@ -387,6 +389,16 @@ StorageGenerateRandom::StorageGenerateRandom(const StorageID & table_id_, const
UInt64 max_array_length_, UInt64 max_string_length_, std::optional<UInt64> random_seed_)
: IStorage(table_id_), max_array_length(max_array_length_), max_string_length(max_string_length_)
{
static constexpr size_t MAX_ARRAY_SIZE = 1 << 30;
static constexpr size_t MAX_STRING_SIZE = 1 << 30;

if (max_array_length > MAX_ARRAY_SIZE)
throw Exception(ErrorCodes::TOO_LARGE_ARRAY_SIZE, "Too large array size in GenerateRandom: {}, maximum: {}",
max_array_length, MAX_ARRAY_SIZE);
if (max_string_length > MAX_STRING_SIZE)
throw Exception(ErrorCodes::TOO_LARGE_STRING_SIZE, "Too large string size in GenerateRandom: {}, maximum: {}",
max_string_length, MAX_STRING_SIZE);

random_seed = random_seed_ ? sipHash64(*random_seed_) : randomSeed();
setColumns(columns_);
}
Expand Down Expand Up @@ -420,7 +432,6 @@ void registerStorageGenerateRandom(StorageFactory & factory)
if (engine_args.size() == 3)
max_array_length = engine_args[2]->as<const ASTLiteral &>().value.safeGet<UInt64>();


return StorageGenerateRandom::create(args.table_id, args.columns, max_array_length, max_string_length, random_seed);
});
}
Expand Down
Empty file.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
SELECT i FROM generateRandom('i Array(Nullable(Enum8(\'hello\' = 1, \'world\' = 5)))', 1025, 65535, 9223372036854775807) LIMIT 10; -- { serverError 128 }

0 comments on commit 9025fd0

Please sign in to comment.