Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI Fest #7 #12508

Closed
21 of 40 tasks
alexey-milovidov opened this issue Jul 15, 2020 · 71 comments
Closed
21 of 40 tasks

CI Fest #7 #12508

alexey-milovidov opened this issue Jul 15, 2020 · 71 comments
Labels
testing Special issue with list of bugs found by CI

Comments

@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jul 15, 2020

Infrastructure Failures

Test Failures

@alexey-milovidov alexey-milovidov added the testing Special issue with list of bugs found by CI label Jul 15, 2020
@alexey-milovidov alexey-milovidov mentioned this issue Jul 15, 2020
16 tasks
@alexey-milovidov alexey-milovidov pinned this issue Jul 15, 2020
@alexey-milovidov
Copy link
Member Author

Live view fix: #12519

@alexey-milovidov
Copy link
Member Author

@tavplubix Test "metadata_loading" is using real time for criteria of success - it's inherently flaky. What shall we do?

@alesapin
Copy link
Member

@alexey-milovidov
Copy link
Member Author

@qoega
Copy link
Member

qoega commented Jul 20, 2020

https://clickhouse-test-reports.s3.yandex.net/0/21d3a797944066cb0b520e372570717ab165f3c4/unit_tests_msan_clang-10.html

zkutil.MultiAsync

[ RUN      ] zkutil.MultiAsync
../src/Common/ZooKeeper/tests/gtest_zkutil_test_multi_exception.cpp:101: Failure
Expected equality of these values:
  res.error
    Which is: 4-byte object <FC-FF FF-FF>
  Coordination::Error::ZOK
    Which is: 4-byte object <00-00 00-00>
[  FAILED  ] zkutil.MultiAsync (11889 ms)

@qoega
Copy link
Member

qoega commented Jul 28, 2020

01307_multiple_leaders https://clickhouse-test-reports.s3.yandex.net/12953/4bd6261dc164c64172c641adafec271df4c12301/functional_stateless_tests_(debug).html#fail1

2020-07-27 21:56:48 01307_multiple_leaders:                                                 [ FAIL ] 5.02 sec. - having stderror:
2020-07-27 21:56:48 [fed3a8e763c2] 2020.07.27 21:56:45.504500 [ 1835 ] {a87ba937-a84c-45bf-9e80-7d046243ecd5} <Warning> default.r1: Tried to commit obsolete part all_122_122_0 covered by all_0_136_5 (state Committed)
2020-07-27 21:56:48 

@qoega
Copy link
Member

qoega commented Jul 29, 2020

01305_replica_create_drop_zookeeper https://clickhouse-test-reports.s3.yandex.net/0/795c09fdbbbe2c1f156d8b0c795cd9992297cb20/functional_stateless_tests_(unbundled).html#fail1

2020-07-28 00:50:45 [80aefae76e85] 2020.07.28 00:50:44.694754 [ 428 ] {255db9d1-b454-490b-a8f2-545af840b22d} <Error> executeQuery: Code: 999, e.displayText() = Coordination::Exception: Can't get data for node /clickhouse/tables/alter_table/columns: node doesn't exist (No node) (version 20.7.1.4189 (official build)) (from [::1]:46720) (in query: CREATE TABLE test_table_2 (a UInt8) ENGINE = ReplicatedMergeTree('/clickhouse/tables/alter_table', 'r_2') ORDER BY tuple();), Stack trace (when copying this message, always include the lines below):
2020-07-28 00:50:45 Code: 999. DB::Exception: Received from localhost:9000. DB::Exception: Can't get data for node /clickhouse/tables/alter_table/columns: node doesn't exist (No node). 

@KochetovNicolai
Copy link
Member

00960_live_view_watch_events_live
https://clickhouse-test-reports.s3.yandex.net/13075/d3bcf89ae427dcb2b9cd7def3ac7296de3def723/functional_stateless_tests_(address).html#fail1

2020-07-29 19:49:57 00960_live_view_watch_events_live:                                      [ FAIL ] 120.86 sec. - return code 1
2020-07-29 19:49:57 Traceback (most recent call last):
2020-07-29 19:49:57   File "/usr/share/clickhouse-test/queries/0_stateless/00960_live_view_watch_events_live.py", line 35, in <module>
2020-07-29 19:49:57     client1.expect('2.*' + end_of_block)
2020-07-29 19:49:57   File "/usr/share/clickhouse-test/queries/0_stateless/helpers/uexpect.py", line 151, in expect
2020-07-29 19:49:57     raise exception
2020-07-29 19:49:57 uexpect.ExpectTimeoutError: Timeout 120.000s for '2.*.*\\r\\n.*\\r\\n' buffer '\x1b[1mWATCH \x1b[0mtest.lv \x1b[1mEVENTS\x1b[0m\r\n\r\n\x1b[1;30m\xe2\x86\x92\x1b[0m Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) \x1b[K\r\x1b[K\xe2\x94\x8c\xe2\x94\x80\x1b[1mversion\x1b[0m\xe2\x94\x80\xe2\x94\x90\r\n\xe2\x94\x82       1 \xe2\x94\x82\r\n\xe2\x94\x94\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x80\xe2\x94\x98\r\n\x1b[1;31m\xe2\x86\x98\x1b[0m Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) \x1b[K\r\x1b[K\x1b[1;32m\xe2\x86\x93\x1b[0m Progress: 0.00 rows, 0.00 B (0.00 rows/s., 0.00 B/s.) \x1b[K\r\x1b[1;33m\xe2\x86\x99\x1b[0m Progress: 1.00 rows, 8.00 B (8.22 rows/s., 65.75 B/s.) \x1b[K\r\x1b[K\x1b[1;34m\xe2\x86\x90\x1b[0m Progress: 1.00 rows, 8.00 B (0.07 rows/s., 0.53 B/s.) \x1b[K\r\x1b[K\x1b[1;35m\xe2\x86\x96\x1b[0m Progress: 1.00 rows, 8.00 B (0.03 rows/s., 0.27 B/s.) \x1b[K\r\x1b[K\x1b[1;36m\xe2\x86\x91\x1b[0m Progress: 1.00 rows, 8.00 B (0.02 rows/s., 0.18 B/s.) \x1b[K\r\x1b[K\x1b[1m\xe2\x86\x97\x1b[0m Progress: 1.00 rows, 8.00 B (0.02 rows/s., 0.13 B/s.) \x1b[K\r\x1b[K\x1b[1;30m\xe2\x86\x92\x1b[0m Progress: 1.00 rows, 8.00 B (0.01 rows/s., 0.11 B/s.) \x1b[K\r\x1b[K\x1b[1;31m\xe2\x86\x98\x1b[0m Progress: 1.00 rows, 8.00 B (0.01 rows/s., 0.09 B/s.) \x1b[K\r\x1b[K\x1b[1;32m\xe2\x86\x93\x1b[0m Progress: 1.00 rows, 8.00 B (0.01 rows/s., 0.08 B/s.) \x1b[K'

@alexey-milovidov
Copy link
Member Author

kill mutation fix: #13167

@azat
Copy link
Collaborator

azat commented Aug 1, 2020

@alexey-milovidov
Copy link
Member Author

Sometimes the check finish with SUCCESS but did not publish the result on GitHub due to:

[can't run :(] There are only 0 GitHub api requests from 5000 for this hour, there will be more after 2020-08-02 06:23:04

@alexey-milovidov
Copy link
Member Author

@alexey-milovidov
Copy link
Member Author

Stress test timeout: #13227

@alexey-milovidov
Copy link
Member Author

@alesapin
Copy link
Member

alesapin commented Aug 3, 2020

test_adaptive_granularity/test.py::test_version_update_two_nodes:

https://clickhouse-test-reports.s3.yandex.net/13234/b67e2cee35bf05493289fa883077ec117d0f2a7e/integration_tests_(release).html#fail1

Exception: Cmd "bash -c pkill -9 clickhouse" failed in container bddd864e43c722b4487db1dda33d2725f8537010db49a1c4901552fe103fba13. Return code 1. Output:

Kill is broken or process died without kill?

@alesapin
Copy link
Member

alesapin commented Aug 3, 2020

test_adaptive_granularity/test.py::test_version_update_two_nodes:
https://clickhouse-test-reports.s3.yandex.net/13234/b67e2cee35bf05493289fa883077ec117d0f2a7e/integration_tests_(release).html#fail1

Exception: Cmd "bash -c pkill -9 clickhouse" failed in container bddd864e43c722b4487db1dda33d2725f8537010db49a1c4901552fe103fba13. Return code 1. Output:

Kill is broken or process died without kill?

Fix https://github.com/ClickHouse/ClickHouse/pull/13278/files

@alesapin
Copy link
Member

alesapin commented Aug 3, 2020

@alesapin
Copy link
Member

alesapin commented Aug 3, 2020

materialize

Possibly fixed here https://github.com/ClickHouse/ClickHouse/pull/12549/files

@alexey-milovidov
Copy link
Member Author

@alexey-milovidov
Copy link
Member Author

@alexey-milovidov
Copy link
Member Author

01396_inactive_replica_cleanup_nodes - #13906

@alesapin
Copy link
Member

@alesapin https://clickhouse-test-reports.s3.yandex.net/0/d2be6c036b3b2f4bec8de985a9c620f9ae2cd28d/functional_stateless_tests_(release)/test_run.txt.out.log

Is it possible that mutations finish on replica in different order?
Maybe it is old behaviour that was already fixed?

Seems like it's very a old test run. Probably fixed.

@qoega
Copy link
Member

qoega commented Aug 24, 2020

@alesapin
Copy link
Member

test_version_update_after_mutation
https://clickhouse-test-reports.s3.yandex.net/0/55ac192417ee0d2b83e7f28c5765ae939ad2983f/integration_tests_(thread).html

E           QueryRuntimeException: Client failed! Return code: 243, stderr: Received exception from server (version 20.1.10):
E           Code: 243. DB::Exception: Received from 172.19.0.7:9000. DB::Exception: Cannot reserve 1.00 MiB, not enough space. Stack trace:
E           0. 0x10339650 Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int)  in /usr/bin/clickhouse
E           1. 0x8ea65cd DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int)  in /usr/bin/clickhouse
E           2. 0xd7fdbff ?  in /usr/bin/clickhouse
E           3. 0xd7d509e DB::MergeTreeData::reserveSpacePreferringTTLRules(unsigned long, DB::MergeTreeDataPartTTLInfos const&, long, unsigned long) const  in /usr/bin/clickhouse
E           4. 0xd8a7dbe DB::MergeTreeDataWriter::writeTempPart(DB::BlockWithPartition&)  in /usr/bin/clickhouse
E           5. 0xd9770a7 DB::ReplicatedMergeTreeBlockOutputStream::write(DB::Block const&)  in /usr/bin/clickhouse
E           6. 0xd08da8e DB::PushingToViewsBlockOutputStream::write(DB::Block const&)  in /usr/bin/clickhouse
E           7. 0xd09b7ca DB::SquashingBlockOutputStream::writeSuffix()  in /usr/bin/clickhouse
E           8. 0x8f961c2 DB::TCPHandler::processInsertQuery(DB::Settings const&)  in /usr/bin/clickhouse
E           9. 0x8f978ef DB::TCPHandler::runImpl()  in /usr/bin/clickhouse
E           10. 0x8f97ee0 DB::TCPHandler::run()  in /usr/bin/clickhouse
E           11. 0xe1ff79b Poco::Net::TCPServerConnection::start()  in /usr/bin/clickhouse
E           12. 0xe1ffc1d Poco::Net::TCPServerDispatcher::run()  in /usr/bin/clickhouse
E           13. 0x103c77b7 Poco::PooledThread::run()  in /usr/bin/clickhouse
E           14. 0x103c35bc Poco::ThreadImpl::runnableEntry(void*)  in /usr/bin/clickhouse
E           15. 0x103c4f5d ?  in /usr/bin/clickhouse
E           16. 0x76db start_thread  in /lib/x86_64-linux-gnu/libpthread-2.27.so
E           17. 0x12188f clone  in /lib/x86_64-linux-gnu/libc-2.27.so

???

@alesapin
Copy link
Member

alesapin commented Aug 25, 2020

Segmentation fault during table drop in lazy database in stress test with msan:
#14027 (comment)
https://clickhouse-test-reports.s3.yandex.net/14027/ffd8c193852283fa7ec675814b6e5e3a46dc511b/stress_test_(memory).html#fail1
cc: @nikvas0

@alexey-milovidov
Copy link
Member Author

01056_prepared_statements_null_and_escaping

Broken in master for unknown reason, will investigate...

@alexey-milovidov
Copy link
Member Author

Stateless tests under TSan take 3 hours to run:
https://clickhouse-test-reports.s3.yandex.net/14048/f95acff93cb17afda047f6fe2d40ebb59e3fb2f8/functional_stateless_tests_(thread)/test_run.txt.out.log

Stress test under TSan started to timeout sometimes after 5 hours.

@alexey-milovidov
Copy link
Member Author

@alesapin The file skip_list.json is in JSON format - we cannot have comments there. It's very important for development.
I can add some very long tests to skip under TSan but refuse to do so if I cannot write comments.

@alesapin
Copy link
Member

alesapin commented Aug 26, 2020

00633_materialized_view_and_too_many_parts_zookeeper with database atomic https://clickhouse-test-reports.s3.yandex.net/14049/e5bc5ea419d49c807dada50b1e33966537fd0ada/functional_stateless_tests_(release,_databaseatomic).html#fail1:
seems like table is not dropped from zookeeper:

2020-08-26 00:19:57 [0c996d30a64c] 2020.08.26 00:19:57.454658 [ 236893 ] {160eaa75-0f67-4671-9579-6db58f739198} <Error> executeQuery: Code: 253, e.displayText() = DB::Exception: Replica /clickhouse/test/a/replicas/1 already exists. (version 20.8.1.4463) (from [::1]:43836) (in query: CREATE MATERIALIZED VIEW a (d UInt64) ENGINE = ReplicatedMergeTree('/clickhouse/test/a', '1') ORDER BY d AS SELECT * FROM root), Stack trace (when copying this message, always include the lines below):

@alesapin
Copy link
Member

@alesapin The file skip_list.json is in JSON format - we cannot have comments there. It's very important for development.
I can add some very long tests to skip under TSan but refuse to do so if I cannot write comments.

We can use YAML, but we will get one more dependency for clickhouse-test PyYAML (https://pyyaml.org/wiki/PyYAMLDocumentation).

@alexey-milovidov
Copy link
Member Author

I prefer JSON with comments - either "JSON5" or just simple function that strips comments before passing to JSON parser.

@alexey-milovidov
Copy link
Member Author

seems like table is not dropped from zookeeper

AFAIK there is DROP NO DELAY, @tavplubix ?

@tavplubix
Copy link
Member

seems like table is not dropped from zookeeper

AFAIK there is DROP NO DELAY, @tavplubix ?

Race condition is still possible, because NO DELAY doesn't make DROP blocking. It can be "fixed" by increasing sleep time or by making all ZooKeeper paths in functional tests unique.

@alexey-milovidov
Copy link
Member Author

alexey-milovidov commented Aug 26, 2020

@tavplubix I'm Ok changing ZooKeeper paths for specific tests...
Or maybe we can do it by enabling Atomic database by default and utilizing the new UUID feature?
Or just rename them on the fly with silly regexp in clickhouse-test?

@alesapin
Copy link
Member

alesapin commented Aug 27, 2020

test_version_update_after_mutation
https://clickhouse-test-reports.s3.yandex.net/0/55ac192417ee0d2b83e7f28c5765ae939ad2983f/integration_tests_(thread).html

E           QueryRuntimeException: Client failed! Return code: 243, stderr: Received exception from server (version 20.1.10):
E           Code: 243. DB::Exception: Received from 172.19.0.7:9000. DB::Exception: Cannot reserve 1.00 MiB, not enough space. Stack trace:
E           0. 0x10339650 Poco::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int)  in /usr/bin/clickhouse
E           1. 0x8ea65cd DB::Exception::Exception(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, int)  in /usr/bin/clickhouse
E           2. 0xd7fdbff ?  in /usr/bin/clickhouse
E           3. 0xd7d509e DB::MergeTreeData::reserveSpacePreferringTTLRules(unsigned long, DB::MergeTreeDataPartTTLInfos const&, long, unsigned long) const  in /usr/bin/clickhouse
E           4. 0xd8a7dbe DB::MergeTreeDataWriter::writeTempPart(DB::BlockWithPartition&)  in /usr/bin/clickhouse
E           5. 0xd9770a7 DB::ReplicatedMergeTreeBlockOutputStream::write(DB::Block const&)  in /usr/bin/clickhouse
E           6. 0xd08da8e DB::PushingToViewsBlockOutputStream::write(DB::Block const&)  in /usr/bin/clickhouse
E           7. 0xd09b7ca DB::SquashingBlockOutputStream::writeSuffix()  in /usr/bin/clickhouse
E           8. 0x8f961c2 DB::TCPHandler::processInsertQuery(DB::Settings const&)  in /usr/bin/clickhouse
E           9. 0x8f978ef DB::TCPHandler::runImpl()  in /usr/bin/clickhouse
E           10. 0x8f97ee0 DB::TCPHandler::run()  in /usr/bin/clickhouse
E           11. 0xe1ff79b Poco::Net::TCPServerConnection::start()  in /usr/bin/clickhouse
E           12. 0xe1ffc1d Poco::Net::TCPServerDispatcher::run()  in /usr/bin/clickhouse
E           13. 0x103c77b7 Poco::PooledThread::run()  in /usr/bin/clickhouse
E           14. 0x103c35bc Poco::ThreadImpl::runnableEntry(void*)  in /usr/bin/clickhouse
E           15. 0x103c4f5d ?  in /usr/bin/clickhouse
E           16. 0x76db start_thread  in /lib/x86_64-linux-gnu/libpthread-2.27.so
E           17. 0x12188f clone  in /lib/x86_64-linux-gnu/libc-2.27.so

???

I've also checked logs and for some reason, there was no free space...
Not executing log entry MUTATE_PART for part all_0_0_0_1 because source parts size (392.70 KiB) is
Ok, I've just slightly increased space requirements for integration tests in CI and fixed possible flap: #14158

@alesapin
Copy link
Member

@alesapin The file skip_list.json is in JSON format - we cannot have comments there. It's very important for development.
I can add some very long tests to skip under TSan but refuse to do so if I cannot write comments.

#14159

@alexey-milovidov
Copy link
Member Author

@alexey-milovidov
Copy link
Member Author

alexey-milovidov commented Aug 27, 2020

01193_metadata_loading

It's also too long. We are looking for ideas how to rewrite this test.
It should check that metadata is loading in parallel. But timing issues make it difficult to wrap it into functional test.

@alesapin
Copy link
Member

Massive build failures with "inner CI problem": https://clickhouse-builds.s3.yandex.net/0/478adb75ef53333b334b5b4a0a44c47e6ed52c32/clickhouse_build_check/report.html

Retries added to CI

@alexey-milovidov
Copy link
Member Author

OOM in build:

2020-08-28 16:39:14 g++-9: fatal error: Killed signal terminated program cc1plus
2020-08-28 16:39:14 compilation terminated.

https://clickhouse-builds.s3.yandex.net/14223/fc84d12542cd3e69c60e8be4814d3a62e782c40f/clickhouse_build_check/report.html#fail1

@alexey-milovidov
Copy link
Member Author

The 01442_merge_detach_attach test highlights issues if we enable Thread Fuzzer:

https://clickhouse-test-reports.s3.yandex.net/9813/e412017200ef3300d1b36bcafdf46410dd5f4cfc/fast_test/runlog.out.log

@4ertus2
Copy link
Contributor

4ertus2 commented Aug 31, 2020

00693_max_block_size_system_tables_columns flapped
https://clickhouse-test-reports.s3.yandex.net/14219/f51e0bfeb0d7b93d8e0da9ceae0e4013d7d57927/fast_test.html#fail1

2020-08-28 17:27:07 00693_max_block_size_system_tables_columns:                             [ FAIL ] - result differs with reference:
2020-08-28 17:27:07 @@ -7,4 +7,4 @@
2020-08-28 17:27:07  1
2020-08-28 17:27:07  1
2020-08-28 17:27:07  1
2020-08-28 17:27:07 -1
2020-08-28 17:27:07 +0
2020-08-28 17:27:07 

@alexey-milovidov alexey-milovidov unpinned this issue Sep 2, 2020
@qoega
Copy link
Member

qoega commented Sep 2, 2020

Opened new CI Fest #14414

@qoega qoega closed this as completed Sep 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
testing Special issue with list of bugs found by CI
Projects
None yet
Development

No branches or pull requests

8 participants