Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix INSERT into Distributed() table with MATERIALIZED column #5429

Merged

Conversation

azat
Copy link
Collaborator

@azat azat commented May 27, 2019

By just skipping MATERIALIZED columns during processing.

P.S. you cannot use insert_allow_materialized_columns since it works
only for Buffer() engine.

Fixes: #4015
Fixes: #3673
Fixes: 01501fa ("correct column list for rewritten INSERT query into Distributed [#CLICKHOUSE-4161]")

Category (leave one):

  • Bug Fix

@azat azat force-pushed the INSERT-into-Distributed-with-MATERIALIZED branch from 8f0c0d2 to a06d492 Compare May 27, 2019 20:31
By just skipping MATERIALIZED columns during processing.

P.S. you cannot use insert_allow_materialized_columns since it works
only for Buffer() engine.

Fixes: ClickHouse#4015
Fixes: ClickHouse#3673
Fixes: 01501fa ("correct column list
for rewritten INSERT query into Distributed [#CLICKHOUSE-4161]")
@azat azat force-pushed the INSERT-into-Distributed-with-MATERIALIZED branch from a06d492 to e527def Compare May 27, 2019 20:32
@azat
Copy link
Collaborator Author

azat commented May 30, 2019

Any thoughts on this?

@vitlibar vitlibar self-assigned this May 31, 2019
@vitlibar
Copy link
Member

vitlibar commented May 31, 2019

Any thoughts on this?

Looks good.
I've simplified your solution slightly and I'll merge it after passing automatic checks.

@vitlibar
Copy link
Member

vitlibar commented Jun 1, 2019

I guess that this will resolve table overlapping, just ignore my question

Yes, we want to run as many tests as possible in parallel.

@vitlibar vitlibar merged commit a4ce427 into ClickHouse:master Jun 1, 2019
@vitlibar vitlibar added the pr-bugfix Pull request with bugfix, not backported by default label Jun 1, 2019
@azat azat deleted the INSERT-into-Distributed-with-MATERIALIZED branch June 2, 2019 16:18
azat added a commit to azat/ClickHouse that referenced this pull request Oct 23, 2019
Previous patch e527def ("Fix INSERT
into Distributed() table with MATERIALIZED column") fixes it only for
cases when the node is local, i.e. direct insert.

This patch address the problem when the node is not local
(`is_local == false`), by erasing materialized columns on INSERT into
Distributed.

And this patch fixes two cases, depends on `insert_distributed_sync`
setting:

- `insert_distributed_sync=0`

    ```
    Not found column value in block. There are only columns: date. Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5d6cf6 DB::Block::getByName(...) dbms/src/Core/Block.cpp:187
    4. 0x7fffec2fe067 DB::NativeBlockInputStream::readImpl() dbms/src/DataStreams/NativeBlockInputStream.cpp:159
    5. 0x7fffec2d223f DB::IBlockInputStream::read() dbms/src/DataStreams/IBlockInputStream.cpp:61
    6. 0x7ffff7c6d40d DB::TCPHandler::receiveData() dbms/programs/server/TCPHandler.cpp:971
    7. 0x7ffff7c6cc1d DB::TCPHandler::receivePacket() dbms/programs/server/TCPHandler.cpp:855
    8. 0x7ffff7c6a1ef DB::TCPHandler::readDataNext(unsigned long const&, int const&) dbms/programs/server/TCPHandler.cpp:406
    9. 0x7ffff7c6a41b DB::TCPHandler::readData(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:437
    10. 0x7ffff7c6a5d9 DB::TCPHandler::processInsertQuery(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:464
    11. 0x7ffff7c687b5 DB::TCPHandler::runImpl() dbms/programs/server/TCPHandler.cpp:257
    ```

- `insert_distributed_sync=1`

    ```
    2019.10.18 13:23:22.114578 [ 44 ] {a78f669f-0b08-4337-abf8-d31e958f6d12} <Error> executeQuery: Code: 171, e.displayText() = DB::Exception: Block structure mismatch in RemoteBlockOutputStream stream: different number of columns:
    date Date UInt16(size = 1), value Date UInt16(size = 1)
    date Date UInt16(size = 0): Insertion status:
    Wrote 1 blocks and 0 rows on shard 0 replica 0, 127.0.0.1:59000 (average 0 ms per block)
    Wrote 0 blocks and 0 rows on shard 1 replica 0, 127.0.0.2:59000 (average 2 ms per block)
     (version 19.16.1.1) (from [::1]:3624) (in query: INSERT INTO distributed_00952 VALUES ), Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5da4e9 DB::checkBlockStructure<void>(...)::{...}::operator()(...) const dbms/src/Core/Block.cpp:460
    4. 0x7fffec5da671 void DB::checkBlockStructure<void>(...) dbms/src/Core/Block.cpp:467
    5. 0x7fffec5d8d58 DB::assertBlocksHaveEqualStructure(...) dbms/src/Core/Block.cpp:515
    6. 0x7fffec326630 DB::RemoteBlockOutputStream::write(DB::Block const&) dbms/src/DataStreams/RemoteBlockOutputStream.cpp:68
    7. 0x7fffe98bd154 DB::DistributedBlockOutputStream::runWritingJob(DB::DistributedBlockOutputStream::JobReplica&, DB::Block const&)::{lambda()ClickHouse#1}::operator()() const dbms/src/Storages/Distributed/DistributedBlockOutputStream.cpp:280
    <snip>
    ````

Fixes: ClickHouse#7365
Fixes: ClickHouse#5429
Refs: ClickHouse#6891
akuzm added a commit that referenced this pull request Oct 24, 2019
* Fix INSERT into Distributed non local node with MATERIALIZED columns

Previous patch e527def ("Fix INSERT
into Distributed() table with MATERIALIZED column") fixes it only for
cases when the node is local, i.e. direct insert.

This patch address the problem when the node is not local
(`is_local == false`), by erasing materialized columns on INSERT into
Distributed.

And this patch fixes two cases, depends on `insert_distributed_sync`
setting:

- `insert_distributed_sync=0`

    ```
    Not found column value in block. There are only columns: date. Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5d6cf6 DB::Block::getByName(...) dbms/src/Core/Block.cpp:187
    4. 0x7fffec2fe067 DB::NativeBlockInputStream::readImpl() dbms/src/DataStreams/NativeBlockInputStream.cpp:159
    5. 0x7fffec2d223f DB::IBlockInputStream::read() dbms/src/DataStreams/IBlockInputStream.cpp:61
    6. 0x7ffff7c6d40d DB::TCPHandler::receiveData() dbms/programs/server/TCPHandler.cpp:971
    7. 0x7ffff7c6cc1d DB::TCPHandler::receivePacket() dbms/programs/server/TCPHandler.cpp:855
    8. 0x7ffff7c6a1ef DB::TCPHandler::readDataNext(unsigned long const&, int const&) dbms/programs/server/TCPHandler.cpp:406
    9. 0x7ffff7c6a41b DB::TCPHandler::readData(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:437
    10. 0x7ffff7c6a5d9 DB::TCPHandler::processInsertQuery(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:464
    11. 0x7ffff7c687b5 DB::TCPHandler::runImpl() dbms/programs/server/TCPHandler.cpp:257
    ```

- `insert_distributed_sync=1`

    ```
    2019.10.18 13:23:22.114578 [ 44 ] {a78f669f-0b08-4337-abf8-d31e958f6d12} <Error> executeQuery: Code: 171, e.displayText() = DB::Exception: Block structure mismatch in RemoteBlockOutputStream stream: different number of columns:
    date Date UInt16(size = 1), value Date UInt16(size = 1)
    date Date UInt16(size = 0): Insertion status:
    Wrote 1 blocks and 0 rows on shard 0 replica 0, 127.0.0.1:59000 (average 0 ms per block)
    Wrote 0 blocks and 0 rows on shard 1 replica 0, 127.0.0.2:59000 (average 2 ms per block)
     (version 19.16.1.1) (from [::1]:3624) (in query: INSERT INTO distributed_00952 VALUES ), Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5da4e9 DB::checkBlockStructure<void>(...)::{...}::operator()(...) const dbms/src/Core/Block.cpp:460
    4. 0x7fffec5da671 void DB::checkBlockStructure<void>(...) dbms/src/Core/Block.cpp:467
    5. 0x7fffec5d8d58 DB::assertBlocksHaveEqualStructure(...) dbms/src/Core/Block.cpp:515
    6. 0x7fffec326630 DB::RemoteBlockOutputStream::write(DB::Block const&) dbms/src/DataStreams/RemoteBlockOutputStream.cpp:68
    7. 0x7fffe98bd154 DB::DistributedBlockOutputStream::runWritingJob(DB::DistributedBlockOutputStream::JobReplica&, DB::Block const&)::{lambda()#1}::operator()() const dbms/src/Storages/Distributed/DistributedBlockOutputStream.cpp:280
    <snip>
    ````

Fixes: #7365
Fixes: #5429
Refs: #6891

* Cover INSERT into Distributed with MATERIALIZED columns and !is_local node

I guess that adding new cluster into server-test.xml is not required,
but it won't harm.

* Update DistributedBlockOutputStream.cpp
akuzm added a commit that referenced this pull request Oct 29, 2019
* Fix INSERT into Distributed non local node with MATERIALIZED columns

Previous patch e527def ("Fix INSERT
into Distributed() table with MATERIALIZED column") fixes it only for
cases when the node is local, i.e. direct insert.

This patch address the problem when the node is not local
(`is_local == false`), by erasing materialized columns on INSERT into
Distributed.

And this patch fixes two cases, depends on `insert_distributed_sync`
setting:

- `insert_distributed_sync=0`

    ```
    Not found column value in block. There are only columns: date. Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5d6cf6 DB::Block::getByName(...) dbms/src/Core/Block.cpp:187
    4. 0x7fffec2fe067 DB::NativeBlockInputStream::readImpl() dbms/src/DataStreams/NativeBlockInputStream.cpp:159
    5. 0x7fffec2d223f DB::IBlockInputStream::read() dbms/src/DataStreams/IBlockInputStream.cpp:61
    6. 0x7ffff7c6d40d DB::TCPHandler::receiveData() dbms/programs/server/TCPHandler.cpp:971
    7. 0x7ffff7c6cc1d DB::TCPHandler::receivePacket() dbms/programs/server/TCPHandler.cpp:855
    8. 0x7ffff7c6a1ef DB::TCPHandler::readDataNext(unsigned long const&, int const&) dbms/programs/server/TCPHandler.cpp:406
    9. 0x7ffff7c6a41b DB::TCPHandler::readData(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:437
    10. 0x7ffff7c6a5d9 DB::TCPHandler::processInsertQuery(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:464
    11. 0x7ffff7c687b5 DB::TCPHandler::runImpl() dbms/programs/server/TCPHandler.cpp:257
    ```

- `insert_distributed_sync=1`

    ```
    2019.10.18 13:23:22.114578 [ 44 ] {a78f669f-0b08-4337-abf8-d31e958f6d12} <Error> executeQuery: Code: 171, e.displayText() = DB::Exception: Block structure mismatch in RemoteBlockOutputStream stream: different number of columns:
    date Date UInt16(size = 1), value Date UInt16(size = 1)
    date Date UInt16(size = 0): Insertion status:
    Wrote 1 blocks and 0 rows on shard 0 replica 0, 127.0.0.1:59000 (average 0 ms per block)
    Wrote 0 blocks and 0 rows on shard 1 replica 0, 127.0.0.2:59000 (average 2 ms per block)
     (version 19.16.1.1) (from [::1]:3624) (in query: INSERT INTO distributed_00952 VALUES ), Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5da4e9 DB::checkBlockStructure<void>(...)::{...}::operator()(...) const dbms/src/Core/Block.cpp:460
    4. 0x7fffec5da671 void DB::checkBlockStructure<void>(...) dbms/src/Core/Block.cpp:467
    5. 0x7fffec5d8d58 DB::assertBlocksHaveEqualStructure(...) dbms/src/Core/Block.cpp:515
    6. 0x7fffec326630 DB::RemoteBlockOutputStream::write(DB::Block const&) dbms/src/DataStreams/RemoteBlockOutputStream.cpp:68
    7. 0x7fffe98bd154 DB::DistributedBlockOutputStream::runWritingJob(DB::DistributedBlockOutputStream::JobReplica&, DB::Block const&)::{lambda()#1}::operator()() const dbms/src/Storages/Distributed/DistributedBlockOutputStream.cpp:280
    <snip>
    ````

Fixes: #7365
Fixes: #5429
Refs: #6891

* Cover INSERT into Distributed with MATERIALIZED columns and !is_local node

I guess that adding new cluster into server-test.xml is not required,
but it won't harm.

* Update DistributedBlockOutputStream.cpp

(cherry picked from commit 29052b6)
akuzm added a commit that referenced this pull request Oct 29, 2019
* Fix INSERT into Distributed non local node with MATERIALIZED columns

Previous patch e527def ("Fix INSERT
into Distributed() table with MATERIALIZED column") fixes it only for
cases when the node is local, i.e. direct insert.

This patch address the problem when the node is not local
(`is_local == false`), by erasing materialized columns on INSERT into
Distributed.

And this patch fixes two cases, depends on `insert_distributed_sync`
setting:

- `insert_distributed_sync=0`

    ```
    Not found column value in block. There are only columns: date. Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5d6cf6 DB::Block::getByName(...) dbms/src/Core/Block.cpp:187
    4. 0x7fffec2fe067 DB::NativeBlockInputStream::readImpl() dbms/src/DataStreams/NativeBlockInputStream.cpp:159
    5. 0x7fffec2d223f DB::IBlockInputStream::read() dbms/src/DataStreams/IBlockInputStream.cpp:61
    6. 0x7ffff7c6d40d DB::TCPHandler::receiveData() dbms/programs/server/TCPHandler.cpp:971
    7. 0x7ffff7c6cc1d DB::TCPHandler::receivePacket() dbms/programs/server/TCPHandler.cpp:855
    8. 0x7ffff7c6a1ef DB::TCPHandler::readDataNext(unsigned long const&, int const&) dbms/programs/server/TCPHandler.cpp:406
    9. 0x7ffff7c6a41b DB::TCPHandler::readData(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:437
    10. 0x7ffff7c6a5d9 DB::TCPHandler::processInsertQuery(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:464
    11. 0x7ffff7c687b5 DB::TCPHandler::runImpl() dbms/programs/server/TCPHandler.cpp:257
    ```

- `insert_distributed_sync=1`

    ```
    2019.10.18 13:23:22.114578 [ 44 ] {a78f669f-0b08-4337-abf8-d31e958f6d12} <Error> executeQuery: Code: 171, e.displayText() = DB::Exception: Block structure mismatch in RemoteBlockOutputStream stream: different number of columns:
    date Date UInt16(size = 1), value Date UInt16(size = 1)
    date Date UInt16(size = 0): Insertion status:
    Wrote 1 blocks and 0 rows on shard 0 replica 0, 127.0.0.1:59000 (average 0 ms per block)
    Wrote 0 blocks and 0 rows on shard 1 replica 0, 127.0.0.2:59000 (average 2 ms per block)
     (version 19.16.1.1) (from [::1]:3624) (in query: INSERT INTO distributed_00952 VALUES ), Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5da4e9 DB::checkBlockStructure<void>(...)::{...}::operator()(...) const dbms/src/Core/Block.cpp:460
    4. 0x7fffec5da671 void DB::checkBlockStructure<void>(...) dbms/src/Core/Block.cpp:467
    5. 0x7fffec5d8d58 DB::assertBlocksHaveEqualStructure(...) dbms/src/Core/Block.cpp:515
    6. 0x7fffec326630 DB::RemoteBlockOutputStream::write(DB::Block const&) dbms/src/DataStreams/RemoteBlockOutputStream.cpp:68
    7. 0x7fffe98bd154 DB::DistributedBlockOutputStream::runWritingJob(DB::DistributedBlockOutputStream::JobReplica&, DB::Block const&)::{lambda()#1}::operator()() const dbms/src/Storages/Distributed/DistributedBlockOutputStream.cpp:280
    <snip>
    ````

Fixes: #7365
Fixes: #5429
Refs: #6891

* Cover INSERT into Distributed with MATERIALIZED columns and !is_local node

I guess that adding new cluster into server-test.xml is not required,
but it won't harm.

* Update DistributedBlockOutputStream.cpp

(cherry picked from commit 29052b6)
CurtizJ pushed a commit that referenced this pull request Nov 15, 2019
* Fix INSERT into Distributed non local node with MATERIALIZED columns

Previous patch e527def ("Fix INSERT
into Distributed() table with MATERIALIZED column") fixes it only for
cases when the node is local, i.e. direct insert.

This patch address the problem when the node is not local
(`is_local == false`), by erasing materialized columns on INSERT into
Distributed.

And this patch fixes two cases, depends on `insert_distributed_sync`
setting:

- `insert_distributed_sync=0`

    ```
    Not found column value in block. There are only columns: date. Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5d6cf6 DB::Block::getByName(...) dbms/src/Core/Block.cpp:187
    4. 0x7fffec2fe067 DB::NativeBlockInputStream::readImpl() dbms/src/DataStreams/NativeBlockInputStream.cpp:159
    5. 0x7fffec2d223f DB::IBlockInputStream::read() dbms/src/DataStreams/IBlockInputStream.cpp:61
    6. 0x7ffff7c6d40d DB::TCPHandler::receiveData() dbms/programs/server/TCPHandler.cpp:971
    7. 0x7ffff7c6cc1d DB::TCPHandler::receivePacket() dbms/programs/server/TCPHandler.cpp:855
    8. 0x7ffff7c6a1ef DB::TCPHandler::readDataNext(unsigned long const&, int const&) dbms/programs/server/TCPHandler.cpp:406
    9. 0x7ffff7c6a41b DB::TCPHandler::readData(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:437
    10. 0x7ffff7c6a5d9 DB::TCPHandler::processInsertQuery(DB::Settings const&) dbms/programs/server/TCPHandler.cpp:464
    11. 0x7ffff7c687b5 DB::TCPHandler::runImpl() dbms/programs/server/TCPHandler.cpp:257
    ```

- `insert_distributed_sync=1`

    ```
    2019.10.18 13:23:22.114578 [ 44 ] {a78f669f-0b08-4337-abf8-d31e958f6d12} <Error> executeQuery: Code: 171, e.displayText() = DB::Exception: Block structure mismatch in RemoteBlockOutputStream stream: different number of columns:
    date Date UInt16(size = 1), value Date UInt16(size = 1)
    date Date UInt16(size = 0): Insertion status:
    Wrote 1 blocks and 0 rows on shard 0 replica 0, 127.0.0.1:59000 (average 0 ms per block)
    Wrote 0 blocks and 0 rows on shard 1 replica 0, 127.0.0.2:59000 (average 2 ms per block)
     (version 19.16.1.1) (from [::1]:3624) (in query: INSERT INTO distributed_00952 VALUES ), Stack trace:

    2. 0x7ffff7be92e0 DB::Exception::Exception() dbms/src/Common/Exception.h:27
    3. 0x7fffec5da4e9 DB::checkBlockStructure<void>(...)::{...}::operator()(...) const dbms/src/Core/Block.cpp:460
    4. 0x7fffec5da671 void DB::checkBlockStructure<void>(...) dbms/src/Core/Block.cpp:467
    5. 0x7fffec5d8d58 DB::assertBlocksHaveEqualStructure(...) dbms/src/Core/Block.cpp:515
    6. 0x7fffec326630 DB::RemoteBlockOutputStream::write(DB::Block const&) dbms/src/DataStreams/RemoteBlockOutputStream.cpp:68
    7. 0x7fffe98bd154 DB::DistributedBlockOutputStream::runWritingJob(DB::DistributedBlockOutputStream::JobReplica&, DB::Block const&)::{lambda()#1}::operator()() const dbms/src/Storages/Distributed/DistributedBlockOutputStream.cpp:280
    <snip>
    ````

Fixes: #7365
Fixes: #5429
Refs: #6891

* Cover INSERT into Distributed with MATERIALIZED columns and !is_local node

I guess that adding new cluster into server-test.xml is not required,
but it won't harm.

* Update DistributedBlockOutputStream.cpp

(cherry picked from commit 29052b6)
@alexey-milovidov
Copy link
Member

@azat How comes it does not work in version 20.3?
#16897

@azat
Copy link
Collaborator Author

azat commented Apr 17, 2021

@alexey-milovidov list of non-materialized columns is obtained from the Distributed table, while in #16897 the Distributed table has all columns non-materialized that's why it wasn't filtered out and rejected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-bugfix Pull request with bugfix, not backported by default
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Insert is not working for distributed tables with materialized columns
5 participants