Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polymorphic parts (compact format). #8290

Merged
merged 99 commits into from Feb 23, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
99 commits
Select commit Hold shift + click to select a range
b433add
polymorphic parts (development)
CurtizJ Oct 10, 2019
18163e4
polymorphic parts (development)
CurtizJ Oct 11, 2019
8ba37da
polymorphic parts (development)
CurtizJ Oct 16, 2019
3ebb2ab
polymorphic parts (development)
CurtizJ Oct 19, 2019
ce36cf8
polymorphic parts (development)
CurtizJ Oct 20, 2019
1297991
polymorphic parts (development)
CurtizJ Oct 21, 2019
8df8bce
polymorphic parts (development)
CurtizJ Oct 21, 2019
1d3f005
polymorphic parts (development)
CurtizJ Oct 21, 2019
8b880d6
polymorphic parts (development)
CurtizJ Oct 22, 2019
715ae5a
polymorphic parts (development)
CurtizJ Oct 22, 2019
32858c4
polymorphic parts (development)
CurtizJ Oct 23, 2019
5484f4b
polymorphic parts (development)
CurtizJ Oct 28, 2019
35b7363
polymorphic parts (development)
CurtizJ Oct 31, 2019
7293841
polymorphic parts (development)
CurtizJ Nov 5, 2019
8cf6236
polymorphic parts (development)
CurtizJ Nov 7, 2019
c070254
polymorphic parts (development)
CurtizJ Nov 8, 2019
f6b1fc5
polymorphic parts (development)
CurtizJ Nov 13, 2019
6cd6af7
polymorphic parts (development)
CurtizJ Nov 18, 2019
e1d13ea
polymorphic parts (development)
CurtizJ Nov 18, 2019
426c62a
polymorphic parts (development)
CurtizJ Nov 20, 2019
4350601
polymorphic parts (development)
CurtizJ Nov 20, 2019
94abf36
polymorphic parts (development)
CurtizJ Nov 21, 2019
43b4c4c
polymorphic parts (development)
CurtizJ Nov 21, 2019
49982ad
polymorphic parts (development)
CurtizJ Nov 22, 2019
921d324
polymorphic parts (development)
CurtizJ Nov 22, 2019
b54f162
polymorphic parts (development)
CurtizJ Nov 25, 2019
9e7adf4
polymorphic parts (development)
CurtizJ Nov 25, 2019
49e465d
polymorphic parts (development)
CurtizJ Nov 26, 2019
d1ddfbb
polymorphic parts (development)
CurtizJ Nov 27, 2019
55deeea
polymorphic parts (development)
CurtizJ Nov 27, 2019
7dbdbff
polymorphic parts (development)
CurtizJ Nov 28, 2019
a3875a6
polymorphic parts (development)
CurtizJ Dec 2, 2019
511ae82
polymorphic parts (development) fix adjust last granule
CurtizJ Dec 2, 2019
be0e13d
polymorphic parts (development) columns sizes
CurtizJ Dec 3, 2019
31ffad0
polymorphic parts (development) columns sizes
CurtizJ Dec 3, 2019
9df0d45
polymorphic parts (development) fix prewhere
CurtizJ Dec 5, 2019
7803aee
polymorphic parts (development) fix prewhere
CurtizJ Dec 5, 2019
bd08520
polymorphic parts (development)
CurtizJ Dec 5, 2019
d3b0800
polymorphic parts (development) alter
CurtizJ Dec 9, 2019
26d159e
polymorphic parts (development) alter
CurtizJ Dec 12, 2019
831f39a
polymorphic parts (development) alter
CurtizJ Dec 16, 2019
59faa49
polymorphic parts (development) alter update
CurtizJ Dec 18, 2019
ae74d28
polymorphic parts (development) fix alter
CurtizJ Dec 18, 2019
55b7db7
polymorphic parts (development) cleanup
CurtizJ Dec 18, 2019
6f67340
polymorphic parts (development) cleanup
CurtizJ Dec 18, 2019
258e8d6
polymorphic parts (development) cleanup
CurtizJ Dec 18, 2019
9db2f2c
Merge remote-tracking branch 'upstream/master' into polymorphic-parts
CurtizJ Dec 19, 2019
ba2a630
merging with master
CurtizJ Dec 19, 2019
206cb1a
fix broken by refactoring functionality with wide parts
CurtizJ Dec 19, 2019
4bd4ac7
Merge remote-tracking branch 'upstream/master' into polymorphic-parts
CurtizJ Dec 25, 2019
c8393f2
fix mutations with mixed-granularity parts
CurtizJ Dec 25, 2019
aadb948
temporarly store all parts in compact format
CurtizJ Dec 25, 2019
c298616
reduce number of seeks in ReaderCompact
CurtizJ Dec 25, 2019
0b99df9
better column initialization in data parts
CurtizJ Dec 25, 2019
74d5c6e
better writer for compact parts
CurtizJ Dec 27, 2019
ccb15e6
better granularity computing
CurtizJ Dec 27, 2019
33ae978
Merge remote-tracking branch 'upstream/master' into polymorphic-parts
CurtizJ Jan 9, 2020
2495849
fix reading of nested columns in compact format
CurtizJ Jan 9, 2020
9752180
fix reading of nested columns in compact format
CurtizJ Jan 9, 2020
6f09b5f
fix reading of nested columns in compact format
CurtizJ Jan 9, 2020
bae3aa3
simplify data part checking
CurtizJ Jan 13, 2020
1011675
avoid errors with compact non-adaptive parts
CurtizJ Jan 13, 2020
f156962
add part type to system.parts table
CurtizJ Jan 13, 2020
18eacfe
ignore compact parts in MergeTreeWhereOptimizer
CurtizJ Jan 13, 2020
ce914cb
refactor code near MergeTreeDataPart
CurtizJ Jan 14, 2020
27750f0
Merge remote-tracking branch 'upstream/master' into polymorphic-parts
CurtizJ Jan 15, 2020
b3bd306
improve performance of compact parts
CurtizJ Jan 15, 2020
3ff8f42
remove almost useless columns sizes from compact parts
CurtizJ Jan 15, 2020
7a549b2
implement 'checkConsistency' method in compact parts
CurtizJ Jan 15, 2020
2797873
code cleanup
CurtizJ Jan 16, 2020
b0906ab
code cleanup
CurtizJ Jan 17, 2020
d073187
fix mutations
CurtizJ Jan 20, 2020
6a29525
add some comments
CurtizJ Jan 21, 2020
1370987
add some tests
CurtizJ Jan 21, 2020
9275225
Merge remote-tracking branch 'upstream/master' into polymorphic-parts
CurtizJ Jan 21, 2020
8183a79
make all parts wide by default
CurtizJ Jan 22, 2020
771e429
fix tests
CurtizJ Jan 22, 2020
2d7ff40
Merge remote-tracking branch 'upstream/master' into polymorphic-parts
CurtizJ Feb 3, 2020
257bb3b
add comments near DataPart code
CurtizJ Feb 3, 2020
a0635ed
better marks reading
CurtizJ Feb 3, 2020
1785b27
fix build
CurtizJ Feb 3, 2020
31c39c8
fix mutations
CurtizJ Feb 4, 2020
2f9f5df
better initialization of MergedBlockOutputStream
CurtizJ Feb 4, 2020
2780250
better code near data part writer
CurtizJ Feb 6, 2020
c72c38a
require strict part type in MergeTreeReaders
CurtizJ Feb 10, 2020
b26a8b5
choose part type while selecting parts to merge
CurtizJ Feb 11, 2020
59c4f53
fix polymorphic parts fetching
CurtizJ Feb 11, 2020
64e1883
better replication with compact parts
CurtizJ Feb 13, 2020
6e1734f
remove rarely used createPart overload
CurtizJ Feb 13, 2020
d39179b
add integration tests
CurtizJ Feb 13, 2020
ddb3a55
Merge remote-tracking branch 'upstream/master' into polymorphic-parts
CurtizJ Feb 14, 2020
73b1ac0
add integration tests for polymorphic parts
CurtizJ Feb 14, 2020
cb9936c
better checking if we can use polymorphic parts
CurtizJ Feb 14, 2020
1612fef
better test
CurtizJ Feb 17, 2020
2359299
comments and refactoring
CurtizJ Feb 19, 2020
6bc5d98
tests for compact parts
CurtizJ Feb 19, 2020
3f7f13c
Merge remote-tracking branch 'upstream/master' into polymorphic-parts
CurtizJ Feb 20, 2020
1950923
Merge branch 'master' into polymorphic-parts
alexey-milovidov Feb 22, 2020
b736029
Update ErrorCodes.cpp
CurtizJ Feb 22, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions dbms/src/Common/ErrorCodes.cpp
Expand Up @@ -484,6 +484,7 @@ namespace ErrorCodes
extern const int CACHE_DICTIONARY_UPDATE_FAIL = 510;
extern const int UNKNOWN_ROLE = 511;
extern const int SET_NON_GRANTED_ROLE = 512;
extern const int UNKNOWN_PART_TYPE = 513;

extern const int KEEPER_EXCEPTION = 999;
extern const int POCO_EXCEPTION = 1000;
Expand Down
11 changes: 10 additions & 1 deletion dbms/src/DataStreams/MarkInCompressedFile.h
Expand Up @@ -40,6 +40,15 @@ struct MarkInCompressedFile

};

using MarksInCompressedFile = PODArray<MarkInCompressedFile>;
class MarksInCompressedFile : public PODArray<MarkInCompressedFile>
{
public:
MarksInCompressedFile(size_t n) : PODArray(n) {}

void read(ReadBuffer & buffer, size_t from, size_t count)
{
buffer.readStrict(reinterpret_cast<char *>(data() + from), count * sizeof(MarkInCompressedFile));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks dangerous because it neither checks nor resizes the array.

}
};

}
4 changes: 2 additions & 2 deletions dbms/src/DataStreams/TTLBlockInputStream.cpp
Expand Up @@ -36,7 +36,7 @@ TTLBlockInputStream::TTLBlockInputStream(
{
if (force || isTTLExpired(ttl_info.min))
{
new_ttl_infos.columns_ttl.emplace(name, MergeTreeDataPart::TTLInfo{});
new_ttl_infos.columns_ttl.emplace(name, IMergeTreeDataPart::TTLInfo{});
empty_columns.emplace(name);

auto it = column_defaults.find(name);
Expand Down Expand Up @@ -98,7 +98,7 @@ void TTLBlockInputStream::readSuffixImpl()
new_ttl_infos.updatePartMinMaxTTL(new_ttl_infos.table_ttl.min, new_ttl_infos.table_ttl.max);

data_part->ttl_infos = std::move(new_ttl_infos);
data_part->empty_columns = std::move(empty_columns);
data_part->expired_columns = std::move(empty_columns);

if (rows_removed)
LOG_INFO(log, "Removed " << rows_removed << " rows with expired TTL from part " << data_part->name);
Expand Down
6 changes: 3 additions & 3 deletions dbms/src/DataStreams/TTLBlockInputStream.h
@@ -1,7 +1,7 @@
#pragma once
#include <DataStreams/IBlockInputStream.h>
#include <Storages/MergeTree/MergeTreeData.h>
#include <Storages/MergeTree/MergeTreeDataPart.h>
#include <Storages/MergeTree/IMergeTreeDataPart.h>
#include <Core/Block.h>

#include <common/DateLUT.h>
Expand Down Expand Up @@ -39,8 +39,8 @@ class TTLBlockInputStream : public IBlockInputStream
time_t current_time;
bool force;

MergeTreeDataPart::TTLInfos old_ttl_infos;
MergeTreeDataPart::TTLInfos new_ttl_infos;
IMergeTreeDataPart::TTLInfos old_ttl_infos;
IMergeTreeDataPart::TTLInfos new_ttl_infos;
NameSet empty_columns;

size_t rows_removed = 0;
Expand Down
4 changes: 3 additions & 1 deletion dbms/src/Interpreters/MutationsInterpreter.cpp
Expand Up @@ -165,7 +165,7 @@ bool isStorageTouchedByMutations(

MutationsInterpreter::MutationsInterpreter(
StoragePtr storage_,
std::vector<MutationCommand> commands_,
MutationCommands commands_,
const Context & context_,
bool can_execute_)
: storage(std::move(storage_))
Expand Down Expand Up @@ -437,6 +437,8 @@ ASTPtr MutationsInterpreter::prepareInterpreterSelectQuery(std::vector<Stage> &

if (i > 0)
prepared_stages[i].output_columns = prepared_stages[i - 1].output_columns;
else if (!commands.additional_columns.empty())
prepared_stages[i].output_columns.insert(commands.additional_columns.begin(), commands.additional_columns.end());

if (prepared_stages[i].output_columns.size() < all_columns.size())
{
Expand Down
4 changes: 2 additions & 2 deletions dbms/src/Interpreters/MutationsInterpreter.h
Expand Up @@ -23,7 +23,7 @@ class MutationsInterpreter
public:
/// Storage to mutate, array of mutations commands and context. If you really want to execute mutation
/// use can_execute = true, in other cases (validation, amount of commands) it can be false
MutationsInterpreter(StoragePtr storage_, std::vector<MutationCommand> commands_, const Context & context_, bool can_execute_);
MutationsInterpreter(StoragePtr storage_, MutationCommands commands_, const Context & context_, bool can_execute_);

void validate(TableStructureReadLockHolder & table_lock_holder);

Expand All @@ -44,7 +44,7 @@ class MutationsInterpreter
BlockInputStreamPtr addStreamsForLaterStages(const std::vector<Stage> & prepared_stages, BlockInputStreamPtr in) const;

StoragePtr storage;
std::vector<MutationCommand> commands;
MutationCommands commands;
const Context & context;
bool can_execute;

Expand Down
2 changes: 1 addition & 1 deletion dbms/src/Interpreters/PartLog.cpp
Expand Up @@ -7,7 +7,7 @@
#include <DataTypes/DataTypeDate.h>
#include <DataTypes/DataTypeString.h>
#include <DataTypes/DataTypeEnum.h>
#include <Storages/MergeTree/MergeTreeDataPart.h>
#include <Storages/MergeTree/IMergeTreeDataPart.h>
#include <Storages/MergeTree/MergeTreeData.h>
#include <Interpreters/PartLog.h>

Expand Down
4 changes: 2 additions & 2 deletions dbms/src/Interpreters/PartLog.h
Expand Up @@ -51,15 +51,15 @@ struct PartLogElement
void appendToBlock(Block & block) const;
};

struct MergeTreeDataPart;
class IMergeTreeDataPart;


/// Instead of typedef - to allow forward declaration.
class PartLog : public SystemLog<PartLogElement>
{
using SystemLog<PartLogElement>::SystemLog;

using MutableDataPartPtr = std::shared_ptr<MergeTreeDataPart>;
using MutableDataPartPtr = std::shared_ptr<IMergeTreeDataPart>;
using MutableDataPartsVector = std::vector<MutableDataPartPtr>;

public:
Expand Down
24 changes: 24 additions & 0 deletions dbms/src/Storages/MergeTree/AlterAnalysisResult.h
@@ -0,0 +1,24 @@
#pragma once
#include <Interpreters/ExpressionActions.h>

namespace DB
{
struct AlterAnalysisResult
{
/// Expression for column type conversion.
/// If no conversions are needed, expression=nullptr.
ExpressionActionsPtr expression = nullptr;

/// Denotes if metadata must be changed even if no file should be overwritten
/// (used for transformation-free changing of Enum values list).
bool force_update_metadata = false;

std::map<String, const IDataType *> new_types;

/// For every column that need to be converted: source column name,
/// column name of calculated expression for conversion.
std::vector<std::pair<String, String>> conversions;
NamesAndTypesList removed_columns;
Names removed_indices;
};
}
9 changes: 3 additions & 6 deletions dbms/src/Storages/MergeTree/DataPartsExchange.cpp
Expand Up @@ -233,7 +233,7 @@ MergeTreeData::MutableDataPartPtr Fetcher::fetchPart(
readBinary(sum_files_size, in);
if (server_protocol_version == REPLICATION_PROTOCOL_VERSION_WITH_PARTS_SIZE_AND_TTL_INFOS)
{
MergeTreeDataPart::TTLInfos ttl_infos;
IMergeTreeDataPart::TTLInfos ttl_infos;
String ttl_infos_string;
readBinary(ttl_infos_string, in);
ReadBufferFromString ttl_infos_buffer(ttl_infos_string);
Expand Down Expand Up @@ -279,11 +279,6 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart(

part_file.createDirectory();

MergeTreeData::MutableDataPartPtr new_data_part = std::make_shared<MergeTreeData::DataPart>(data, reservation->getDisk(), part_name);
new_data_part->relative_path = relative_part_path;
new_data_part->is_temp = true;


MergeTreeData::DataPart::Checksums checksums;
for (size_t i = 0; i < files; ++i)
{
Expand Down Expand Up @@ -327,6 +322,8 @@ MergeTreeData::MutableDataPartPtr Fetcher::downloadPart(

assertEOF(in);

MergeTreeData::MutableDataPartPtr new_data_part = data.createPart(part_name, reservation->getDisk(), relative_part_path);
new_data_part->is_temp = true;
new_data_part->modification_time = time(nullptr);
new_data_part->loadColumnsChecksumsIndexes(true, false);
new_data_part->checksums.checkEqual(checksums, false);
Expand Down