Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Storage dictionary #763

Merged
merged 20 commits into from
Jun 13, 2017
Merged

Storage dictionary #763

merged 20 commits into from
Jun 13, 2017

Conversation

KochetovNicolai
Copy link
Member

Added:

  • StorageDictionary storage
  • DictionaryBlockInputStream and RangeDictionaryBlockInputStream implementation of the IBlockInputStream representing in-memry stored dictionary data

Changed:
ComplexKeyHashDictionary and ComplexKeyCacheDictionary key serialization format

@@ -53,6 +55,8 @@ struct IDictionaryBase

virtual bool isInjective(const std::string & attribute_name) const = 0;

virtual BlockInputStreamPtr getBlockInputStream(const Names & column_names) const = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment.

{
const auto keys_size = key_columns.size();
size_t sum_keys_size{};
for (const auto i : ext::range(0, keys_size))

Arena arena;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary Arena for every row is very expensive.
Better to implement overload IColumn::serializeValueIntoArena for ArenaWithFreeLists.
(or call it IColumn::serializeValueIntoArenaWithFreeLists and provide a wrapper here.)

@@ -208,6 +210,10 @@ class CacheDictionary final : public IDictionary
const std::vector<Key> & requested_ids, PresentIdHandler && on_cell_updated,
AbsentIdHandler && on_id_not_found) const;

PaddedPODArray<Key> getCachedIds() const;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should not be public.
BTW, you don't have the same method in ComplexKeyCacheDictionary
and this is inconsistent. Better to remove this method from here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved to private

}

using BlockInputStreamType = DictionaryBlockInputStream<ComplexKeyCacheDictionary, UInt64>;
return std::move(std::make_unique<BlockInputStreamType>(*this, keys, column_names));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not std::make_shared instead of std::move(std::make_unique<BlockInputStreamType>(?

std::vector<StringRef> ComplexKeyHashedDictionary::getKeys(const Attribute & attribute) const
{
const ContainerType<T> & attr = *std::get<ContainerPtrType<T>>(attribute.maps);
std::vector<StringRef> keys;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should reserve space for vector.
In other places too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added reserve call

BlockInputStreamPtr ComplexKeyHashedDictionary::getBlockInputStream(const Names & column_names) const
{
using BlockInputStreamType = DictionaryBlockInputStream<ComplexKeyHashedDictionary, UInt64>;
return std::move(std::make_unique<BlockInputStreamType>(*this, getKeys(), column_names));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See prev. comment.


/*
* BlockInputStream implementation for external dictionaries
* read() returns single block consisting of the in-memory contents of the dictionaries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Single block will be too large for some dictionaries. It's better to generate blocks of max_block_size.

if (attribute) {
name = attribute->name;
}
if (column_names.find(name) != column_names.end()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style.

columns.emplace_back(column, attribute.type, attribute.name);
}
}
block = Block(columns);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now std::move will be appropriate.

@@ -262,7 +249,7 @@ StoragePtr StorageFactory::get(
}
else if (name == "View")
{
return StorageView::create(
return StorageView::create(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should setup your IDE to trim trailing whitespaces.

@@ -225,15 +225,6 @@ static void setGraphitePatternsFromConfig(const Context & context,
}


/// Some types are only for intermediate values of expressions and cannot be used in tables.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad merge. You have deleted my changes.

}
};

}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to setup your IDE to add newlines at end of file. This does not matter a lot (for C++11), but will be more consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@alexey-milovidov alexey-milovidov merged commit e03ae57 into master Jun 13, 2017
@Felixoid Felixoid deleted the storage-dictionary branch February 16, 2022 15:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants