-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accounting of temporary data on the disk, pt1 #40893
Conversation
9692119
to
0829dc4
Compare
0b030d9
to
ddfa074
Compare
00ff6b4
to
fb1a319
Compare
@kssenii refactoring of partial merge join to use new interface is pretty tricky. Probably it's better to make it in several steps. Let's finish this PR with sorting and aggregation to have it in master, and I'll continue the rest of TODOs in separate PRs. |
|
||
|
||
/* | ||
* Used to account amount of temporary data written to dicsk. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Used to account amount of temporary data written to dicsk. | |
* Used to account amount of temporary data written to disk. |
extern const int NOT_ENOUGH_SPACE; | ||
} | ||
|
||
void TemporaryDataOnDiskScope::deltaAlloc(int comp_delta, int uncomp_delta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably compressed
instead of comp
would be more readable, takes a few seconds to understand what comp
is.
|
||
void TemporaryDataOnDiskScope::deltaAlloc(int comp_delta, int uncomp_delta) | ||
{ | ||
size_t current_consuption = stat.compressed_size.fetch_add(comp_delta) + comp_delta; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why comp_delta
is added twice? should it be just size_t current_consuption = stat.compressed_size.fetch_add(comp_delta)
?
if (parent) | ||
parent->deltaAlloc(comp_delta, uncomp_delta); | ||
|
||
if (comp_delta > 0 && limit && current_consuption > limit) | ||
throw Exception("Memory limit exceeded", ErrorCodes::MEMORY_LIMIT_EXCEEDED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we catch exception from parent->deltaAlloc
and restore just updated stat
before rethrowing?
"Disk '{}' is not local and can't be used for temporary files", disk->getName()); | ||
} | ||
|
||
shared->temp_data_on_disk = std::make_shared<TemporaryDataOnDiskScope>(volume, max_size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if shared->temp_data_on_disk
was already set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be set only here in setTemporaryStorage
and setTemporaryStorage
is called once.
parent->deltaAlloc(comp_delta, uncomp_delta); | ||
|
||
if (comp_delta > 0 && limit && current_consuption > limit) | ||
throw Exception("Memory limit exceeded", ErrorCodes::MEMORY_LIMIT_EXCEEDED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why exception is about memory limit if the data is stored on disk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to ErrorCodes::TOO_MANY_ROWS_OR_BYTES
if (out_writer) | ||
{ | ||
out_writer->finalize(); | ||
updateAlloc(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did not understand why zero update alloc updateAlloc(0)
is needed, looks noop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The finalize
probably can flush some buffers, so amount of bytes can be changed. Argument is amount of rows and it's a bit inconsistent (doesn't take buffered data into account), so removed num rows at all.
7d526aa
to
e40e832
Compare
src/Disks/DiskDecorator.h
Outdated
@@ -107,6 +107,13 @@ class DiskDecorator : public IDisk | |||
bool supportsChmod() const override { return delegate->supportsChmod(); } | |||
void chmod(const String & path, mode_t mode) override { delegate->chmod(path, mode); } | |||
|
|||
const IDisk & getNestedDisk() const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe getNestedDiskOrThis()
?
e40e832
to
78251d3
Compare
This reverts commit 46e4f02.
78251d3
to
f495361
Compare
CI failures:
|
@@ -399,6 +399,9 @@ static constexpr UInt64 operator""_GiB(unsigned long long value) | |||
M(UInt64, max_network_bandwidth_for_user, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running user queries. Zero means unlimited.", 0)\ | |||
M(UInt64, max_network_bandwidth_for_all_users, 0, "The maximum speed of data exchange over the network in bytes per second for all concurrently running queries. Zero means unlimited.", 0) \ | |||
\ | |||
M(UInt64, max_temp_data_on_disk_size_for_user, 0, "The maximum amount of data consumed by temporary files on disk in bytes for all concurrently running user queries. Zero means unlimited.", 0)\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
temp -> temporary please.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need "temporary" instead of "temp" for the following reasons:
- We hate abbrs wholeheartedly.
- If you abbreviate "temporary", it is difficult to remember if it should be "temp" or "tmp" or "t".
- There are other settings with the full "temporary" word.
- The "temp" abbreviation can be confused with the words "tempo" and "temperature" and "Temple OS".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename it before it is in release #41984
Also, I noticed that there's no setting for global. I'll add it. Where should it be configured? In config.xml, I suppose, not in settings? What section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
config.xml, similarly to max_memory_usage_for_server
.
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
max_temporary_data_on_disk_size_for_user
/max_temporary_data_on_disk_size_for_query
.Use new interface for: