Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added move_ttl_info to system.parts #10591

Merged
merged 4 commits into from May 21, 2020
Merged

Added move_ttl_info to system.parts #10591

merged 4 commits into from May 21, 2020

Conversation

excitoon
Copy link
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Added move_ttl_info to system.parts in order to provide introspection of move TTL functionality.

@blinkov blinkov added the pr-improvement Pull request with some product improvements label Apr 30, 2020
{"hash_of_uncompressed_files", std::make_shared<DataTypeString>()},
{"uncompressed_hash_of_compressed_files", std::make_shared<DataTypeString>()},

{"move_ttl_info", std::make_shared<DataTypeArray>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what with basic ttl info (for deletion)?

@filimonov
Copy link
Contributor

filimonov commented Apr 30, 2020

Проблема 1 - как показать move_ttl. Варианты - массив туплов (не self-desctibing) или Nested (не всегда удобный, self-describing, используется в разных местах - например system.query_log.ProfileEvents)

Проблема 2 - как показать ttl для части. Мне кажется просто 2 колонки (ttl_min / ttl_max) в system.parts

Проблема 3 - как показать ttl для колонки внутри части. Мне кажется просто 2 колонки (ttl_min / ttl_max) в system.parts_columns

Nested - это по сути несколько массивов с одинаковым префиксом. Типа
move_ttl_info.name Array(String),
move_ttl_info.ttl_min Array(DateTime),
move_ttl_info.ttl_max Array(DateTime)
для Nested есть специальный синтакс - ARRAY JOIN move_ttl_info

Можно вынести это на обсуждение. Но порефакторить это потом будет никак. Как только люди начнут этим пользоваться - нужно будет чтоб структура была всегда такая иначе проблемы с обратной совместимостью.

@alex-zaitsev
Copy link
Contributor

Agree, but it is probably sufficient to have two columns, not 3:

move_ttl_info.name Array(String),
move_ttl_info.ttl_minmax Array((DateTime, DateTime)),

@alexey-milovidov
Copy link
Member

For table TTLs we already have:

TTL expr MOVE
TTL expr DELETE

and there are upcoming

TTL expr DELETE WHERE cond
TTL expr GROUP BY columns... SET col = agg(col), ...

Now suppose we will allow multiple table TTLs.

And we should not limit this new info for TTL MOVE, because it is rarely used feature that is still in experimental stage (see https://github.com/ClickHouse/ClickHouse/pulls/excitoon).

@alexey-milovidov alexey-milovidov added the st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken label May 2, 2020
@alexey-milovidov alexey-milovidov added st-waiting-for-fix We are waiting for fixes in this issue or pull request and removed st-discussion The story requires discussion /research / expert help / design & decomposition before will be taken labels May 10, 2020
@alexey-milovidov alexey-milovidov self-assigned this May 10, 2020
@alexey-milovidov
Copy link
Member

alexey-milovidov commented May 10, 2020

  1. ttl_info should not be only about moves.
  2. it should not be in a single Tuple column, but in multiple separate columns instead.

@alexey-milovidov
Copy link
Member

It's Ok to add info about MOVE TTL but only after we will already have info about ordinary TTL.
I propose to add ttl_info without distinguishing between MOVE and other TTLs.

@excitoon
Copy link
Contributor Author

@alexey-milovidov Historically, there is no expression stored in part for DELETE TTL. If I merge move and delete TTLs into one array, what should I output for delete TTL as an expression?

@alexey-milovidov
Copy link
Member

I don't understand why expression is stored for move ttl but not for delete ttl?
And why we need expression in ttl if we already have it in table definition?

@filimonov
Copy link
Contributor

I don't understand why expression is stored for move ttl but not for delete ttl?

It was not used originally, and i think @excitoon kept it as is for backward compatibility. Otherwise older version will not be able to read ttl for parts created in a newer version (and not sure how it will fail, as the quite simple and stupid parser is used).

And why we need expression in ttl if we already have it in table definition?

If there are (let's say) 3 ttl move rules it's hard to distinguish them otherwise.

So for such ttl rules:

TTL FlightDate TO VOLUME 'fast',
   FlightDate + INTERVAL 3 YEAR TO VOLUME 'med',
   FlightDate + INTERVAL 5 YEAR TO VOLUME 'slow',
   FlightDate + INTERVAL 7 YEAR DELETE

it writes smth like

$ sudo cat /data/fast/data/default/ontime_chunk/2017_113_113_0/ttl.txt
ttl format version: 1
{
 "table": {
    "min":1710457200,
    "max":1710975600
 },
 "moves":[
     {"expression":"FlightDate","min":1489532400,"max":1490050800},
     {"expression":"plus(FlightDate, toIntervalYear(3))","min":1584226800,"max":1584745200},
     {"expression":"plus(FlightDate, toIntervalYear(5))","min":1647298800,"max":1647817200}
 ]
}

and there are upcoming
TTL expr DELETE WHERE cond
TTL expr GROUP BY columns... SET col = agg(col), ...

I'm don't understand the first case - what is the difference with existing TTL delete?
Can you share some 'specs' on that?

#10537

@filimonov
Copy link
Contributor

filimonov commented May 13, 2020

Regarding the usability:

  1. most of the user will use DELETE only, and storing it in the array / other complicated structures seem a bit overkill for that use case
  2. if it will be stored in some complicated structures (array with expressions etc) it will be no clear way to get the list of parts which should be removed in X hour (which were not removed at a time) - i think that would be the most common use case (plus check if the ttl_info was properly updated)

Ie.

SELECT * FROM system.parts WHERE max_ttl < now();

is much clearer than

SELECT * FROM system.parts WHERE ttl_info.max_ttl[indexOf(ttl_info.expression,'plus(FlightDate, toIntervalYear(3))')] < now()

@alexey-milovidov
Copy link
Member

Ok.

BTW, why delete ttl is Array? If we can only have one delete TTL for table, let's make it simple.

@alexey-milovidov
Copy link
Member

alexey-milovidov commented May 18, 2020

I see that this is just a mistake.
Fixed in 8bde836.

@filimonov filimonov self-requested a review May 21, 2020 08:25
Copy link
Contributor

@filimonov filimonov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@alexey-milovidov alexey-milovidov merged commit b6b62db into ClickHouse:master May 21, 2020
@alexey-milovidov
Copy link
Member

But there is no test for this feature :(

@vzakaznikov
Copy link
Contributor

There is move_ttl_info.expression but there is no delete_ttl_info.expression?

@vzakaznikov
Copy link
Contributor

Also, we wanted to have something like last_ttl and next_ttl. Are we going to implement this?

@xiaogaozi
Copy link

I have same question as @vzakaznikov asks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements st-waiting-for-fix We are waiting for fixes in this issue or pull request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants