-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cats: postgresql introduce pl/sql lstat_decode() function #1466
cats: postgresql introduce pl/sql lstat_decode() function #1466
Conversation
1e84a06
to
bdc4877
Compare
bdc4877
to
e3d2ae7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't look too bad, but what if I just want the size? It is still going to decode all the other fields I don't care about.
How bad is the performance impact for that? Maybe it is worth adding something like size_from_lstat(text)
bareos_frombase64(fields[4]) st_nlink, | ||
bareos_frombase64(fields[5]) st_uid, | ||
bareos_frombase64(fields[6]) st_gid, | ||
bareos_frombase64(fields[3]) st_mode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this one it out of order
res bigint default 0; | ||
rf text default ''; | ||
base64 text default 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'; | ||
z integer := char_length(field); -- minus 1s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does that comment mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch I forget to remove them. will do
declare | ||
fields text[]; | ||
begin | ||
fields = string_to_array(lst, ' '); -- minus 10s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does that comment mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch I forget to remove them. will do
You make a point, I must think about and test decoding one of the field directly but I'm not sure if calling bareos_frombase64() one time is really quicker that the whole field. I would like also maybe create a small test to check that no variation appear with newer code (pg or our). |
@arogge so to allow searching whatever parameter you need I've adapted the function to support parameters (by default all fields are returned).
|
f4b4b7f
to
339f30a
Compare
Forget to say that you can check the assumption that parallel safe is applied for example with this kind of query regress_upgrade_database=# SELECT proname, proparallel FROM pg_proc where proname in ('decode_lstat','bareos_frombase64');
proname | proparallel
-------------------+-------------
decode_lstat | s
bareos_frombase64 | s |
a6793fc
to
6d3a4aa
Compare
commit; | ||
|
||
begin; | ||
create or replace function exec(raw_query text) returns boolean as $$ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to this post you can just name the function pg_temp.exec
. Then you don't have to drop and and you can be pretty sure that it won't clash with anything, because it is session-local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitively a good idea, will implement that.
core/src/cats/cats.h
Outdated
@@ -469,7 +469,7 @@ class pathid_cache; | |||
#define QUERY_HTABLE_PAGES 128 | |||
|
|||
// Current database version number for all drivers | |||
#define BDB_VERSION 2210 | |||
#define BDB_VERSION 2301 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably open to discussion, but previously the schema was 2000 + 10 * Major + Minor, which would result in 2230 for Bareos 23.0.0.
However, I don't really care as long as it is strictly increasing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks I was trying to understand the pattern behind hit. So I will adapt, but also add the formula as comment, so next contributor will have its life eased ;-)
552682d
to
88272dc
Compare
- introduce a new sql function decode_lstat to help querying lstat field of file table with PostgreSQL language and tool. By default the function will return all available fields in lstat you can restrict the desired field(s) by passing those in the param text array. - improvements made to the original function: + Moving to parallel safe gain 1 sec Time: 33375.061 ms (00:33.375) - Original query Time: 32689.084 ms (00:32.689) - Adding Parallel safe + Replace regexp_split_to_array by faster string_to_array (Arogge) Time: 23095.753 ms (00:23.096) - 10 seconds gain + Replace length() for each loop with a variable; another 1s win + Reimplement FromBase64() function to handle bigint, skip negative + LinkFi need a review compared to C code (here bigint) + Modify lstat_decode() to return a table without the need to create an empty table make the use easier + Future investigation get_bit,get_byte function may offer further gain speed in bareos_frombase64() - add a query for bconsole 22 list top XX biggest files for a jobid - remove sql instruction from update script - ddl scripts: unset PAGER before psql call - move basefiles_baseid_seq alter to sql script - add decode lstat function to update script - as rhel7 use PostgreSQL < 10 function attribue parallel safe is applied with an alter instruction only if pg > 10 use pg_temp.exec function so no drop is needed - cats: update database version to 2230 - add database version pattern to cats.h Signed-off-by: Bruno Friedmann <bruno.friedmann@bareos.com>
88272dc
to
015652a
Compare
Introduce a new pl/sql function that can decode file.lstat field with PostgreSQL language and tool.
List of Improvements made to the original function
- Time: 23095.753 ms (00:23.096)
Note it is not possible to replace the bareos_frombase64() sql function by decode due to the way the algorithm was implemented.
Testing on different jobs with high number of files, decoding the 16 fields in psql.
Top 10 decode_lstat
The time spent to get an answer is proportional (and almost predictable) to the number of lines to scan.
Of course it is bad to parse 62M line to keep only 10.
The actual query plan look like
We also tested the creation of a jsonb column but this create as expected a too large burden on table file size ( ~2.5x bigger), even if then the attribute can be queries directly.
We are still investigating if the lstat can be stored in a more efficient way, which would allow good performance on querying the attributes.
note the above table will be edited with a file_lstat table containing all additional column (16 fields of bigint)
Some of the ideas to store differently lstat field.
Please check
If you have any questions or problems, please give a comment in the PR.
Helpful documentation and best practices
Checklist for the reviewer of the PR (will be processed by the Bareos team)
Make sure you check/merge the PR using
devtools/pr-tool
to have some simple automated checks run and a proper changelog record added.General
Source code quality