Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cats: postgresql introduce pl/sql lstat_decode() function #1466

Merged
merged 2 commits into from
Jul 4, 2023

Conversation

bruno-at-bareos
Copy link
Contributor

@bruno-at-bareos bruno-at-bareos commented May 11, 2023

Introduce a new pl/sql function that can decode file.lstat field with PostgreSQL language and tool.

List of Improvements made to the original function

  • Moving to parallel safe
    • Time: 33375.061 ms (00:33.375) Original query
    • Time: 32689.084 ms (00:32.689) Adding Parallel safe
  • Replace regexp_split_to_array by string_to_array is faster (Arogge) we won 10 seconds
    - Time: 23095.753 ms (00:23.096)
  • Replace length() for each loop to 1 var win another 1 second
  • Reimplement FromBase64() C function to handle bigint and skip negative field
  • LinkFi need a review compared to C code (here bigint)
  • Modify lstat_decode() to return a table without the need to create an empty table making the use evern easier if not faster
  • Future investigation could get_bit,get_byte function to gain speed in bareos_frombase64()

Note it is not possible to replace the bareos_frombase64() sql function by decode due to the way the algorithm was implemented.

convert_from(decode(fields[1], 'base64'), 'SQL_ASCII');

psql:lstat_decode_test.sql:1: ERROR:  invalid base64 end sequence
HINT:  Input data is missing padding, is truncated, or is otherwise corrupted.
CONTEXT:  PL/pgSQL function decode_lstat(text) line 8 at assignment
Time: 385.523 ms

Testing on different jobs with high number of files, decoding the 16 fields in psql.

Top 10 decode_lstat

 nb_files │       │ jobid │ Old function               │ New function               │
══════════╪═══════╪═══════╪════════════════════════════╪════════════════════════════│
 62317486 │   62M │ 14873 │  aborted >1h               │ 1434870.685 ms (23:54.871) │
 45876323 │   45M │ 15427 │ 1754329.892 ms (29:14.330) │  999460.792 ms (16:39.461) │
 10086553 │   10M │ 15308 │  857101.268 ms (14:17.101) │  215277.489 ms (03:35.277) │
  1049440 │    1M │ 15422 │   59892.181 ms (00:59.892) │   22682.788 ms (00:22.683) │

The time spent to get an answer is proportional (and almost predictable) to the number of lines to scan.
Of course it is bad to parse 62M line to keep only 10.

The actual query plan look like

 explain analyze
select
  p.path    ,
  f.name    ,
  st_dev    ,
  st_ino    ,
  st_mode   ,
  st_nlink  ,
  st_uid    ,
  st_gid    ,
  st_rdev   ,
  st_size   ,
  st_blksize,
  st_blocks ,
  st_atime  ,
  st_mtime  ,
  st_ctime  ,
  LinkFI    ,
  st_flags  ,
  data
from file f, path p, decode_lstat(f.lstat)
where p.pathid = f.pathid
and f.jobid = 15422 order by st_size limit 10;
                                                                                   QUERY PLAN
═════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
 Limit  (cost=20531078.37..20531079.54 rows=10 width=290) (actual time=34138.720..34153.014 rows=10 loops=1)
   ->  Gather Merge  (cost=20531078.37..129223097.51 rows=931580834 width=290) (actual time=34005.535..34019.828 rows=10 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=20530078.35..21694554.39 rows=465790417 width=290) (actual time=33986.939..33986.940 rows=10 loops=3)
               Sort Key: decode_lstat.st_size
               Sort Method: top-N heapsort  Memory: 27kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 29kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 29kB
               ->  Nested Loop  (cost=1.52..10464514.90 rows=465790417 width=290) (actual time=129.285..33768.062 rows=349813 loops=3)
                     ->  Nested Loop  (cost=1.27..1148714.65 rows=465790 width=219) (actual time=127.856..866.435 rows=349813 loops=3)
                           ->  Parallel Index Scan using file_jpfid_idx on file f  (cost=0.70..1009023.84 rows=465790 width=85) (actual time=0.050..448.124 rows=349813 loops=3)
                                 Index Cond: (jobid = 15422)
                           ->  Memoize  (cost=0.58..2.22 rows=1 width=142) (actual time=0.001..0.001 rows=1 loops=1049440)
                                 Cache Key: f.pathid
                                 Cache Mode: logical
                                 Hits: 341555  Misses: 3185  Evictions: 0  Overflows: 0  Memory Usage: 502kB
                                 Worker 0:  Hits: 360194  Misses: 3179  Evictions: 0  Overflows: 0  Memory Usage: 513kB
                                 Worker 1:  Hits: 337881  Misses: 3446  Evictions: 0  Overflows: 0  Memory Usage: 555kB
                                 ->  Index Scan using path_pkey on path p  (cost=0.57..2.21 rows=1 width=142) (actual time=0.003..0.003 rows=1 loops=9810)
                                       Index Cond: (pathid = f.pathid)
                     ->  Function Scan on decode_lstat  (cost=0.25..10.25 rows=1000 width=128) (actual time=0.093..0.093 rows=1 loops=1049440)
 Planning Time: 0.295 ms
 JIT:
   Functions: 49
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 3.653 ms, Inlining 145.424 ms, Optimization 222.947 ms, Emission 147.994 ms, Total 520.019 ms
 Execution Time: 34154.113 ms
(28 rows)

Time: 34154.905 ms (00:34.155)

We also tested the creation of a jsonb column but this create as expected a too large burden on table file size ( ~2.5x bigger), even if then the attribute can be queries directly.

We are still investigating if the lstat can be stored in a more efficient way, which would allow good performance on querying the attributes.

      relname       │  total_sz  │   idx_sz   │ reltuples │ bytes_per_row │
════════════════════╪════════════╪════════════╪═══════════╪═══════════════│
 file_jsonb         │ 195 GB     │ 0 bytes    │ 353025984 │           592 │
 file               │ 80 GB      │ 26 GB      │ 353020128 │           243 │

note the above table will be edited with a file_lstat table containing all additional column (16 fields of bigint)

Some of the ideas to store differently lstat field.

  • native bytea directly the output (with platform agnostic format) of stat
  • text: array decoded

Please check

  • Short description and the purpose of this PR is present above this paragraph
  • Your name is present in the AUTHORS file (optional)

If you have any questions or problems, please give a comment in the PR.

Helpful documentation and best practices

Checklist for the reviewer of the PR (will be processed by the Bareos team)

Make sure you check/merge the PR using devtools/pr-tool to have some simple automated checks run and a proper changelog record added.

General
  • Is the PR title usable as CHANGELOG entry?
  • Purpose of the PR is understood
  • Commit descriptions are understandable and well formatted
  • Check backport line
  • Required backport PRs have been created
Source code quality
  • Source code changes are understandable
  • Variable and function names are meaningful
  • Code comments are correct (logically and spelling)
  • Required documentation changes are present and part of the PR

@bruno-at-bareos bruno-at-bareos changed the title cats: postgresql introduce lstat_decode.sql cats: postgresql introduce pl/sql lstat_decode() function Jun 21, 2023
@bruno-at-bareos bruno-at-bareos marked this pull request as ready for review June 21, 2023 14:07
Copy link
Member

@arogge arogge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look too bad, but what if I just want the size? It is still going to decode all the other fields I don't care about.
How bad is the performance impact for that? Maybe it is worth adding something like size_from_lstat(text)

bareos_frombase64(fields[4]) st_nlink,
bareos_frombase64(fields[5]) st_uid,
bareos_frombase64(fields[6]) st_gid,
bareos_frombase64(fields[3]) st_mode,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one it out of order

res bigint default 0;
rf text default '';
base64 text default 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/';
z integer := char_length(field); -- minus 1s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does that comment mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch I forget to remove them. will do

declare
fields text[];
begin
fields = string_to_array(lst, ' '); -- minus 10s
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does that comment mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch I forget to remove them. will do

@bruno-at-bareos
Copy link
Contributor Author

You make a point, I must think about and test decoding one of the field directly but I'm not sure if calling bareos_frombase64() one time is really quicker that the whole field.

I would like also maybe create a small test to check that no variation appear with newer code (pg or our).

@bruno-at-bareos
Copy link
Contributor Author

bruno-at-bareos commented Jun 22, 2023

@arogge so to allow searching whatever parameter you need I've adapted the function to support parameters (by default all fields are returned).

Finding top 10 only st_size Time: 9613.526 ms (00:09.614) over  353'020'126 rows
versus all fields
Time: 25584.687 ms (00:25.585)

@bruno-at-bareos bruno-at-bareos force-pushed the dev/bruno/lstat_research branch 3 times, most recently from f4b4b7f to 339f30a Compare June 27, 2023 12:31
@bruno-at-bareos
Copy link
Contributor Author

Forget to say that you can check the assumption that parallel safe is applied for example with this kind of query

regress_upgrade_database=# SELECT proname, proparallel FROM   pg_proc where proname in ('decode_lstat','bareos_frombase64');
      proname      | proparallel
-------------------+-------------
 decode_lstat      | s
 bareos_frombase64 | s

commit;

begin;
create or replace function exec(raw_query text) returns boolean as $$
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this post you can just name the function pg_temp.exec. Then you don't have to drop and and you can be pretty sure that it won't clash with anything, because it is session-local.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitively a good idea, will implement that.

@@ -469,7 +469,7 @@ class pathid_cache;
#define QUERY_HTABLE_PAGES 128

// Current database version number for all drivers
#define BDB_VERSION 2210
#define BDB_VERSION 2301
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably open to discussion, but previously the schema was 2000 + 10 * Major + Minor, which would result in 2230 for Bareos 23.0.0.
However, I don't really care as long as it is strictly increasing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I was trying to understand the pattern behind hit. So I will adapt, but also add the formula as comment, so next contributor will have its life eased ;-)

- introduce a new sql function decode_lstat to help querying lstat field
  of file table with PostgreSQL language and tool.
  By default the function will return all available fields in lstat
  you can restrict the desired field(s) by passing those in the param
  text array.
- improvements made to the original function:
  + Moving to parallel safe gain 1 sec
    Time: 33375.061 ms (00:33.375) - Original query
    Time: 32689.084 ms (00:32.689) - Adding Parallel safe
  + Replace regexp_split_to_array by faster string_to_array (Arogge)
    Time: 23095.753 ms (00:23.096) - 10 seconds gain
  + Replace length() for each loop with a variable; another 1s win
  + Reimplement FromBase64() function to handle bigint, skip negative
  + LinkFi need a review compared to C code (here bigint)
  + Modify lstat_decode() to return a table without the need to create
    an empty table make the use easier
  + Future investigation get_bit,get_byte function may offer
    further gain speed in bareos_frombase64()
- add a query for bconsole 22 list top XX biggest files for a jobid
- remove sql instruction from update script
- ddl scripts: unset PAGER before psql call
- move basefiles_baseid_seq alter to sql script
- add decode lstat function to update script
- as rhel7 use PostgreSQL < 10 function attribue parallel safe is
  applied with an alter instruction only if pg > 10
  use pg_temp.exec function so no drop is needed
- cats: update database version to 2230
- add database version pattern to cats.h

Signed-off-by: Bruno Friedmann <bruno.friedmann@bareos.com>
@BareosBot BareosBot merged commit 1ad4771 into bareos:master Jul 4, 2023
@bruno-at-bareos bruno-at-bareos deleted the dev/bruno/lstat_research branch July 4, 2023 11:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants