Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Actual space consumption higher despite repetitive nature #26

Closed
tomas0svk opened this issue Nov 6, 2023 · 3 comments
Closed

Actual space consumption higher despite repetitive nature #26

tomas0svk opened this issue Nov 6, 2023 · 3 comments

Comments

@tomas0svk
Copy link

tomas0svk commented Nov 6, 2023

Hello.
I am trying to evaluate ZSON to store some metadata (as one of many columns in PG table), that should be very well suited for ZSON

  • table has 200k rows
  • json column uses 50MB of disk space, average size is 327 (some are null)
  • dev environment - data are very repetitive (see later), i'm not fiddling with prod ofc.

Steps:

  1. Performed steps from Install section
  2. Ran zson_learn with defaults on a jsonb column of a table. The whole dictionary consists of 28 rows.
  3. Altered table to add a column of zson type
  4. Ran update table, to set zson_column to be jsonb_to_zson(original_column)

The problem is, that according to pg_column_size, the zson column is 7% larger compared to jsonb type.
Also, if I run zson_info, the result is no different
zson version = 0, dict version = 0, jsonb size = 327, zson size (without pglz compression) = 349 (106.73%)
zson version = 0, dict version = 0, jsonb size = 324, zson size (without pglz compression) = 362 (111.73%)
zson version = 0, dict version = 0, jsonb size = 331, zson size (without pglz compression) = 349 (105.44%)

The json looks lite this:
{ "variables": [ { "type": "ComplexValue", "variableCode": "oneOfVeryFewKeyTypes", "variableValue": [ { "type": "StringValue", "variableCode": "otherOfVeryFewKeyTypes", "variableValue": "oneOf20Values" }, { "type": "StringValue", "variableCode": "otherOfVeryFewKeyTypes", "variableValue": "anotherOf20Values" } ] } ], "schemaVersion": "v4" }

What I am doing wrong?
Thanks in advance.

@afiskon
Copy link
Collaborator

afiskon commented Nov 7, 2023

Could you please show the result of SELECT * FROM zson_dict ?

@tomas0svk
Copy link
Author

Sure.
28 entries, basically all values from jsons on that dummy test env.
Sorry for obfuscation, I tried to keep length. May it be a problem, I can send you exact json and dictionary values.

0 1 type
0 2 variableName
0 3 variables
0 4 String
0 5 variableValue
0 6 Complex
0 7 keyName000001
0 8 keyName00002
0 9 keyName003
0 10 schemaVersion
0 11 v4
0 12 KURZY_OVERVIEW_POIUYTRE
0 13 2.1.4
0 14 LKJHG_QWERTYUI_QWER
0 15 2.1.3
0 16 4.1.1.1
0 17 LKJHG_QWERTYUI_COL1
0 18 v3
0 19 LKJHGFDSA_QWERTYUI_COL2
0 20 4.1.1.2
0 21 LKJHGFDSAQWERTYUI_COL4
0 22 4.1.1.4
0 23 LKJHGFDSA_QWERTYUI_COL3
0 24 4.1.1.3
0 25 ZXCVBNMLK_LKJHGF
0 26 4.3.1
0 27 ABCDE_FGHJKL_QWERTYUI
0 28 2.2.1

@afiskon
Copy link
Collaborator

afiskon commented Nov 8, 2023

I have the same results:

INSERT INTO zson_dict VALUES
(0, 1 , 'type'),
(0, 2 , 'variableName'),
(0, 3 , 'variables'),
(0, 4 , 'String'),
(0, 5 , 'variableValue'),
(0, 6 , 'Complex'),
(0, 7 , 'keyName000001'),
(0, 8 , 'keyName00002'),
(0, 9 , 'keyName003'),
(0, 10,  'schemaVersion'),
(0, 11,  'v4'),
(0, 12,  'KURZY_OVERVIEW_POIUYTRE'),
(0, 13,  '2.1.4'),
(0, 14,  'LKJHG_QWERTYUI_QWER'),
(0, 15,  '2.1.3'),
(0, 16,  '4.1.1.1'),
(0, 17,  'LKJHG_QWERTYUI_COL1'),
(0, 18,  'v3'),
(0, 19,  'LKJHGFDSA_QWERTYUI_COL2'),
(0, 20,  '4.1.1.2'),
(0, 21,  'LKJHGFDSAQWERTYUI_COL4'),
(0, 22,  '4.1.1.4'),
(0, 23,  'LKJHGFDSA_QWERTYUI_COL3'),
(0, 24,  '4.1.1.3'),
(0, 25,  'ZXCVBNMLK_LKJHGF'),
(0, 26,  '4.3.1'),
(0, 27,  'ABCDE_FGHJKL_QWERTYUI'),
(0, 28,  '2.2.1');

select zson_info('{ "variables": [ { "type": "ComplexValue", "variableCode": "oneOfVeryFewKeyTypes", "variableValue": [ { "type": "StringValue", "variableCode": "otherOfVeryFewKeyTypes", "variableValue": "oneOf20Values" }, { "type": "StringValue", "variableCode": "otherOfVeryFewKeyTypes", "variableValue": "anotherOf20Values" } ] } ], "schemaVersion": "v4" }' :: zson);
                                                 zson_info
------------------------------------------------------------------------------------------------------------
 zson version = 0, dict version = 0, jsonb size = 373, zson size (without pglz compression) = 373 (100.00%)

I guess ZSON compression scheme is just not good enough for these documents, or the documents are too small. ZSON was originally developed for documents that consume at least a few KB or so (see an example). For small documents it doesn't give any benefits and if fact the overall performance would probably degrade.

@afiskon afiskon closed this as completed Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants