Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change BYTEA to VARCHAR #2314

Closed
4miners opened this Issue Aug 14, 2018 · 8 comments

Comments

@4miners
Copy link
Member

4miners commented Aug 14, 2018

Expected behavior

We should use VARCHAR data type for storing hexadecimal strings. Reason for that is ease of use, get rid of ENCODE/DECODE, and it's faster for reads (referenece: http://engineering.pivotal.io/post/bytea_versus_text_in_postgresql/).

Actual behavior

We're using BYTEA data type for various fields in database:

  • publicKey
  • secondPublicKey
  • senderPublicKey
  • recipientPublicKey
  • generatorPublicKey
  • requesterPublicKey
  • signature
  • signSignature
  • blockSignature
  • payloadHash
  • broadhash
  • data field of transaction type 0

Steps to reproduce

N/A

Which version(s) does this affect? (Environment, OS, etc...)

all

@nazarhussain

This comment has been minimized.

Copy link
Contributor

nazarhussain commented Aug 27, 2018

Here is a summary db physical size for each data type;

SELECT 
pg_column_size(decode('b65aa5950acf1ade522bcf520f2b2491dcde2f312b4933f56443faff80ad8ebc', 'hex')::bytea) AS size_bytea,
pg_column_size('b65aa5950acf1ade522bcf520f2b2491dcde2f312b4933f56443faff80ad8ebc'::text) AS size_var,
pg_column_size(10001) as size_int,
pg_column_size(10001::bigint) as size_int
;
36, 68, 4, 8
@diego-G

This comment has been minimized.

Copy link
Member

diego-G commented Sep 4, 2018

On hold till #1780

@diego-G diego-G removed this from To do in Version 1.3.0 Sep 5, 2018

@diego-G diego-G moved this from Sprint Backlog to Sprint Candidate in Lisk Pipelines Sep 5, 2018

@diego-G diego-G removed this from the Version 1.3.0 milestone Sep 5, 2018

@MaciejBaj MaciejBaj moved this from Sprint Candidate to Sprint Backlog in Lisk Pipelines Sep 11, 2018

@diego-G diego-G added this to To do in Version 1.3.0 Sep 11, 2018

@diego-G diego-G added this to the Version 1.3.0 milestone Sep 11, 2018

@MaciejBaj MaciejBaj moved this from Sprint Backlog to Sprint Candidate in Lisk Pipelines Sep 19, 2018

@diego-G diego-G removed this from To do in Version 1.3.0 Sep 19, 2018

@diego-G diego-G removed this from the Version 1.3.0 milestone Sep 20, 2018

@MaciejBaj MaciejBaj removed the suspended label Oct 9, 2018

@MaciejBaj MaciejBaj added this to To do in Version 1.4.0 via automation Oct 16, 2018

@MaciejBaj MaciejBaj moved this from Sprint Candidate to Sprint Backlog in Lisk Pipelines Oct 16, 2018

@MaciejBaj MaciejBaj added this to the Version 1.4.0 milestone Oct 25, 2018

@yatki

This comment has been minimized.

Copy link
Member

yatki commented Nov 1, 2018

To see how this issue will impact the performance, I did some tests as below:

  1. I imported mainnet snapshot to db.
dropdb lisk_main
createdb lisk_main
rm latest.gz 
wget https://snapshot.lisknode.io/latest.gz
gunzip -fcq latest.gz | psql -U lisk -d lisk_main
  1. On development branch, I started application with: node app.js -n mainnet.

  2. I performed following ab operations:

GET /api/accounts:

~ # now && ab -n200000 -c1 -k "http://127.0.0.1:8000/api/accounts" > acc_bytea_n200000_c1.txt
13:52:08

This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)

Server Software:
Server Hostname:        127.0.0.1
Server Port:            8000

Document Path:          /api/accounts
Document Length:        3223 bytes

Concurrency Level:      1
Time taken for tests:   853.613 seconds
Complete requests:      200000
Failed requests:        198419
   (Connect: 0, Receive: 0, Length: 198419, Exceptions: 0)
Keep-Alive requests:    0
Total transferred:      675609988 bytes
HTML transferred:       614609988 bytes
Requests per second:    234.30 [#/sec] (mean)
Time per request:       4.268 [ms] (mean)
Time per request:       4.268 [ms] (mean, across all concurrent requests)
Transfer rate:          772.92 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.5      0      66
Processing:     2    4   5.2      3     256
Waiting:        2    4   5.2      3     256
Total:          2    4   5.3      3     256

Percentage of the requests served within a certain time (ms)
  50%      3
  66%      4
  75%      4
  80%      4
  90%      5
  95%      7
  98%     10
  99%     12
 100%    256 (longest request)

New Relic Results:
image

GET /api/accounts with publicKey filter:

~ # now && ab -n200000 -c1 -k "http://127.0.0.1:8000/api/accounts?publicKey=4e8896e20375b16e5f1a6e980a4ed0cdcb3356e99e965e923804593669c87ad2" > acc_bytea_n200000_c1_publicKey.txt
14:11:07

This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)

Server Software:
Server Hostname:        127.0.0.1
Server Port:            8000

Document Path:          /api/accounts?publicKey=4e8896e20375b16e5f1a6e980a4ed0cdcb3356e99e965e923804593669c87ad2
Document Length:        375 bytes

Concurrency Level:      1
Time taken for tests:   1516.012 seconds
Complete requests:      55979
Failed requests:        0
Keep-Alive requests:    0
Total transferred:      38065720 bytes
HTML transferred:       20992125 bytes
Requests per second:    36.93 [#/sec] (mean)
Time per request:       27.082 [ms] (mean)
Time per request:       27.082 [ms] (mean, across all concurrent requests)
Transfer rate:          24.52 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       2
Processing:    22   27   6.1     26     268
Waiting:       22   27   6.1     26     252
Total:         22   27   6.1     26     268

Percentage of the requests served within a certain time (ms)
  50%     26
  66%     27
  75%     27
  80%     27
  90%     29
  95%     30
  98%     34
  99%     42
 100%    268 (longest request)

New Relic Results:
image

  1. Then I stopped the application and I applied the migration commands as below:
DROP TRIGGER protect_mem_accounts ON public.mem_accounts;
DROP FUNCTION public.revert_mem_account();

DROP VIEW public.full_blocks_list;
DROP VIEW public.trs_list;

DROP INDEX mem_accounts_get_delegates;

ALTER TABLE public.signatures
	ALTER COLUMN "publicKey" TYPE varchar(64) USING ENCODE("publicKey", 'hex');

ALTER TABLE public.mem_accounts
	ALTER COLUMN "publicKey" TYPE varchar(64) USING ENCODE("publicKey", 'hex');

ALTER TABLE public.rounds_rewards
	ALTER COLUMN "publicKey" TYPE varchar(64) USING ENCODE("publicKey", 'hex');

CREATE VIEW public.full_blocks_list AS
 SELECT b.id AS b_id,
    b.version AS b_version,
    b."timestamp" AS b_timestamp,
    b.height AS b_height,
    b."previousBlock" AS "b_previousBlock",
    b."numberOfTransactions" AS "b_numberOfTransactions",
    b."totalAmount" AS "b_totalAmount",
    b."totalFee" AS "b_totalFee",
    b.reward AS b_reward,
    b."payloadLength" AS "b_payloadLength",
    encode(b."payloadHash", 'hex'::text) AS "b_payloadHash",
    encode(b."generatorPublicKey", 'hex'::text) AS "b_generatorPublicKey",
    encode(b."blockSignature", 'hex'::text) AS "b_blockSignature",
    t.id AS t_id,
    t."rowId" AS "t_rowId",
    t.type AS t_type,
    t."timestamp" AS t_timestamp,
    encode(t."senderPublicKey", 'hex'::text) AS "t_senderPublicKey",
    t."senderId" AS "t_senderId",
    t."recipientId" AS "t_recipientId",
    t.amount AS t_amount,
    t.fee AS t_fee,
    encode(t.signature, 'hex'::text) AS t_signature,
    encode(t."signSignature", 'hex'::text) AS "t_signSignature",
    s."publicKey" AS "s_publicKey",
    d.username AS d_username,
    v.votes AS v_votes,
    m.min AS m_min,
    m.lifetime AS m_lifetime,
    m.keysgroup AS m_keysgroup,
    dapp.name AS dapp_name,
    dapp.description AS dapp_description,
    dapp.tags AS dapp_tags,
    dapp.type AS dapp_type,
    dapp.link AS dapp_link,
    dapp.category AS dapp_category,
    dapp.icon AS dapp_icon,
    it."dappId" AS "in_dappId",
    ot."dappId" AS "ot_dappId",
    ot."outTransactionId" AS "ot_outTransactionId",
    encode(t."requesterPublicKey", 'hex'::text) AS "t_requesterPublicKey",
    tf.data AS tf_data,
    t.signatures AS t_signatures
   FROM (((((((((public.blocks b
     LEFT JOIN public.trs t ON (((t."blockId")::text = (b.id)::text)))
     LEFT JOIN public.delegates d ON (((d."transactionId")::text = (t.id)::text)))
     LEFT JOIN public.votes v ON (((v."transactionId")::text = (t.id)::text)))
     LEFT JOIN public.signatures s ON (((s."transactionId")::text = (t.id)::text)))
     LEFT JOIN public.multisignatures m ON (((m."transactionId")::text = (t.id)::text)))
     LEFT JOIN public.dapps dapp ON (((dapp."transactionId")::text = (t.id)::text)))
     LEFT JOIN public.intransfer it ON (((it."transactionId")::text = (t.id)::text)))
     LEFT JOIN public.outtransfer ot ON (((ot."transactionId")::text = (t.id)::text)))
     LEFT JOIN public.transfer tf ON (((tf."transactionId")::text = (t.id)::text)));

ALTER TABLE public.full_blocks_list OWNER TO lisk;

CREATE VIEW public.trs_list AS
 SELECT t.id AS t_id,
    b.height AS b_height,
    t."blockId" AS "t_blockId",
    t.type AS t_type,
    t."timestamp" AS t_timestamp,
    t."senderPublicKey" AS "t_senderPublicKey",
    m."publicKey" AS "m_recipientPublicKey",
    upper((t."senderId")::text) AS "t_senderId",
    upper((t."recipientId")::text) AS "t_recipientId",
    t.amount AS t_amount,
    t.fee AS t_fee,
    encode(t.signature, 'hex'::text) AS t_signature,
    encode(t."signSignature", 'hex'::text) AS "t_SignSignature",
    t.signatures AS t_signatures,
    (( SELECT (blocks.height + 1)
           FROM public.blocks
          ORDER BY blocks.height DESC
         LIMIT 1) - b.height) AS confirmations,
    t."rowId" AS "t_rowId"
   FROM ((public.trs t
     LEFT JOIN public.blocks b ON (((t."blockId")::text = (b.id)::text)))
     LEFT JOIN public.mem_accounts m ON (((t."recipientId")::text = (m.address)::text)));


ALTER TABLE public.trs_list OWNER TO lisk;

CREATE FUNCTION public.revert_mem_account() RETURNS trigger
    LANGUAGE plpgsql
    AS $$ BEGIN IF NEW."address" <> OLD."address" THEN
		RAISE WARNING 'Reverting change of address from % to %', OLD."address", NEW."address";
		NEW."address" = OLD."address";
	END IF;
	IF NEW."u_username" <> OLD."u_username" AND NEW."u_username" IS NOT NULL AND OLD."u_username" IS NOT NULL THEN
		RAISE WARNING 'Reverting change of u_username from % to %', OLD."u_username", NEW."u_username";
		NEW."u_username" = OLD."u_username";
	END IF;
	IF NEW."username" <> OLD."username" AND NEW."username" IS NOT NULL AND OLD."username" IS NOT NULL THEN
		RAISE WARNING 'Reverting change of username from % to %', OLD."username", NEW."username";
		NEW."username" = OLD."username";
	END IF;
	IF NEW."publicKey" <> OLD."publicKey" AND OLD."publicKey" IS NOT NULL THEN
		RAISE WARNING 'Reverting change of publicKey from % to %', OLD."publicKey", NEW."publicKey";
		NEW."publicKey" = OLD."publicKey";
	END IF;
	IF NEW."secondPublicKey" <> OLD."secondPublicKey" AND OLD."secondPublicKey" IS NOT NULL THEN
		RAISE WARNING 'Reverting change of secondPublicKey from % to %', ENCODE(OLD."secondPublicKey", 'hex'), ENCODE(NEW."secondPublicKey", 'hex');
		NEW."secondPublicKey" = OLD."secondPublicKey";
	END IF;
	RETURN NEW;
	END $$;

ALTER FUNCTION public.revert_mem_account() OWNER TO lisk;

CREATE TRIGGER protect_mem_accounts BEFORE UPDATE ON public.mem_accounts FOR EACH ROW EXECUTE PROCEDURE public.revert_mem_account();

CREATE INDEX mem_accounts_get_delegates ON public.mem_accounts USING btree (vote DESC, "publicKey") WHERE ("isDelegate" = 1);

CREATE UNIQUE INDEX mem_accounts_publicKey ON mem_accounts ("publicKey");

CREATE INDEX signatures_publicKey ON signatures ("publicKey");
CREATE INDEX rounds_rewards_publicKey ON rounds_rewards ("publicKey");
  1. I refactored the source code as in 2314-convert-bytea-to-varchar branch.

  2. I started application with node app.js -n mainnet and performed following ab operations.

~ # now && ab -n200000 -c1 -k "http://127.0.0.1:8000/api/accounts" > acc_var_n200000_c1_unique_index.txt
17:24:50

This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)


Server Software:
Server Hostname:        127.0.0.1
Server Port:            8000

Document Path:          /api/accounts
Document Length:        3170 bytes

Concurrency Level:      1
Time taken for tests:   980.870 seconds
Complete requests:      200000
Failed requests:        123345
   (Connect: 0, Receive: 0, Length: 123345, Exceptions: 0)
Keep-Alive requests:    0
Total transferred:      707581190 bytes
HTML transferred:       646581190 bytes
Requests per second:    203.90 [#/sec] (mean)
Time per request:       4.904 [ms] (mean)
Time per request:       4.904 [ms] (mean, across all concurrent requests)
Transfer rate:          704.47 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.7      0      75
Processing:     2    5   8.7      3    1126
Waiting:        2    4   8.4      3    1125
Total:          3    5   8.7      4    1126

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      4
  75%      4
  80%      5
  90%      7
  95%      9
  98%     13
  99%     24
 100%   1126 (longest request)

New Relic Results:
image

~ # now && ab -n200000 -c1 -k "http://127.0.0.1:8000/api/accounts?publicKey=4e8896e20375b16e5f1a6e980a4ed0cdcb3356e99e965e923804593669c87ad2" > acc_varchar_n200000_c1_publicKey_index2.txt
12:31:24

This is ApacheBench, Version 2.3 <$Revision: 1826891 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient)


Server Software:
Server Hostname:        127.0.0.1
Server Port:            8000

Document Path:          /api/accounts?publicKey=4e8896e20375b16e5f1a6e980a4ed0cdcb3356e99e965e923804593669c87ad2
Document Length:        375 bytes

Concurrency Level:      1
Time taken for tests:   1234.949 seconds
Complete requests:      91335
Failed requests:        0
Keep-Alive requests:    0
Total transferred:      62107800 bytes
HTML transferred:       34250625 bytes
Requests per second:    73.96 [#/sec] (mean)
Time per request:       13.521 [ms] (mean)
Time per request:       13.521 [ms] (mean, across all concurrent requests)
Transfer rate:          49.11 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.1      0       9
Processing:     2   13  25.8      4    1002
Waiting:        2   13  25.4      4     908
Total:          2   13  25.8      4    1002

Percentage of the requests served within a certain time (ms)
  50%      4
  66%      5
  75%      6
  80%      7
  90%     54
  95%     60
  98%     69
  99%     78
 100%   1002 (longest request)

New Relic Results:
image

@nazarhussain

This comment has been minimized.

Copy link
Contributor

nazarhussain commented Nov 2, 2018

I run a small test directly on the PostgreSQL server.

create table mem_accounts_binary as table mem_accounts;
create table mem_accounts_varchar as table mem_accounts;
alter table mem_accounts_varchar alter column "publicKey" type CHAR(64) USING ENCODE("publicKey", 'hex');
CREATE INDEX ON mem_accounts_varchar ("publicKey”);
SELECT ENCODE("publicKey", 'hex') from mem_accounts_binary;
select "publicKey" from mem_accounts_varchar;

pgbench -c 30 -T 120 -n -f ./query_with_binary.sql -f ./query_with_varchar.sql -P 5 -S lisk_main_net

SQL script 1: ./query_with_binary.sql
 - weight: 1 (targets 33.3% of total)
 - 236 transactions (34.6% of total, tps = 1.876072)
 - latency average = 8896.888 ms
 - latency stddev = 2548.701 ms
SQL script 2: ./query_with_varchar.sql
 - weight: 1 (targets 33.3% of total)
 - 225 transactions (33.0% of total, tps = 1.788628)
 - latency average = 7164.604 ms
 - latency stddev = 2209.866 ms

So its clear that ENCODE actually don't have a noticeable overhead on the query.

@nazarhussain

This comment has been minimized.

Copy link
Contributor

nazarhussain commented Nov 2, 2018

During a meeting between @nazarhussain @yatki and @vitaly-t we discussed and decided to keep all hexadecimal strings as BYTEA data type in database and use ENCODE/DECODE to convert those to strings. For following reasons:

  1. We don't need partial search on such strings and exact match works pretty find with BYTEA type
  2. Converting to CHAR(64) and making index provide faster performance on queries with filter on public key, but it then decrease overall performance of data read from PostgreSQL to NodeJS. (not sure but possibly because of sting based IO)
  3. Getting better performance with CHAR/VARCHAR require an index on column which then turns insertion to slow compared to BYTEA type

So in general we will follow these guidelines.

  1. All hexadecimal strings will be stored as BYTEA in database
  2. For select queries we will rely on ENCODE
  3. For conditional clause we will either use DECODE or convert hex string to Buffer from NodeJS before passing to query.

@yatki Can go through the code once to make sure we are following the above guidelines everywhere.

@yatki

This comment has been minimized.

Copy link
Member

yatki commented Nov 13, 2018

@nazarhussain I found out we are storing hexadecimal values as varchar in the following columns in tables. I think it's better to create a separate issue to convert this fields to bytea. What do you think ?

Table => column:
mem_round_snapshot => delegate
mem_round => delegate
mem_accounts2u_multisignatures => dependentId
mem_accounts2u_delegates => dependentId
mem_accounts2multisignatures => dependentId
mem_accounts2delegates => dependentId

@nazarhussain

This comment has been minimized.

Copy link
Contributor

nazarhussain commented Nov 13, 2018

@yatki Yes I agree we should create seperate issue for above fields, and close this issue.

@yatki

This comment has been minimized.

Copy link
Member

yatki commented Nov 14, 2018

Closing this issue, the remaning tasks are being discussed here

@yatki yatki closed this Nov 14, 2018

Version 1.4.0 automation moved this from To do to Done Nov 14, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.