Update transaction tables to reduce queries dispatched & executed by Postgres #102

simerplaha · 2021-12-21T20:25:03Z

Overview

Update utransactions & transactions tables to use foreign key constraints

To reduce the number of queries required to fetch a single transaction to 1.
To make all inserts transactional.

Issue

To fetch a single transaction 3 queries are dispatched to Postgres.

select "hash", "chain_from", "chain_to", "gas-amount", "gas-price" from "utransactions" where "hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'
select "tx_hash", "hint", "output_ref_key", "unlock_script" from "uinputs" where "tx_hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'
select "tx_hash", "amount", "address", "lock_time" from "uoutputs" where "tx_hash" = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444'

Similar occurs for a write.

5 queries in total
Inserts start individual transactions for single request and are not within a single transaction.

insert into "utransactions" ("hash","chain_from","chain_to","gas-amount","gas-price") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', $2 = '32', ...
BEGIN
update "uinputs" set "hint"=$1,"unlock_script"=$2 where "tx_hash"=$3 and "output_ref_key"=$4 ... $3 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444' ...
insert into "uinputs" ("tx_hash","hint","output_ref_key","unlock_script") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444' ...
COMMIT
BEGIN
update "uoutputs" set "amount"=$1,"lock_time"=$2 where "tx_hash"=$3 and "address"=$4 ... $3 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', ...
insert into "uoutputs" ("tx_hash","amount","address","lock_time") ... $1 = '4b9f7a4102027a0595f707d0849f2bee4fce4820a40822109edef96368494444', ...
COMMIT

Options

1. Foreign key constraints

Use foreign key constraints on existing tables so Postgres knows the relationships between the tables, and we can reduce
the number of queries dispatched & executed by Postgres for each read and write to 1.

2. Merge all data into a single `transaction` table

I'm seeing these transaction queries always fetch their relevant inputs and outputs rows. inputs and outputs tables are never queried individually. Merging inputs and outputs tables into the main transaction table as single column or two columns and storing them as un-indexed varchar or blob would increase performance by a large factor over option 1 above because

Postgres maintains a different file for each table & index. Merging them into a single table reduces IOps.
Increases write performance because only one index is maintained for transactions table and none for inputs
and outputs (which are much larger tables with larger indexes).
Increases read performance because only a single index is queried.

We can store them as jsonb if we want indexing. But if indexing is needed option 1 is easier to maintain and jsonb's value size is limited to 268.435455 MB (from StackOverflow question).

Drawback of using this option over option 1 - In-case we want to query inputs and outputs individually, separate tables are much easier to maintain. But if we want performance this option 2 will be noticeably better.

Thoughts?

The text was updated successfully, but these errors were encountered:

tdroxler · 2021-12-22T06:57:15Z

We do use inputs and outputs for example to get the balance of an address:

explorer-backend/app/src/main/scala/org/alephium/explorer/persistence/queries/TransactionQueries.scala

Lines 303 to 327 in 08fc34a

    
           private val getBalanceQuery = Compiled { address: Rep[Address] => 
        
             outputsTable 
        
               .filter(output => output.mainChain && output.address === address) 
        
               .map(output => (output.key, output.amount, output.lockTime)) 
        
               .joinLeft(inputsTable.filter(_.mainChain)) 
        
               .on(_._1 === _.outputRefKey) 
        
               .filter(_._2.isEmpty) 
        
               .map { case ((_, amount, lockTime), _) => (amount, lockTime) } 
        
           } 
        
           def getBalanceAction(address: Address): DBActionR[(U256, U256)] = { 
        
             getBalanceQuery(address).result.map { outputs => 
        
               val now = TimeStamp.now() 
        
               outputs.foldLeft((U256.Zero, U256.Zero)) { 
        
                 case ((total, locked), (amount, lockTime)) => 
        
                   val newTotal = total.addUnsafe(amount) 
        
                   val newLocked = if (lockTime.map(_.isBefore(now)).getOrElse(true)) { 
        
                     locked 
        
                   } else { 
        
                     locked.addUnsafe(amount) 
        
                   } 
        
                   (newTotal, newLocked) 
        
               } 
        
             } 
        
           }

no tx involved here.

While writing this I also realize we need to have those two tables for example to be able to find the spent output when searching the inputs:

explorer-backend/app/src/main/scala/org/alephium/explorer/persistence/queries/TransactionQueries.scala

Lines 251 to 266 in 08fc34a

    
           private val getInputsQuery = Compiled { (txHash: Rep[Transaction.Hash]) => 
        
             mainInputs 
        
               .filter(_.txHash === txHash) 
        
               .join(mainOutputs) 
        
               .on(_.outputRefKey === _.key) 
        
               .sortBy(_._1.order) 
        
               .map { 
        
                 case (input, output) => 
        
                   (input.hint, 
        
                    input.outputRefKey, 
        
                    input.unlockScript, 
        
                    output.txHash, 
        
                    output.address, 
        
                    output.amount) 
        
               } 
        
           }

In the UTXO model, an input is always linked to an older output, but when we insert the transaction with the outputs, we don't know in advance which inputs of a future tx will use those outputs. So we have to do all those join between outputs/inputs` table.

So I guess we need to go for option 1

Your thoughts @polarker ?

simerplaha · 2022-01-26T09:06:52Z

Hey @tdroxler, you are right about this on our Slack conversion. I got Slick confused with Hibernate. In Hibernate adding foreign-key-constraints automatically optimises queries to use joins but in Slick we have to do this manually. My excuse: haven't used ORMs in a while :(

And now since we are writing raw SQL, adding foreign-key-constraints to optimise for performance makes even less sense. So this issue is invalid.

Thank you for pointing this out. If you are ok I will close this task?

tdroxler · 2022-01-26T09:09:29Z

👍 all good to close it

simerplaha closed this as completed Jan 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update transaction tables to reduce queries dispatched & executed by Postgres #102

Update transaction tables to reduce queries dispatched & executed by Postgres #102

simerplaha commented Dec 21, 2021 •

edited

Loading

tdroxler commented Dec 22, 2021

simerplaha commented Jan 26, 2022

tdroxler commented Jan 26, 2022

Update transaction tables to reduce queries dispatched & executed by Postgres #102

Update transaction tables to reduce queries dispatched & executed by Postgres #102

Comments

simerplaha commented Dec 21, 2021 • edited Loading

Overview

Issue

Options

1. Foreign key constraints

2. Merge all data into a single transaction table

tdroxler commented Dec 22, 2021

simerplaha commented Jan 26, 2022

tdroxler commented Jan 26, 2022

simerplaha commented Dec 21, 2021 •

edited

Loading

2. Merge all data into a single `transaction` table