Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unused Indexes, Unique and Primary Keys #1087

Closed
kderme opened this issue Mar 22, 2022 · 10 comments
Closed

Remove unused Indexes, Unique and Primary Keys #1087

kderme opened this issue Mar 22, 2022 · 10 comments
Assignees
Labels
enhancement New feature or request product-backlog

Comments

@kderme
Copy link
Contributor

kderme commented Mar 22, 2022

In the same spirit as #1082 we could continue simplifying the schema trying to optimize insertions.

  • Unique and autoincremental Primary Id keys on tables like StakeRegistration, Delegation, TxIn, MaTxOut, MaTxMint which are not referenced by any other table could probably be deleted.
  • Tables like Reward and EpochStake need to maintain their Unique keys, since we can have duplicate insertions on rollbacks. However their Primary Id is never used.
  • Tables like Tx, Redeemer which are referenced by other tables need to maintain their Primary key, but their Unique key can be removed. We already know that their hashes are unique and the specs make sure there will never be any duplicates. We already use insertUnchecked on them (or should use), so if there is a violation the whole thing crashes anyway. There is no query on their hashes, but we should audit if they are used in other queries while inserting (removing unique keys would make these slower).
  • Tables like StakeAddress, MultiAsset, Script need to maintain both primary and unique keys. The difference with previous tables is that they can be inserted in different places and db-sync may try to insert them multiple times. We also need a way to get their Primary id from their hash.

We could also try to audit indexes that are important for syncing and disable all others. Applications that build on top of db-sync could insert their own indexes based on their needs.

The above list probably needs a better audit to check if it's correct for all tables and the mentioned tables can probably be extended.

@kderme kderme added the enhancement New feature or request label Mar 22, 2022
@erikd
Copy link
Contributor

erikd commented Mar 22, 2022

I am testing something right now which just removes the foreign key constraints. Currently on mainnet epoch 289 . While there did seem to be a small speed improvement in the Byron era, I want to see it sync the full chain before I pass judgement.

@erikd
Copy link
Contributor

erikd commented Mar 24, 2022

Removing foreign key constraints (and nothing else) provides significant sync speed improvements when testing on mainnet. For Byron era epochs, the speed improvement is about 25%. For the latest era (Alonzo) the sync speed improvement are from 50% to 70%. Mainnet synced from genesis to epoch 308 in about 52 hours (still running). I would be a little surprised if the complete sync (to epoch 329) would take more than 72 hours (3 days).

@erikd
Copy link
Contributor

erikd commented Mar 24, 2022

@rdirt There is still a lot of work to be done on this idea.

@rdlrt
Copy link

rdlrt commented Mar 25, 2022

@rdirt There is still a lot of work to be done on this idea.

Yip - can definitely imagine the complications it would add to probabilities and careful changes (not expecting this to be ready very soon, and perhaps even introduce bugs =] ), but I can definitely see the benefits in longer term from postgres consumer pov (especially to size of fact tables when doing very large queries across tables) 🙂

@erikd
Copy link
Contributor

erikd commented Mar 25, 2022

I really hope we can do enough testing to prevent any bugs escaping into a released version. When complete, I intend to test on mainnet and both QA testnets.

@erikd
Copy link
Contributor

erikd commented Mar 29, 2022

This is incredibly complex and difficult.

My initital test just removed all foreign key constraints and was not able to do any valid rollbacks. Trying to support rollbacks makes everything more difficult. The TxIn and TxOut tables used to reference the Tx table using foreign keys. Removal of all foreign keys means that TxIn and TxOut entries no longer get rolled back. This means these tables also need a blockNo field.

@kderme
Copy link
Contributor Author

kderme commented Mar 29, 2022

This is about #1082 right?

Maybe we can remove references from tables that don't need to rollback. For example StakeAddress and Script have a TxId, which shows the first Tx where they were introduced. This is mostly added to rollback these tables, but they don't actually need to, like MultiAsset and PoolHash never rollback.

@kderme
Copy link
Contributor Author

kderme commented Mar 29, 2022

The tables you mention TxIn and TxOut still need a way to rollback.

@erikd
Copy link
Contributor

erikd commented Mar 29, 2022

Yes, TxIn, TxOut and CollateralTxIn will rollback by the new blockNo field.

@kderme kderme mentioned this issue May 16, 2022
11 tasks
@erikd
Copy link
Contributor

erikd commented Jul 19, 2022

This is WIP. My branch is erikd/no-foreign-keys-does-this-work-3 (still under test).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request product-backlog
Projects
None yet
Development

No branches or pull requests

3 participants