Skip to content

perf(core-api): Use a faster alternative to derive an estimate#1655

Merged
faustbrian merged 6 commits intodevelopfrom
ditch-tablesample100
Dec 14, 2018
Merged

perf(core-api): Use a faster alternative to derive an estimate#1655
faustbrian merged 6 commits intodevelopfrom
ditch-tablesample100

Conversation

@vasild
Copy link
Copy Markdown
Contributor

@vasild vasild commented Dec 5, 2018

Proposed changes

Types of changes

  • Bugfix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Refactoring (improve a current implementation without adding a new feature or fixing a bug)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Build (changes that affect the build system)
  • Docs (documentation only changes)
  • Test (adding missing tests or fixing existing tests)
  • Other... Please describe:

Checklist

  • I have read the CONTRIBUTING documentation
  • Lint and unit tests pass locally with my changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have added necessary documentation (if appropriate)

@vasild vasild requested a review from faustbrian December 5, 2018 18:18
@ghost ghost assigned vasild Dec 5, 2018
@ghost ghost added the review label Dec 5, 2018
@faustbrian
Copy link
Copy Markdown
Contributor

I would look at https://wiki.postgresql.org/wiki/Count_estimate and create a function via migration to then wrap queries in it to get the proper estimate that is based on the full query with all conditions.

@vasild
Copy link
Copy Markdown
Contributor Author

vasild commented Dec 6, 2018

I would look at https://wiki.postgresql.org/wiki/Count_estimate and create a function via migration to then wrap queries in it to get the proper estimate that is based on the full query with all conditions.

That would be one solution. However in some cases we can derive the exact result very fast and thus don't need to bother with estimates. One such is:
SELECT COUNT(*) FROM blocks TABLESAMPLE SYSTEM (100) WHERE height = ...
This one randomly selects 100% of the rows in the table, then searches inside this random pile (indexes invalidated) for a row with the given height. If instead we use
SELECT COUNT(*) FROM blocks WHERE height = ...
it would use the unique index on height and come up with the result "fast" (exact result, not an estimate).

So, I am looking into how to do that.

@ghost ghost assigned j-a-m-l Dec 6, 2018
@vasild vasild force-pushed the ditch-tablesample100 branch 2 times, most recently from 2f1fea8 to 5af85e9 Compare December 7, 2018 20:13
When we use OFFSET and LIMIT in SQL we want to assess the total number
of rows that would have been returned if OFFSET and LIMIT were omitted.

Instead of doing a separate query with COUNT(*), parse the output
of EXPLAIN SELECT ... and fetch the number of rows from there.
@vasild vasild force-pushed the ditch-tablesample100 branch from 5af85e9 to 366212b Compare December 7, 2018 20:23
@codecov-io
Copy link
Copy Markdown

codecov-io commented Dec 7, 2018

Codecov Report

Merging #1655 into develop will increase coverage by 0.04%.
The diff coverage is 38.18%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #1655      +/-   ##
===========================================
+ Coverage    39.41%   39.46%   +0.04%     
===========================================
  Files          353      353              
  Lines         7674     7672       -2     
  Branches      1076     1102      +26     
===========================================
+ Hits          3025     3028       +3     
+ Misses        4635     4627       -8     
- Partials        14       17       +3
Impacted Files Coverage Δ
packages/core-api/src/repositories/blocks.ts 0% <0%> (ø) ⬆️
packages/core-p2p/src/monitor.ts 0% <0%> (ø) ⬆️
packages/core-api/src/repositories/transactions.ts 36.58% <31.81%> (-0.54%) ⬇️
packages/core-api/src/repositories/repository.ts 88.63% <87.5%> (-8.04%) ⬇️
packages/crypto/src/utils/sort-transactions.ts 63.63% <0%> (-27.28%) ⬇️
packages/core-deployer/src/utils.ts 0% <0%> (ø) ⬆️
packages/core-logger-winston/src/formatter.ts 0% <0%> (ø) ⬆️
packages/core-test-utils/src/generators/wallets.ts 85.71% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0c23196...ced8501. Read the comment docs.

expect(response.data.data).toBeArray()

expect(response.data.data).toHaveLength(100)
expect(response.data.meta.totalCount).toBe(153)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice: that count is supposed to be an estimate. So we may return 100 rows, totalCount=100, but however there may be e.g. 110 rows in the database.

@faustbrian
Copy link
Copy Markdown
Contributor

@vasild This can now be adjusted to TypeScript on the develop branch so we can include it in the next release.

faustbrian
faustbrian previously approved these changes Dec 13, 2018
Copy link
Copy Markdown
Contributor

@faustbrian faustbrian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good once the conflicts are resolved.

@faustbrian faustbrian changed the title perf: Use a faster alternative to derive an estimate perf(core-api): Use a faster alternative to derive an estimate Dec 14, 2018
@faustbrian faustbrian merged commit 8fc955a into develop Dec 14, 2018
@ghost ghost removed the review label Dec 14, 2018
@faustbrian faustbrian deleted the ditch-tablesample100 branch December 14, 2018 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants