Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

split out pure validation functions and SQL interactions #42

Open
thedavidmeister opened this issue Aug 12, 2017 · 0 comments
Open

split out pure validation functions and SQL interactions #42

thedavidmeister opened this issue Aug 12, 2017 · 0 comments

Comments

@thedavidmeister
Copy link
Contributor

thedavidmeister commented Aug 12, 2017

Problem:

  • The validation of units is a slow process because it thrashes the SQL database, causing problems syncing just a few GB of DAG (takes multiple days, some users even reported in chat syncing/validation running slower than new units are being added to the DAG)
  • There are no unit tests for the validation algorithms
  • Refactoring/using/reasoning about validation functions from the library is difficult due to high cyclomatic complexity and generally poor separation of concerns in validation.js
  • Validation is tightly coupled to specific SQL queries, forcing all clients to implement SQL if they want to validate units

Potential solution:

One of the key strengths of the design of the DAG as per the whitepaper is that both units and the DAG itself is an immutable data structure and the validity of the whole DAG and any individual unit is able to be independently and deterministically validated using an algorithm that "walks" some or all of the DAG.

This implies to me that all of the validation functions should accept data structures/objects representing DAGs and units as arguments and walk over these with no knowledge of concepts such as "SQL query" or "database connection".

Currently the validation functions like validateHashTree take a database connection and "validation state" as arguments and so are required to fetch, build and validate a hash tree ad-hoc as well as some kind of state management, rather than simply validate a hash tree.

If a DAG/unit object needs to (lazily) lookup something in the database, then it can either do it itself or an internal API that fetches/builds DAGs/units can do that. Either way, SQL isn't the responsibility of validation logic.

This approach would allow for the following:

  • DAG/unit objects, once validated, represent immutable objects and so can be cached infinitely, no need to thrash or even interact with the db once they are in memory. For the case of new units arriving from the network, we have them (and probably their parents if cached) in memory before they even hit SQL, so we should be able to validate many new units with zero SQL reads.
  • one step closer to allowing clients of the library to use storage other than SQL
  • validation can be unit tested by mocking out DAG/unit objects rather than relying on a live db connection to test anything
  • performance improvements such as implementing processing queues, parallelising/batching validation across multiple CPUs, etc. would be much easier to implement
  • consumers of the library could leverage the validation logic without a storage backend at all, as long as they implement the right data structure/object interface
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant