split out pure validation functions and SQL interactions #42

thedavidmeister · 2017-08-12T08:33:04Z

Problem:

The validation of units is a slow process because it thrashes the SQL database, causing problems syncing just a few GB of DAG (takes multiple days, some users even reported in chat syncing/validation running slower than new units are being added to the DAG)
There are no unit tests for the validation algorithms
Refactoring/using/reasoning about validation functions from the library is difficult due to high cyclomatic complexity and generally poor separation of concerns in validation.js
Validation is tightly coupled to specific SQL queries, forcing all clients to implement SQL if they want to validate units

Potential solution:

One of the key strengths of the design of the DAG as per the whitepaper is that both units and the DAG itself is an immutable data structure and the validity of the whole DAG and any individual unit is able to be independently and deterministically validated using an algorithm that "walks" some or all of the DAG.

This implies to me that all of the validation functions should accept data structures/objects representing DAGs and units as arguments and walk over these with no knowledge of concepts such as "SQL query" or "database connection".

Currently the validation functions like validateHashTree take a database connection and "validation state" as arguments and so are required to fetch, build and validate a hash tree ad-hoc as well as some kind of state management, rather than simply validate a hash tree.

If a DAG/unit object needs to (lazily) lookup something in the database, then it can either do it itself or an internal API that fetches/builds DAGs/units can do that. Either way, SQL isn't the responsibility of validation logic.

This approach would allow for the following:

DAG/unit objects, once validated, represent immutable objects and so can be cached infinitely, no need to thrash or even interact with the db once they are in memory. For the case of new units arriving from the network, we have them (and probably their parents if cached) in memory before they even hit SQL, so we should be able to validate many new units with zero SQL reads.
one step closer to allowing clients of the library to use storage other than SQL
validation can be unit tested by mocking out DAG/unit objects rather than relying on a live db connection to test anything
performance improvements such as implementing processing queues, parallelising/batching validation across multiple CPUs, etc. would be much easier to implement
consumers of the library could leverage the validation logic without a storage backend at all, as long as they implement the right data structure/object interface

The text was updated successfully, but these errors were encountered:

thedavidmeister mentioned this issue Aug 12, 2017

"pluggable" data structure/object/interface/abstraction for DAG/unit (especially re: SQL/storage) #43

Open

thedavidmeister mentioned this issue Sep 12, 2017

Move getSourceString to String Utils #52

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split out pure validation functions and SQL interactions #42

split out pure validation functions and SQL interactions #42

thedavidmeister commented Aug 12, 2017 •

edited

split out pure validation functions and SQL interactions #42

split out pure validation functions and SQL interactions #42

Comments

thedavidmeister commented Aug 12, 2017 • edited

Problem:

Potential solution:

thedavidmeister commented Aug 12, 2017 •

edited