Conversation
|
@bobbinth I have validated these behaviours on this branch:
|
| #[instrument(target = COMPONENT, skip_all)] | ||
| pub async fn load(database_filepath: PathBuf) -> Result<miden_node_store::Db, DatabaseSetupError> { | ||
| let manager = ConnectionManager::new(database_filepath.to_str().unwrap()); | ||
| let pool = deadpool_diesel::Pool::builder(manager).max_size(16).build()?; |
There was a problem hiding this comment.
Does max_size mean the capacity, or is the actual capacity chosen differently and this is just the upper bound to that algorithm?
There was a problem hiding this comment.
An upper bound but I don't know what goes on under the hood. @drahnr do you know why we wouldn't just use default?
/// Maximum size of the [`Pool`].
///
/// Default: `cpu_count * 4`
///
/// [`Pool`]: super::Pool
pub max_size: usize,There was a problem hiding this comment.
No, I don't remember when or who introduced it
crates/validator/src/db/models.rs
Outdated
| pub fn new(info: &ValidatedTransactionInfo) -> Self { | ||
| Self { | ||
| id: info.tx_id().to_bytes(), | ||
| block_num: info.block_num().to_raw_sql(), | ||
| account_id: info.account_id().to_bytes(), | ||
| info: info.to_bytes(), | ||
| } |
There was a problem hiding this comment.
I think we should handle all the serialization here, instead of serializing in VAlidatedTransactionInfo as I allude to in this comment
There was a problem hiding this comment.
LMK if my latest changes reflect your thinking here
| } | ||
|
|
||
| // Start the Validator if we have bound a socket. | ||
| if let Some(address) = validator_socket_address { |
There was a problem hiding this comment.
The configuration decision based on the validator_socket_address is a bit obfuscating the fact that the user did not provide a validator URL but a key
There was a problem hiding this comment.
Not sure that part of the stack should be considering the key. Key presence is considered elsewhere. This part of the stack just cares about whether to run the local instance or not
There was a problem hiding this comment.
@Mirko-von-Leipzig please remind me do we have an issue for cleaning up the bundled config/startup stuff?
| /// Open a connection to the DB and apply any pending migrations. | ||
| #[instrument(target = COMPONENT, skip_all)] | ||
| pub async fn load(database_filepath: PathBuf) -> Result<miden_node_store::Db, DatabaseSetupError> { | ||
| let manager = ConnectionManager::new(database_filepath.to_str().unwrap()); |
There was a problem hiding this comment.
@Mirko-von-Leipzig @drahnr despite this (and the PRAGMA queries it makes for WAL etc) I still get db locked errors when running the integration tests.
ERROR rpc:submit_proven_transaction: miden-validator: crates/validator/src/server/mod.rs:126:
error: status: 'Internal error', self: "database is locked" rpc.service:
"validator.Api", rpc.method: "SubmitProvenTransaction", otel.name: "validator.Api/SubmitProvenTransaction"
Validator could receive parallel requests for submit_proven_transaction endpoint so I do think its something we need to solve (not just integration tests' fault).
There was a problem hiding this comment.
Adding this does avoid it. ATM I'm reusing the store's connection manager code for validator.
pub(crate) fn configure_connection_on_creation(
conn: &mut SqliteConnection,
) -> Result<(), ConnectionManagerError> {
// Wait up to 5 seconds for writer locks before erroring.
diesel::sql_query("PRAGMA busy_timeout=5000")
.execute(conn)
.map_err(ConnectionManagerError::ConnectionParamSetup)?;There was a problem hiding this comment.
I think we need to serialize writes. By default you get an error with DB locked if a write is already in process, it's application level responsiblity to deal with this. The timeout fixes this on the surface, since almost all queries will finish under 5 seconds, only under load you'll see the locked again.
I suggest to add an internal bounded channel to serialize the queries (at least the write).
There was a problem hiding this comment.
Adding a bounded channel would effectively add another "queue" on top of the one in the database only for the benefit of controlling capacity via the size of the channel (instead of timeout seconds or retry count). In both situations we would error out on the basis of too many writes (no difference for the client of the corresponding endpoint(s)).
I agree notionally about serializing writes, but I'm not sure we are actually better off adding the necessary code for the channel rather than relying on a timeout or retry count.
There was a problem hiding this comment.
The main difference is retaining control and the error information we can provide, which to me, given our latest debug adventures, is well worth the additional 100 (fair, 200) loc for queue management.
I see myself debugging something unrelated 6 months down the line, where all we get is a DB locked-error again, and the root cause being somewhere, but now we also not disambiguate overload based DB locked from bug-rooted DB locked.
|
LGTM, bar the pragma usage and the config around the validator socket address being used as a discriminator for spawning a task. |
Context
The Validator currently only stores validated transactions in-memory. We want to persist validated transactions for resilience and potential future use cases around debugging / audit-ability w.r.t public transactions.
The Validator will be run as a separate instance to the node so it needs its own database, schema, etc.
Relates to #1316.
Changes
ValidatorConfigstruct.