Skip to content

Conversation

@rozbb
Copy link
Contributor

@rozbb rozbb commented Jun 12, 2025

This one looks like a lot but it's mostly just moving stuff around. I recommend reading commit-by-commit.

Some open problems:

  • I had to remove config_roots from the sequencer's metrics, since the generic sequencer knows nothing about roots. Not sure what the best way to fix that is.
  • We are now exporting the ObjectBackend async trait. Apparently exporting async traits is a warning:
warning: use of `async fn` in public traits is discouraged as auto trait bounds cannot be specified
   --> crates/generic_log_worker/src/lib.rs:379:5
    |
379 |     async fn upload(&self, key: &str, data: &[u8], opts: &UploadOptions) -> Result<()>;
    |     ^^^^^
    |
    = note: you can suppress this lint if you plan to use the trait only in your own code, or do not care about auto traits like `Send` on the `Future`
    = note: `#[warn(async_fn_in_trait)]` on by default
help: you can alternatively desugar to a normal `fn` that returns `impl Future` and add any desired bounds such as `Send`, but these cannot be relaxed without a breaking API change
    |
379 -     async fn upload(&self, key: &str, data: &[u8], opts: &UploadOptions) -> Result<()>;
379 +     fn upload(&self, key: &str, data: &[u8], opts: &UploadOptions) -> impl std::future::Future<Output = Result<()>> + Send;
    |

Also, I'm not sure if I made the right cut on the API. I think the contents of ct_worker/src/frontend_worker.rs could be more succinct. Curious to hear what you think!

Finally, the readme for the generic log sequencer is mostly a copy of the CT one. I could use help adding to it if you think it should have more.

Copy link
Contributor

@lukevalenta lukevalenta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good! Leaving comments so far and I'll finish the review tomorrow.

// The number of entries in the short-term deduplication cache.
// This cache provides a secondary deduplication layer to bridge the gap in KV's eventual consistency.
// It should hold at least <maximum-entries-per-second> x <kv-eventual-consistency-time (60s)> entries.
const MEMORY_CACHE_SIZE: usize = 300_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: probably want to add this to the app/log config.

@rozbb
Copy link
Contributor Author

rozbb commented Jun 13, 2025

I pushed a commit that makes some of the changes you suggested. Let me know if I interpreted correctly. If so, I gotta do some cleanup.

It have two TODOs in there though that I could use feedback on:

  1. If GenericSequencer::new() errors, it's because it couldn't fetch the name from State. Does this ever happen?
  2. How precisely does naming work? Logs were previously lazily loaded on first fetch by name (in the request body). Is that request name necessarily always the durable object name? It seems like it's retrieved from the URL of end-user GET queries

* Move app config package back to ct_worker. The justification here is
  that each application is going to need to manage its own schema, and
  the AppConfig struct is directly used for parsing that config.
* Add SequencerConfig (renamed from LogConfig) and BatcherConfig with
  options that are relevant for generic sequencers and batchers.
* Rename Metrics -> SequencerMetrics
* Construct the metrics registry in the app-specific code and pass it in
  to the Sequencer constructor. This allows apps to registry their own
  custom metrics, if desired.
* Use [event(start)] for initializing logging and panic handling. This
  is the first thing that runs when the wasm is loaded.
* Make the `new()` methods for GenericSequencer and GenericBatcher
  infallible, and handle the errors (by panicing, which is all we can
  really do) in the calling app.
@lukevalenta
Copy link
Contributor

  1. If GenericSequencer::new() errors, it's because it couldn't fetch the name from State. Does this ever happen?

In my edits I made GenericSequencer::new() infallible, and just panic in StaticCTSequencer::new() if we don't know the name. The only way the 'name' property wouldn't be present is it someone updates the frontend worker to try to create a Sequencer with a random ID instead of a log name, which would be a programmer error.

  1. How precisely does naming work? Logs were previously lazily loaded on first fetch by name (in the request body). Is that request name necessarily always the durable object name? It seems like it's retrieved from the URL of end-user GET queries

The basic idea is that we have an allowlist of log names in the configuration, and every entrypoint must check valid_log_name to make sure that name is in the allowlist. If not, we should return a 404. Otherwise, the log name is always the name of the Durable object (except for Batchers, which have the _<shard_id> suffix that needs to be stripped off to get the log name).

@rozbb
Copy link
Contributor Author

rozbb commented Jun 14, 2025

Ah ok then. This makes a lot of sense. Agreed on your edits!

@lukevalenta lukevalenta merged commit 07e3c5c into cloudflare:main Jun 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants