Replies: 1 comment 3 replies
-
|
@sophie-cluml Could you review this plan when you have time? I would especially appreciate your feedback on the proposed issue/PR structure, the stored-data compatibility and data migration policy, and whether the expectations for Octoaide-generated PRs are clear enough before we start writing the actual work issues. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Background
As part of discussion-repo issue#344, I would like to share a plan for reducing the remaining
chronodependency and exposure inreview-database, and to discuss the time-type contract that this repository should provide to downstream consumers.Based on the latest discussion-repo issue#344 summary, manager and unsupervised-engine have already moved much of their internal code to Jiff, but chrono still remains at the boundaries that touch
review-database.Therefore, I think the starting point for this work should be the contract defined by
review-database. However, changing thereview-databasecontract can directly affect downstream compile/API compatibility or serialized payload compatibility, so the goal is to make that impact visible early through fixtures and compile checks.Before opening implementation PRs, I reviewed previous
chrono->Jiffwork in other repositories and relevant precedents insidereview-database. My conclusion is that this should not be treated as a simple replacement ofchronotypes withJifftypes. A safer approach is to first define and verify thereview-databaseboundaries for public API, serialized payloads, and RocksDB stored data, and then let downstream consumers follow that contract.Proposed issue and PR structure
Implementation PRs may be generated by Octoaide. So I think the important thing for this discussion is not to prescribe exact patches, but to agree on the contract and review gates that our human-written issues should require.
Umbrella issue: review-database chrono to Jiff tracking
Purpose:
review-databaseside of discussion-repo issue#344.Acceptance criteria:
Issue 1: Compatibility baseline
Purpose:
ts_nanosecondsstored fields, default bincode table values, stringtimestamp values, and
column_stats.Acceptance criteria:
column_statskey/value timestamp behavior is covered.i64nanos boundary/overflow behavior.Issue 2+: Jiff implementation with stored-contract preservation
Purpose:
column_stats, or stored table surface.issues that close over one struct or a closely related struct group when that
makes migration, production code, fixtures, and review more coherent.
Acceptance criteria:
review-database::EventMessage.timeis Jiff-based, while event key timestamp bytes continue to use the existingi64epoch nanoseconds contract.chrono::serde::ts_nanosecondsstored fields use a custom Jiff adapter that reads and writes the existingi64nanos bytes.TorExitNode.updated_atandTrustedUserAgent.updated_atare handled as explicit data migration surfaces.Issue 3: Chrono dependency cleanup
Purpose:
Acceptance criteria:
PR review expectations
For Octoaide-generated PRs, I would review against at least these expectations.
column_statskey must preserve the existingi64epoch nanoseconds contract, including the 1677-09-21 to 2262-04-11 range behavior.i128key must not be confused with storing timestamps as Jiffi128nanoseconds.chrono::serde::ts_nanosecondsreplacements must use a customi64 nanosJiff adapter, not Jiff default serde.DateTime<Utc>::to_string()timestamp values must be handled through data migration, not silently switched to another format.i64nanos where relevant.old DB migration, successful no-op behavior when re-run, and resulting state
consistency including version files. The
review-migratebinary should beused where practical.
Supporting analysis
Surface inventory summary
The currently identified chrono/Jiff migration surfaces can be grouped into the following categories. The baseline issue/PR does not need to change all of them at once. Instead, I propose using this inventory as the review frame: surfaces actually changed by a PR should be covered by fixtures or data migration tests, and surfaces not changed by that PR should be explicitly left out of scope.
One important classification detail is that a stored table surface and a public API surface can share the same Rust struct. When creating actual issues, each chrono-bearing struct should be checked for both roles: whether it defines stored table value/key bytes, and whether the same struct or a related DTO is publicly exported or returned by public
Store/Tablemethods.review-database::EventMessage.timefieldsstored bytes*FieldsStoredandEventDb/ iterator returned event time semanticsEventDb::remove_beforecolumn_statsand related query surfacescolumn_statspublic methods that accept/returnchrono::NaiveDateTimei64nanoseconds key layoutStatistics.batch_tsload_rounds_by_clusterseek/pagination behaviorstructured::Element::DateTimevalues inside column statisticsAccount,Customer,Network,Node::InnerSamplingPolicy,TrafficFilter,TriagePolicytables::Cluster,TriageResponse,Filter.periodModelIndicator,TorExitNode,TrustedUserAgentreview-databaseAccounttypes::Cluster,tables::Cluster, privateClusterDbSchemaTriagePolicyInputand other public DTO/input structs, if anyTriageResponsepublic methods and timestamp-bearing key contractbackup::BackupInfo.timestamptop_n/time_series.rsreview-webGraphQL/API wrappers aroundStore, events,column_stats, and time filtersBackupConfig.backup_timeAccountPolicy.expiry_period_in_secsAccountPolicy.lockout_duration_in_secsBackupConfig.backup_durationevent_retention_period_daysRetentionConfig.period_in_daysandpurge_old_column_statsHost.creation_timeand classifier temp-path timestampsDefaultOptions, event fields use plainbincode::serializeInventory validation result
I also performed one mechanical validation pass before implementation.
Method:
chronodependency fromreview-database.cargo check --message-format=json.Cargo.toml, and only carried the classification results into the inventory and PR checklist.The check produced 68 direct chrono errors and did not reveal any new top-level surface category. It did, however, clarify several final-cleanup call sites such as the lockout calculation in
tables/accounts.rs, the default range calculation intables/time_series.rs, the purge cutoff calculation inlib.rs, and classifier temp-path timestamps inclassifier_fs.rs.What this check means:
chrono::serdehelpers,Utc::now(), and chrono duration helpers.Why not remove chrono immediately?
In previous PRs from other repositories, reviewers repeatedly asked for the same safeguards:
In particular, one semi-supervised-engine migration review pointed out that
EventMessageencoded with Jiff by semi-supervised-engine had to be decoded/deserialized correctly by chrono-basedreview-database. I think the same review perspective applies to thisreview-databasework.Stored data migration scope: decision
Data migration should not be treated as globally required. I think the rule for this work should be:
i64epoch nanos ori64epoch seconds, and the Jiff change can keep the same primitive bytes, do not perform data migration.However,
i64epoch nanoseconds have a representable range of roughly 1677-09-21 through 2262-04-11. Current chronotimestamp_nanos_opt()also returnsNoneoutside this range, so the Jiff adapter should explicitly preserve the same range constraint.DateTime<Utc>/NaiveDateTimeserde, then changing to the matching Jiff default serde type may proceed without data migration only for surfaces where the PR proves the same encoder settings and byte contract. A direct reproduction showed identical bincode bytes and successful old-byte decode for the checked primitive chrono/Jiff type pairs, but each touched stored surface should still be covered by tests or fixtures.to_string(), then changing them to a Jiff-native format or primitive format should be treated as data migration.Timestampserde matches default chronoDateTime<Utc>bincode bytes in the checked case, but it does not match thei64bytes fromchrono::serde::ts_nanoseconds. Jiff nanosecond serde helpers are alsoi128-based, so they are not the existingi64contract. Therefore, when a stored field currently usingchrono::serde::ts_nanosecondsis changed to a Jiff type, it needs a custom serde/adapter that reads and writes the existingi64bytes. Jiff conversion APIs such asas_nanosecond()/from_nanosecond()should only be used inside that adapter implementation.Direct reproduction results:
DateTime<Utc>bincode bytes using default serde were identical to Jiff defaultTimestampbincode bytes in the checked reproduction, and decoded as JiffTimestamp.NaiveDateTimebincode bytes using default serde were identical to Jiffcivil::DateTimebincode bytes in the checked reproduction, and decoded as Jiffcivil::DateTime.ts_nanosecondsbytes did not decode as Jiff defaultTimestamp.DateTime<Utc>::to_string()values did not parse as JiffTimestamp.Data migration not required if we preserve stored bytes
The following should not require data migration if designed to preserve the existing stored contract:
Account,Customer,Network,Node::Inner,SamplingPolicy,TrafficFilter,TriagePolicy,tables::Cluster,TriageResponsestored value,Filter.periodEven if
EventMessage.timeis changed to Jiff, the existing key byte layout can be preserved by converting it to the samei64epoch nanos before key construction.The event key itself is a 16-byte big-endian
i128key composed from timestamp nanosi64, event kind, and random/counter bits. This does not mean the timestamp is stored as Jiffi128nanos.Therefore, we should not introduce Jiff's wider nanosecond range into the event key. Instead, the existing
i64nanos range and overflow behavior should be fixed by fixtures.column_statskey/value:The public API currently uses
NaiveDateTime, but the stored key is alreadyi64nanos, and the value isstructured::Description/NLargestCount.Stored fields that are already
i64nanos, such asstart_time/end_time, should stay as they are.chrono::serde::ts_nanoseconds:Even if these are changed to Jiff, the existing bincode bytes can be preserved by using a custom
i64 nanosserde adapter. In that case, data migration should not be performed.ModelIndicator.last_modification_time:This currently uses
chrono::serde::ts_seconds, so the stored contract isi64seconds. If the same seconds representation is kept after the Jiff change, data migration should not be performed.Data migration required
The following should be treated as requiring data migration code that reads old DB value bytes and writes them back using the new schema:
DateTime<Utc>::to_string():TorExitNode.updated_at,TrustedUserAgent.updated_at(values stored with
updated_at.to_string().into_bytes()and read by string parsing)Because these surfaces already require data migration, the new stored contract
should consider a primitive representation such as
i64epochseconds/nanoseconds instead of another library-dependent string or serde
representation.
Special case:
EventMessage.timereview-database::EventMessage.timeshould be changed to Jiff. However, the event table does not store the wholeEventMessageas a value.EventDb::putconvertsEventMessage.timeto the samei64epoch nanoseconds value as today and stores that value in the timestamp portion of the event RocksDB key.EventMessage.fieldsis converted separately into storage-specific*FieldsStoredbytes and stored as the value.Therefore, changing the Rust type of
EventMessage.timeis not old RocksDB value data migration. The required checks are:EventMessage.timeenters the key as the samei64epoch nanoseconds.review-protocol::EventMessage.timewith a Jiff timestamp is ingested with the same timestamp semantics.review-database::EventMessagebincode payload, that should be handled as a payload compatibility issue, not DB data migration.Data migration policy
References
column_statsnoteBeta Was this translation helpful? Give feedback.
All reactions