Skip to content

docs(checks): close resolved TBDs in data-storage, events, README#481

Open
SgtPooki wants to merge 5 commits intomainfrom
docs/close-resolved-tbds
Open

docs(checks): close resolved TBDs in data-storage, events, README#481
SgtPooki wants to merge 5 commits intomainfrom
docs/close-resolved-tbds

Conversation

@SgtPooki
Copy link
Copy Markdown
Collaborator

@SgtPooki SgtPooki commented Apr 27, 2026

Summary

Closes out TBD items in docs/checks/ that have already been implemented in code. Each closure was verified against the current source.

Verified done (this PR)

  • data-storage.md assertions Feat: use sdk's next version #3, feat(bot): allow chain configuration #5, feat: upgrade sdk version to 0.24.1 #6, feat: add health check route #7Yes. Code refs:
    • pieceConfirmedOnChainMs histogram, DealStatus.PIECE_CONFIRMED (deal.service.ts:386).
    • ipniVerifyMs + IpniStatus.VERIFIED (ipni.strategy.ts, ipni-verification.service.ts).
    • Inline retrieval at deal.service.ts:454 (testAllRetrievalMethods); deal throws on failure.
    • Deal only reaches DEAL_CREATED after all sub-checks pass; dataStorageCheckMs emitted.
  • data-storage.md Synapse callback names corrected to plural (onPiecesAdded, onPiecesConfirmed) to match deal.service.ts:345,370.
  • data-storage.md TBD_VARIABLE poll-interval refs replaced:
    • Section 5: SP piece-status poll → hardcoded POLLING_INTERVAL_MS = 2500 in ipni.strategy.ts:46.
    • Section 6: IPNI verification poll → IPNI_VERIFICATION_POLLING_MS (default 2000, was incorrectly documented as 5s).
  • data-storage.md Section 7 header drops — **TBD**; intro disclaimer removed.
  • data-storage.md ## TBD Summary rewritten as ## Implementation History with code references for: inline retrieval verification, CID-based content verification (per-block sha256 via createBlock in ipfs-block.strategy.ts:230), per-deal max time limit (AbortController in jobs.service.ts:443DealStatus.FAILED at deal.service.ts:515), gated status, status model, onPieceConfirmed, IPFS gateway retrieval, filecoin-pin CAR conversion.
  • events-and-metrics.md Event List reframed: entries are timing markers used to define metric Timer Starts/Ends, not necessarily emitted Prometheus events. Added a note clarifying this so the "Implemented" column reflects whether the marker is tracked in code, not whether dealbot emits a same-named event.
    • uploadToSpStart → Yes (anchor: deal.uploadStartTime at deal.service.ts:255).
    • ipniVerificationStart → Yes (anchor: ipniVerificationStartTime at ipni-verification.service.ts:63, drives ipniVerifyMs).
    • ipfsRetrievalStart → Yes (anchor: retrieval startTime at retrieval-addons.service.ts:227; logs retrieval_started).
    • ipfsRetrievalFirstByteReceived / LastByteReceived → Yes (drive the matching histograms).
    • ipfsRetrievalIntegrityChecked → Yes (per-block sha256 in ipfs-block.strategy.ts; inline, no discrete event).
    • pieceConfirmed → Yes (onPiecesConfirmedpieceConfirmedOnChainMs).
    • dealCreated clarified — DEAL_CREATED only after all gates; upload alone sets UPLOADED.
    • Histogram-buckets TBD replaced with link to metrics-prometheus.module.ts.
    • Mermaid timeline updated to match the table; the unreadable opaque rgb(50, 50, 50) rect fill swapped for translucent rgba(120, 120, 200, 0.15).
  • README.md dataset-creation handler named (canonical pg-boss job type data_set_creation) with config envs.

Still TBD (not addressed here)

  • jobs.md:64: PR feat: cap deal/retrievals with abort signals #263 lookahead-skip for upcoming maintenance windows is not implemented. getMaintenanceWindowStatus only checks if a window is currently active. Either implement or rewrite the doc.
  • production-configuration-and-approval-methodology.md:43: PDP_SUBGRAPH_ENDPOINT production value — needs deployment-config access (not in this repo).

Tracking: #280

Test plan

  • Skim rendered diff on each of the three docs
  • Confirm IPNI_VERIFICATION_POLLING_MS default reads as 2000 in app.config.ts:137
  • Confirm Section 5 SP poll interval still matches POLLING_INTERVAL_MS constant in ipni.strategy.ts
  • Confirm Mermaid diagrams render (no opaque blocks hiding labels)

Items previously marked TBD that are now implemented in code:

- data-storage.md assertions #3, #5, #6, #7 (pieceConfirmed, IPNI
  discoverability, retrievability, all-checks-gated) -> Yes.
- data-storage.md poll intervals: replace TBD_VARIABLE refs with the
  concrete sources (hardcoded POLLING_INTERVAL_MS = 2.5s for SP piece
  status, IPNI_VERIFICATION_POLLING_MS env var with 2s default for IPNI
  verification; doc previously claimed 5s).
- data-storage.md section 7 header drops TBD; intro disclaimer removed.
- data-storage.md "TBD Summary" rewritten as "Implementation History"
  with code references for inline retrieval, CID integrity, per-deal
  timeout (AbortController -> DealStatus.FAILED), gated status, status
  model, onPieceConfirmed, IPFS gateway retrieval, filecoin-pin CAR.
- events-and-metrics.md: pieceConfirmed -> Yes (pieceConfirmedOnChainMs
  histogram); ipfsRetrievalIntegrityChecked -> implemented inline via
  per-block sha256 verification in ipfs-block.strategy.ts (no discrete
  event); ipfsRetrievalFirstByte/LastByteReceived marked Partial since
  duration histograms exist but no discrete event; histogram-buckets
  TBD replaced with link to metrics-prometheus.module.ts.
- README.md: name the dataset-creation job (data-set-creation) and
  reference its config envs.

Still TBD (not changed in this commit): uploadToSpStart,
ipniVerificationStart, ipfsRetrievalStart events; jobs.md PR #263
lookahead-skip; PDP_SUBGRAPH_ENDPOINT production value.
Copilot AI review requested due to automatic review settings April 27, 2026 19:43
@FilOzzy FilOzzy added this to FOC Apr 27, 2026
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FOC Apr 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates docs/checks/ to close out previously-marked TBDs by reflecting what is already implemented in the backend (events, metrics, polling intervals, and dataset-creation job naming/config).

Changes:

  • Mark several Data Storage assertions as implemented; replace placeholder poll-interval TBDs with concrete config/constants.
  • Update Events & Metrics documentation to reflect implemented on-chain confirmation + retrieval integrity behavior and point histogram buckets to the Prometheus metrics module.
  • Update checks README to name the dataset creation job and list its relevant configuration env vars.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
docs/checks/events-and-metrics.md Updates event/metric implementation status and references (piece confirmation, retrieval integrity, histogram bucket source).
docs/checks/data-storage.md Marks previously-TBD assertions as implemented; documents actual polling intervals and adds “Implementation History”.
docs/checks/README.md Documents the dataset-creation job and its configuration knobs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/checks/data-storage.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/README.md Outdated
- data-storage.md: rename Synapse callbacks to plural form
  (onPiecesAdded, onPiecesConfirmed) to match deal.service.ts.
- events-and-metrics.md: same rename in the event list. Clarify that
  dealCreated maps to DealStatus.DEAL_CREATED only after all gates pass
  (upload alone sets UPLOADED, not DEAL_CREATED).
- events-and-metrics.md: ipfsRetrievalIntegrityChecked downgraded from
  Yes to Partial since no discrete event is emitted (inline check
  only).
- events-and-metrics.md: Mermaid timeline now matches the table -
  ipfsRetrievalFirstByteReceived/LastByteReceived labelled as
  "Partial: histogram only", ipfsRetrievalIntegrityChecked labelled
  "Partial: inline check, no event".
- README.md: refer to the canonical pg-boss job type
  data_set_creation (underscore) so operators can map the doc to
  jobType values.
@SgtPooki SgtPooki self-assigned this Apr 27, 2026
@SgtPooki SgtPooki moved this from 📌 Triage to ⌨️ In Progress in FOC Apr 27, 2026
@SgtPooki SgtPooki moved this from ⌨️ In Progress to 🔎 Awaiting review in FOC Apr 27, 2026
@SgtPooki SgtPooki requested a review from silent-cipher April 27, 2026 19:58
The 'Data Storage Only' rect used rgb(50, 50, 50), which renders as a
near-black block that hides the message labels and arrows inside it
(both on GitHub light/dark themes). Switch to a translucent
rgba(120, 120, 200, 0.15) so the highlight is visible without
obscuring content.
The 'events' in this doc are named anchors used to define metric Timer
Starts/Ends; dealbot does not necessarily emit each as a discrete
Prometheus event or log line. Add an explicit note up top so readers
don't expect every entry to map to an emitted event, and update rows
that were marked TBD/Partial purely because no discrete event is
emitted.

- uploadToSpStart -> Yes (anchor: deal.uploadStartTime in
  deal.service.ts:255).
- ipniVerificationStart -> Yes (anchor: ipniVerificationStartTime in
  ipni-verification.service.ts:63 - drives ipniVerifyMs).
- ipfsRetrievalStart -> Yes (anchor: retrieval startTime in
  retrieval-addons.service.ts:227; logs 'retrieval_started').
- ipfsRetrievalFirstByteReceived -> Yes (drives
  ipfsRetrievalFirstByteMs).
- ipfsRetrievalLastByteReceived -> Yes (drives
  ipfsRetrievalLastByteMs).
- ipfsRetrievalIntegrityChecked -> Yes (per-block sha256 in
  ipfs-block.strategy.ts; inline, no discrete event).
- Mermaid timeline: drop the (TBD) / (Partial: ...) annotations on
  these markers so the diagram and the table agree.
All rows are now Yes (each marker is anchored in code), so the column
adds no signal. Anchor details folded into the Source-of-truth column.
Intro note tightened.
| Per-deal max time limit | Done — `DEAL_JOB_TIMEOUT_SECONDS` triggers an `AbortController` in `jobs.service.ts`; on abort the deal is set to `DealStatus.FAILED` with failure-status metrics emitted. |
| Deal gated on all checks | Done — deal only reaches `DealStatus.DEAL_CREATED` after upload, onchain, IPNI, and retrieval all succeed. |
| Status model update | Done — `DealStatus` includes `PIECE_CONFIRMED`, `DEAL_CREATED`, `FAILED`; `IpniStatus` includes `SP_INDEXED`, `SP_ADVERTISED`, `VERIFIED`, `FAILED`; `RetrievalStatus` enum exists. |
| `onPieceConfirmed` callback tracking | Done — `piecesConfirmedTime` recorded, `pieceConfirmedOnChainMs` histogram emitted, `DealStatus.PIECE_CONFIRMED` state exists. |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
| `onPieceConfirmed` callback tracking | Done — `piecesConfirmedTime` recorded, `pieceConfirmedOnChainMs` histogram emitted, `DealStatus.PIECE_CONFIRMED` state exists. |
| `onPiecesConfirmed` progress event tracking | Done — `piecesConfirmedTime` recorded, `pieceConfirmedOnChainMs` histogram emitted, `DealStatus.PIECE_CONFIRMED` state exists. |

Comment thread docs/checks/README.md
## Datasets for Checks

Dealbot manages a set of datasets to use for its checks. Creating these datasets is a precondition before the "data storage" check can run. This is handled by **TBD** job ([tracking issue](https://github.com/FilOzone/dealbot/issues/284)) when it discovers a new SP to measure from the registry.
Dealbot manages a set of datasets to use for its checks. Creating these datasets is a precondition before the "data storage" check can run. This is handled by the canonical `data_set_creation` job type (see [`jobs.md`](../jobs.md), [tracking issue](https://github.com/FilOzone/dealbot/issues/284)) when it discovers a new SP to measure from the registry. Configured by `DATASET_CREATIONS_PER_SP_PER_HOUR` and `DATA_SET_CREATION_JOB_TIMEOUT_SECONDS`.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Dealbot manages a set of datasets to use for its checks. Creating these datasets is a precondition before the "data storage" check can run. This is handled by the canonical `data_set_creation` job type (see [`jobs.md`](../jobs.md), [tracking issue](https://github.com/FilOzone/dealbot/issues/284)) when it discovers a new SP to measure from the registry. Configured by `DATASET_CREATIONS_PER_SP_PER_HOUR` and `DATA_SET_CREATION_JOB_TIMEOUT_SECONDS`.
Dealbot manages a set of datasets to use for its checks. Creating these datasets is a precondition before the "data storage" check can run. This is handled by the canonical `data_set_creation` job type (see [`jobs.md`](../jobs.md) when it discovers a new SP to measure from the registry. Configured by `DATASET_CREATIONS_PER_SP_PER_HOUR` and `DATA_SET_CREATION_JOB_TIMEOUT_SECONDS`.

@BigLep BigLep moved this from 🔎 Awaiting review to ✔️ Approved by reviewer in FOC Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: ✔️ Approved by reviewer

Development

Successfully merging this pull request may close these issues.

5 participants