Skip to content

Long-form teaser-thread strategy + atmosphere_long_form_composition filter#34

Open
kraftbj wants to merge 14 commits intotrunkfrom
add/long-form-teaser-thread
Open

Long-form teaser-thread strategy + atmosphere_long_form_composition filter#34
kraftbj wants to merge 14 commits intotrunkfrom
add/long-form-teaser-thread

Conversation

@kraftbj
Copy link
Copy Markdown
Contributor

@kraftbj kraftbj commented Apr 24, 2026

Fixes DOTCOM-16886
See DOTCOM-16810 (parent epic)

Proposed changes:

Introduces long-form composition strategies for Bluesky posts, with the default (link-card) preserved so existing sites see no change.

  • New atmosphere_long_form_composition filter selects the long-form composition strategy. Accepts 'link-card' (default, unchanged behavior), 'truncate-link' (single post, body-as-text + permalink), or 'teaser-thread' (2-post thread: hook + CTA-with-link).
  • New atmosphere_teaser_thread_posts filter lets downstream override the default 2-post composition (e.g. expand to 3 posts, customize CTA copy).
  • New _atmosphere_bsky_thread_records post meta: ordered array of { uri, cid, tid } triples tracking every bsky post in a thread. Existing _atmosphere_bsky_uri / _atmosphere_bsky_tid / _atmosphere_bsky_cid continue to mirror the root post for backwards compatibility.
  • New _atmosphere_bsky_orphan_records post meta: populated only when a thread publish fails and the compensating-delete rollback also fails, so operators have a durable manifest of records alive on the PDS but no longer tracked locally.
  • New atmosphere_pre_apply_writes filter: short-circuits API::apply_writes when a non-null value is returned. Used by tests and external harnesses (FOSSE e2e) to mock/inspect the write batch.
  • Publisher::publish/update/delete reworked to handle multi-record threads via sequential writes with partial-meta writes and compensating-delete rollback on failure.
  • Atmosphere::on_before_delete now reads META_THREAD_RECORDS and schedules a delete covering every bsky tid, so force-deleting a thread-strategy post cleans up the whole thread instead of leaking the non-root replies.

Why sequential writes for thread publishes instead of one atomic applyWrites? Reply records need the parent's CID. CIDs are deterministic hashes of canonical DAG-CBOR — clients can compute them ahead of time (the TypeScript SDK does this to batch threads atomically) — but there is no DAG-CBOR encoder in our PHP dependencies yet, so we rely on the PDS to return the CID in its create response. We mitigate the partial-failure window with partial-meta writes after each successful create (so crash recovery can surface any orphans).

See FOSSE SDD for full design context, alternatives considered, and open questions.

Default behavior is unchanged. atmosphere_long_form_composition defaults to 'link-card'. Existing sites behave exactly as today unless they opt into a different strategy (or install a downstream like FOSSE that projects the value).

Publisher::update() semantics for thread-strategy posts:

  • Same record count → in-place update. When the new composition has the same number of records as the stored thread, every record (root + replies + doc) is rewritten via applyWrites#update in one atomic batch. TIDs and URIs are preserved, so external replies remain valid and Bluesky URLs stay stable. Each record gets a new CID; reply refs are built from the pre-update CIDs (self-consistent at write time, and clients treat any post-update CID mismatch as "parent was edited" rather than broken). `createdAt` is pinned to `post_date_gmt`.
  • Different record count or strategy change → delete + republish. When the shape changes (link-card ↔ teaser-thread, or thread length changes), every existing record is deleted and the post is republished fresh. In this path, external replies to the original thread are orphaned (their `reply.root` points at a deleted AT URI), and the replaced posts surface to followers with a fresh `createdAt`.

Other information:

  • Have you written new tests for your changes, if applicable?

Testing instructions:

  • `composer test` — unit tests cover composition branches, sequential-write happy path, rollback on mid-thread failure, rollback-of-rollback with orphan-manifest persistence, in-place thread update, strategy-change rewrite, delete covering the whole thread, and legacy single-record fallback.
  • Manual: install with `fosse-long-form-strategy` downstream (or call `add_filter( 'atmosphere_long_form_composition', fn() => 'teaser-thread' )`), publish a titled post of 500+ words, verify two `app.bsky.feed.post` records land with `reply` refs. Edit the post content and verify the two records are updated in place (same TIDs/URIs, new content).

Changelog entry

  • Automatically create a changelog entry from the details below.
Changelog Entry Details

Significance

  • Minor

Type

  • Added - for new features

Message

Long-form posts can now be published as a Bluesky thread. A new filter lets sites choose between a single link-card post (default, unchanged from today), a single post that combines body text with the permalink, or a 2-post teaser thread that leads with a hook and follows with a "continue reading" link. Post edits update every record in place when the shape of the post hasn't changed, so Bluesky URLs stay stable and people's replies don't get orphaned. If the shape of the post changes (for example, switching between a single post and a thread), the old records are replaced with fresh ones.

This branch introduces Atmosphere-side support for multi-post thread
composition for long-form posts. Tracks the FOSSE SDD at
https://github.com/Automattic/fosse/tree/trunk/sdd/long-form-bluesky-strategy.

Opening as draft to start the async review window. Implementation
lands in follow-up commits on this branch.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
kraftbj added 5 commits April 23, 2026 20:33
…osition methods

New on Atmosphere\Transformer\Post:

- `META_THREAD_RECORDS` constant for the per-WP-post ordered array
  of `{ uri, cid, tid }` triples that Publisher will populate once it
  knows about threads (next commit).
- `build_long_form_records()` as the long-form counterpart to
  `transform()`. Branches on `atmosphere_long_form_composition`
  (default `'link-card'`, i.e. today's behavior unchanged) to produce
  one record for `'link-card'` / `'truncate-link'`, or 2+ records for
  `'teaser-thread'`. `transform()` is unchanged — legacy callers keep
  today's contract.
- `build_teaser_thread()` (private) composes the default hook + CTA
  pair, filterable via `atmosphere_teaser_thread_posts`. Hook uses the
  post's excerpt when set, else the first ~280 chars of the body cut
  at a sentence boundary so the hook never ends mid-sentence right
  before the CTA.
- `build_truncate_link_text()` (private) for the single-post
  body-plus-permalink strategy.
- `truncate_to_budget()` (private) — budget-aware truncation that
  prefers a sentence break, falls back to word boundary, then hard-caps
  with an ellipsis. Uses mb_strlen, matching the existing
  `truncate_text()` helper.
- `is_short_form_post()` (public) wraps the existing
  `atmosphere_is_short_form_post` filter + private `is_short_form()`
  discriminator so Publisher can branch on short vs. long without
  duplicating the filter call inside its own code.
- Empty-body guard: `'teaser-thread'` and `'truncate-link'` silently
  degrade to `'link-card'` when the post has neither a 10+-char
  excerpt nor a 10+-char body. Logs a notice so ops can tell the
  fallback from an intentional link-card configuration.

New on Atmosphere\API:

- `atmosphere_pre_apply_writes` filter. Returning a non-null value
  short-circuits the PDS call, so PHPUnit and the FOSSE e2e harness
  can observe the write batch without a real DPoP round-trip
  (`pre_http_request` fires inside `wp_remote_request`, too late to
  mock past the DPoP-proof step in test environments without a real
  JWK).

Default behavior is unchanged: `atmosphere_long_form_composition`
defaults to `'link-card'`. Existing sites see exactly today's output
unless a downstream projects a different value.

Tests cover these additions in the next commit.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
Adds 16 unit tests under Atmosphere\Tests\Transformer\Test_Post:

- 6 tests for `truncate_to_budget()` via reflection — sentence-preferred
  cut, trailing close-punctuation, word-boundary fallback, word-only
  when `prefer_sentence` is false, and hard-cap for a single long
  token.
- 10 tests for `build_long_form_records()` — default (link-card)
  matches `transform()` output, `atmosphere_transform_bsky_post` fires
  per entry for threads, `truncate-link` branch, `teaser-thread`
  default 2-entry shape with sentence-cut hook and link-faceted CTA,
  word-boundary fallback when no sentence break exists, excerpt
  precedence, empty-body downgrade to link-card, 3-entry extension
  via `atmosphere_teaser_thread_posts`, `langs` consistency,
  per-record facet extraction, and unknown-strategy fallback.

Also adds a small observability hook on the downgrade path:
`atmosphere_long_form_strategy_downgraded` fires with the post,
requested strategy, and effective strategy, so ops and the
downgrade test can distinguish the fallback from an intentional
`'link-card'` configuration.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
…s and rollback

Reshapes Publisher around the idea that one WordPress post may map to
N Bluesky feed posts (a thread) plus exactly one site.standard.document.

**publish()**
- Branches on `Post::is_short_form_post()`. Short-form stays on today's
  single-record path via `transform()`.
- Long-form calls the new `build_long_form_records()`. 1-entry output
  (link-card / truncate-link) goes through `publish_single()`; N-entry
  output goes through `publish_thread()`.
- `publish_single()` preserves today's atomic applyWrites (bsky + doc)
  and additionally populates `META_THREAD_RECORDS` as a 1-entry array
  so the new key is always present after a successful publish.
- `publish_thread()` writes root + doc atomically, persists partial
  meta immediately, then writes each reply on its own `applyWrites`.
  Reply refs (`reply.root` / `reply.parent`) are filled from the
  already-persisted thread records since CIDs only arrive via the PDS
  response to the prior create — we can't assemble one atomic batch
  for the whole thread.

**Rollback**
On a mid-thread failure, `rollback_thread()` issues compensating
deletes tail-first + a doc delete, clears all meta, and returns the
original error. If rollback itself fails, the returned WP_Error wraps
both errors and carries `partial_records` so an operator can clean up
by hand.

**update()**
- Reads `META_THREAD_RECORDS` with legacy fallback to single-record meta.
- Single stored + single new composition → in-place applyWrites#update
  on bsky + doc, matching today's behavior.
- Any other shape (single ↔ thread, thread → thread with different
  lengths) → delete all existing records atomically, then republish.
  Documented side effect: followers see the thread updates as a fresh
  publish (new `createdAt`); external replies to the original thread
  are orphaned.

**delete()**
- Reads `META_THREAD_RECORDS` with legacy fallback; batches one
  applyWrites with N bsky deletes + 1 doc delete. Clears every meta
  key on success; leaves meta intact on failure so a retry can
  complete.

**Meta shape**
- New `META_THREAD_RECORDS` is the canonical list of ordered
  `{ uri, cid, tid }` triples.
- Legacy `META_URI` / `META_TID` / `META_CID` continue to mirror the
  root record for backwards compatibility and because
  `Document::transform()` still reads them to compose the bskyPostRef.
- `clear_all_record_meta()` helper keeps the six keys in sync on
  delete / rollback paths.

**Not touched**
- `delete_by_tids()` signature preserved for queued-cron backwards
  compatibility. `on_before_delete` in class-atmosphere.php still
  captures a single TID; force-deleting a thread-published post
  leaves thread tails on the PDS. Follow-up task will update the
  on_before_delete / cron hook to capture the full thread.
- Short-form behavior is unchanged — `transform()`'s `createdAt`
  (from `post_date_gmt`) still flows through `publish_single()`.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
Adds 10 unit tests covering the new long-form / thread flows, using the
`atmosphere_pre_apply_writes` filter as the test seam (DPoP-dependent
`pre_http_request` mocking can't reach the wp_remote_request layer in
the test environment's auth setup).

Publish path:
- link-card writes one atomic applyWrites with 2 creates and mirrors
  the root into `META_THREAD_RECORDS` as a 1-entry array.
- teaser-thread writes root + doc atomically, then the CTA reply as a
  second applyWrites with `reply.root` / `reply.parent` pointing at the
  root's {uri, cid}.
- partial meta is persisted between the root write and the reply
  write — asserted via a meta snapshot taken inside the capture
  filter when the reply call fires.
- happy-path final meta is an ordered 2-entry `META_THREAD_RECORDS`
  array, with legacy single-value keys mirroring the root.

Rollback path:
- second-write failure triggers compensating deletes (root bsky +
  doc), clears every meta key, and returns the original WP_Error.
- when the rollback applyWrites itself also fails, the returned
  `WP_Error` has code `atmosphere_thread_rollback_failed` and data
  containing `original_error`, `rollback_error`, and
  `partial_records`.

Update / delete paths:
- single stored + single new composition uses in-place
  `applyWrites#update` on both bsky and doc (one applyWrites call),
  matching today's link-card update behavior.
- strategy change (stored single → composed thread) issues a delete
  of the old record + doc, then publishes fresh as a thread.
- thread delete batches every bsky delete + the doc delete into one
  atomic applyWrites and clears all meta.
- legacy single-record meta (no `META_THREAD_RECORDS`) still deletes
  correctly via the fallback in `stored_thread_records()`.

Linear: https://linear.app/a8c/issue/DOTCOM-16810
@pfefferle
Copy link
Copy Markdown
Member

Early review — read through the branch, a few things worth flagging before this gets much further. Ordered by impact.

1. Default behavior is changed: createdAt shifts from post date to publish-run time

The description says "Default behavior is unchanged — atmosphere_long_form_composition defaults to 'link-card', existing sites behave exactly as today." That isn't quite right.

  • Today: Post::transform() sets createdAt from $post->post_date_gmt (line 99).
  • This PR: for long-form (including the default link-card), the record goes through build_long_form_records()record_for_link_card(), which omits createdAt. publish_single then stamps \wp_date( 'c' ) — the current time.

Consequences:

  • A normal long-form publish shows up on bsky.app with a timestamp different from the WP publish date.
  • Backfill is much worse: every re-synced post gets stamped with the backfill-run timestamp, torpedoing the chronological ordering of the account's timeline.

Fix: propagate $post->post_date_gmt through build_long_form_records / record_for_link_card so the default strategy produces byte-identical output to transform(). Thread replies that are genuinely fresh posts can still use wp_date('c').

2. Thread-aware force-delete is missing on the WP side

The new comment in delete_by_tids says "force-deletion of thread-strategy posts is handled by the caller by reading META_THREAD_RECORDS pre-deletion and scheduling a separate cron per thread record." But the only caller — Atmosphere::on_before_delete — hasn't been updated:

$bsky_tid = \get_post_meta( $post_id, Transformer\Post::META_TID, true );
$doc_tid  = \get_post_meta( $post_id, Transformer\Document::META_TID, true );
\wp_schedule_single_event( \time(), 'atmosphere_delete_records', array( $bsky_tid, $doc_tid ) );

It only captures the root + doc. Force-deleting a thread-strategy post (empty-trash, wp post delete --force, before_delete_post from another plugin) leaks every non-root thread reply to Bluesky. This is a real data-consistency bug, not just a doc caveat.

3. Rollback's partial_records payload is effectively swallowed

rollback_thread returns a WP_Error with partial_records in the error data when rollback itself fails. But the cron closures that call Publisher::publish only read get_error_code() + get_error_message(). Nobody inspects $error->get_error_data()['partial_records'], so the orphan-record manifest is lost the moment the cron event returns. Either log the partial records alongside the error message, or persist them in post meta (e.g. _atmosphere_bsky_orphan_records) so an operator or recovery worker can find them later.

4. Thread update is delete-and-republish; the alternative isn't addressed

The PR calls out the follower-feed churn and orphaned-external-replies consequences and routes affected callers to "pin to a single-post strategy". But applyWrites#update on each existing thread record would do an in-place rewrite — same TIDs, same URIs, replies remain valid, createdAt preserved. The only cases where delete-and-republish is strictly required are record-count changes (thread length shrinks / grows) or strategy changes (thread ↔ single). For equal-length thread → thread, an N-record applyWrites#update batch would work.

Not a blocker, but worth a line in the description: "We considered in-place thread updates and rejected them because X", or the code could prefer them when the shape matches.

5. Minor

  • $bsky_transformer->get_rkey(); called as a side-effect-only statement at the top of publish() then again inside publish_single/publish_thread. Pick one.
  • store_results() and mirror_thread_records_meta() both write Post::META_URI / META_TID / META_CID for the root. Two code paths → easy to drift. Mirror-at-the-end only, or drop the mirroring from store_results.
  • record_for_thread_entry() fires atmosphere_transform_bsky_post once per thread entry. The existing filter contract was one-call-per-post. Listeners that accumulate state (rate-limit counters, lint checks) will double-count. Worth a heads-up note in the filter docblock, or a new filter name for thread entries.
  • The description says reply CIDs can only come from the PDS response, "a protocol constraint". It's actually a PHP-library constraint — CIDs are deterministic hashes of canonical DAG-CBOR and can be computed client-side (the TypeScript SDK does this to batch threads atomically). Just no DAG-CBOR encoder in the PHP codebase yet. Worth a one-line correction so the sequential-write choice reads as a pragmatic trade-off rather than a protocol requirement.

Things that look good

  • Keeping transform() byte-compatible for the short-form path and routing everything new through build_long_form_records() is the right shape.
  • The partial-meta-per-step pattern in publish_thread (so crash recovery has a pointer to the root before step 2 starts) is the right discipline.
  • The empty-body guard + atmosphere_long_form_strategy_downgraded observability hook is a nice touch.

kraftbj and others added 8 commits April 24, 2026 11:30
…-place thread update

Five changes informed by the early review feedback on #34:

1. Preserve the post's publish date as `createdAt` on every record
   representing the post itself (short-form transform(), link-card,
   truncate-link, and thread root). Previously the long-form paths
   fell through to `wp_date('c')`, which stamped the backfill-run
   time on re-synced posts and collapsed timeline ordering.

2. Thread-aware force-delete. `Atmosphere::on_before_delete` now reads
   `Post::META_THREAD_RECORDS` and schedules a delete covering every
   bsky tid plus the doc; `Publisher::delete_by_tids` accepts an array
   so one applyWrites covers the full thread. Legacy single-TID
   callers still work unchanged.

3. Persist rollback orphans to post meta. When a thread publish fails
   and the compensating rollback itself fails, the partial record list
   now lands in `Post::META_ORPHAN_RECORDS` (and error_log) instead of
   disappearing with the cron closure's WP_Error.

4. In-place applyWrites#update for thread posts when the record count
   is unchanged. Preserves TIDs/URIs and external replies; only CIDs
   change. Falls back to delete+republish when the thread shape
   changes (single ↔ thread, 2-post ↔ 3-post).

5. Cleanups: drop redundant get_rkey() side-effect calls from
   publish(); split legacy store_results() into a document-only
   helper so the root's single-record meta is mirrored from one
   place; document that `atmosphere_transform_bsky_post` fires per
   record (not per post) when the thread strategy is active.
The assertion `assertDoesNotMatch('~\S$~', $hook)` required the hook to
end with whitespace, which contradicts
`test_truncate_to_budget_falls_back_to_word_boundary_when_no_sentence`
(expects 'The quick brown fox' with no trailing space). The test name
and comment both say "not mid-word" — that's the actual intent. Since
the corpus is built of 8-char words, a partial trailing word is any
`\s\S{1,7}$` tail.
Consolidated findings from a 3-agent parallel review pass:

- Unify reply `createdAt` across publish_thread and update_thread_in_place.
  Both now stamp every thread record with `post_date_gmt`, so backfill,
  fresh publish, and in-place update all agree on the post's publish
  date. Previously the publish path stamped replies with `wp_date('c')`,
  which would stamp replies with the backfill-run time.

- Validate `atmosphere_pre_apply_writes` return. A filter that returns
  a scalar or object would fatal on the `array|\WP_Error` return type.
  Now normalized to a `WP_Error` with a clear code.

- Harden `atmosphere_teaser_thread_posts`. Non-iterable or non-string
  entries fall back to the default pair; the list is capped at 5 to
  bound PDS rate-limit blast radius if a downstream returns something
  unreasonable.

- Carry post tags on the thread root (index 0), matching the
  link-card and short-form record shape. Replies remain conversational
  and omit tags.

- Index reply URIs for reaction sync. New `META_URI_INDEX` multi-row
  meta key mirrors every thread record URI; `find_post_by_bsky_uri`
  in Reaction_Sync now falls back to the index so a like or repost on
  a reply post resolves back to the originating WP post.

- Cap `META_ORPHAN_RECORDS` at 10 entries so a crash-looping rollback
  can't grow the meta row past MySQL limits.

- Align `has_composable_body` with `build_teaser_thread`: both now
  require an excerpt of ≥ 10 chars before preferring it over the body.

- Memoize `render_post_content_plain()` per instance so thread
  composition doesn't re-run `the_content` multiple times per pass.

- Tests: `delete_by_tids` with array/string/empty; `on_before_delete`
  thread-aware vs legacy vs no-meta scheduling; `META_URI_INDEX` is
  populated; `atmosphere_pre_apply_writes` malformed return surfaces
  as WP_Error; stale docblock references updated.
…sponse

Previously publish_thread returned a bare WP_Error on this path,
leaving the root + doc records live on the PDS while the local TID
was still stored from get_rkey(). A retry reused the same TID and
the PDS rejected the create, stuck state. Route through
rollback_thread so retry starts from a clean slate; on rollback
failure the orphan manifest surfaces as for any other thread
rollback.
…overy

Two real findings from an adversarial pass that Pass 1/Pass 2 missed:

1. rewrite_thread deleted remote records before proving the replacement
   publish would succeed. If the republish failed (transient PDS error,
   rate limit, malformed payload), the old thread was gone from Bluesky,
   the new one was absent, and local meta was already cleared. The
   deleted records themselves are unrecoverable, but we now persist a
   rewrite-phase manifest to META_ORPHAN_RECORDS so operators have an
   audit trail, and retry of update() self-heals via publish() (stored
   records are cleared, so new TIDs are generated).

2. update()'s "missing document URI" repair branch called publish()
   directly, which reuses the existing bsky TID via get_rkey() — the
   PDS then rejected the create as "already exists." Self-heal was
   broken. Now routes through rewrite_thread with an empty doc_tid,
   which deletes the orphan bsky records and republishes fresh.

rewrite_thread now also guards against empty tids and empty doc_tid
so it can handle the partial-state recovery shape.

Tests cover both paths.
…ordpress-atmosphere into add/long-form-teaser-thread
A TID without a URI means Transformer::get_rkey() reserved the rkey
locally but the create never landed on the PDS — most commonly after
a failed publish or a rewrite_thread republish that clears meta
mid-step. stored_thread_records() previously returned such a bare TID
as a recoverable record, which caused the next update() to attempt a
delete of a non-existent PDS record and the retry to loop.

Now the legacy fallback requires a non-empty URI, matching how
publish() actually persists state. The reserved ghost TID is harmless:
the next publish() reuses it (get_rkey returns the stored value) and
the PDS accepts the create normally.
@kraftbj
Copy link
Copy Markdown
Contributor Author

kraftbj commented Apr 24, 2026

Thanks for the careful read — all 5 addressed in 72ac474 (with extra review passes landing in follow-up commits). Walking through in the same order:

1. createdAt drift. Real regression. Fixed by centralising the fallback in Publisher::publish_single / update_single / publish_thread's root to stamp to_iso8601( $post->post_date_gmt ) instead of wp_date( 'c' ). Short-form transform() already set it; the long-form paths now match, so backfill and normal publishes both land on the WP publish date. Later passes also unified thread replies on the same timestamp (they were stamping wp_date('c') in the publish path while the update path used post_date_gmt — in-place edits would've silently rewound reply createdAt after any edit).

2. Thread-aware force-delete. Real bug. Atmosphere::on_before_delete now reads Post::META_THREAD_RECORDS pre-deletion and schedules a single atmosphere_delete_records cron with the full tid array. Publisher::delete_by_tids accepts string|array, so one applyWrites covers the whole thread (root + replies + doc). Legacy single-tid callers keep working; the doc-comment on delete_by_tids now reflects the new contract rather than apologizing for not matching it.

3. Orphan manifest disappearing. New post meta Post::META_ORPHAN_RECORDSrollback_thread writes the partial state there when the compensating delete fails, in addition to error_log and the WP_Error data. Value is an append-only list (capped at 10 entries to bound meta_value growth) of { stamp, bsky_records, doc_rkey, original_error, rollback_error } entries so repeated failures on the same post all survive for manual cleanup. Test coverage added.

4. Delete-and-republish for thread updates. Implemented in-place applyWrites#update — when count($stored) === count($new_records) and both > 1, update_thread_in_place rewrites every record + doc in one atomic batch. TIDs and URIs are preserved, external replies stay valid, createdAt is pinned to post_date_gmt. Reply refs are built from the pre-update CIDs (structurally self-consistent at write time; clients treat any post-update CID mismatch as "parent was edited"). After the call, META_THREAD_RECORDS is refreshed with the new CIDs from the response so subsequent updates chain from current state. rewrite_thread (delete + republish) is now only taken on a true shape change — strategy swap or different record count — and on failure now persists a rewrite-phase manifest to META_ORPHAN_RECORDS so operators have an audit trail of what was deleted.

5. Minors.

  • Dropped the side-effect-only get_rkey() calls at the top of publish()publish_single/publish_thread generate lazily when building writes, so the pre-call was just noise.
  • store_results() renamed to store_document_meta() and reduced to owning doc meta only. mirror_thread_records_meta() is the single writer for Post::META_URI / META_TID / META_CID.
  • The atmosphere_transform_bsky_post docblock now calls out that the filter fires per-record under teaser-thread, with a pointer for accumulating listeners on how to branch per-entry.
  • You're right about the CID constraint — it's a PHP-ecosystem limitation (no DAG-CBOR encoder in our deps yet), not a protocol one. Updated the PR description.

Also fixed 4 failing tests on the way: one real truncate_to_budget bug (the preg_replace word-boundary fallback fired even when no whitespace existed, bypassing the hard-cap ellipsis), and three tests that didn't set 'post_excerpt' => '' and got bitten by the WP test factory's auto-generated excerpt.

Follow-up review passes surfaced a few more things that landed as bonus fixes: reaction-sync now resolves reply URIs via a new META_URI_INDEX (previously a like/repost on a reply post wouldn't resolve back to the WP post); atmosphere_pre_apply_writes guards against malformed filter returns that would otherwise fatal on the typed return; atmosphere_teaser_thread_posts is sanitised and capped; and the legacy stored_thread_records fallback now requires a URI so a reserved-but-unpublished TID doesn't loop retry through non-existent PDS records.

CI is green across the full phpunit matrix. Flipping to Ready for Review.

@kraftbj kraftbj marked this pull request as ready for review April 24, 2026 20:50
@kraftbj kraftbj requested review from Copilot and pfefferle April 24, 2026 21:16
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds opt-in long-form composition strategies for publishing WordPress posts to Bluesky, including a new “teaser thread” mode that publishes multi-post threads while preserving existing default behavior (link-card) for current installs.

Changes:

  • Added long-form composition strategy selection via filters and implemented thread-capable record composition in the post transformer.
  • Reworked publishing/update/delete flows to support multi-record threads with sequential writes, partial-meta persistence, and compensating-delete rollback on failure.
  • Extended reaction sync and before-delete cleanup to resolve and delete non-root thread replies via a per-URI index and thread record manifest.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
includes/transformer/class-post.php Adds long-form composition strategies, thread record builders, and new thread/orphan meta key constants.
includes/transformer/class-base.php Memoizes plain-text rendering to avoid repeated the_content filter work during composition.
includes/class-publisher.php Implements thread-aware publish/update/delete, thread meta mirroring/indexing, rollback, and rewrite flows.
includes/class-api.php Adds atmosphere_pre_apply_writes short-circuit filter to mock/inspect applyWrites in tests/harnesses.
includes/class-atmosphere.php Updates before-delete scheduling to delete all thread TIDs (root + replies) in one cron event.
includes/class-reaction-sync.php Resolves WP posts by reply AT-URI using the new URI index meta.
tests/phpunit/tests/transformer/class-test-post.php Adds unit coverage for truncation logic and long-form composition strategy branches.
tests/phpunit/tests/class-test-status-change.php Adds tests ensuring before-delete schedules deletion for all thread TIDs (and legacy single-TID behavior).
tests/phpunit/tests/class-test-publisher.php Adds extensive tests for thread publish/update/delete, rollback, orphan manifests, and pre-apply short-circuiting.
.github/changelog/long-form-teaser-thread Changelog entry describing the new long-form thread feature and strategy options.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread includes/class-api.php
Comment on lines +188 to +192
$short_circuit = \apply_filters( 'atmosphere_pre_apply_writes', null, $writes );

if ( \is_array( $short_circuit ) || \is_wp_error( $short_circuit ) ) {
return $short_circuit;
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The atmosphere_pre_apply_writes short-circuit accepts any array as a successful applyWrites response, but nothing validates that it has the expected shape (e.g. a results key with an array of entries matching the writes). A malformed array return will be treated as success and can silently skip meta persistence or yield inconsistent state. Consider validating/normalizing the short-circuit array (or returning a WP_Error when required keys are missing) to keep behavior predictable for downstream harnesses.

Copilot uses AI. Check for mistakes.
Comment on lines +496 to +502
$permalink = \get_permalink( $this->object );
$plain = $this->render_post_content_plain( $this->object );
$budget = 300 - \mb_strlen( "\n\n" ) - \mb_strlen( $permalink );

$body = $this->truncate_to_budget( $plain, $budget, false );

return $body . "\n\n" . $permalink;
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_truncate_link_text() computes $budget as 300 - 2 - mb_strlen($permalink), which can become 0 or negative for unusually long permalinks. Passing a non-positive $max into truncate_to_budget() leads to surprising mb_substr() behavior (negative length) and can produce invalid/over-budget output. Add a guard to clamp the budget to a sane minimum (or fall back to a permalink-only/link-card path) so the final text always respects the 300-char limit.

Suggested change
$permalink = \get_permalink( $this->object );
$plain = $this->render_post_content_plain( $this->object );
$budget = 300 - \mb_strlen( "\n\n" ) - \mb_strlen( $permalink );
$body = $this->truncate_to_budget( $plain, $budget, false );
return $body . "\n\n" . $permalink;
$max_length = 300;
$separator = "\n\n";
$permalink = \get_permalink( $this->object );
$plain = $this->render_post_content_plain( $this->object );
if ( \mb_strlen( $permalink ) >= $max_length ) {
return $this->truncate_to_budget( $permalink, $max_length, false );
}
$budget = $max_length - \mb_strlen( $permalink );
if ( $budget <= \mb_strlen( $separator ) ) {
return $permalink;
}
$body = $this->truncate_to_budget( $plain, $budget - \mb_strlen( $separator ), false );
return $body . $separator . $permalink;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

[Feature] API PDS API client [Feature] Publisher Publishing to AT Protocol [Feature] Transformer AT Protocol record transformers [Status] In Progress [Tests] Includes Tests PR includes test changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants