Long-form teaser-thread strategy + atmosphere_long_form_composition filter#34
Long-form teaser-thread strategy + atmosphere_long_form_composition filter#34
Conversation
This branch introduces Atmosphere-side support for multi-post thread composition for long-form posts. Tracks the FOSSE SDD at https://github.com/Automattic/fosse/tree/trunk/sdd/long-form-bluesky-strategy. Opening as draft to start the async review window. Implementation lands in follow-up commits on this branch. Linear: https://linear.app/a8c/issue/DOTCOM-16810
…osition methods
New on Atmosphere\Transformer\Post:
- `META_THREAD_RECORDS` constant for the per-WP-post ordered array
of `{ uri, cid, tid }` triples that Publisher will populate once it
knows about threads (next commit).
- `build_long_form_records()` as the long-form counterpart to
`transform()`. Branches on `atmosphere_long_form_composition`
(default `'link-card'`, i.e. today's behavior unchanged) to produce
one record for `'link-card'` / `'truncate-link'`, or 2+ records for
`'teaser-thread'`. `transform()` is unchanged — legacy callers keep
today's contract.
- `build_teaser_thread()` (private) composes the default hook + CTA
pair, filterable via `atmosphere_teaser_thread_posts`. Hook uses the
post's excerpt when set, else the first ~280 chars of the body cut
at a sentence boundary so the hook never ends mid-sentence right
before the CTA.
- `build_truncate_link_text()` (private) for the single-post
body-plus-permalink strategy.
- `truncate_to_budget()` (private) — budget-aware truncation that
prefers a sentence break, falls back to word boundary, then hard-caps
with an ellipsis. Uses mb_strlen, matching the existing
`truncate_text()` helper.
- `is_short_form_post()` (public) wraps the existing
`atmosphere_is_short_form_post` filter + private `is_short_form()`
discriminator so Publisher can branch on short vs. long without
duplicating the filter call inside its own code.
- Empty-body guard: `'teaser-thread'` and `'truncate-link'` silently
degrade to `'link-card'` when the post has neither a 10+-char
excerpt nor a 10+-char body. Logs a notice so ops can tell the
fallback from an intentional link-card configuration.
New on Atmosphere\API:
- `atmosphere_pre_apply_writes` filter. Returning a non-null value
short-circuits the PDS call, so PHPUnit and the FOSSE e2e harness
can observe the write batch without a real DPoP round-trip
(`pre_http_request` fires inside `wp_remote_request`, too late to
mock past the DPoP-proof step in test environments without a real
JWK).
Default behavior is unchanged: `atmosphere_long_form_composition`
defaults to `'link-card'`. Existing sites see exactly today's output
unless a downstream projects a different value.
Tests cover these additions in the next commit.
Linear: https://linear.app/a8c/issue/DOTCOM-16810
Adds 16 unit tests under Atmosphere\Tests\Transformer\Test_Post: - 6 tests for `truncate_to_budget()` via reflection — sentence-preferred cut, trailing close-punctuation, word-boundary fallback, word-only when `prefer_sentence` is false, and hard-cap for a single long token. - 10 tests for `build_long_form_records()` — default (link-card) matches `transform()` output, `atmosphere_transform_bsky_post` fires per entry for threads, `truncate-link` branch, `teaser-thread` default 2-entry shape with sentence-cut hook and link-faceted CTA, word-boundary fallback when no sentence break exists, excerpt precedence, empty-body downgrade to link-card, 3-entry extension via `atmosphere_teaser_thread_posts`, `langs` consistency, per-record facet extraction, and unknown-strategy fallback. Also adds a small observability hook on the downgrade path: `atmosphere_long_form_strategy_downgraded` fires with the post, requested strategy, and effective strategy, so ops and the downgrade test can distinguish the fallback from an intentional `'link-card'` configuration. Linear: https://linear.app/a8c/issue/DOTCOM-16810
…s and rollback
Reshapes Publisher around the idea that one WordPress post may map to
N Bluesky feed posts (a thread) plus exactly one site.standard.document.
**publish()**
- Branches on `Post::is_short_form_post()`. Short-form stays on today's
single-record path via `transform()`.
- Long-form calls the new `build_long_form_records()`. 1-entry output
(link-card / truncate-link) goes through `publish_single()`; N-entry
output goes through `publish_thread()`.
- `publish_single()` preserves today's atomic applyWrites (bsky + doc)
and additionally populates `META_THREAD_RECORDS` as a 1-entry array
so the new key is always present after a successful publish.
- `publish_thread()` writes root + doc atomically, persists partial
meta immediately, then writes each reply on its own `applyWrites`.
Reply refs (`reply.root` / `reply.parent`) are filled from the
already-persisted thread records since CIDs only arrive via the PDS
response to the prior create — we can't assemble one atomic batch
for the whole thread.
**Rollback**
On a mid-thread failure, `rollback_thread()` issues compensating
deletes tail-first + a doc delete, clears all meta, and returns the
original error. If rollback itself fails, the returned WP_Error wraps
both errors and carries `partial_records` so an operator can clean up
by hand.
**update()**
- Reads `META_THREAD_RECORDS` with legacy fallback to single-record meta.
- Single stored + single new composition → in-place applyWrites#update
on bsky + doc, matching today's behavior.
- Any other shape (single ↔ thread, thread → thread with different
lengths) → delete all existing records atomically, then republish.
Documented side effect: followers see the thread updates as a fresh
publish (new `createdAt`); external replies to the original thread
are orphaned.
**delete()**
- Reads `META_THREAD_RECORDS` with legacy fallback; batches one
applyWrites with N bsky deletes + 1 doc delete. Clears every meta
key on success; leaves meta intact on failure so a retry can
complete.
**Meta shape**
- New `META_THREAD_RECORDS` is the canonical list of ordered
`{ uri, cid, tid }` triples.
- Legacy `META_URI` / `META_TID` / `META_CID` continue to mirror the
root record for backwards compatibility and because
`Document::transform()` still reads them to compose the bskyPostRef.
- `clear_all_record_meta()` helper keeps the six keys in sync on
delete / rollback paths.
**Not touched**
- `delete_by_tids()` signature preserved for queued-cron backwards
compatibility. `on_before_delete` in class-atmosphere.php still
captures a single TID; force-deleting a thread-published post
leaves thread tails on the PDS. Follow-up task will update the
on_before_delete / cron hook to capture the full thread.
- Short-form behavior is unchanged — `transform()`'s `createdAt`
(from `post_date_gmt`) still flows through `publish_single()`.
Linear: https://linear.app/a8c/issue/DOTCOM-16810
Adds 10 unit tests covering the new long-form / thread flows, using the
`atmosphere_pre_apply_writes` filter as the test seam (DPoP-dependent
`pre_http_request` mocking can't reach the wp_remote_request layer in
the test environment's auth setup).
Publish path:
- link-card writes one atomic applyWrites with 2 creates and mirrors
the root into `META_THREAD_RECORDS` as a 1-entry array.
- teaser-thread writes root + doc atomically, then the CTA reply as a
second applyWrites with `reply.root` / `reply.parent` pointing at the
root's {uri, cid}.
- partial meta is persisted between the root write and the reply
write — asserted via a meta snapshot taken inside the capture
filter when the reply call fires.
- happy-path final meta is an ordered 2-entry `META_THREAD_RECORDS`
array, with legacy single-value keys mirroring the root.
Rollback path:
- second-write failure triggers compensating deletes (root bsky +
doc), clears every meta key, and returns the original WP_Error.
- when the rollback applyWrites itself also fails, the returned
`WP_Error` has code `atmosphere_thread_rollback_failed` and data
containing `original_error`, `rollback_error`, and
`partial_records`.
Update / delete paths:
- single stored + single new composition uses in-place
`applyWrites#update` on both bsky and doc (one applyWrites call),
matching today's link-card update behavior.
- strategy change (stored single → composed thread) issues a delete
of the old record + doc, then publishes fresh as a thread.
- thread delete batches every bsky delete + the doc delete into one
atomic applyWrites and clears all meta.
- legacy single-record meta (no `META_THREAD_RECORDS`) still deletes
correctly via the fallback in `stored_thread_records()`.
Linear: https://linear.app/a8c/issue/DOTCOM-16810
|
Early review — read through the branch, a few things worth flagging before this gets much further. Ordered by impact. 1. Default behavior is changed:
|
…-place thread update Five changes informed by the early review feedback on #34: 1. Preserve the post's publish date as `createdAt` on every record representing the post itself (short-form transform(), link-card, truncate-link, and thread root). Previously the long-form paths fell through to `wp_date('c')`, which stamped the backfill-run time on re-synced posts and collapsed timeline ordering. 2. Thread-aware force-delete. `Atmosphere::on_before_delete` now reads `Post::META_THREAD_RECORDS` and schedules a delete covering every bsky tid plus the doc; `Publisher::delete_by_tids` accepts an array so one applyWrites covers the full thread. Legacy single-TID callers still work unchanged. 3. Persist rollback orphans to post meta. When a thread publish fails and the compensating rollback itself fails, the partial record list now lands in `Post::META_ORPHAN_RECORDS` (and error_log) instead of disappearing with the cron closure's WP_Error. 4. In-place applyWrites#update for thread posts when the record count is unchanged. Preserves TIDs/URIs and external replies; only CIDs change. Falls back to delete+republish when the thread shape changes (single ↔ thread, 2-post ↔ 3-post). 5. Cleanups: drop redundant get_rkey() side-effect calls from publish(); split legacy store_results() into a document-only helper so the root's single-record meta is mirrored from one place; document that `atmosphere_transform_bsky_post` fires per record (not per post) when the thread strategy is active.
The assertion `assertDoesNotMatch('~\S$~', $hook)` required the hook to
end with whitespace, which contradicts
`test_truncate_to_budget_falls_back_to_word_boundary_when_no_sentence`
(expects 'The quick brown fox' with no trailing space). The test name
and comment both say "not mid-word" — that's the actual intent. Since
the corpus is built of 8-char words, a partial trailing word is any
`\s\S{1,7}$` tail.
Consolidated findings from a 3-agent parallel review pass:
- Unify reply `createdAt` across publish_thread and update_thread_in_place.
Both now stamp every thread record with `post_date_gmt`, so backfill,
fresh publish, and in-place update all agree on the post's publish
date. Previously the publish path stamped replies with `wp_date('c')`,
which would stamp replies with the backfill-run time.
- Validate `atmosphere_pre_apply_writes` return. A filter that returns
a scalar or object would fatal on the `array|\WP_Error` return type.
Now normalized to a `WP_Error` with a clear code.
- Harden `atmosphere_teaser_thread_posts`. Non-iterable or non-string
entries fall back to the default pair; the list is capped at 5 to
bound PDS rate-limit blast radius if a downstream returns something
unreasonable.
- Carry post tags on the thread root (index 0), matching the
link-card and short-form record shape. Replies remain conversational
and omit tags.
- Index reply URIs for reaction sync. New `META_URI_INDEX` multi-row
meta key mirrors every thread record URI; `find_post_by_bsky_uri`
in Reaction_Sync now falls back to the index so a like or repost on
a reply post resolves back to the originating WP post.
- Cap `META_ORPHAN_RECORDS` at 10 entries so a crash-looping rollback
can't grow the meta row past MySQL limits.
- Align `has_composable_body` with `build_teaser_thread`: both now
require an excerpt of ≥ 10 chars before preferring it over the body.
- Memoize `render_post_content_plain()` per instance so thread
composition doesn't re-run `the_content` multiple times per pass.
- Tests: `delete_by_tids` with array/string/empty; `on_before_delete`
thread-aware vs legacy vs no-meta scheduling; `META_URI_INDEX` is
populated; `atmosphere_pre_apply_writes` malformed return surfaces
as WP_Error; stale docblock references updated.
…sponse Previously publish_thread returned a bare WP_Error on this path, leaving the root + doc records live on the PDS while the local TID was still stored from get_rkey(). A retry reused the same TID and the PDS rejected the create, stuck state. Route through rollback_thread so retry starts from a clean slate; on rollback failure the orphan manifest surfaces as for any other thread rollback.
…overy Two real findings from an adversarial pass that Pass 1/Pass 2 missed: 1. rewrite_thread deleted remote records before proving the replacement publish would succeed. If the republish failed (transient PDS error, rate limit, malformed payload), the old thread was gone from Bluesky, the new one was absent, and local meta was already cleared. The deleted records themselves are unrecoverable, but we now persist a rewrite-phase manifest to META_ORPHAN_RECORDS so operators have an audit trail, and retry of update() self-heals via publish() (stored records are cleared, so new TIDs are generated). 2. update()'s "missing document URI" repair branch called publish() directly, which reuses the existing bsky TID via get_rkey() — the PDS then rejected the create as "already exists." Self-heal was broken. Now routes through rewrite_thread with an empty doc_tid, which deletes the orphan bsky records and republishes fresh. rewrite_thread now also guards against empty tids and empty doc_tid so it can handle the partial-state recovery shape. Tests cover both paths.
…ordpress-atmosphere into add/long-form-teaser-thread
A TID without a URI means Transformer::get_rkey() reserved the rkey locally but the create never landed on the PDS — most commonly after a failed publish or a rewrite_thread republish that clears meta mid-step. stored_thread_records() previously returned such a bare TID as a recoverable record, which caused the next update() to attempt a delete of a non-existent PDS record and the retry to loop. Now the legacy fallback requires a non-empty URI, matching how publish() actually persists state. The reserved ghost TID is harmless: the next publish() reuses it (get_rkey returns the stored value) and the PDS accepts the create normally.
|
Thanks for the careful read — all 5 addressed in 72ac474 (with extra review passes landing in follow-up commits). Walking through in the same order: 1. 2. Thread-aware force-delete. Real bug. 3. Orphan manifest disappearing. New post meta 4. Delete-and-republish for thread updates. Implemented in-place 5. Minors.
Also fixed 4 failing tests on the way: one real Follow-up review passes surfaced a few more things that landed as bonus fixes: reaction-sync now resolves reply URIs via a new CI is green across the full phpunit matrix. Flipping to Ready for Review. |
There was a problem hiding this comment.
Pull request overview
This PR adds opt-in long-form composition strategies for publishing WordPress posts to Bluesky, including a new “teaser thread” mode that publishes multi-post threads while preserving existing default behavior (link-card) for current installs.
Changes:
- Added long-form composition strategy selection via filters and implemented thread-capable record composition in the post transformer.
- Reworked publishing/update/delete flows to support multi-record threads with sequential writes, partial-meta persistence, and compensating-delete rollback on failure.
- Extended reaction sync and before-delete cleanup to resolve and delete non-root thread replies via a per-URI index and thread record manifest.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
includes/transformer/class-post.php |
Adds long-form composition strategies, thread record builders, and new thread/orphan meta key constants. |
includes/transformer/class-base.php |
Memoizes plain-text rendering to avoid repeated the_content filter work during composition. |
includes/class-publisher.php |
Implements thread-aware publish/update/delete, thread meta mirroring/indexing, rollback, and rewrite flows. |
includes/class-api.php |
Adds atmosphere_pre_apply_writes short-circuit filter to mock/inspect applyWrites in tests/harnesses. |
includes/class-atmosphere.php |
Updates before-delete scheduling to delete all thread TIDs (root + replies) in one cron event. |
includes/class-reaction-sync.php |
Resolves WP posts by reply AT-URI using the new URI index meta. |
tests/phpunit/tests/transformer/class-test-post.php |
Adds unit coverage for truncation logic and long-form composition strategy branches. |
tests/phpunit/tests/class-test-status-change.php |
Adds tests ensuring before-delete schedules deletion for all thread TIDs (and legacy single-TID behavior). |
tests/phpunit/tests/class-test-publisher.php |
Adds extensive tests for thread publish/update/delete, rollback, orphan manifests, and pre-apply short-circuiting. |
.github/changelog/long-form-teaser-thread |
Changelog entry describing the new long-form thread feature and strategy options. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| $short_circuit = \apply_filters( 'atmosphere_pre_apply_writes', null, $writes ); | ||
|
|
||
| if ( \is_array( $short_circuit ) || \is_wp_error( $short_circuit ) ) { | ||
| return $short_circuit; | ||
| } |
There was a problem hiding this comment.
The atmosphere_pre_apply_writes short-circuit accepts any array as a successful applyWrites response, but nothing validates that it has the expected shape (e.g. a results key with an array of entries matching the writes). A malformed array return will be treated as success and can silently skip meta persistence or yield inconsistent state. Consider validating/normalizing the short-circuit array (or returning a WP_Error when required keys are missing) to keep behavior predictable for downstream harnesses.
| $permalink = \get_permalink( $this->object ); | ||
| $plain = $this->render_post_content_plain( $this->object ); | ||
| $budget = 300 - \mb_strlen( "\n\n" ) - \mb_strlen( $permalink ); | ||
|
|
||
| $body = $this->truncate_to_budget( $plain, $budget, false ); | ||
|
|
||
| return $body . "\n\n" . $permalink; |
There was a problem hiding this comment.
build_truncate_link_text() computes $budget as 300 - 2 - mb_strlen($permalink), which can become 0 or negative for unusually long permalinks. Passing a non-positive $max into truncate_to_budget() leads to surprising mb_substr() behavior (negative length) and can produce invalid/over-budget output. Add a guard to clamp the budget to a sane minimum (or fall back to a permalink-only/link-card path) so the final text always respects the 300-char limit.
| $permalink = \get_permalink( $this->object ); | |
| $plain = $this->render_post_content_plain( $this->object ); | |
| $budget = 300 - \mb_strlen( "\n\n" ) - \mb_strlen( $permalink ); | |
| $body = $this->truncate_to_budget( $plain, $budget, false ); | |
| return $body . "\n\n" . $permalink; | |
| $max_length = 300; | |
| $separator = "\n\n"; | |
| $permalink = \get_permalink( $this->object ); | |
| $plain = $this->render_post_content_plain( $this->object ); | |
| if ( \mb_strlen( $permalink ) >= $max_length ) { | |
| return $this->truncate_to_budget( $permalink, $max_length, false ); | |
| } | |
| $budget = $max_length - \mb_strlen( $permalink ); | |
| if ( $budget <= \mb_strlen( $separator ) ) { | |
| return $permalink; | |
| } | |
| $body = $this->truncate_to_budget( $plain, $budget - \mb_strlen( $separator ), false ); | |
| return $body . $separator . $permalink; |
Fixes DOTCOM-16886
See DOTCOM-16810 (parent epic)
Proposed changes:
Introduces long-form composition strategies for Bluesky posts, with the default (
link-card) preserved so existing sites see no change.atmosphere_long_form_compositionfilter selects the long-form composition strategy. Accepts'link-card'(default, unchanged behavior),'truncate-link'(single post, body-as-text + permalink), or'teaser-thread'(2-post thread: hook + CTA-with-link).atmosphere_teaser_thread_postsfilter lets downstream override the default 2-post composition (e.g. expand to 3 posts, customize CTA copy)._atmosphere_bsky_thread_recordspost meta: ordered array of{ uri, cid, tid }triples tracking every bsky post in a thread. Existing_atmosphere_bsky_uri/_atmosphere_bsky_tid/_atmosphere_bsky_cidcontinue to mirror the root post for backwards compatibility._atmosphere_bsky_orphan_recordspost meta: populated only when a thread publish fails and the compensating-delete rollback also fails, so operators have a durable manifest of records alive on the PDS but no longer tracked locally.atmosphere_pre_apply_writesfilter: short-circuitsAPI::apply_writeswhen a non-null value is returned. Used by tests and external harnesses (FOSSE e2e) to mock/inspect the write batch.Publisher::publish/update/deletereworked to handle multi-record threads via sequential writes with partial-meta writes and compensating-delete rollback on failure.Atmosphere::on_before_deletenow readsMETA_THREAD_RECORDSand schedules a delete covering every bsky tid, so force-deleting a thread-strategy post cleans up the whole thread instead of leaking the non-root replies.Why sequential writes for thread publishes instead of one atomic
applyWrites? Reply records need the parent's CID. CIDs are deterministic hashes of canonical DAG-CBOR — clients can compute them ahead of time (the TypeScript SDK does this to batch threads atomically) — but there is no DAG-CBOR encoder in our PHP dependencies yet, so we rely on the PDS to return the CID in its create response. We mitigate the partial-failure window with partial-meta writes after each successful create (so crash recovery can surface any orphans).See FOSSE SDD for full design context, alternatives considered, and open questions.
Default behavior is unchanged.
atmosphere_long_form_compositiondefaults to'link-card'. Existing sites behave exactly as today unless they opt into a different strategy (or install a downstream like FOSSE that projects the value).Publisher::update()semantics for thread-strategy posts:applyWrites#updatein one atomic batch. TIDs and URIs are preserved, so external replies remain valid and Bluesky URLs stay stable. Each record gets a new CID; reply refs are built from the pre-update CIDs (self-consistent at write time, and clients treat any post-update CID mismatch as "parent was edited" rather than broken). `createdAt` is pinned to `post_date_gmt`.Other information:
Testing instructions:
Changelog entry
Changelog Entry Details
Significance
Type
Message
Long-form posts can now be published as a Bluesky thread. A new filter lets sites choose between a single link-card post (default, unchanged from today), a single post that combines body text with the permalink, or a 2-post teaser thread that leads with a hook and follows with a "continue reading" link. Post edits update every record in place when the shape of the post hasn't changed, so Bluesky URLs stay stable and people's replies don't get orphaned. If the shape of the post changes (for example, switching between a single post and a thread), the old records are replaced with fresh ones.