Skip to content

Conversation

@luancazarine
Copy link
Collaborator

@luancazarine luancazarine commented Sep 23, 2025

Resolves #15446

Summary by CodeRabbit

  • New Features
    • Post creation now auto-detects mentions, links, and hashtags with improved accuracy, including Unicode support.
    • Mentions resolve to the correct accounts, and links are normalized (handles domains and trailing punctuation).
  • Chores
    • Version updates across Bluesky package and multiple actions/sources (Create Post, Like Post, Retrieve Thread, New Follower on Account, New Posts by Author, New Timeline Posts).

- Bump version of @pipedream/bluesky to 0.1.2
- Update action versions for create-post (0.1.0), like-post (0.0.2), retrieve-thread (0.0.2), new-follower-on-account (0.0.2), new-posts-by-author (0.0.2), and new-timeline-posts (0.0.2)
- Introduce textEncoding module for detecting facets in post content
- Remove deprecated methods from create-post action
@luancazarine luancazarine linked an issue Sep 23, 2025 that may be closed by this pull request
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 23, 2025

Walkthrough

Introduces hashtag/link/mention facet detection via a new common textEncoding module and integrates it into the Bluesky Create Post action, replacing prior internal helpers. Adds a TLD list module. Bumps versions for multiple Bluesky actions, sources, and the package. No other logic changes outside create-post facet construction.

Changes

Cohort / File(s) Change Summary
Create Post: integrate facet detection
components/bluesky/actions/create-post/create-post.mjs
Imports common textEncoding, removes parseUrls and parseFacets, uses await textEncoding.detectFacets(text, app) to build record.facets, bumps version 0.0.2 → 0.1.0.
New common utilities (facets & TLDs)
components/bluesky/common/textEncoding.mjs, components/bluesky/common/tlds.mjs
Adds textEncoding.detectFacets(text, app) (async) to detect mentions, links, tags with UTF‑8 byte ranges; includes handle resolution via app.resolveHandle. Adds default export array of TLD strings.
Action version bumps
components/bluesky/actions/like-post/like-post.mjs, components/bluesky/actions/retrieve-thread/retrieve-thread.mjs
Updates version fields only: like-post 0.0.1 → 0.0.2, retrieve-thread 0.0.1 → 0.0.2. No logic changes.
Source version bumps
components/bluesky/sources/new-follower-on-account/new-follower-on-account.mjs, components/bluesky/sources/new-posts-by-author/new-posts-by-author.mjs, components/bluesky/sources/new-timeline-posts/new-timeline-posts.mjs
Updates version fields only: each 0.0.1 → 0.0.2. No logic changes.
Package version bump
components/bluesky/package.json
Updates package version 0.1.1 → 0.1.2.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant Action as CreatePost Action
  participant TE as textEncoding.detectFacets
  participant App as app (resolveHandle)
  participant API as Bluesky API

  User->>Action: invoke({ text, ... })
  Action->>TE: detectFacets(text, app)
  alt mention facets
    TE->>App: resolveHandle(@handle)
    App-->>TE: did
  end
  TE-->>Action: facets[]
  Action->>API: createRecord({ text, facets })
  API-->>Action: ack/uri
  Action-->>User: result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

I twitched my ears at tags anew,
Hops between links and mentions too—
Byte by byte, I mapped the way,
UTF winds where posts now play.
Version carrots stacked in rows,
Facets bloom where hashtags grow.
Thump! Off to Bluesky goes.

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description only contains "Resolves #15446" and does not follow the repository template which requires a "## WHY" section; it lacks a summary of the change, motivation, and any testing or usage notes required by the template. Update the PR description to follow the repository template by adding a "## WHY" section that explains the motivation, a concise summary of the implemented changes (including the new detectFacets behavior), linked issue references, and any testing or migration notes so reviewers can evaluate scope and risks.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Title Check ✅ Passed The title concisely and accurately summarizes the primary changes by stating both the version updates and the addition of text-encoding functionality, making the main intent clear to a reviewer scanning history.
Linked Issues Check ✅ Passed The changes add a textEncoding.detectFacets implementation that extracts tags (alongside mentions and links) with proper byte offsets and update the create-post action to use it, which directly implements automatic hashtag rich-text facets requested in issue [#15446]; the code-level objectives for hashtag support are met.
Out of Scope Changes Check ✅ Passed All modifications are confined to Bluesky-related files: new textEncoding and tlds modules, updates to create-post to use detectFacets, and version bumps across Bluesky actions and sources; there are no unrelated or out-of-scope file changes detected.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 15446-action-bluesky-hash-tags-rich-text-facets

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (11)
components/bluesky/common/tlds.mjs (1)

1-2: Export a Set for O(1) lookups and document source/update process.
Membership checks on a large list will be faster with a Set. Also, add a brief header comment noting the source and update procedure for maintainability.

Apply this diff:

- export default [
+ // Source: IANA TLD list and known IDN TLDs.
+ // Consider maintaining via a small script that refreshes from:
+ // https://data.iana.org/TLD/tlds-alpha-by-domain.txt
+ export const TLD_LIST = [
@@
- ];
+ ];
+ export const TLD_SET = new Set(TLD_LIST);
+ export default TLD_LIST;

Also applies to: 1441-1441

components/bluesky/actions/create-post/create-post.mjs (3)

32-33: Validate 300-char limit and only include facets when present.
Preempt server-side errors and avoid sending empty arrays.

       ["$type"]: constants.RESOURCE_TYPE.POST,
       text,
-      facets: await textEncoding.detectFacets(text, app),
+      // Bluesky posts are limited to 300 characters (Unicode code points)
+      // Early validate to provide a clearer error
+      ...(Array.from(text).length > 300 ? (() => { throw new Error("Text exceeds Bluesky 300 character limit"); })() : {}),
+      // Compute facets and include only when present
+      ...(await (async () => {
+        const facets = await textEncoding.detectFacets(text, app);
+        return facets && facets.length ? { facets } : {};
+      })()),
       createdAt: new Date().toISOString(),

13-17: Clarify prop description to reflect auto-faceting.
Make it clear hashtags/mentions/links are auto-linked via rich-text facets.

-      description: "The text content of the post.",
+      description: "The text content of the post. @mentions, #hashtags, and URLs are automatically linked via rich-text facets.",

3-3: detectFacets: DID present and UTF‑8 offsets correct — pass $ through to resolveHandle

  • detectFacets sets features[0].did = handle.did and computes byteStart/byteEnd via utf16IndexToUtf8Index (TextEncoder) — this produces UTF‑8 byte offsets and handles emoji/surrogates.
  • Issue: detectFacets calls app.resolveHandle(...) without the action $ context. Forward $ into detectFacets and call resolveHandle({ $, params: { handle } }) (also update caller in components/bluesky/actions/create-post/create-post.mjs to await textEncoding.detectFacets(text, app, $)).
components/bluesky/common/textEncoding.mjs (7)

3-11: Domain validation: make case-insensitive, allow hyphens, and support multi-label TLDs efficiently

Current logic is case-sensitive and O(|TLDs|) per call. It also doesn’t validate label hyphens correctly. Recommend lowercasing, validating labels, and checking multi-label TLDs via a precomputed Set.

Apply this diff to replace isValidDomain:

-function isValidDomain(str) {
-  return !!TLDs.find((tld) => {
-    const i = str.lastIndexOf(tld);
-    if (i === -1) {
-      return false;
-    }
-    return str.charAt(i - 1) === "." && i === str.length - tld.length;
-  });
-}
+function isValidDomain(str) {
+  const s = String(str || "").toLowerCase();
+  // Validate labels: alnum + hyphen, no leading/trailing hyphen in any label
+  if (!/^[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(?:\.[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)+$/.test(s)) {
+    return false;
+  }
+  // Support multi-label TLDs (e.g., .co.uk)
+  const parts = s.split(".");
+  for (let i = 0; i < parts.length; i++) {
+    const suffix = parts.slice(i).join(".");
+    if (TLD_SET.has(suffix)) return true;
+  }
+  return false;
+}

Add this near the import (module scope):

// after import TLDs
const TLD_SET = new Set(TLDs.map((t) => t.toLowerCase()));

14-16: Avoid re‑creating TextEncoder; reuse a single instance

This runs for every index conversion. Reuse one encoder to reduce allocations.

Apply this diff:

-function utf16IndexToUtf8Index(i, utf16) {
-  const encoder = new TextEncoder();
-  return encoder.encode(utf16.slice(0, i)).byteLength;
-}
+function utf16IndexToUtf8Index(i, utf16) {
+  return encoder.encode(utf16.slice(0, i)).byteLength;
+}

Add this at module scope:

const encoder = new TextEncoder();

36-36: Facet consistency: drop top-level $type (keep it only in features)

Only mention facets include the top-level $type. Recommend omitting it for consistency (features keep $type).

Apply this diff:

-      $type: "app.bsky.richtext.facet",

51-52: Links: allow hyphens in domain labels

Current regex excludes valid domains with hyphens. Update to accept label hyphens (not leading/trailing).

Apply this diff:

-  const re2 =
-      /(^|\s|\()((https?:\/\/[\S]+)|((?<domain>[a-z][a-z0-9]*(\.[a-z0-9]+)+)[\S]*))/gim;
+  const re2 =
+      /(^|\s|\()((https?:\/\/\S+)|((?<domain>[a-z0-9](?:[a-z0-9-]*[a-z0-9])?(?:\.[a-z0-9](?:[a-z0-9-]*[a-z0-9])?)+)\S*))/gim;

62-76: Links: compute start from match.index; fix comment typo

Use match.index plus leading group length for deterministic start. Also fix “puncuation” typo.

Apply this diff:

-    const start = text.indexOf(match[2], match.index);
+    const start = match.index + (match[1]?.length || 0);
@@
-      // strip ending puncuation
+      // strip ending punctuation

108-110: Byte end is exclusive; fix misleading comment

The code uses exclusive byteEnd, which is correct. The comment says “inclusive”.

Apply this diff:

-        byteEnd: utf16IndexToUtf8Index(index + tag.length, text), // inclusive of last char
+        byteEnd: utf16IndexToUtf8Index(index + tag.length, text), // exclusive

18-48: Optional: parallelize handle resolution and dedupe mentions

Multiple mentions resolve sequentially. For posts with several mentions, gather unique handles and resolve with Promise.all, then map DIDs back to positions.

If you’d like, I can provide a refactor that collects all mention matches first, resolves unique handles concurrently, and then builds facets.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0c007db and 37d840c.

📒 Files selected for processing (9)
  • components/bluesky/actions/create-post/create-post.mjs (2 hunks)
  • components/bluesky/actions/like-post/like-post.mjs (1 hunks)
  • components/bluesky/actions/retrieve-thread/retrieve-thread.mjs (1 hunks)
  • components/bluesky/common/textEncoding.mjs (1 hunks)
  • components/bluesky/common/tlds.mjs (1 hunks)
  • components/bluesky/package.json (1 hunks)
  • components/bluesky/sources/new-follower-on-account/new-follower-on-account.mjs (1 hunks)
  • components/bluesky/sources/new-posts-by-author/new-posts-by-author.mjs (1 hunks)
  • components/bluesky/sources/new-timeline-posts/new-timeline-posts.mjs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
components/bluesky/common/textEncoding.mjs (1)
components/bluesky/actions/retrieve-thread/retrieve-thread.mjs (2)
  • app (49-52)
  • app (54-59)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Lint Code Base
  • GitHub Check: pnpm publish
  • GitHub Check: Publish TypeScript components
  • GitHub Check: Verify TypeScript components
🔇 Additional comments (10)
components/bluesky/package.json (1)

3-3: Version bump LGTM.
Consistent with action/source bumps in this PR.

components/bluesky/sources/new-follower-on-account/new-follower-on-account.mjs (1)

9-9: Version bump LGTM.
No functional changes introduced here.

components/bluesky/sources/new-timeline-posts/new-timeline-posts.mjs (1)

9-9: Version bump LGTM.
No behavior changes in this module.

components/bluesky/actions/retrieve-thread/retrieve-thread.mjs (1)

7-7: Version bump LGTM.
No logic changes detected.

components/bluesky/actions/like-post/like-post.mjs (1)

8-8: Version bump LGTM.
No behavioral changes in this action.

components/bluesky/sources/new-posts-by-author/new-posts-by-author.mjs (1)

9-9: Version bump LGTM.
Aligned with the broader release.

components/bluesky/actions/create-post/create-post.mjs (1)

9-9: Version bump LGTM.
Reflects public API changes (facet detection integration).

components/bluesky/common/tlds.mjs (1)

1-1441: components/bluesky/common/tlds.mjs — UTF‑8 valid; no duplicates found.
Verified by reading the file as UTF‑8 and counting entries: 1439 total, 1439 unique, 0 duplicates.

components/bluesky/common/textEncoding.mjs (2)

120-123: Return value shape: confirm downstream handling of undefined vs []

You return undefined when there are no facets. Ensure callers (e.g., create-post) handle both cases correctly.

If create-post expects an array, consider returning [] to simplify call sites and avoid optional chaining checks.


29-35: Compute mention slice deterministically; destructure DID from resolveHandle

Use match.index + (match[1]?.length || 0) to get the '@' position (avoids indexOf pitfalls) and destructure did from resolveHandle. resolveHandle accepts an args object (resolveHandle(args = {})) and can receive an optional $ param — other callers pass $; include it here only if detectFacets needs request context.

Apply this diff:

-    const start = text.indexOf(match[3], match.index) - 1;
-    const handle = await app.resolveHandle({
-      params: {
-        handle: match[3],
-      },
-    });
+    const start = match.index + (match[1]?.length || 0); // position of '@'
+    const { did } = await app.resolveHandle({
+      params: { handle: match[3] },
+    });
@@
-        {
-          $type: "app.bsky.richtext.facet#mention",
-          did: handle.did, // must be resolved afterwards
-        },
+        {
+          $type: "app.bsky.richtext.facet#mention",
+          did, // resolved DID
+        },

Also applies to: 41-46

@vercel
Copy link

vercel bot commented Sep 23, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

2 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
pipedream-docs Ignored Ignored Sep 23, 2025 4:38pm
pipedream-docs-redirect-do-not-edit Ignored Ignored Sep 23, 2025 4:38pm

Copy link
Collaborator

@michelle0927 michelle0927 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@vunguyenhung vunguyenhung merged commit cc5ace9 into master Sep 24, 2025
10 checks passed
@vunguyenhung vunguyenhung deleted the 15446-action-bluesky-hash-tags-rich-text-facets branch September 24, 2025 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ACTION] Bluesky Hash Tags (Rich text facets)

4 participants