Skip to content

Conversation

@everettbu
Copy link
Contributor

@everettbu everettbu commented Jul 26, 2025

Test 4

Summary by CodeRabbit

  • New Features

    • Added support for embedding Discourse discussions on external sites via iframe, including dynamic resizing and custom styles.
    • Introduced the ability to import and update topics from RSS/ATOM feeds automatically.
    • Added new site settings for embedding and feed polling configuration.
    • Enabled posts to render raw HTML when specified.
  • Bug Fixes

    • Improved handling of relative URLs in embedded post content, ensuring they are converted to absolute URLs.
  • Documentation

    • Added new localization strings and site setting descriptions for embedding and feed importing features.
  • Chores

    • Introduced new tests covering embedding, feed polling, and raw HTML post rendering.
    • Added new dependencies for feed parsing and HTML readability.

@coderabbitai
Copy link

coderabbitai bot commented Jul 26, 2025

Walkthrough

This change set introduces a full-featured embedding and feed import system. It adds new models, controllers, jobs, routes, and views to allow external content to be embedded and imported into the application, including support for raw HTML posts, RSS/ATOM feed polling, and topic retrieval. Supporting tests, migrations, site settings, and localization strings are included.

Changes

Cohort / File(s) Change Summary
Gem Dependencies
Gemfile
Added ruby-readability and simple-rss gems for feed importing and embedding, with require: false.
Frontend Embedding Assets
app/assets/javascripts/embed.js, app/assets/stylesheets/embed.css.scss
Introduced JS for dynamic iframe embedding and event handling; added SCSS for embed layout styling.
Embed Controller & Routing
app/controllers/embed_controller.rb, config/routes.rb
Added EmbedController with a best action, embeddable host validation, and a new /embed/best route.
Embed Views & Layouts
app/views/embed/best.html.erb, app/views/embed/loading.html.erb, app/views/layouts/embed.html.erb
New templates for rendering embedded discussions, loading state, and a custom layout with resize messaging.
Topic Embedding Model
app/models/topic_embed.rb
New model for managing embedded topics, importing/updating content, and handling remote retrieval and URL absolutization.
Post Model and Cooking
app/models/post.rb, db/migrate/20131219203905_add_cook_method_to_posts.rb
Added cook_method enum and logic to allow raw HTML posts; migration adds cook_method column.
Feed Polling & Topic Retrieval Jobs
app/jobs/regular/retrieve_topic.rb, app/jobs/scheduled/poll_feed.rb, lib/topic_retriever.rb
Added background jobs for retrieving topics and polling feeds; new service class for topic retrieval with throttling and validation.
Post Creation & Revision
lib/post_creator.rb, lib/post_revisor.rb
Modified to support cook_method assignment and optional validation skipping during revisions.
Disqus Import Adaptation
lib/tasks/disqus.thor
Removed category option and switched to using TopicEmbed.import_remote for thread import.
Migrations: Embedding & Misc
db/migrate/20131217174004_create_topic_embeds.rb, db/migrate/20131219203905_add_cook_method_to_posts.rb, db/migrate/20131223171005_create_top_topics.rb, db/migrate/20131210181901_migrate_word_counts.rb
Added migrations for topic_embeds table and cook_method on posts, adjusted table creation with force: true, and minor whitespace fixes.
Site Settings & Localization
config/site_settings.yml, config/locales/client.en.yml, config/locales/server.en.yml
Added new embedding and feed polling settings; introduced related localization strings.
Specs: Embedding & Feed
spec/controllers/embed_controller_spec.rb, spec/components/topic_retriever_spec.rb, spec/jobs/poll_feed_spec.rb, spec/models/topic_embed_spec.rb
New RSpec test files for controller, model, job, and service logic covering embedding and feed polling.

Sequence Diagram(s)

sequenceDiagram
    participant ExternalSite as External Site
    participant User as User
    participant EmbedController as EmbedController
    participant TopicEmbed as TopicEmbed
    participant TopicRetriever as TopicRetriever
    participant Jobs as Background Jobs
    participant DB as Database

    ExternalSite->>EmbedController: GET /embed/best?embed_url=...
    EmbedController->>TopicEmbed: topic_id_for_embed(embed_url)
    alt Topic exists
        EmbedController->>DB: Fetch topic/posts
        EmbedController-->>ExternalSite: Render embed view
    else Topic missing
        EmbedController->>Jobs: Enqueue retrieve_topic (embed_url)
        EmbedController-->>ExternalSite: Render loading view
        Jobs->>TopicRetriever: retrieve(embed_url)
        TopicRetriever->>TopicEmbed: import_remote(user, embed_url)
        TopicEmbed->>DB: Create Topic/Post/Embed
    end
Loading
sequenceDiagram
    participant Scheduler as Scheduler
    participant Jobs as PollFeed Job
    participant SimpleRSS as SimpleRSS
    participant TopicEmbed as TopicEmbed
    participant DB as Database

    Scheduler->>Jobs: Run hourly
    Jobs->>SimpleRSS: Parse feed from feed_polling_url
    loop For each feed item
        Jobs->>TopicEmbed: import(user, item_url, title, content)
        TopicEmbed->>DB: Create or update Topic/Post/Embed
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐇
A bunny hops through code anew,
Embedding posts and topics too.
Feeds now poll and topics grow,
With iframes styled in tidy rows.
HTML or markdown, both can shine—
This rabbit’s work is quite divine!
Hooray for embeds, so sleek and fine!

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch embed-url-handling-post

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai
Copy link

coderabbitai bot commented Jul 29, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@everettbu
Copy link
Contributor Author

@coderabbitai review

@coderabbitai
Copy link

coderabbitai bot commented Jul 29, 2025

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 11

🧹 Nitpick comments (18)
db/migrate/20131219203905_add_cook_method_to_posts.rb (1)

1-5: LGTM! Consider documenting the cook_method enum values.

The migration correctly adds the cook_method column with appropriate constraints. Since this appears to support different cooking methods including raw HTML (as mentioned in the AI summary), consider adding a comment or documentation explaining the enum values, with 1 likely being the default cooking method.

app/views/embed/loading.html.erb (1)

6-12: Consider making the reload timeout configurable.

The hard-coded 30-second timeout works but could be made more flexible through a site setting or configuration parameter.

Consider extracting the timeout to a configurable value:

<script>
  (function() {
    setTimeout(function() {
      document.location.reload();
-    }, 30000);
+    }, <%= SiteSetting.embed_loading_timeout || 30000 %>);
  })();
</script>

Additionally, consider implementing exponential backoff for repeated failures or adding error handling for network issues.

db/migrate/20131217174004_create_topic_embeds.rb (2)

3-3: Consider the implications of using force: true in production.

The force: true option will drop and recreate the table if it exists, which could cause data loss in production environments. This is typically acceptable for initial feature migrations but should be used carefully.

If this migration might run in environments where the table could already exist with data, consider removing force: true or adding a conditional check.


4-7: Consider adding foreign key constraints for data integrity.

The migration creates topic_id and post_id columns but doesn't establish foreign key relationships. Adding foreign key constraints would ensure referential integrity.

Consider adding foreign key constraints:

    create_table :topic_embeds, force: true do |t|
      t.integer :topic_id, null: false
      t.integer :post_id, null: false
      t.string :embed_url, null: false
      t.string :content_sha1, null: false, limit: 40
      t.timestamps
    end

+   add_foreign_key :topic_embeds, :topics
+   add_foreign_key :topic_embeds, :posts
app/jobs/regular/retrieve_topic.rb (1)

5-6: Remove extra empty lines per RuboCop conventions.

The static analysis tool correctly identifies extra empty lines at module body boundaries.

 module Jobs
-
   # Asynchronously retrieve a topic from an embedded site
   class RetrieveTopic < Jobs::Base
   end
-
 end
-

Also applies to: 21-22

spec/models/topic_embed_spec.rb (1)

4-5: Consider cleaning up extra empty lines for consistency.

Multiple RuboCop violations for extra empty lines at block boundaries. While not critical, cleaning these up would improve code consistency.

 describe TopicEmbed do
-
   it { should belong_to :topic }
   context '.import' do
-
     let(:user) { Fabricate(:user) }
     end
-
   end
-
 end
-

Also applies to: 12-13, 43-48

spec/jobs/poll_feed_spec.rb (1)

5-6: Remove extra empty lines per RuboCop conventions.

Static analysis correctly identifies extra empty lines at block body boundaries.

 describe Jobs::PollFeed do
-
   let(:poller) { Jobs::PollFeed.new }
   end
-
 end

Also applies to: 39-40

spec/components/topic_retriever_spec.rb (2)

5-6: Fix formatting issues flagged by RuboCop.

The static analysis tool has identified extra empty lines that should be removed for consistency with the project's style guidelines.

Apply this diff to fix the formatting:

 describe TopicRetriever do
-
   let(:embed_url) { "http://eviltrout.com/2013/02/10/why-discourse-uses-emberjs.html" }
     end
-
   end
-
 end

Also applies to: 43-46


32-42: Consider adding edge case tests for throttling logic.

The throttling tests cover the basic scenarios, but consider adding tests for:

  • Boundary conditions around the throttling timeframe
  • Error handling when retrieved_recently? fails
  • Concurrent access scenarios

This would strengthen the test coverage for the throttling mechanism which is important for preventing abuse.

app/assets/stylesheets/embed.css.scss (1)

14-27: Consider responsive design for author section.

The fixed width of 92px for the author section and the corresponding margin-left: 92px for content may not work well on mobile devices or smaller screens.

Consider using flexbox or CSS Grid for a more responsive layout:

-  .author {
-    padding: 20px 0;
-    width: 92px;
-    float: left;
-    text-align: center;
-    h3 {
-      text-align: center;
-      color: #4a6b82;
-      font-size: 13px;
-      margin: 0;
-    }
-  }
-
-  .cooked {
-    padding: 20px 0;
-    margin-left: 92px;
-    p {
-      margin: 0 0 1em 0;
-    }
-  }
+  display: flex;
+  gap: 20px;
+
+  .author {
+    flex-shrink: 0;
+    width: 92px;
+    text-align: center;
+    h3 {
+      color: #4a6b82;
+      font-size: 13px;
+      margin: 0;
+    }
+  }
+
+  .cooked {
+    flex: 1;
+    p {
+      margin: 0 0 1em 0;
+    }
+  }
app/views/embed/best.html.erb (2)

20-20: Replace inline styles with CSS classes.

Inline styles reduce maintainability and override CSS specificity rules. Move this to the stylesheet.

Add to embed.css.scss:

.clearfix {
  clear: both;
}

Then update the template:

-      <div style='clear: both'></div>
+      <div class='clearfix'></div>

14-14: Consider internationalizing the date format.

The hardcoded date format "%e %b %Y" may not be appropriate for all locales.

Consider using Rails' localization helpers:

-      <%= link_to post.created_at.strftime("%e %b %Y"), post.url, class: 'post-date', target: "_blank" %>
+      <%= link_to l(post.created_at, format: :short), post.url, class: 'post-date', target: "_blank", rel: "noopener" %>
app/jobs/scheduled/poll_feed.rb (1)

32-33: Improve URL validation logic.

The current URL validation is fragile and could miss edge cases or allow invalid URLs to pass through.

Use more robust URL validation:

-        url = i.link
-        url = i.id if url.blank? || url !~ /^https?\:\/\//
+        url = i.link.to_s.strip
+        if url.blank? || !url.match?(/\Ahttps?:\/\//)
+          url = i.id.to_s.strip
+          next if url.blank? || !url.match?(/\Ahttps?:\/\//)
+        end
app/assets/javascripts/embed.js (1)

15-24: Add input validation for postMessage data.

The message handler should validate the data structure more thoroughly to prevent potential issues.

Improve data validation:

 function postMessageReceived(e) {
   if (!e) { return; }
   var allowedOrigin = new URL(discourseUrl).origin;
   if (e.origin !== allowedOrigin) { return; }

-  if (e.data) {
-    if (e.data.type === 'discourse-resize' && e.data.height) {
-      iframe.height = e.data.height + "px";
-    }
-  }
+  if (e.data && typeof e.data === 'object') {
+    if (e.data.type === 'discourse-resize' && 
+        typeof e.data.height === 'number' && 
+        e.data.height > 0 && 
+        e.data.height < 10000) {
+      iframe.height = e.data.height + "px";
+    }
+  }
 }
spec/controllers/embed_controller_spec.rb (2)

4-5: Fix formatting: Remove extra empty lines.

Static analysis identified unnecessary empty lines at block beginnings.

 describe EmbedController do
-
   let(:host) { "eviltrout.com" }
     context "success" do
-
       before do

Also applies to: 30-31


54-55: Fix formatting: Remove extra empty lines.

Static analysis identified unnecessary empty lines at block endings.

     end
-
   end
-
 end

Also applies to: 57-58

lib/topic_retriever.rb (1)

48-53: Consider adding error handling for user lookup.

While the current implementation handles the case where the user is blank, it might be helpful to log when the configured embed user is not found.

Add logging for debugging:

 def fetch_http
   user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
-  return if user.blank?
+  if user.blank?
+    Rails.logger.warn "Embed user '#{SiteSetting.embed_by_username}' not found"
+    return
+  end

   TopicEmbed.import_remote(user, @embed_url)
 end
app/models/topic_embed.rb (1)

56-76: Consider refactoring to reduce complexity.

The method has high complexity metrics (ABC: 26.65/23, Cyclomatic: 9/7) but the logic is functionally correct. The complexity stems from handling multiple tag types and URL manipulations.

Consider extracting helper methods to improve readability:

def self.absolutize_urls(url, contents)
  uri = URI(url)
  base_url = build_base_url(uri)
  
  fragment = Nokogiri::HTML.fragment(contents)
  absolutize_links(fragment, base_url)
  absolutize_images(fragment, base_url)
  
  fragment.to_html
end

private

def self.build_base_url(uri)
  prefix = "#{uri.scheme}://#{uri.host}"
  prefix << ":#{uri.port}" unless [80, 443].include?(uri.port)
  prefix
end

def self.absolutize_links(fragment, base_url)
  fragment.css('a').each do |a|
    href = a['href']
    a['href'] = absolutize_path(base_url, href) if relative_path?(href)
  end
end

def self.absolutize_images(fragment, base_url)
  fragment.css('img').each do |img|
    src = img['src']
    img['src'] = absolutize_path(base_url, src) if relative_path?(src)
  end
end

def self.relative_path?(path)
  path.present? && path.start_with?('/')
end

def self.absolutize_path(base_url, path)
  "#{base_url}/#{path.sub(/^\/+/, '')}"
end
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62db063 and 4f8aed2.

⛔ Files ignored due to path filters (1)
  • Gemfile_rails4.lock is excluded by !**/*.lock
📒 Files selected for processing (27)
  • Gemfile (1 hunks)
  • app/assets/javascripts/embed.js (1 hunks)
  • app/assets/stylesheets/embed.css.scss (1 hunks)
  • app/controllers/embed_controller.rb (1 hunks)
  • app/jobs/regular/retrieve_topic.rb (1 hunks)
  • app/jobs/scheduled/poll_feed.rb (1 hunks)
  • app/models/post.rb (2 hunks)
  • app/models/topic_embed.rb (1 hunks)
  • app/views/embed/best.html.erb (1 hunks)
  • app/views/embed/loading.html.erb (1 hunks)
  • app/views/layouts/embed.html.erb (1 hunks)
  • config/locales/client.en.yml (1 hunks)
  • config/locales/server.en.yml (2 hunks)
  • config/routes.rb (1 hunks)
  • config/site_settings.yml (1 hunks)
  • db/migrate/20131210181901_migrate_word_counts.rb (2 hunks)
  • db/migrate/20131217174004_create_topic_embeds.rb (1 hunks)
  • db/migrate/20131219203905_add_cook_method_to_posts.rb (1 hunks)
  • db/migrate/20131223171005_create_top_topics.rb (1 hunks)
  • lib/post_creator.rb (1 hunks)
  • lib/post_revisor.rb (1 hunks)
  • lib/tasks/disqus.thor (1 hunks)
  • lib/topic_retriever.rb (1 hunks)
  • spec/components/topic_retriever_spec.rb (1 hunks)
  • spec/controllers/embed_controller_spec.rb (1 hunks)
  • spec/jobs/poll_feed_spec.rb (1 hunks)
  • spec/models/topic_embed_spec.rb (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
app/jobs/regular/retrieve_topic.rb (1)
lib/topic_retriever.rb (1)
  • retrieve (8-10)
app/controllers/embed_controller.rb (2)
app/models/topic_embed.rb (1)
  • topic_id_for_embed (78-80)
app/controllers/application_controller.rb (1)
  • discourse_expires_in (168-171)
🪛 RuboCop (1.76.1)
app/jobs/regular/retrieve_topic.rb

[convention] 5-6: Extra empty line detected at module body beginning.

(Layout/EmptyLinesAroundModuleBody)


[convention] 21-22: Extra empty line detected at module body end.

(Layout/EmptyLinesAroundModuleBody)

spec/models/topic_embed_spec.rb

[convention] 4-5: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 12-13: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 43-44: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)


[convention] 45-46: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)


[convention] 47-48: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

spec/components/topic_retriever_spec.rb

[convention] 5-6: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 43-44: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)


[convention] 45-46: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

spec/jobs/poll_feed_spec.rb

[convention] 5-6: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 39-40: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

spec/controllers/embed_controller_spec.rb

[convention] 4-5: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 30-31: Extra empty line detected at block body beginning.

(Layout/EmptyLinesAroundBlockBody)


[convention] 54-55: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)


[convention] 57-58: Extra empty line detected at block body end.

(Layout/EmptyLinesAroundBlockBody)

app/models/topic_embed.rb

[convention] 10-42: Assignment Branch Condition size for import is too high. [<7, 23, 6> 24.78/23]

(Metrics/AbcSize)


[convention] 56-76: Assignment Branch Condition size for absolutize_urls is too high. [<9, 23, 10> 26.65/23]

(Metrics/AbcSize)


[convention] 56-76: Cyclomatic complexity for absolutize_urls is too high. [9/7]

(Metrics/CyclomaticComplexity)


[convention] 56-76: Perceived complexity for absolutize_urls is too high. [9/8]

(Metrics/PerceivedComplexity)

🪛 Brakeman (7.0.2)
app/jobs/scheduled/poll_feed.rb

[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/

(File Access)

🔇 Additional comments (30)
Gemfile (1)

209-212: Feed dependencies verified and approved

  • Confirmed that ruby-readability (v0.7.2) and simple-rss (v1.3.3) are both at their latest releases.
  • No known security advisories found for either gem.
  • Gems are correctly added with require: false and a clear comment.

Good to merge.

db/migrate/20131210181901_migrate_word_counts.rb (2)

3-3: Minor formatting cleanup - LGTM.

This appears to be just whitespace cleanup with no functional changes to the migration logic.


33-33: File ending cleanup - LGTM.

Minor formatting improvement at end of file.

config/locales/client.en.yml (1)

1477-1477: Appropriate localization addition for embedding feature.

The new "Embedding" category label is well-placed and follows the existing localization patterns. This supports the broader embedding functionality being introduced.

config/routes.rb (1)

245-246: Embed route and controller security verified

The new GET embed/best route follows Rails conventions, and the EmbedController implements the following protections:

  • Requires a valid embed_url parameter.
  • Uses a before_filter :ensure_embeddable to:
    • Verify SiteSetting.embeddable_host is set.
    • Compare the request’s referer host against embeddable_host.
    • Rescue invalid URIs and raise Discourse::InvalidAccess.
    • Set X-Frame-Options: ALLOWALL only after validation.
  • Does not skip CSRF protection for GET requests.

No further security changes are needed.

db/migrate/20131223171005_create_top_topics.rb (1)

3-3: No change needed for force: true in this migration

This create_top_topics migration is one of many early schema files (10+ under db/migrate) that use force: true to drop and recreate tables during initial setup. Since these migrations have already been applied in production and won’t be re‐run against an existing database, there’s no active risk of data loss. Removing force: true here would break consistency with the rest of the codebase’s historical migrations.

Likely an incorrect or invalid review comment.

lib/post_creator.rb (1)

216-216: LGTM!

The addition follows the established pattern for optional attribute setting and properly checks for presence before assignment. This change supports the new embedding functionality by allowing posts to specify their cooking method.

config/site_settings.yml (1)

353-357: LGTM!

The new embedding configuration section is well-structured with appropriate defaults and clear naming. The settings properly support the embedding and feed polling functionality introduced in this PR.

lib/post_revisor.rb (1)

85-85: Limit and document all skip_validations usage

The new conditional bypass (validate: !@opts[:skip_validations]) appears in multiple places:

  • lib/topic_creator.rb → save_topic
  • lib/post_creator.rb → save_post & SpamRulesEnforcer guard
  • lib/post_revisor.rb → revise!
  • app/models/topic_embed.rb → initial embed creation & revision
  • app/controllers/posts_controller.rb → permitted param (gated by api_key_valid?)
  • script/profile_db_generator.rb, db/fixtures/999_topics.rb & spec/components/post_creator_spec.rb

Action items:

  • Confirm :skip_validations can only be set in trusted code paths or via a valid API key.
  • Review api_key_valid? to ensure it cannot be bypassed.
  • Add inline comments or update the README to document when and why skipping validations is safe.
db/migrate/20131217174004_create_topic_embeds.rb (1)

11-11: LGTM!

The unique index on embed_url is essential for preventing duplicate embeddings and ensuring data integrity.

app/jobs/regular/retrieve_topic.rb (1)

9-18: LGTM! Clean job implementation with appropriate error handling.

The job correctly validates the required embed_url parameter, safely handles optional user lookup, and appropriately delegates to the TopicRetriever service. The no-throttle privilege for staff users is a sensible design choice.

spec/models/topic_embed_spec.rb (3)

18-21: LGTM! Good edge case testing.

The test properly verifies that malformed URLs are handled gracefully by returning nil and not creating any records.


29-35: Excellent test coverage of the embedding functionality.

The test comprehensively verifies:

  • Raw HTML rendering method is used
  • Relative URLs are converted to absolute URLs based on the embed URL
  • TopicEmbed record is created and associated

39-42: LGTM! Post update functionality is properly tested.

The test verifies that importing the same URL updates the existing post content, which is the expected behavior for the embedding system.

spec/jobs/poll_feed_spec.rb (1)

12-37: Excellent test coverage of conditional execution logic.

The test suite thoroughly verifies that the poll_feed method is only called when all required site settings are properly configured:

  • feed_polling_enabled? must be true
  • feed_polling_url must be present
  • embed_by_username must be present

The use of stubs and expectations ensures the job's guard clauses work correctly.

spec/components/topic_retriever_spec.rb (1)

9-25: LGTM! Good validation test coverage.

The tests properly cover the essential security validations:

  • Checking that embeddable_host setting is configured
  • Verifying host matching to prevent unauthorized embedding
  • Validating URL format to prevent malformed input

These are critical security checks for the embedding feature.

app/assets/stylesheets/embed.css.scss (1)

60-69: LGTM! Clean styling for logo and buttons.

The CSS for logo positioning and button styling is well-structured and follows good practices with proper use of max-height for responsive images.

app/models/post.rb (2)

63-65: LGTM! Clean enum definition for cook methods.

The enum definition follows Rails conventions and clearly separates regular posts from raw HTML posts. The naming is descriptive and the implementation is straightforward.


130-137: Approved: Conditional cooking logic and raw_html usage verified

  • Documentation and early-return for raw_html are clear and appropriate.
  • Verified that Post.cook_methods[:raw_html] is only set in app/models/topic_embed.rb (via PostCreator) and covered by spec/models/topic_embed_spec.rb.
  • No other occurrences of cook_method == Post.cook_methods[:raw_html] or direct assignments found in the codebase.

All good to merge.

config/locales/server.en.yml (2)

32-38: LGTM! Comprehensive localization for embed UI.

The embed localization strings are well-written and cover all the UI elements needed:

  • Discussion highlights and navigation
  • Loading states
  • Attribution links
  • Clear, user-friendly language

These match the template usage I saw in the view files.


768-771: Clear and informative site setting descriptions.

The embedding and feed polling setting descriptions are:

  • Concise but informative
  • Clearly explain the purpose and impact
  • Use consistent terminology
  • Provide adequate context for administrators

The settings align well with the functionality implemented in the other files.

lib/tasks/disqus.thor (1)

148-148: LGTM - Clean integration with new embedding system.

The change from category-based post creation to TopicEmbed.import_remote aligns well with the new embedding infrastructure. This simplifies the import process by delegating topic creation to the embedding system.

spec/controllers/embed_controller_spec.rb (2)

40-46: LGTM - Good test coverage for topic retrieval.

The test properly verifies that when no embed exists, the controller triggers topic retrieval through TopicRetriever. The mocking approach correctly isolates the controller behavior.


48-52: LGTM - Comprehensive test for existing topic scenario.

The test correctly verifies that when a topic already exists for the embed URL, a TopicView is created with appropriate parameters for displaying the best posts.

lib/topic_retriever.rb (3)

14-19: LGTM - Robust host validation with proper error handling.

The host validation correctly compares against the configured embeddable host and properly handles invalid URIs by treating them as invalid hosts. The URI error handling is appropriate.


21-33: LGTM - Well-implemented throttling mechanism.

The throttling logic is sound, using Redis SETNX for atomic operations and proper TTL handling. The ability to bypass throttling for staff users is a good design choice.


35-46: Excellent defensive programming in perform_retrieve.

The method properly checks for race conditions where another process might have already created the embed, and intelligently tries RSS polling first before falling back to HTTP fetching.

app/controllers/embed_controller.rb (1)

1-6: Well-structured embedding controller setup.

The filter configuration is appropriate for an embedding context - skipping XHR checks and JSON preloading while adding essential embedding security validation.

app/models/topic_embed.rb (2)

1-8: Well-structured model with appropriate validations.

The model structure is clean with proper associations and validations. The explicit nokogiri requirement suggests it's used for HTML processing within the model.


78-80: Efficient and well-implemented query method.

The use of pluck to retrieve just the topic_id is more efficient than loading full objects, and the logic is correct.

Comment on lines +5 to +12
var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
iframe.id = 'discourse-embed-frame';
iframe.width = "100%";
iframe.frameBorder = "0";
iframe.scrolling = "no";
comments.appendChild(iframe);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for DOM manipulation.

The code assumes discourse-comments element exists and doesn't handle cases where it might be missing.

Add defensive programming:

   var comments = document.getElementById('discourse-comments'),
       iframe = document.createElement('iframe');
+  
+  if (!comments) {
+    console.error('Element with id "discourse-comments" not found');
+    return;
+  }
+  
   iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
   iframe.id = 'discourse-embed-frame';
   iframe.width = "100%";
   iframe.frameBorder = "0";
   iframe.scrolling = "no";
   comments.appendChild(iframe);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
iframe.id = 'discourse-embed-frame';
iframe.width = "100%";
iframe.frameBorder = "0";
iframe.scrolling = "no";
comments.appendChild(iframe);
var comments = document.getElementById('discourse-comments'),
iframe = document.createElement('iframe');
if (!comments) {
console.error('Element with id "discourse-comments" not found');
return;
}
iframe.src = discourseUrl + "embed/best?embed_url=" + encodeURIComponent(discourseEmbedUrl);
iframe.id = 'discourse-embed-frame';
iframe.width = "100%";
iframe.frameBorder = "0";
iframe.scrolling = "no";
comments.appendChild(iframe);
🤖 Prompt for AI Agents
In app/assets/javascripts/embed.js around lines 5 to 12, the code assumes the
element with id 'discourse-comments' exists without checking. To fix this, add a
conditional check to verify that the 'discourse-comments' element is not null
before attempting to create and append the iframe. If the element is missing,
avoid executing the iframe creation and appending logic to prevent runtime
errors.


function postMessageReceived(e) {
if (!e) { return; }
if (discourseUrl.indexOf(e.origin) === -1) { return; }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Security vulnerability: Weak origin validation.

The current origin validation using indexOf is insufficient and can be bypassed. An attacker could use a malicious domain like evil-discourseUrl.com to pass this check.

Use proper origin validation:

-    if (discourseUrl.indexOf(e.origin) === -1) { return; }
+    var allowedOrigin = new URL(discourseUrl).origin;
+    if (e.origin !== allowedOrigin) { return; }
🤖 Prompt for AI Agents
In app/assets/javascripts/embed.js at line 17, the origin validation uses
indexOf which can be bypassed by malicious domains containing the discourseUrl
as a substring. Replace this check with a strict equality comparison or use URL
parsing to verify that e.origin exactly matches the expected discourseUrl origin
to ensure proper security validation.

Comment on lines +8 to +20
def best
embed_url = params.require(:embed_url)
topic_id = TopicEmbed.topic_id_for_embed(embed_url)

if topic_id
@topic_view = TopicView.new(topic_id, current_user, {best: 5})
else
Jobs.enqueue(:retrieve_topic, user_id: current_user.try(:id), embed_url: embed_url)
render 'loading'
end

discourse_expires_in 1.minute
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add explicit rendering and improve error handling.

The method logic is sound but has a few areas for improvement:

  1. Missing explicit render call when topic_id is found
  2. Cache expiration is set regardless of content state
  3. No error handling for malformed embed_url parameter

Apply this diff to improve the method:

 def best
   embed_url = params.require(:embed_url)
+  
+  # Validate URL format
+  begin
+    URI.parse(embed_url)
+  rescue URI::InvalidURIError
+    render json: { error: 'Invalid embed URL' }, status: 400
+    return
+  end
+  
   topic_id = TopicEmbed.topic_id_for_embed(embed_url)

   if topic_id
     @topic_view = TopicView.new(topic_id, current_user, {best: 5})
+    render 'best'
+    discourse_expires_in 1.minute
   else
     Jobs.enqueue(:retrieve_topic, user_id: current_user.try(:id), embed_url: embed_url)
     render 'loading'
   end
-
-  discourse_expires_in 1.minute
 end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def best
embed_url = params.require(:embed_url)
topic_id = TopicEmbed.topic_id_for_embed(embed_url)
if topic_id
@topic_view = TopicView.new(topic_id, current_user, {best: 5})
else
Jobs.enqueue(:retrieve_topic, user_id: current_user.try(:id), embed_url: embed_url)
render 'loading'
end
discourse_expires_in 1.minute
end
def best
embed_url = params.require(:embed_url)
# Validate URL format
begin
URI.parse(embed_url)
rescue URI::InvalidURIError
render json: { error: 'Invalid embed URL' }, status: 400
return
end
topic_id = TopicEmbed.topic_id_for_embed(embed_url)
if topic_id
@topic_view = TopicView.new(topic_id, current_user, {best: 5})
render 'best'
discourse_expires_in 1.minute
else
Jobs.enqueue(
:retrieve_topic,
user_id: current_user.try(:id),
embed_url: embed_url
)
render 'loading'
end
end
🤖 Prompt for AI Agents
In app/controllers/embed_controller.rb around lines 8 to 20, add an explicit
render call when topic_id is found to ensure a response is returned. Move the
discourse_expires_in call inside the conditional branches so cache expiration is
set only when appropriate. Add error handling for malformed embed_url by
rescuing parameter errors and rendering an error or fallback response
accordingly.

Comment on lines +24 to +31
def ensure_embeddable
raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host

response.headers['X-Frame-Options'] = "ALLOWALL"
rescue URI::InvalidURIError
raise Discourse::InvalidAccess.new('invalid referer host')
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Strengthen embedding security controls.

The security validation is functional but could be more robust:

  1. X-Frame-Options: ALLOWALL is overly permissive and defeats clickjacking protection
  2. Referer validation can be bypassed and should be supplemented with additional checks
  3. The fallback to empty string for nil referer could mask legitimate validation failures

Consider this more secure approach:

 def ensure_embeddable
   raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
-  raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host
+  
+  referer = request.referer
+  raise Discourse::InvalidAccess.new('referer required for embedding') if referer.blank?
+  raise Discourse::InvalidAccess.new('invalid referer host') if URI(referer).host != SiteSetting.embeddable_host

-  response.headers['X-Frame-Options'] = "ALLOWALL"
+  # Only allow framing from the configured embeddable host
+  response.headers['X-Frame-Options'] = "ALLOW-FROM https://#{SiteSetting.embeddable_host}"
+  response.headers['Content-Security-Policy'] = "frame-ancestors https://#{SiteSetting.embeddable_host}"
 rescue URI::InvalidURIError
-  raise Discourse::InvalidAccess.new('invalid referer host')
+  raise Discourse::InvalidAccess.new('invalid referer URL')
 end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def ensure_embeddable
raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
raise Discourse::InvalidAccess.new('invalid referer host') if URI(request.referer || '').host != SiteSetting.embeddable_host
response.headers['X-Frame-Options'] = "ALLOWALL"
rescue URI::InvalidURIError
raise Discourse::InvalidAccess.new('invalid referer host')
end
def ensure_embeddable
raise Discourse::InvalidAccess.new('embeddable host not set') if SiteSetting.embeddable_host.blank?
referer = request.referer
raise Discourse::InvalidAccess.new('referer required for embedding') if referer.blank?
raise Discourse::InvalidAccess.new('invalid referer host') if URI(referer).host != SiteSetting.embeddable_host
# Only allow framing from the configured embeddable host
response.headers['X-Frame-Options'] = "ALLOW-FROM https://#{SiteSetting.embeddable_host}"
response.headers['Content-Security-Policy'] = "frame-ancestors https://#{SiteSetting.embeddable_host}"
rescue URI::InvalidURIError
raise Discourse::InvalidAccess.new('invalid referer URL')
end
🤖 Prompt for AI Agents
In app/controllers/embed_controller.rb around lines 24 to 31, the current
embedding security is weak because it sets 'X-Frame-Options' to 'ALLOWALL',
which disables clickjacking protection, and the referer validation is easily
bypassed. To fix this, replace 'ALLOWALL' with a stricter header like
'SAMEORIGIN' or use Content Security Policy frame-ancestors directive to
restrict embedding to trusted hosts. Also, improve referer validation by
checking for presence and validity explicitly instead of defaulting to an empty
string, and consider adding additional verification such as checking an
authentication token or origin header to strengthen security.

Comment on lines +24 to +38
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?

require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)

rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//

content = CGI.unescapeHTML(i.content.scrub)
TopicEmbed.import(user, url, i.title, content)
end
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add comprehensive error handling for feed parsing.

The poll_feed method lacks error handling for several failure scenarios that could cause the job to crash silently or behave unexpectedly.

Add proper error handling:

 def poll_feed
   user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
   return if user.blank?

+  begin
     require 'simple-rss'
     # ... (use safer HTTP client as suggested above)
     
     rss.items.each do |i|
       url = i.link
       url = i.id if url.blank? || url !~ /^https?\:\/\//
+      next if url.blank?

       content = CGI.unescapeHTML(i.content.scrub)
       TopicEmbed.import(user, url, i.title, content)
     end
+  rescue => e
+    Rails.logger.error "Feed polling failed: #{e.message}"
+    # Optionally notify administrators
+  end
 end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?
require 'simple-rss'
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//
content = CGI.unescapeHTML(i.content.scrub)
TopicEmbed.import(user, url, i.title, content)
end
end
def poll_feed
user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
return if user.blank?
begin
require 'simple-rss'
# ... (use safer HTTP client as suggested above)
rss = SimpleRSS.parse open(SiteSetting.feed_polling_url)
rss.items.each do |i|
url = i.link
url = i.id if url.blank? || url !~ /^https?\:\/\//
next if url.blank?
content = CGI.unescapeHTML(i.content.scrub)
TopicEmbed.import(user, url, i.title, content)
end
rescue => e
Rails.logger.error "Feed polling failed: #{e.message}"
# Optionally notify administrators
end
end
🧰 Tools
🪛 Brakeman (7.0.2)

[medium] 29-29: Model attribute used in file name
Type: File Access
Confidence: Medium
More info: https://brakemanscanner.org/docs/warning_types/file_access/

(File Access)

🤖 Prompt for AI Agents
In app/jobs/scheduled/poll_feed.rb around lines 24 to 38, the poll_feed method
currently does not handle errors that may occur during feed fetching or parsing,
which can cause silent failures. Wrap the feed fetching and parsing logic in a
begin-rescue block to catch exceptions such as network errors or parsing errors.
Log or handle these exceptions appropriately to prevent the job from crashing
silently and to aid in debugging issues with the feed.

Comment on lines +10 to +42
def self.import(user, url, title, contents)
return unless url =~ /^https?\:\/\//

contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"

embed = TopicEmbed.where(embed_url: url).first
content_sha1 = Digest::SHA1.hexdigest(contents)
post = nil

# If there is no embed, create a topic, post and the embed.
if embed.blank?
Topic.transaction do
creator = PostCreator.new(user, title: title, raw: absolutize_urls(url, contents), skip_validations: true, cook_method: Post.cook_methods[:raw_html])
post = creator.create
if post.present?
TopicEmbed.create!(topic_id: post.topic_id,
embed_url: url,
content_sha1: content_sha1,
post_id: post.id)
end
end
else
post = embed.post
# Update the topic if it changed
if content_sha1 != embed.content_sha1
revisor = PostRevisor.new(post)
revisor.revise!(user, absolutize_urls(url, contents), skip_validations: true, bypass_rate_limiter: true)
embed.update_column(:content_sha1, content_sha1)
end
end

post
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Refactor complex method and address security concerns.

This method has high complexity (ABC size 24.78/23) and mixes multiple responsibilities. Additionally, there are security concerns with HTML injection in the footer.

Security Issue: The I18n interpolation includes raw HTML which could lead to XSS:

-contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{url}'>#{url}</a>")}</small>\n"
+safe_url = ERB::Util.html_escape(url)
+contents << "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{safe_url}'>#{safe_url}</a>")}</small>\n"

Complexity Reduction: Consider extracting methods:

def self.import(user, url, title, contents)
  return unless valid_url?(url)
  
  contents_with_footer = add_source_footer(url, contents)
  content_sha1 = Digest::SHA1.hexdigest(contents_with_footer)
  embed = find_embed(url)
  
  if embed.blank?
    create_new_embed(user, url, title, contents_with_footer, content_sha1)
  else
    update_existing_embed(user, embed, contents_with_footer, content_sha1)
  end
end

private

def self.valid_url?(url)
  url =~ /\Ahttps?:\/\//
end

def self.add_source_footer(url, contents)
  safe_url = ERB::Util.html_escape(url)
  contents + "\n<hr>\n<small>#{I18n.t('embed.imported_from', link: "<a href='#{safe_url}'>#{safe_url}</a>")}</small>\n"
end
🧰 Tools
🪛 RuboCop (1.76.1)

[convention] 10-42: Assignment Branch Condition size for import is too high. [<7, 23, 6> 24.78/23]

(Metrics/AbcSize)

🤖 Prompt for AI Agents
In app/models/topic_embed.rb around lines 10 to 42, the import method is overly
complex and includes a security risk by interpolating raw HTML into the I18n
translation, which can lead to XSS vulnerabilities. Refactor the method by
extracting helper methods such as valid_url? to check the URL format,
add_source_footer to safely append the footer with escaped URLs, find_embed to
retrieve existing embeds, and separate methods to create or update embeds.
Ensure that URLs are HTML-escaped before interpolation to prevent injection
attacks.

Comment on lines +44 to +53
def self.import_remote(user, url, opts=nil)
require 'ruby-readability'

opts = opts || {}
doc = Readability::Document.new(open(url).read,
tags: %w[div p code pre h1 h2 h3 b em i strong a img],
attributes: %w[href src])

TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Critical security vulnerability: SSRF attack vector.

The open(url).read call is a serious security vulnerability that allows Server-Side Request Forgery (SSRF) attacks. An attacker could provide URLs to internal services or file:// schemes.

Replace with a safer HTTP client with proper restrictions:

 def self.import_remote(user, url, opts=nil)
   require 'ruby-readability'
+  require 'net/http'
+  require 'uri'

   opts = opts || {}
-  doc = Readability::Document.new(open(url).read,
+  
+  # Validate URL and fetch safely
+  uri = URI.parse(url)
+  raise ArgumentError, 'Only HTTP/HTTPS URLs allowed' unless %w[http https].include?(uri.scheme)
+  raise ArgumentError, 'Invalid host' if uri.host.nil?
+  
+  # Prevent SSRF to internal networks
+  raise ArgumentError, 'Access to private networks not allowed' if private_network?(uri.host)
+  
+  response = fetch_url_safely(uri)
+  doc = Readability::Document.new(response,
                                       tags: %w[div p code pre h1 h2 h3 b em i strong a img],
                                       attributes: %w[href src])

   TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
 end

+private
+
+def self.private_network?(host)
+  # Add logic to check for private IP ranges, localhost, etc.
+  %w[localhost 127.0.0.1 ::1].include?(host) ||
+  host.match(/\A(10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.)/)
+end
+
+def self.fetch_url_safely(uri)
+  Net::HTTP.start(uri.host, uri.port, use_ssl: uri.scheme == 'https',
+                  open_timeout: 10, read_timeout: 30) do |http|
+    request = Net::HTTP::Get.new(uri)
+    response = http.request(request)
+    raise "HTTP error: #{response.code}" unless response.code == '200'
+    response.body
+  end
+end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def self.import_remote(user, url, opts=nil)
require 'ruby-readability'
opts = opts || {}
doc = Readability::Document.new(open(url).read,
tags: %w[div p code pre h1 h2 h3 b em i strong a img],
attributes: %w[href src])
TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
end
def self.import_remote(user, url, opts=nil)
require 'ruby-readability'
require 'net/http'
require 'uri'
opts = opts || {}
# Validate URL and fetch safely
uri = URI.parse(url)
raise ArgumentError, 'Only HTTP/HTTPS URLs allowed' unless %w[http https].include?(uri.scheme)
raise ArgumentError, 'Invalid host' if uri.host.nil?
# Prevent SSRF to internal networks
raise ArgumentError, 'Access to private networks not allowed' if private_network?(uri.host)
response = fetch_url_safely(uri)
doc = Readability::Document.new(response,
tags: %w[div p code pre h1 h2 h3 b em i strong a img],
attributes: %w[href src])
TopicEmbed.import(user, url, opts[:title] || doc.title, doc.content)
end
private
def self.private_network?(host)
# Simple checks for localhost and common private IP ranges
%w[localhost 127.0.0.1 ::1].include?(host) ||
host.match(/\A(10\.|172\.(1[6-9]|2\d|3[01])\.|192\.168\.)/)
end
def self.fetch_url_safely(uri)
Net::HTTP.start(uri.host, uri.port,
use_ssl: uri.scheme == 'https',
open_timeout: 10, read_timeout: 30) do |http|
request = Net::HTTP::Get.new(uri)
response = http.request(request)
raise "HTTP error: #{response.code}" unless response.code == '200'
response.body
end
end
🤖 Prompt for AI Agents
In app/models/topic_embed.rb around lines 44 to 53, the use of open(url).read
introduces a critical SSRF vulnerability by allowing arbitrary URL access.
Replace this with a safer HTTP client like Net::HTTP or an HTTP gem that
enforces URL validation and restricts schemes to http/https only. Implement
checks to prevent access to internal IP ranges or local files, and handle
timeouts and errors securely to mitigate SSRF risks.

Comment on lines +3 to +6
<%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- else %>
<%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- end if %>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add security attributes to external links.

Links that open in new tabs should include rel="noopener" to prevent the new page from accessing the window.opener property, which is a security risk.

Apply this diff:

-    <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+    <%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank', rel: 'noopener') %>
   <%- else %>
-    <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
+    <%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank', rel: 'noopener') %>

Also apply the same fix to lines 14, 25, and 26.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- else %>
<%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank') %>
<%- end if %>
<%= link_to(I18n.t('embed.title'), @topic_view.topic.url, class: 'button', target: '_blank', rel: 'noopener') %>
<%- else %>
<%= link_to(I18n.t('embed.start_discussion'), @topic_view.topic.url, class: 'button', target: '_blank', rel: 'noopener') %>
<%- end if %>
🤖 Prompt for AI Agents
In app/views/embed/best.html.erb around lines 3 to 6, 14, 25, and 26, the
external links that open in new tabs use target="_blank" but lack the
rel="noopener" attribute, which is a security risk. Update all these link_to
helper calls to include rel: "noopener" in their HTML options to prevent the new
page from accessing window.opener.

<img src='<%= post.user.small_avatar_url %>'>
<h3><%= post.user.username %></h3>
</div>
<div class='cooked'><%= raw post.cooked %></div>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Security concern: Raw HTML rendering without sanitization.

The <%= raw post.cooked %> line renders HTML content directly without sanitization, which could introduce XSS vulnerabilities if the content isn't properly validated upstream.

Verify that the post.cooked content is properly sanitized before rendering. If this is for trusted embedded content only, ensure proper validation occurs in the TopicEmbed.import or Post.cook methods. Consider adding a comment explaining why raw HTML is safe here.

Let me check how the post content is processed:


🏁 Script executed:

#!/bin/bash
# Check how post.cooked content is processed and if it's sanitized
ast-grep --pattern 'def cook($$$) { $$$ }'
rg -A 10 "cook.*raw_html"
rg -A 5 "TopicEmbed.*import"

Length of output: 5095


Action Required: Sanitize embedded HTML and harden link‐security

It turns out that post.cooked is rendered with the :raw_html cook method (via PostCreator in TopicEmbed.import), so no HTML sanitization is applied before it’s sent to the view. Rendering it verbatim can introduce XSS if the source isn’t fully trusted. In addition, we still need to tighten up link security, remove inline styles, and localize dates:

• app/views/embed/best.html.erb (line 19)

  • Replace
    <%= raw post.cooked %>
    with something like:
    <%= sanitize(post.cooked,
         tags: %w[a p img ul ol li strong em blockquote],
         attributes: %w[href src title alt rel target]) %>
    or ensure upstream that only safe HTML is passed here.
    • Every <a target="_blank"> in this template needs a rel="noopener noreferrer".
    • Remove any inline style="…" attributes and move them into your CSS.
    • Replace hard-coded date formatting (e.g. post.created_at.strftime(...)) with I18n—e.g.
<%= l(post.created_at, format: :short) %>
🤖 Prompt for AI Agents
In app/views/embed/best.html.erb at line 19, replace the raw rendering of
post.cooked with a sanitized version using the sanitize helper, allowing only
safe tags and attributes to prevent XSS. Ensure all <a> tags with
target="_blank" include rel="noopener noreferrer" to improve link security.
Remove any inline style attributes from the HTML and move those styles into CSS
files. Also, replace any hard-coded date formatting with the I18n localization
helper l(), using a suitable format like :short for dates.

Comment on lines +6 to +15
<script>
(function() {
window.onload = function() {
if (parent) {
// Send a post message with our loaded height
parent.postMessage({type: 'discourse-resize', height: document['body'].offsetHeight}, '<%= request.referer %>');
}
}
})();
</script>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Validate the referer before using it in postMessage.

Using request.referer directly as the target origin for postMessage could be a security risk, as the referer header can be spoofed or missing.

Consider validating the referer against the configured embeddable_host setting:

<script>
  (function() {
    window.onload = function() {
      if (parent) {
+       var allowedOrigin = '<%= SiteSetting.embeddable_host %>';
+       var referer = '<%= request.referer %>';
+       var targetOrigin = allowedOrigin && referer && referer.indexOf(allowedOrigin) === 0 ? referer : '*';
        // Send a post message with our loaded height
-       parent.postMessage({type: 'discourse-resize', height: document['body'].offsetHeight}, '<%= request.referer %>');
+       parent.postMessage({type: 'discourse-resize', height: document['body'].offsetHeight}, targetOrigin);
      }
    }
  })();
</script>

This ensures the message is only sent to validated origins or uses '*' as a fallback.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<script>
(function() {
window.onload = function() {
if (parent) {
// Send a post message with our loaded height
parent.postMessage({type: 'discourse-resize', height: document['body'].offsetHeight}, '<%= request.referer %>');
}
}
})();
</script>
<script>
(function() {
window.onload = function() {
if (parent) {
var allowedOrigin = '<%= SiteSetting.embeddable_host %>';
var referer = '<%= request.referer %>';
var targetOrigin = allowedOrigin && referer && referer.indexOf(allowedOrigin) === 0
? referer
: '*';
// Send a post message with our loaded height
parent.postMessage(
{ type: 'discourse-resize', height: document['body'].offsetHeight },
targetOrigin
);
}
}
})();
</script>
🤖 Prompt for AI Agents
In app/views/layouts/embed.html.erb around lines 6 to 15, the code uses
request.referer directly as the target origin in parent.postMessage, which is
insecure. To fix this, validate request.referer against the configured
embeddable_host setting before using it. If it matches, use it as the target
origin; otherwise, fallback to '*' to ensure the message is only sent to trusted
origins or safely broadcasted.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has been open for 60 days with no activity. To keep it open, remove the stale tag, push code, or add a comment. Otherwise, it will be closed in 14 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants