Enhance embed URL handling and validation system #4

everettbu · 2025-07-26T19:32:21Z

Test 4

Summary by CodeRabbit

New Features
- Added support for embedding Discourse discussions on external sites via iframe, including dynamic resizing and custom styles.
- Introduced the ability to import and update topics from RSS/ATOM feeds automatically.
- Added new site settings for embedding and feed polling configuration.
- Enabled posts to render raw HTML when specified.
Bug Fixes
- Improved handling of relative URLs in embedded post content, ensuring they are converted to absolute URLs.
Documentation
- Added new localization strings and site setting descriptions for embedding and feed importing features.
Chores
- Introduced new tests covering embedding, feed polling, and raw HTML post rendering.
- Added new dependencies for feed parsing and HTML readability.

…f feedzirra

coderabbitai · 2025-07-26T19:32:29Z

Walkthrough

This change set introduces a full-featured embedding and feed import system. It adds new models, controllers, jobs, routes, and views to allow external content to be embedded and imported into the application, including support for raw HTML posts, RSS/ATOM feed polling, and topic retrieval. Supporting tests, migrations, site settings, and localization strings are included.

Changes

Cohort / File(s)	Change Summary
Gem Dependencies `Gemfile`	Added `ruby-readability` and `simple-rss` gems for feed importing and embedding, with `require: false`.
Frontend Embedding Assets `app/assets/javascripts/embed.js`, `app/assets/stylesheets/embed.css.scss`	Introduced JS for dynamic iframe embedding and event handling; added SCSS for embed layout styling.
Embed Controller & Routing `app/controllers/embed_controller.rb`, `config/routes.rb`	Added `EmbedController` with a `best` action, embeddable host validation, and a new `/embed/best` route.
Embed Views & Layouts `app/views/embed/best.html.erb`, `app/views/embed/loading.html.erb`, `app/views/layouts/embed.html.erb`	New templates for rendering embedded discussions, loading state, and a custom layout with resize messaging.
Topic Embedding Model `app/models/topic_embed.rb`	New model for managing embedded topics, importing/updating content, and handling remote retrieval and URL absolutization.
Post Model and Cooking `app/models/post.rb`, `db/migrate/20131219203905_add_cook_method_to_posts.rb`	Added `cook_method` enum and logic to allow raw HTML posts; migration adds `cook_method` column.
Feed Polling & Topic Retrieval Jobs `app/jobs/regular/retrieve_topic.rb`, `app/jobs/scheduled/poll_feed.rb`, `lib/topic_retriever.rb`	Added background jobs for retrieving topics and polling feeds; new service class for topic retrieval with throttling and validation.
Post Creation & Revision `lib/post_creator.rb`, `lib/post_revisor.rb`	Modified to support `cook_method` assignment and optional validation skipping during revisions.
Disqus Import Adaptation `lib/tasks/disqus.thor`	Removed category option and switched to using `TopicEmbed.import_remote` for thread import.
Migrations: Embedding & Misc `db/migrate/20131217174004_create_topic_embeds.rb`, `db/migrate/20131219203905_add_cook_method_to_posts.rb`, `db/migrate/20131223171005_create_top_topics.rb`, `db/migrate/20131210181901_migrate_word_counts.rb`	Added migrations for `topic_embeds` table and `cook_method` on posts, adjusted table creation with `force: true`, and minor whitespace fixes.
Site Settings & Localization `config/site_settings.yml`, `config/locales/client.en.yml`, `config/locales/server.en.yml`	Added new embedding and feed polling settings; introduced related localization strings.
Specs: Embedding & Feed `spec/controllers/embed_controller_spec.rb`, `spec/components/topic_retriever_spec.rb`, `spec/jobs/poll_feed_spec.rb`, `spec/models/topic_embed_spec.rb`	New RSpec test files for controller, model, job, and service logic covering embedding and feed polling.

Sequence Diagram(s)

sequenceDiagram
    participant ExternalSite as External Site
    participant User as User
    participant EmbedController as EmbedController
    participant TopicEmbed as TopicEmbed
    participant TopicRetriever as TopicRetriever
    participant Jobs as Background Jobs
    participant DB as Database

    ExternalSite->>EmbedController: GET /embed/best?embed_url=...
    EmbedController->>TopicEmbed: topic_id_for_embed(embed_url)
    alt Topic exists
        EmbedController->>DB: Fetch topic/posts
        EmbedController-->>ExternalSite: Render embed view
    else Topic missing
        EmbedController->>Jobs: Enqueue retrieve_topic (embed_url)
        EmbedController-->>ExternalSite: Render loading view
        Jobs->>TopicRetriever: retrieve(embed_url)
        TopicRetriever->>TopicEmbed: import_remote(user, embed_url)
        TopicEmbed->>DB: Create Topic/Post/Embed
    end

sequenceDiagram
    participant Scheduler as Scheduler
    participant Jobs as PollFeed Job
    participant SimpleRSS as SimpleRSS
    participant TopicEmbed as TopicEmbed
    participant DB as Database

    Scheduler->>Jobs: Run hourly
    Jobs->>SimpleRSS: Parse feed from feed_polling_url
    loop For each feed item
        Jobs->>TopicEmbed: import(user, item_url, title, content)
        TopicEmbed->>DB: Create or update Topic/Post/Embed
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Poem

🐇
A bunny hops through code anew,
Embedding posts and topics too.
Feeds now poll and topics grow,
With iframes styled in tidy rows.
HTML or markdown, both can shine—
This rabbit’s work is quite divine!
Hooray for embeds, so sleek and fine!

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch embed-url-handling-post

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai · 2025-07-29T00:22:08Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

everettbu · 2025-07-29T00:34:11Z

@coderabbitai review

coderabbitai · 2025-07-29T00:34:47Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai

Actionable comments posted: 11

🧹 Nitpick comments (18)

db/migrate/20131219203905_add_cook_method_to_posts.rb (1)

1-5: LGTM! Consider documenting the cook_method enum values.

The migration correctly adds the cook_method column with appropriate constraints. Since this appears to support different cooking methods including raw HTML (as mentioned in the AI summary), consider adding a comment or documentation explaining the enum values, with 1 likely being the default cooking method.
app/views/embed/loading.html.erb (1)
6-12: Consider making the reload timeout configurable.

The hard-coded 30-second timeout works but could be made more flexible through a site setting or configuration parameter.

Consider extracting the timeout to a configurable value:
<script>
  (function() {
    setTimeout(function() {
      document.location.reload();
-    }, 30000);
+    }, <%= SiteSetting.embed_loading_timeout || 30000 %>);
  })();
</script>
Additionally, consider implementing exponential backoff for repeated failures or adding error handling for network issues.
db/migrate/20131217174004_create_topic_embeds.rb (2)
3-3: Consider the implications of using force: true in production.

The force: true option will drop and recreate the table if it exists, which could cause data loss in production environments. This is typically acceptable for initial feature migrations but should be used carefully.

If this migration might run in environments where the table could already exist with data, consider removing force: true or adding a conditional check.

4-7: Consider adding foreign key constraints for data integrity.

The migration creates topic_id and post_id columns but doesn't establish foreign key relationships. Adding foreign key constraints would ensure referential integrity.

Consider adding foreign key constraints:
    create_table :topic_embeds, force: true do |t|
      t.integer :topic_id, null: false
      t.integer :post_id, null: false
      t.string :embed_url, null: false
      t.string :content_sha1, null: false, limit: 40
      t.timestamps
    end

+   add_foreign_key :topic_embeds, :topics
+   add_foreign_key :topic_embeds, :posts
app/jobs/regular/retrieve_topic.rb (1)
5-6: Remove extra empty lines per RuboCop conventions.

The static analysis tool correctly identifies extra empty lines at module body boundaries.
 module Jobs
-
   # Asynchronously retrieve a topic from an embedded site
   class RetrieveTopic < Jobs::Base
   end
-
 end
-
Also applies to: 21-22
spec/models/topic_embed_spec.rb (1)
4-5: Consider cleaning up extra empty lines for consistency.

Multiple RuboCop violations for extra empty lines at block boundaries. While not critical, cleaning these up would improve code consistency.
 describe TopicEmbed do
-
   it { should belong_to :topic }
   context '.import' do
-
     let(:user) { Fabricate(:user) }
     end
-
   end
-
 end
-
Also applies to: 12-13, 43-48
spec/jobs/poll_feed_spec.rb (1)
5-6: Remove extra empty lines per RuboCop conventions.

Static analysis correctly identifies extra empty lines at block body boundaries.
 describe Jobs::PollFeed do
-
   let(:poller) { Jobs::PollFeed.new }
   end
-
 end
Also applies to: 39-40
spec/components/topic_retriever_spec.rb (2)
5-6: Fix formatting issues flagged by RuboCop.

The static analysis tool has identified extra empty lines that should be removed for consistency with the project's style guidelines.

Apply this diff to fix the formatting:
 describe TopicRetriever do
-
   let(:embed_url) { "http://eviltrout.com/2013/02/10/why-discourse-uses-emberjs.html" }
     end
-
   end
-
 end
Also applies to: 43-46

32-42: Consider adding edge case tests for throttling logic.

The throttling tests cover the basic scenarios, but consider adding tests for:

Boundary conditions around the throttling timeframe

Error handling when retrieved_recently? fails

Concurrent access scenarios

This would strengthen the test coverage for the throttling mechanism which is important for preventing abuse.
app/assets/stylesheets/embed.css.scss (1)
14-27: Consider responsive design for author section.

The fixed width of 92px for the author section and the corresponding margin-left: 92px for content may not work well on mobile devices or smaller screens.

Consider using flexbox or CSS Grid for a more responsive layout:
-  .author {
-    padding: 20px 0;
-    width: 92px;
-    float: left;
-    text-align: center;
-    h3 {
-      text-align: center;
-      color: #4a6b82;
-      font-size: 13px;
-      margin: 0;
-    }
-  }
-
-  .cooked {
-    padding: 20px 0;
-    margin-left: 92px;
-    p {
-      margin: 0 0 1em 0;
-    }
-  }
+  display: flex;
+  gap: 20px;
+
+  .author {
+    flex-shrink: 0;
+    width: 92px;
+    text-align: center;
+    h3 {
+      color: #4a6b82;
+      font-size: 13px;
+      margin: 0;
+    }
+  }
+
+  .cooked {
+    flex: 1;
+    p {
+      margin: 0 0 1em 0;
+    }
+  }
app/views/embed/best.html.erb (2)
20-20: Replace inline styles with CSS classes.

Inline styles reduce maintainability and override CSS specificity rules. Move this to the stylesheet.

Add to embed.css.scss:
.clearfix {
  clear: both;
}
Then update the template:
-      <div style='clear: both'></div>
+      <div class='clearfix'></div>
14-14: Consider internationalizing the date format.

The hardcoded date format "%e %b %Y" may not be appropriate for all locales.

Consider using Rails' localization helpers:
-      <%= link_to post.created_at.strftime("%e %b %Y"), post.url, class: 'post-date', target: "_blank" %>
+      <%= link_to l(post.created_at, format: :short), post.url, class: 'post-date', target: "_blank", rel: "noopener" %>
app/jobs/scheduled/poll_feed.rb (1)
32-33: Improve URL validation logic.

The current URL validation is fragile and could miss edge cases or allow invalid URLs to pass through.

Use more robust URL validation:
-        url = i.link
-        url = i.id if url.blank? || url !~ /^https?\:\/\//
+        url = i.link.to_s.strip
+        if url.blank? || !url.match?(/\Ahttps?:\/\//)
+          url = i.id.to_s.strip
+          next if url.blank? || !url.match?(/\Ahttps?:\/\//)
+        end
app/assets/javascripts/embed.js (1)
15-24: Add input validation for postMessage data.

The message handler should validate the data structure more thoroughly to prevent potential issues.

Improve data validation:
 function postMessageReceived(e) {
   if (!e) { return; }
   var allowedOrigin = new URL(discourseUrl).origin;
   if (e.origin !== allowedOrigin) { return; }

-  if (e.data) {
-    if (e.data.type === 'discourse-resize' && e.data.height) {
-      iframe.height = e.data.height + "px";
-    }
-  }
+  if (e.data && typeof e.data === 'object') {
+    if (e.data.type === 'discourse-resize' && 
+        typeof e.data.height === 'number' && 
+        e.data.height > 0 && 
+        e.data.height < 10000) {
+      iframe.height = e.data.height + "px";
+    }
+  }
 }
spec/controllers/embed_controller_spec.rb (2)
4-5: Fix formatting: Remove extra empty lines.

Static analysis identified unnecessary empty lines at block beginnings.
 describe EmbedController do
-
   let(:host) { "eviltrout.com" }
     context "success" do
-
       before do
Also applies to: 30-31

54-55: Fix formatting: Remove extra empty lines.

Static analysis identified unnecessary empty lines at block endings.
     end
-
   end
-
 end
Also applies to: 57-58
lib/topic_retriever.rb (1)
48-53: Consider adding error handling for user lookup.

While the current implementation handles the case where the user is blank, it might be helpful to log when the configured embed user is not found.

Add logging for debugging:
 def fetch_http
   user = User.where(username_lower: SiteSetting.embed_by_username.downcase).first
-  return if user.blank?
+  if user.blank?
+    Rails.logger.warn "Embed user '#{SiteSetting.embed_by_username}' not found"
+    return
+  end

   TopicEmbed.import_remote(user, @embed_url)
 end
app/models/topic_embed.rb (1)
56-76: Consider refactoring to reduce complexity.

The method has high complexity metrics (ABC: 26.65/23, Cyclomatic: 9/7) but the logic is functionally correct. The complexity stems from handling multiple tag types and URL manipulations.

Consider extracting helper methods to improve readability:
def self.absolutize_urls(url, contents)
  uri = URI(url)
  base_url = build_base_url(uri)
  
  fragment = Nokogiri::HTML.fragment(contents)
  absolutize_links(fragment, base_url)
  absolutize_images(fragment, base_url)
  
  fragment.to_html
end

private

def self.build_base_url(uri)
  prefix = "#{uri.scheme}://#{uri.host}"
  prefix << ":#{uri.port}" unless [80, 443].include?(uri.port)
  prefix
end

def self.absolutize_links(fragment, base_url)
  fragment.css('a').each do |a|
    href = a['href']
    a['href'] = absolutize_path(base_url, href) if relative_path?(href)
  end
end

def self.absolutize_images(fragment, base_url)
  fragment.css('img').each do |img|
    src = img['src']
    img['src'] = absolutize_path(base_url, src) if relative_path?(src)
  end
end

def self.relative_path?(path)
  path.present? && path.start_with?('/')
end

def self.absolutize_path(base_url, path)
  "#{base_url}/#{path.sub(/^\/+/, '')}"
end

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 62db063 and 4f8aed2.

⛔ Files ignored due to path filters (1)

Gemfile_rails4.lock is excluded by !**/*.lock

📒 Files selected for processing (27)

Gemfile (1 hunks)
app/assets/javascripts/embed.js (1 hunks)
app/assets/stylesheets/embed.css.scss (1 hunks)
app/controllers/embed_controller.rb (1 hunks)
app/jobs/regular/retrieve_topic.rb (1 hunks)
app/jobs/scheduled/poll_feed.rb (1 hunks)
app/models/post.rb (2 hunks)
app/models/topic_embed.rb (1 hunks)
app/views/embed/best.html.erb (1 hunks)
app/views/embed/loading.html.erb (1 hunks)
app/views/layouts/embed.html.erb (1 hunks)
config/locales/client.en.yml (1 hunks)
config/locales/server.en.yml (2 hunks)
config/routes.rb (1 hunks)
config/site_settings.yml (1 hunks)
db/migrate/20131210181901_migrate_word_counts.rb (2 hunks)
db/migrate/20131217174004_create_topic_embeds.rb (1 hunks)
db/migrate/20131219203905_add_cook_method_to_posts.rb (1 hunks)
db/migrate/20131223171005_create_top_topics.rb (1 hunks)
lib/post_creator.rb (1 hunks)
lib/post_revisor.rb (1 hunks)
lib/tasks/disqus.thor (1 hunks)
lib/topic_retriever.rb (1 hunks)
spec/components/topic_retriever_spec.rb (1 hunks)
spec/controllers/embed_controller_spec.rb (1 hunks)
spec/jobs/poll_feed_spec.rb (1 hunks)
spec/models/topic_embed_spec.rb (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

app/jobs/regular/retrieve_topic.rb (1)

lib/topic_retriever.rb (1)

retrieve (8-10)

app/controllers/embed_controller.rb (2)

app/models/topic_embed.rb (1)

topic_id_for_embed (78-80)

app/controllers/application_controller.rb (1)

discourse_expires_in (168-171)

🪛 RuboCop (1.76.1)

app/jobs/regular/retrieve_topic.rb

[convention] 5-6: Extra empty line detected at module body beginning.

(Layout/EmptyLinesAroundModuleBody)

[convention] 21-22: Extra empty line detected at module body end.

(Layout/EmptyLinesAroundModuleBody)

spec/models/topic_embed_spec.rb

[convention] 4-5: Extra empty line detected at block body beginning.