feat(seo): add bot detection for dynamic og:image tags by MarkusNeusinger · Pull Request #3171 · MarkusNeusinger/anyplot

MarkusNeusinger · 2026-01-05T22:53:02Z

Summary

Add nginx bot detection for social media crawlers (Twitter, Facebook, LinkedIn, Slack, Telegram, WhatsApp, Google, Bing, Discord, Pinterest, Apple)
Add SEO proxy endpoints that return HTML with correct og:tags for bots
Dynamic og:image from database preview_url for implementation pages

Problem

CSR (Client-Side Rendered) React app sets meta tags via React Helmet after JavaScript execution. Social media bots don't execute JavaScript → all pages show the default og-image.png instead of dynamic plot previews.

Solution

Bot Request → nginx (User-Agent check)
                  │
                  ├── Normal Browser → index.html (SPA) → 0 performance impact
                  │
                  └── Bot → proxy to /seo-proxy/... → HTML with correct og:tags

Endpoints

URL	og:image
`/seo-proxy/`	default
`/seo-proxy/catalog`	default
`/seo-proxy/{spec_id}`	default
`/seo-proxy/{spec_id}/{library}`	`preview_url` from DB

Test plan

Unit tests (9 tests added)
Manual test: curl -A "Twitterbot" https://pyplots.ai/scatter-basic/matplotlib
Validate with Facebook Sharing Debugger
Validate with Twitter Card Validator

🤖 Generated with Claude Code

- Add nginx bot detection map for social media crawlers (Twitter, Facebook, LinkedIn, Slack, Telegram, WhatsApp, Google, Bing, Discord, Pinterest, Apple) - Add SEO proxy endpoints for bot-optimized HTML with og:tags: - /seo-proxy/ - home page - /seo-proxy/catalog - catalog page - /seo-proxy/{spec_id} - spec overview (default og:image) - /seo-proxy/{spec_id}/{library} - implementation (dynamic preview_url) - Use error_page 418 trick for safe nginx conditional proxying - Add comprehensive unit tests for all SEO proxy endpoints This ensures social media bots (which don't execute JavaScript) receive proper meta tags with dynamic og:image URLs instead of the static default og-image.png. 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

+            BOT_HTML_TEMPLATE.format(
+                title=f"{spec_id} | pyplots.ai",
+                description=DEFAULT_DESCRIPTION,
+                image=DEFAULT_IMAGE,
+                url=f"https://pyplots.ai/{html.escape(spec_id)}",
+            )


In general, to fix reflected server-side XSS, every user-controlled value inserted into an HTML document must be properly escaped for the context in which it appears (HTML body, attribute, URL, etc.). In this file, all uses of spec_id in HTML contexts should be consistently passed through html.escape, just as is already done for url in this same branch and for title/description when the DB is available.

The single best minimal fix is to escape spec_id when it is interpolated into the title for the DB-unavailable fallback in seo_spec_overview. Specifically, change line 124 from title=f"{spec_id} | pyplots.ai", to title=f"{html.escape(spec_id)} | pyplots.ai",. This mirrors the escaping already used for the url field in the same response and for the title field later in the function when spec is loaded from the database. No new imports are needed because html is already imported at the top of api/routers/seo.py. No other behavioral changes are introduced; only the unsafe direct inclusion of the raw path parameter into HTML is corrected.

+            BOT_HTML_TEMPLATE.format(
+                title=f"{spec_id} - {library} | pyplots.ai",
+                description=DEFAULT_DESCRIPTION,
+                image=DEFAULT_IMAGE,
+                url=f"https://pyplots.ai/{html.escape(spec_id)}/{html.escape(library)}",
+            )


In general, to fix reflected server-side XSS in this endpoint, all user-controlled values (spec_id, library) must be HTML-escaped before being interpolated into BOT_HTML_TEMPLATE, not only when used in URLs but also when used in text nodes like the <title> element. The Python standard library’s html.escape() is already imported and used for some fields; we should extend its use to every occurrence where raw user input is inserted into the template.

Concretely, in seo_spec_implementation’s DB-unavailable fallback (lines ~155–161), the title field currently embeds spec_id and library without escaping. We should wrap these in html.escape() like is already done for the url field. This preserves existing functionality (the same values are displayed) but ensures any <, >, &, quotes, etc. are encoded and cannot break out of the HTML context. No new imports or helpers are needed; we only adjust the f-string expressions in that block. The rest of the function already escapes user-derived values where necessary.

codecov · 2026-01-05T22:56:01Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copilot

Pull request overview

This PR adds bot detection and SEO proxy endpoints to serve dynamic og:image tags for social media crawlers. The solution uses nginx User-Agent detection to route bot traffic to backend endpoints that return pre-rendered HTML with correct meta tags, while regular browsers continue to receive the client-side rendered React app with zero performance impact.

Key Changes

nginx bot detection for 11 social media crawlers (Twitter, Facebook, LinkedIn, Slack, Telegram, WhatsApp, Google, Bing, Discord, Pinterest, Apple)
Four new SEO proxy endpoints that return HTML with dynamic og:tags based on database content
Dynamic og:image from database preview_url for implementation pages

Reviewed changes

Copilot reviewed 3 out of 4 changed files in this pull request and generated 3 comments.

File	Description
app/nginx.conf	Adds bot detection map and proxy routing to redirect bot requests to SEO endpoints while preserving SPA routing for normal browsers
api/routers/seo.py	Adds four new SEO proxy endpoints (home, catalog, spec overview, spec implementation) with HTML template and dynamic meta tag generation
tests/unit/api/test_routers.py	Adds comprehensive test suite (9 tests) covering all SEO proxy endpoints with and without database, including fallback scenarios

Copilot · 2026-01-05T22:57:38Z

+
+    return HTMLResponse(
+        BOT_HTML_TEMPLATE.format(
+            title=f"{html.escape(spec.title)} - {html.escape(library)} | pyplots.ai",
+            description=html.escape(spec.description or DEFAULT_DESCRIPTION),
+            image=image,


The preview_url from the database is inserted directly into the HTML template without HTML escaping. This could lead to XSS vulnerabilities if the preview_url contains malicious content. The image variable should be HTML-escaped before being used in the template, similar to how spec.title and spec.description are escaped.

Suggested change

return HTMLResponse(

BOT_HTML_TEMPLATE.format(

title=f"{html.escape(spec.title)} - {html.escape(library)} | pyplots.ai",

description=html.escape(spec.description or DEFAULT_DESCRIPTION),

image=image,

escaped_image = html.escape(image, quote=True)

return HTMLResponse(

BOT_HTML_TEMPLATE.format(

title=f"{html.escape(spec.title)} - {html.escape(library)} | pyplots.ai",

description=html.escape(spec.description or DEFAULT_DESCRIPTION),

image=escaped_image,

Copilot · 2026-01-05T22:57:38Z

+        # Fallback when DB unavailable
+        return HTMLResponse(
+            BOT_HTML_TEMPLATE.format(
+                title=f"{spec_id} - {library} | pyplots.ai",


The spec_id and library parameters are not HTML-escaped in the fallback URL construction when the database is unavailable. While they are path parameters and less likely to contain malicious content, they should still be escaped for consistency and defense-in-depth, similar to how they are escaped in lines 127, 141, 160, and 178.

Suggested change

title=f"{spec_id} - {library} | pyplots.ai",

title=f"{html.escape(spec_id)} - {html.escape(library)} | pyplots.ai",

Copilot · 2026-01-05T22:57:39Z

+    # Named location for bot SEO proxy
+    location @seo_proxy {
+        proxy_pass https://api.pyplots.ai/seo-proxy$request_uri;
+        proxy_set_header Host api.pyplots.ai;
+        proxy_ssl_server_name on;
+    }


In location @seo_proxy you are proxying to https://api.pyplots.ai without enabling TLS certificate verification, and by default nginx does not verify upstream certificates for proxy_pass over HTTPS. An attacker who can influence DNS or the network path between this frontend and api.pyplots.ai could perform a man-in-the-middle attack to tamper with or replace the SEO HTML returned to bots. Enable strict certificate validation for this upstream (for example by turning on proxy_ssl_verify and configuring trusted CAs, ideally at the http or server level so all HTTPS proxies inherit it) to ensure the backend’s identity is authenticated.

- Escape spec_id and library in fallback title (XSS prevention) - Escape preview_url with quote=True before inserting in HTML template - Enable proxy_ssl_verify for backend proxy to prevent MITM attacks - Add trusted CA certificate path for TLS verification Addresses Copilot and GitHub Advanced Security findings. 🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.

Copilot · 2026-01-05T23:04:10Z

+        # Fallback when DB unavailable
+        return HTMLResponse(
+            BOT_HTML_TEMPLATE.format(
+                title=f"{spec_id} | pyplots.ai",


The spec_id in the title is not HTML-escaped, which could lead to XSS if a malicious spec_id is provided. All user-controlled inputs should be escaped before insertion into HTML. The url parameter already escapes it correctly, but the title needs the same treatment.

Suggested change

title=f"{spec_id} | pyplots.ai",

title=f"{html.escape(spec_id)} | pyplots.ai",

Copilot · 2026-01-05T23:04:10Z

+        # Fallback when DB unavailable
+        return HTMLResponse(
+            BOT_HTML_TEMPLATE.format(
+                title=f"{html.escape(spec_id)} - {html.escape(library)} | pyplots.ai",


The library parameter in the title is not HTML-escaped, which could lead to XSS if a malicious library value is provided. All user-controlled inputs should be escaped before insertion into HTML.

Copilot AI review requested due to automatic review settings January 5, 2026 22:53

Copilot started reviewing on behalf of MarkusNeusinger January 5, 2026 22:53 View session

github-advanced-security AI found potential problems Jan 5, 2026

View reviewed changes

new og-image

e82dca0

Copilot AI reviewed Jan 5, 2026

View reviewed changes

Copilot AI review requested due to automatic review settings January 5, 2026 23:01

style: fix ruff formatting

594001d

🤖 Generated with [Claude Code](https://claude.ai/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Copilot started reviewing on behalf of MarkusNeusinger January 5, 2026 23:01 View session

MarkusNeusinger merged commit f18c867 into main Jan 5, 2026
8 of 9 checks passed

MarkusNeusinger deleted the feat/seo-og-image-bot-detection branch January 5, 2026 23:01

Copilot AI reviewed Jan 5, 2026

View reviewed changes

MarkusNeusinger mentioned this pull request Jan 5, 2026

feat(seo): enhance og:image with branding and spec overview collage #3172

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(seo): add bot detection for dynamic og:image tags#3171

feat(seo): add bot detection for dynamic og:image tags#3171
MarkusNeusinger merged 4 commits intomainfrom
feat/seo-og-image-bot-detection

MarkusNeusinger commented Jan 5, 2026

Uh oh!

Check warning

Copilot Autofix

Check warning

Copilot Autofix

codecov bot commented Jan 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Copilot AI Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	title=f"{spec_id} - {library} \| pyplots.ai",
	title=f"{html.escape(spec_id)} - {html.escape(library)} \| pyplots.ai",

	title=f"{spec_id} \| pyplots.ai",
	title=f"{html.escape(spec_id)} \| pyplots.ai",

Conversation

MarkusNeusinger commented Jan 5, 2026

Summary

Problem

Solution

Endpoints

Test plan

Uh oh!

Check warning

Uh oh!

Copilot Autofix

Check warning

Copilot Autofix

codecov bot commented Jan 5, 2026

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants