feat: block /debug route from search engine indexing#3815
feat: block /debug route from search engine indexing#3815MarkusNeusinger merged 5 commits intomainfrom
Conversation
Adds Disallow: /debug to robots.txt to prevent search engines from crawling and indexing the internal debug dashboard. This follows SEO best practices for internal admin/debug tools.
Adds GET /robots.txt endpoint that blocks all crawlers with Disallow: /. APIs should not be indexed by search engines. Changes: - Add robots.txt endpoint in api/routers/seo.py - Add comprehensive tests (unit, integration, e2e) - Update docs/reference/seo.md with robots.txt documentation - Social media bots remain unaffected (they fetch og:images directly) Follows best practices for API SEO management.
There was a problem hiding this comment.
Pull request overview
This PR adds robots.txt functionality to both the frontend and backend to control search engine crawling. The frontend blocks only the /debug route while the backend API blocks all routes from search engine indexing (following best practices for APIs).
Changes:
- Added
Disallow: /debugto frontend robots.txt to prevent search engine indexing of debug dashboard - Added backend API endpoint
GET /robots.txtthat blocks all API routes from search engines - Added comprehensive test coverage (unit, integration, and e2e tests)
- Updated SEO documentation with detailed robots.txt configuration for both frontend and backend
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
app/public/robots.txt |
Added Disallow: /debug to block debug route from indexing while allowing all other routes |
api/routers/seo.py |
Added new GET /robots.txt endpoint that blocks all API routes (Disallow: /) |
docs/reference/seo.md |
Added comprehensive documentation section for robots.txt configuration, explaining frontend vs backend differences |
tests/unit/api/test_routers.py |
Added unit test for backend robots.txt endpoint |
tests/integration/api/test_api_endpoints.py |
Added integration test for backend robots.txt endpoint |
tests/e2e/test_api_postgres.py |
Added e2e test for backend robots.txt endpoint with real PostgreSQL |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| """Tests for SEO router.""" | ||
|
|
||
| def test_robots_txt(self, client: TestClient) -> None: | ||
| """robots.txt should block crawlers from all routes.""" |
There was a problem hiding this comment.
The test docstring claims "robots.txt should block crawlers from all routes" but this test is only validating the backend API endpoint behavior (which does block all routes with "Disallow: /"). However, the PR title and description state the purpose is to "block /debug route from search engine indexing", which refers to the frontend robots.txt file that only blocks /debug.
This test doesn't validate the frontend robots.txt file at all (which is a static file at app/public/robots.txt). Consider either:
- Updating the docstring to clarify this tests the backend API robots.txt endpoint specifically, or
- Adding a separate test for the frontend robots.txt file if that's feasible in your test setup.
| """robots.txt should block crawlers from all routes.""" | |
| """Backend /robots.txt endpoint should block crawlers from all routes.""" |
Adds Disallow: /debug to robots.txt to prevent search engines
from crawling and indexing the internal debug dashboard.
This follows SEO best practices for internal admin/debug tools.