-
-
Notifications
You must be signed in to change notification settings - Fork 0
[#96] Fixed robots.txt content for PROD.
#98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
📝 WalkthroughWalkthroughRemoved environment-specific robots.txt configs, replaced default robots.txt with an expanded policy, cleared config_split partials referencing Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant DrupalKernel as Drupal Kernel
participant SettingsInclude as settings.robotstxt.php
participant Config as Config System
Client->>DrupalKernel: GET /robots.txt
DrupalKernel->>SettingsInclude: load settings include(s)
alt non-production (environment != PROD)
SettingsInclude->>Config: set robots_txt.settings.content = "User-agent: *\r\nDisallow:"
note right of SettingsInclude: override applied (non-production)
else production (environment = PROD)
SettingsInclude-->>Config: no override applied
note right of SettingsInclude: expanded default config used
end
DrupalKernel->>Config: read robots_txt.settings.content
Config-->>DrupalKernel: robots.txt content
DrupalKernel-->>Client: 200 OK with robots.txt
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (8)
💤 Files with no reviewable changes (2)
🧰 Additional context used📓 Path-based instructions (1)tests/behat/**/*.feature📄 CodeRabbit Inference Engine (CLAUDE.md)
Files:
🧠 Learnings (2)📚 Learning: 2025-08-06T02:21:34.266ZApplied to files:
📚 Learning: 2025-08-06T02:21:34.266ZApplied to files:
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite
📒 Files selected for processing (8)
config/default/config_split.config_split.dev.yml(1 hunks)config/default/config_split.config_split.test.yml(1 hunks)config/default/robotstxt.settings.yml(1 hunks)config/dev/robotstxt.settings.yml(0 hunks)config/stage/robotstxt.settings.yml(0 hunks)tests/behat/features/robotstxt.feature(1 hunks)tests/phpunit/Drupal/EnvironmentSettingsTest.php(8 hunks)web/sites/default/includes/modules/settings.robotstxt.php(1 hunks)
💤 Files with no reviewable changes (2)
- config/dev/robotstxt.settings.yml
- config/stage/robotstxt.settings.yml
🧰 Additional context used
📓 Path-based instructions (1)
tests/behat/**/*.feature
📄 CodeRabbit Inference Engine (CLAUDE.md)
tests/behat/**/*.feature: All Behat feature files must use the standard user story format: 'As a [user type]\nI want to [action]\nSo that [benefit]'
Use consistent user types in Behat feature files: 'site visitor', 'site administrator', and 'content editor'
Do not use commas or periods in user story statements within Behat feature files
Files:
tests/behat/features/robotstxt.feature
🔇 Additional comments (2)
web/sites/default/includes/modules/settings.robotstxt.php (2)
1-8: Header and strict_types placement look goodThe file header is clear and declare(strict_types=1) is correctly positioned.
1-12: Confirm loading order and environment initializationBased on your script results:
- ENVIRONMENT_PROD is defined in
web/sites/default/settings.php:131.- No explicit include of
settings.robotstxt.phpwas found—verify how/when this file is pulled in.- No assignment to
$settings['environment']was detected underweb/sites—ensure it’s populated before this snippet runs.- No other overrides of
robots_txt.settings.contentwere found.Please double-check that:
settings.robotstxt.phpis included after both the environment constant and$settings['environment']are initialized.- No downstream configuration rewrites
robots_txt.settings.contentin non-prod environments.
| complete_list: { } | ||
| partial_list: | ||
| - robotstxt.settings | ||
| partial_list: { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Use a list for partial_list to avoid schema/type mismatches.
Config Split expects partial_list to be a sequence (list). Representing it as an empty mapping { } risks config import/schema validation issues. Prefer [].
Apply this diff:
-partial_list: { }
+partial_list: [ ]Run this to confirm consistency across the repo and spot any other occurrences using { }:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Occurrences of partial_list across config_split configs:"
rg -n -C1 -g 'config/**/config_split.config_split.*.yml' -P '^\s*partial_list:\s*(\{\s*\}|\[\s*\])'Length of output: 843
Standardize partial_list as an empty list in all Config Split files
Config Split requires partial_list to be a sequence, not a mapping. Defining it as { } may break import/schema validation. Please update every occurrence:
- config/default/config_split.config_split.dev.yml (line 16)
- config/default/config_split.config_split.test.yml (line 16)
- config/default/config_split.config_split.local.yml (line 18)
- config/default/config_split.config_split.ci.yml (line 18)
Apply this diff in each file:
-partial_list: { }
+partial_list: [](Optional) Re-run to confirm no remaining mappings:
rg -n -g 'config/**/config_split.config_split.*.yml' -P '^\s*partial_list:\s*{'🤖 Prompt for AI Agents
In config/default/config_split.config_split.dev.yml around line 16 (and
similarly in config/default/config_split.config_split.test.yml line 16,
config/default/config_split.config_split.local.yml line 18, and
config/default/config_split.config_split.ci.yml line 18), the key partial_list
is defined as a mapping ({ }) but Config Split requires a sequence; change each
occurrence from an empty mapping to an empty list (replace "{ }" with "[]") in
those files and then re-run the provided ripgrep check to confirm no remaining
mappings.
| complete_list: { } | ||
| partial_list: | ||
| - robotstxt.settings | ||
| partial_list: { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
partial_list should be an empty list, not an empty mapping.
Same as in the dev split, prefer [] for an empty sequence to match Config Split’s expected type.
Apply this diff:
-partial_list: { }
+partial_list: [ ]Run this to confirm consistency across the repo:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n -C1 -g 'config/**/config_split.config_split.*.yml' -P '^\s*partial_list:\s*(\{\s*\}|\[\s*\])'Length of output: 720
Convert all partial_list mappings to empty sequences
We found 4 occurrences of partial_list: { }—it should be partial_list: [] to match Config Split’s expected sequence type:
- config/default/config_split.config_split.test.yml:16
- config/default/config_split.config_split.dev.yml:16
- config/default/config_split.config_split.local.yml:18
- config/default/config_split.config_split.ci.yml:18
Apply this patch in each file:
-partial_list: { }
+partial_list: []Re-run to verify no mappings remain:
#!/usr/bin/env bash
set -euo pipefail
rg -n -C1 -g 'config/**/config_split.config_split.*.yml' -P '^\s*partial_list:\s*(\{\s*\}|\[\s*\])'🤖 Prompt for AI Agents
In config/default/config_split.config_split.test.yml around line 16 (and the
other occurrences in config/default/config_split.config_split.dev.yml:16,
config/default/config_split.config_split.local.yml:18,
config/default/config_split.config_split.ci.yml:18), replace the mapping form
"partial_list: { }" with an empty sequence "partial_list: []" so the YAML type
matches Config Split’s expected sequence; update each file accordingly and
re-run the provided ripgrep verification to ensure no mappings remain.
| _core: | ||
| default_config_hash: ceCx5XZ_ay1Mxcv-sB95U_fBKoVkpvo8RaQiwutSZLI | ||
| content: "User-agent: *\r\nDisallow:" | ||
| content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.css?\r\nAllow: /core/*.js$\r\nAllow: /core/*.js?\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.css?\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.js?\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Config name mismatch: robotstxt.settings vs robots_txt.settings will break PROD intent.
Tests and settings overrides use robots_txt.settings, but this file is robotstxt.settings.yml. In PROD, the intended content will not be applied if the config entity name doesn’t match. Standardize on robots_txt.settings.
Actions:
- Rename the file to
config/default/robots_txt.settings.yml(config name is derived from filename). - Ensure any code or config splits referencing
robotstxt.settingsare updated torobots_txt.settings.
Run this to verify current usage and spot mismatches:
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Config key usage:"
rg -n -C2 -P '\brobots?_txt\.settings\b|robotstxt\.settings\b'
echo
echo "Config files named for robots settings:"
fd -uu -t f -g 'config/**/robots*_*.settings.yml' || true
fd -uu -t f -g 'config/**/robotstxt.settings.yml' || trueLength of output: 5563
Fix config filename mismatch: rename to robots_txt.settings.yml
Tests and runtime code reference the config key robots_txt.settings, but the default file is named robotstxt.settings.yml. In production this means your robots.txt content will never be loaded.
• Rename
config/default/robotstxt.settings.yml→config/default/robots_txt.settings.yml
• Update any environment-specific overrides or config splits (e.g.config/production/robotstxt.settings.yml) to userobots_txt.settings.yml
• No code changes required—the PHP and PHPUnit code already usesrobots_txt.settings
🤖 Prompt for AI Agents
In config/default/robotstxt.settings.yml around line 3, the filename is wrong
(robotstxt.settings.yml) so the application expects robots_txt.settings but
never loads this file; rename config/default/robotstxt.settings.yml to
config/default/robots_txt.settings.yml and update any environment-specific or
split config files (e.g. config/production/robotstxt.settings.yml) to the new
name so the existing PHP/PHPUnit config key robots_txt.settings resolves
correctly; no code changes required.
🧹 Nitpick (assertive)
Remove redundant Allow entries with '?' suffix.
Robots.txt supports * and $ (Google extension). ? is not a wildcard and will be treated as a literal, making lines like Allow: /*.css? ineffective and redundant when /*.css$ already exists.
Apply this replacement for the content string (removing the four ?-suffixed lines only):
-content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.css?\r\nAllow: /core/*.js$\r\nAllow: /core/*.js?\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.css?\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.js?\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n"
+content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.js$\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.css?\r\nAllow: /core/*.js$\r\nAllow: /core/*.js?\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.css?\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.js?\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n" | |
| content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.js$\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n" |
🤖 Prompt for AI Agents
In config/default/robotstxt.settings.yml around line 3, the robots content
includes redundant Allow entries that end with a literal '?' (e.g. Allow:
/core/*.css? and similar) which are ineffective and duplicate the corresponding
regex-anchored rules (/*.css$ and /*.js$); remove the four ?-suffixed Allow
lines (the Allow lines for /core/*.css?, /core/*.js?, /profiles/*.css?,
/profiles/*.js?) from the content string so only the valid $-anchored Allow
rules remain.
|
|
||
| @api | ||
| Scenario: Verify robots.txt exists and contains appropriate content in production | ||
| Scenario: Verify robots.txt exists and contains appropriate content in non-production |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Scenario rename aligns with non-production behavior — update user story to meet guidelines.
Per coding guidelines, use the standard user story format and allowed user types, and avoid punctuation.
Update the header lines as shown:
As a site visitor
I want to access the robots txt file
So that search engines crawl and index the site correctly
🤖 Prompt for AI Agents
In tests/behat/features/robotstxt.feature around line 9, the scenario header
should follow the standard user story format with an allowed user type and no
punctuation; replace the current header lines with three lines: "As a site
visitor", "I want to access the robots txt file", and "So that search engines
crawl and index the site correctly" ensuring exact wording and no punctuation,
one line per statement at the top of the feature file.
| $config['environment_indicator.indicator']['name'] = static::ENVIRONMENT_SUT; | ||
| $config['environment_indicator.settings']['favicon'] = TRUE; | ||
| $config['environment_indicator.settings']['toolbar_integration'] = [TRUE]; | ||
| $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Non-PROD robots snippet assertions look correct — verify config name consistency across the codebase.
The tests assert robots_txt.settings.content = "User-agent: *\r\nDisallow:" across non-prod environments, which aligns with the new settings include behavior. However, the default config file is named config/default/robotstxt.settings.yml (without underscore). This mismatch would prevent PROD from receiving the intended content.
Actions:
- Rename
config/default/robotstxt.settings.ymltoconfig/default/robots_txt.settings.yml(or adjust the settings include/tests to userobotstxt.settings, but standardizing onrobots_txt.settingsis recommended). - Confirm there are no remaining references to
robotstxt.settings.
Script to verify:
If you want, I can provide a patchset that renames the file and updates any references.
Also applies to: 330-330, 377-377, 423-423, 472-472, 521-521, 572-572, 623-623
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Searching for config key usages:"
rg -n -C2 -P '\brobots?_txt\.settings\b|robotstxt\.settings\b'
echo
echo "Checking presence of config files:"
fd -uu -t f 'robots_txt.settings.yml' config/ || true
fd -uu -t f 'robotstxt.settings.yml' config/ || trueLength of output: 5608
Ensure config filename matches the robots_txt.settings key
The code and tests reference $config['robots_txt.settings'], but the default config file is named robotstxt.settings.yml. This mismatch means PROD will never load your robots.txt settings.
Actions required:
- Rename
config/default/robotstxt.settings.yml→config/default/robots_txt.settings.yml - Update any references to
robotstxt.settings(tests, docs, includes) to userobots_txt.settings - Confirm there are no remaining
robotstxt.settingsusages via a grep/ripgrep scan
🤖 Prompt for AI Agents
In tests/phpunit/Drupal/EnvironmentSettingsTest.php around line 260, the test
sets $config['robots_txt.settings'] but the repository's default config file is
named robotstxt.settings.yml causing a mismatch; rename
config/default/robotstxt.settings.yml to config/default/robots_txt.settings.yml,
update all references (tests, docs, includes, variable names) from
robotstxt.settings to robots_txt.settings, and run a project-wide grep/ripgrep
to confirm no remaining robotstxt.settings usages.
|
|
||
| declare(strict_types=1); | ||
|
|
||
| if ($settings['environment'] !== ENVIRONMENT_PROD) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Harden against undefined constant and missing setting
If ENVIRONMENT_PROD isn’t defined before this include, PHP 8 will error. Also, if $settings['environment'] is not set, you’ll get a notice. Guard both for robustness.
Apply this diff:
-if ($settings['environment'] !== ENVIRONMENT_PROD) {
+if (!defined('ENVIRONMENT_PROD') || (($settings['environment'] ?? null) !== ENVIRONMENT_PROD)) {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if ($settings['environment'] !== ENVIRONMENT_PROD) { | |
| if (!defined('ENVIRONMENT_PROD') || (($settings['environment'] ?? null) !== ENVIRONMENT_PROD)) { |
🤖 Prompt for AI Agents
In web/sites/default/includes/modules/settings.robotstxt.php around line 10, the
conditional uses ENVIRONMENT_PROD and $settings['environment'] directly which
can raise an undefined constant or undefined index notice; update the condition
to first check defined('ENVIRONMENT_PROD') and isset($settings['environment'])
(or array_key_exists) before comparing, e.g. only evaluate the comparison when
both exist, and treat missing values as non-production by default so the block
runs safely when either is absent.
| if ($settings['environment'] !== ENVIRONMENT_PROD) { | ||
| $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:"; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Non-prod robots.txt currently allows crawling; confirm intent or change to disallow all
In robots.txt, “Disallow:” with an empty value means “allow everything.” For non-production, best practice is to block indexing to avoid leaking staging/dev content into search engines. If the current behavior is intentional, ignore this. Otherwise, switch to “Disallow: /”.
Apply this diff to disallow all in non-production:
- $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";
+ $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow: /";🏁 Script executed:
#!/bin/bash
set -e
# Locate the robots.txt settings file
echo "Searching for settings.robotstxt.php..."
fd -HI "settings.robotstxt.php"
# Search for references to robots_txt.settings in the repo
echo -e "\nSearching for robots_txt.settings assignments..."
rg -n --type=php "robots_txt\.settings"
# Show the first 50 lines of the file for context
echo -e "\nDisplaying the first 50 lines of settings.robotstxt.php..."
sed -n '1,50p' web/sites/default/includes/modules/settings.robotstxt.phpLength of output: 1939
Update non-prod robots.txt and corresponding tests to block all crawling
The current implementation and existing PHPUnit tests expect:
$config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";Empty “Disallow:” allows all bots. To enforce “no crawling” on non-prod, change it to “Disallow: /” and update every test asserting the old value:
– File: web/sites/default/includes/modules/settings.robotstxt.php
-if ($settings['environment'] !== ENVIRONMENT_PROD) {
- $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";
-}
+if ($settings['environment'] !== ENVIRONMENT_PROD) {
+ $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow: /";
+}– Tests to update:
tests/phpunit/Drupal/EnvironmentSettingsTest.php
At lines approx. 260, 330, 376, 423, 472, 521, 572, and 623, replace:
"User-agent: *\r\nDisallow:";with
"User-agent: *\r\nDisallow: /";This ensures both behavior and tests remain in sync.
🤖 Prompt for AI Agents
In web/sites/default/includes/modules/settings.robotstxt.php around lines 10–12
the non-production robots.txt sets "Disallow:" which allows crawling; change the
value to "User-agent: *\r\nDisallow: /" to block all crawling, and update the
PHPUnit assertions in tests/phpunit/Drupal/EnvironmentSettingsTest.php at the
specified approximate lines (260, 330, 376, 423, 472, 521, 572, 623) to expect
"User-agent: *\r\nDisallow: /" instead of the old string so tests match the new
behavior.
| declare(strict_types=1); | ||
|
|
||
| if ($settings['environment'] !== ENVIRONMENT_PROD) { | ||
| $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹 Nitpick (assertive)
Line endings in content: verify CRLF requirement
You’re using CRLF “\r\n”. If the default config or downstream consumers normalize to “\n”, this may create unexpected diffs or mismatches across environments. If CRLF is deliberate (e.g., to match other fixtures/tests), keep it; otherwise consider “\n”.
🤖 Prompt for AI Agents
In web/sites/default/includes/modules/settings.robotstxt.php around line 11, the
robots_txt.settings content uses CRLF ("\r\n"); confirm whether CRLF is required
by downstream consumers or tests—if not, replace "\r\n" with "\n" to avoid
cross-environment diffs, otherwise keep "\r\n" and add a short code comment
explaining the deliberate CRLF choice so future reviewers know it is
intentional.
1a180ad to
5f2560a
Compare
5f2560a to
d9544af
Compare
closes #96
Summary by CodeRabbit
New Features
Improvements
Tests
Chores