Skip to content

Conversation

@AlexSkrypnyk
Copy link
Member

@AlexSkrypnyk AlexSkrypnyk commented Aug 19, 2025

closes #96

Summary by CodeRabbit

  • New Features

    • Full, detailed robots.txt with explicit allow/disallow rules for assets, directories, and common paths.
  • Improvements

    • Non-production environments now serve a restrictive robots.txt to prevent indexing; production retains the full rules.
  • Tests

    • Expanded coverage for robots.txt across environments and renamed a scenario for clarity.
  • Chores

    • Cleaned up environment-specific config splits and removed redundant overrides.

@coderabbitai
Copy link

coderabbitai bot commented Aug 19, 2025

📝 Walkthrough

Walkthrough

Removed environment-specific robots.txt configs, replaced default robots.txt with an expanded policy, cleared config_split partials referencing robotstxt.settings, added a settings include to set a minimal disallowing robots.txt in non-production, and updated Behat and PHPUnit tests accordingly.

Changes

Cohort / File(s) Summary
Config split updates
config/default/config_split.config_split.dev.yml, config/default/config_split.config_split.test.yml
Replaced partial_list (previously containing robotstxt.settings) with an empty mapping (partial_list: { }), removing the partial references.
Default robots.txt content
config/default/robotstxt.settings.yml
Replaced a minimal robots.txt ("User-agent: *\r\nDisallow:") with a comprehensive robots.txt block including Allow rules for assets and many Disallow rules for dirs, files, and path variants.
Env-specific robots.txt removals
config/dev/robotstxt.settings.yml, config/stage/robotstxt.settings.yml
Removed _core.default_config_hash and content entries (the environment-specific robots.txt definitions), leaving those environment config files empty.
Settings include (new)
web/sites/default/includes/modules/settings.robotstxt.php
Added a strict-typed PHP include that sets $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:" when $settings['environment'] !== ENVIRONMENT_PROD.
Behat test scenario
tests/behat/features/robotstxt.feature
Renamed scenario to reference non-production; final assertion now expects Disallow: / in non-production robots.txt.
PHPUnit test updates
tests/phpunit/Drupal/EnvironmentSettingsTest.php
Added robots_txt.settings.content snippet ("User-agent: *\r\nDisallow:") to multiple environment test configurations used by the tests.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant DrupalKernel as Drupal Kernel
  participant SettingsInclude as settings.robotstxt.php
  participant Config as Config System

  Client->>DrupalKernel: GET /robots.txt
  DrupalKernel->>SettingsInclude: load settings include(s)
  alt non-production (environment != PROD)
    SettingsInclude->>Config: set robots_txt.settings.content = "User-agent: *\r\nDisallow:"
    note right of SettingsInclude: override applied (non-production)
  else production (environment = PROD)
    SettingsInclude-->>Config: no override applied
    note right of SettingsInclude: expanded default config used
  end
  DrupalKernel->>Config: read robots_txt.settings.content
  Config-->>DrupalKernel: robots.txt content
  DrupalKernel-->>Client: 200 OK with robots.txt
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 5f2560a and d9544af.

📒 Files selected for processing (8)
  • config/default/config_split.config_split.dev.yml (1 hunks)
  • config/default/config_split.config_split.test.yml (1 hunks)
  • config/default/robotstxt.settings.yml (1 hunks)
  • config/dev/robotstxt.settings.yml (0 hunks)
  • config/stage/robotstxt.settings.yml (0 hunks)
  • tests/behat/features/robotstxt.feature (1 hunks)
  • tests/phpunit/Drupal/EnvironmentSettingsTest.php (8 hunks)
  • web/sites/default/includes/modules/settings.robotstxt.php (1 hunks)
💤 Files with no reviewable changes (2)
  • config/stage/robotstxt.settings.yml
  • config/dev/robotstxt.settings.yml
🧰 Additional context used
📓 Path-based instructions (1)
tests/behat/**/*.feature

📄 CodeRabbit Inference Engine (CLAUDE.md)

tests/behat/**/*.feature: All Behat feature files must use the standard user story format: 'As a [user type]\nI want to [action]\nSo that [benefit]'
Use consistent user types in Behat feature files: 'site visitor', 'site administrator', and 'content editor'
Do not use commas or periods in user story statements within Behat feature files

Files:

  • tests/behat/features/robotstxt.feature
🧠 Learnings (2)
📚 Learning: 2025-08-06T02:21:34.266Z
Learnt from: CR
PR: drevops/website#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-06T02:21:34.266Z
Learning: Applies to tests/behat/**/*.feature : All Behat feature files must use the standard user story format: 'As a [user type]\nI want to [action]\nSo that [benefit]'

Applied to files:

  • tests/behat/features/robotstxt.feature
📚 Learning: 2025-08-06T02:21:34.266Z
Learnt from: CR
PR: drevops/website#0
File: CLAUDE.md:0-0
Timestamp: 2025-08-06T02:21:34.266Z
Learning: Applies to tests/behat/**/*.feature : Use consistent user types in Behat feature files: 'site visitor', 'site administrator', and 'content editor'

Applied to files:

  • tests/behat/features/robotstxt.feature

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@AlexSkrypnyk AlexSkrypnyk enabled auto-merge (rebase) August 19, 2025 08:30
@AlexSkrypnyk AlexSkrypnyk added the AUTOMERGE Pull request has been approved and set to automerge label Aug 19, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Lite

📥 Commits

Reviewing files that changed from the base of the PR and between 144f7c1 and 1a180ad.

📒 Files selected for processing (8)
  • config/default/config_split.config_split.dev.yml (1 hunks)
  • config/default/config_split.config_split.test.yml (1 hunks)
  • config/default/robotstxt.settings.yml (1 hunks)
  • config/dev/robotstxt.settings.yml (0 hunks)
  • config/stage/robotstxt.settings.yml (0 hunks)
  • tests/behat/features/robotstxt.feature (1 hunks)
  • tests/phpunit/Drupal/EnvironmentSettingsTest.php (8 hunks)
  • web/sites/default/includes/modules/settings.robotstxt.php (1 hunks)
💤 Files with no reviewable changes (2)
  • config/dev/robotstxt.settings.yml
  • config/stage/robotstxt.settings.yml
🧰 Additional context used
📓 Path-based instructions (1)
tests/behat/**/*.feature

📄 CodeRabbit Inference Engine (CLAUDE.md)

tests/behat/**/*.feature: All Behat feature files must use the standard user story format: 'As a [user type]\nI want to [action]\nSo that [benefit]'
Use consistent user types in Behat feature files: 'site visitor', 'site administrator', and 'content editor'
Do not use commas or periods in user story statements within Behat feature files

Files:

  • tests/behat/features/robotstxt.feature
🔇 Additional comments (2)
web/sites/default/includes/modules/settings.robotstxt.php (2)

1-8: Header and strict_types placement look good

The file header is clear and declare(strict_types=1) is correctly positioned.


1-12: Confirm loading order and environment initialization

Based on your script results:

  • ENVIRONMENT_PROD is defined in web/sites/default/settings.php:131.
  • No explicit include of settings.robotstxt.php was found—verify how/when this file is pulled in.
  • No assignment to $settings['environment'] was detected under web/sites—ensure it’s populated before this snippet runs.
  • No other overrides of robots_txt.settings.content were found.

Please double-check that:

  1. settings.robotstxt.php is included after both the environment constant and $settings['environment'] are initialized.
  2. No downstream configuration rewrites robots_txt.settings.content in non-prod environments.

complete_list: { }
partial_list:
- robotstxt.settings
partial_list: { }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Use a list for partial_list to avoid schema/type mismatches.

Config Split expects partial_list to be a sequence (list). Representing it as an empty mapping { } risks config import/schema validation issues. Prefer [].

Apply this diff:

-partial_list: {  }
+partial_list: [ ]

Run this to confirm consistency across the repo and spot any other occurrences using { }:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Occurrences of partial_list across config_split configs:"
rg -n -C1 -g 'config/**/config_split.config_split.*.yml' -P '^\s*partial_list:\s*(\{\s*\}|\[\s*\])'

Length of output: 843


Standardize partial_list as an empty list in all Config Split files

Config Split requires partial_list to be a sequence, not a mapping. Defining it as { } may break import/schema validation. Please update every occurrence:

  • config/default/config_split.config_split.dev.yml (line 16)
  • config/default/config_split.config_split.test.yml (line 16)
  • config/default/config_split.config_split.local.yml (line 18)
  • config/default/config_split.config_split.ci.yml (line 18)

Apply this diff in each file:

-partial_list: {  }
+partial_list: []

(Optional) Re-run to confirm no remaining mappings:

rg -n -g 'config/**/config_split.config_split.*.yml' -P '^\s*partial_list:\s*{'
🤖 Prompt for AI Agents
In config/default/config_split.config_split.dev.yml around line 16 (and
similarly in config/default/config_split.config_split.test.yml line 16,
config/default/config_split.config_split.local.yml line 18, and
config/default/config_split.config_split.ci.yml line 18), the key partial_list
is defined as a mapping ({ }) but Config Split requires a sequence; change each
occurrence from an empty mapping to an empty list (replace "{ }" with "[]") in
those files and then re-run the provided ripgrep check to confirm no remaining
mappings.

complete_list: { }
partial_list:
- robotstxt.settings
partial_list: { }
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

partial_list should be an empty list, not an empty mapping.

Same as in the dev split, prefer [] for an empty sequence to match Config Split’s expected type.

Apply this diff:

-partial_list: {  }
+partial_list: [ ]

Run this to confirm consistency across the repo:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n -C1 -g 'config/**/config_split.config_split.*.yml' -P '^\s*partial_list:\s*(\{\s*\}|\[\s*\])'

Length of output: 720


Convert all partial_list mappings to empty sequences

We found 4 occurrences of partial_list: { }—it should be partial_list: [] to match Config Split’s expected sequence type:

  • config/default/config_split.config_split.test.yml:16
  • config/default/config_split.config_split.dev.yml:16
  • config/default/config_split.config_split.local.yml:18
  • config/default/config_split.config_split.ci.yml:18

Apply this patch in each file:

-partial_list: { }
+partial_list: []

Re-run to verify no mappings remain:

#!/usr/bin/env bash
set -euo pipefail
rg -n -C1 -g 'config/**/config_split.config_split.*.yml' -P '^\s*partial_list:\s*(\{\s*\}|\[\s*\])'
🤖 Prompt for AI Agents
In config/default/config_split.config_split.test.yml around line 16 (and the
other occurrences in config/default/config_split.config_split.dev.yml:16,
config/default/config_split.config_split.local.yml:18,
config/default/config_split.config_split.ci.yml:18), replace the mapping form
"partial_list: { }" with an empty sequence "partial_list: []" so the YAML type
matches Config Split’s expected sequence; update each file accordingly and
re-run the provided ripgrep verification to ensure no mappings remain.

_core:
default_config_hash: ceCx5XZ_ay1Mxcv-sB95U_fBKoVkpvo8RaQiwutSZLI
content: "User-agent: *\r\nDisallow:"
content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.css?\r\nAllow: /core/*.js$\r\nAllow: /core/*.js?\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.css?\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.js?\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Config name mismatch: robotstxt.settings vs robots_txt.settings will break PROD intent.

Tests and settings overrides use robots_txt.settings, but this file is robotstxt.settings.yml. In PROD, the intended content will not be applied if the config entity name doesn’t match. Standardize on robots_txt.settings.

Actions:

  • Rename the file to config/default/robots_txt.settings.yml (config name is derived from filename).
  • Ensure any code or config splits referencing robotstxt.settings are updated to robots_txt.settings.

Run this to verify current usage and spot mismatches:


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Config key usage:"
rg -n -C2 -P '\brobots?_txt\.settings\b|robotstxt\.settings\b'
echo
echo "Config files named for robots settings:"
fd -uu -t f -g 'config/**/robots*_*.settings.yml' || true
fd -uu -t f -g 'config/**/robotstxt.settings.yml' || true

Length of output: 5563


Fix config filename mismatch: rename to robots_txt.settings.yml

Tests and runtime code reference the config key robots_txt.settings, but the default file is named robotstxt.settings.yml. In production this means your robots.txt content will never be loaded.

• Rename

  • config/default/robotstxt.settings.ymlconfig/default/robots_txt.settings.yml
    • Update any environment-specific overrides or config splits (e.g. config/production/robotstxt.settings.yml) to use robots_txt.settings.yml
    • No code changes required—the PHP and PHPUnit code already uses robots_txt.settings
🤖 Prompt for AI Agents
In config/default/robotstxt.settings.yml around line 3, the filename is wrong
(robotstxt.settings.yml) so the application expects robots_txt.settings but
never loads this file; rename config/default/robotstxt.settings.yml to
config/default/robots_txt.settings.yml and update any environment-specific or
split config files (e.g. config/production/robotstxt.settings.yml) to the new
name so the existing PHP/PHPUnit config key robots_txt.settings resolves
correctly; no code changes required.

🧹 Nitpick (assertive)

Remove redundant Allow entries with '?' suffix.

Robots.txt supports * and $ (Google extension). ? is not a wildcard and will be treated as a literal, making lines like Allow: /*.css? ineffective and redundant when /*.css$ already exists.

Apply this replacement for the content string (removing the four ?-suffixed lines only):

-content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.css?\r\nAllow: /core/*.js$\r\nAllow: /core/*.js?\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.css?\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.js?\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n"
+content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.js$\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.css?\r\nAllow: /core/*.js$\r\nAllow: /core/*.js?\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.css?\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.js?\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n"
content: "User-agent: *\r\n# CSS, JS, Images\r\nAllow: /core/*.css$\r\nAllow: /core/*.js$\r\nAllow: /core/*.gif\r\nAllow: /core/*.jpg\r\nAllow: /core/*.jpeg\r\nAllow: /core/*.png\r\nAllow: /core/*.svg\r\nAllow: /profiles/*.css$\r\nAllow: /profiles/*.js$\r\nAllow: /profiles/*.gif\r\nAllow: /profiles/*.jpg\r\nAllow: /profiles/*.jpeg\r\nAllow: /profiles/*.png\r\nAllow: /profiles/*.svg\r\n# Directories\r\nDisallow: /core/\r\nDisallow: /profiles/\r\n# Files\r\nDisallow: /README.md\r\nDisallow: /composer/Metapackage/README.txt\r\nDisallow: /composer/Plugin/ProjectMessage/README.md\r\nDisallow: /composer/Plugin/Scaffold/README.md\r\nDisallow: /composer/Plugin/VendorHardening/README.txt\r\nDisallow: /composer/Template/README.txt\r\nDisallow: /modules/README.txt\r\nDisallow: /sites/README.txt\r\nDisallow: /themes/README.txt\r\n# Paths (clean URLs)\r\nDisallow: /admin/\r\nDisallow: /comment/reply/\r\nDisallow: /filter/tips\r\nDisallow: /node/add/\r\nDisallow: /search/\r\nDisallow: /user/register\r\nDisallow: /user/password\r\nDisallow: /user/login\r\nDisallow: /user/logout\r\nDisallow: /media/oembed\r\nDisallow: /*/media/oembed\r\n# Paths (no clean URLs)\r\nDisallow: /index.php/admin/\r\nDisallow: /index.php/comment/reply/\r\nDisallow: /index.php/filter/tips\r\nDisallow: /index.php/node/add/\r\nDisallow: /index.php/search/\r\nDisallow: /index.php/user/password\r\nDisallow: /index.php/user/register\r\nDisallow: /index.php/user/login\r\nDisallow: /index.php/user/logout\r\nDisallow: /index.php/media/oembed\r\nDisallow: /index.php/*/media/oembed\r\n"
🤖 Prompt for AI Agents
In config/default/robotstxt.settings.yml around line 3, the robots content
includes redundant Allow entries that end with a literal '?' (e.g. Allow:
/core/*.css? and similar) which are ineffective and duplicate the corresponding
regex-anchored rules (/*.css$ and /*.js$); remove the four ?-suffixed Allow
lines (the Allow lines for /core/*.css?, /core/*.js?, /profiles/*.css?,
/profiles/*.js?) from the content string so only the valid $-anchored Allow
rules remain.


@api
Scenario: Verify robots.txt exists and contains appropriate content in production
Scenario: Verify robots.txt exists and contains appropriate content in non-production
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Scenario rename aligns with non-production behavior — update user story to meet guidelines.

Per coding guidelines, use the standard user story format and allowed user types, and avoid punctuation.

Update the header lines as shown:

As a site visitor
I want to access the robots txt file
So that search engines crawl and index the site correctly
🤖 Prompt for AI Agents
In tests/behat/features/robotstxt.feature around line 9, the scenario header
should follow the standard user story format with an allowed user type and no
punctuation; replace the current header lines with three lines: "As a site
visitor", "I want to access the robots txt file", and "So that search engines
crawl and index the site correctly" ensuring exact wording and no punctuation,
one line per statement at the top of the feature file.

$config['environment_indicator.indicator']['name'] = static::ENVIRONMENT_SUT;
$config['environment_indicator.settings']['favicon'] = TRUE;
$config['environment_indicator.settings']['toolbar_integration'] = [TRUE];
$config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Non-PROD robots snippet assertions look correct — verify config name consistency across the codebase.

The tests assert robots_txt.settings.content = "User-agent: *\r\nDisallow:" across non-prod environments, which aligns with the new settings include behavior. However, the default config file is named config/default/robotstxt.settings.yml (without underscore). This mismatch would prevent PROD from receiving the intended content.

Actions:

  • Rename config/default/robotstxt.settings.yml to config/default/robots_txt.settings.yml (or adjust the settings include/tests to use robotstxt.settings, but standardizing on robots_txt.settings is recommended).
  • Confirm there are no remaining references to robotstxt.settings.

Script to verify:

If you want, I can provide a patchset that renames the file and updates any references.

Also applies to: 330-330, 377-377, 423-423, 472-472, 521-521, 572-572, 623-623


🏁 Script executed:

#!/bin/bash
set -euo pipefail
echo "Searching for config key usages:"
rg -n -C2 -P '\brobots?_txt\.settings\b|robotstxt\.settings\b'
echo
echo "Checking presence of config files:"
fd -uu -t f 'robots_txt.settings.yml' config/ || true
fd -uu -t f 'robotstxt.settings.yml' config/ || true

Length of output: 5608


Ensure config filename matches the robots_txt.settings key

The code and tests reference $config['robots_txt.settings'], but the default config file is named robotstxt.settings.yml. This mismatch means PROD will never load your robots.txt settings.

Actions required:

  • Rename config/default/robotstxt.settings.ymlconfig/default/robots_txt.settings.yml
  • Update any references to robotstxt.settings (tests, docs, includes) to use robots_txt.settings
  • Confirm there are no remaining robotstxt.settings usages via a grep/ripgrep scan
🤖 Prompt for AI Agents
In tests/phpunit/Drupal/EnvironmentSettingsTest.php around line 260, the test
sets $config['robots_txt.settings'] but the repository's default config file is
named robotstxt.settings.yml causing a mismatch; rename
config/default/robotstxt.settings.yml to config/default/robots_txt.settings.yml,
update all references (tests, docs, includes, variable names) from
robotstxt.settings to robots_txt.settings, and run a project-wide grep/ripgrep
to confirm no remaining robotstxt.settings usages.


declare(strict_types=1);

if ($settings['environment'] !== ENVIRONMENT_PROD) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden against undefined constant and missing setting

If ENVIRONMENT_PROD isn’t defined before this include, PHP 8 will error. Also, if $settings['environment'] is not set, you’ll get a notice. Guard both for robustness.

Apply this diff:

-if ($settings['environment'] !== ENVIRONMENT_PROD) {
+if (!defined('ENVIRONMENT_PROD') || (($settings['environment'] ?? null) !== ENVIRONMENT_PROD)) {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ($settings['environment'] !== ENVIRONMENT_PROD) {
if (!defined('ENVIRONMENT_PROD') || (($settings['environment'] ?? null) !== ENVIRONMENT_PROD)) {
🤖 Prompt for AI Agents
In web/sites/default/includes/modules/settings.robotstxt.php around line 10, the
conditional uses ENVIRONMENT_PROD and $settings['environment'] directly which
can raise an undefined constant or undefined index notice; update the condition
to first check defined('ENVIRONMENT_PROD') and isset($settings['environment'])
(or array_key_exists) before comparing, e.g. only evaluate the comparison when
both exist, and treat missing values as non-production by default so the block
runs safely when either is absent.

Comment on lines +10 to +12
if ($settings['environment'] !== ENVIRONMENT_PROD) {
$config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Non-prod robots.txt currently allows crawling; confirm intent or change to disallow all

In robots.txt, “Disallow:” with an empty value means “allow everything.” For non-production, best practice is to block indexing to avoid leaking staging/dev content into search engines. If the current behavior is intentional, ignore this. Otherwise, switch to “Disallow: /”.

Apply this diff to disallow all in non-production:

-  $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";
+  $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow: /";

🏁 Script executed:

#!/bin/bash
set -e

# Locate the robots.txt settings file
echo "Searching for settings.robotstxt.php..."
fd -HI "settings.robotstxt.php"

# Search for references to robots_txt.settings in the repo
echo -e "\nSearching for robots_txt.settings assignments..."
rg -n --type=php "robots_txt\.settings"

# Show the first 50 lines of the file for context
echo -e "\nDisplaying the first 50 lines of settings.robotstxt.php..."
sed -n '1,50p' web/sites/default/includes/modules/settings.robotstxt.php

Length of output: 1939


Update non-prod robots.txt and corresponding tests to block all crawling

The current implementation and existing PHPUnit tests expect:

$config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";

Empty “Disallow:” allows all bots. To enforce “no crawling” on non-prod, change it to “Disallow: /” and update every test asserting the old value:

File: web/sites/default/includes/modules/settings.robotstxt.php

-if ($settings['environment'] !== ENVIRONMENT_PROD) {
-  $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";
-}
+if ($settings['environment'] !== ENVIRONMENT_PROD) {
+  $config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow: /";
+}

Tests to update:
tests/phpunit/Drupal/EnvironmentSettingsTest.php
At lines approx. 260, 330, 376, 423, 472, 521, 572, and 623, replace:

"User-agent: *\r\nDisallow:";

with

"User-agent: *\r\nDisallow: /";

This ensures both behavior and tests remain in sync.

🤖 Prompt for AI Agents
In web/sites/default/includes/modules/settings.robotstxt.php around lines 10–12
the non-production robots.txt sets "Disallow:" which allows crawling; change the
value to "User-agent: *\r\nDisallow: /" to block all crawling, and update the
PHPUnit assertions in tests/phpunit/Drupal/EnvironmentSettingsTest.php at the
specified approximate lines (260, 330, 376, 423, 472, 521, 572, 623) to expect
"User-agent: *\r\nDisallow: /" instead of the old string so tests match the new
behavior.

declare(strict_types=1);

if ($settings['environment'] !== ENVIRONMENT_PROD) {
$config['robots_txt.settings']['content'] = "User-agent: *\r\nDisallow:";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Line endings in content: verify CRLF requirement

You’re using CRLF “\r\n”. If the default config or downstream consumers normalize to “\n”, this may create unexpected diffs or mismatches across environments. If CRLF is deliberate (e.g., to match other fixtures/tests), keep it; otherwise consider “\n”.

🤖 Prompt for AI Agents
In web/sites/default/includes/modules/settings.robotstxt.php around line 11, the
robots_txt.settings content uses CRLF ("\r\n"); confirm whether CRLF is required
by downstream consumers or tests—if not, replace "\r\n" with "\n" to avoid
cross-environment diffs, otherwise keep "\r\n" and add a short code comment
explaining the deliberate CRLF choice so future reviewers know it is
intentional.

@AlexSkrypnyk AlexSkrypnyk force-pushed the feature/fix-robotstxt-prod branch from 1a180ad to 5f2560a Compare August 19, 2025 13:02
@AlexSkrypnyk AlexSkrypnyk force-pushed the feature/fix-robotstxt-prod branch from 5f2560a to d9544af Compare August 19, 2025 13:42
@AlexSkrypnyk AlexSkrypnyk merged commit fb20cfa into develop Aug 19, 2025
7 checks passed
@AlexSkrypnyk AlexSkrypnyk deleted the feature/fix-robotstxt-prod branch August 19, 2025 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AUTOMERGE Pull request has been approved and set to automerge

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Incorrect robots.txt served in PROD

2 participants