Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delete search index entries under more specific conditions #7266

Merged
merged 9 commits into from
Jun 5, 2024

Conversation

fritzmg
Copy link
Contributor

@fritzmg fritzmg commented Jun 5, 2024

In a Contao instance of a customer where indexing of protected pages is enabled I noticed the following issue: if the URL of protected and already indexed page is requested without a valid login for that page, the URL gets deleted from the search index.

To fix this I first thought of ignoring the 401 and 403 status code in our SearchIndexListener for the delete operation. However, then I realized that there are other status codes where the same holds true: a 503 status code is only temporary and any URL responding momentarily with that status code should not be removed from the index (neither would Google, they would only remove the URL from the index if that status code persists over a longer period of time).

Then I realized the same holds true for the 500 status code. Any error happening under a specific URL might only be temporary and thus the URL should not be removed from the index (neither would Google, they would only remove the URL from the index if that status code persists over a longer period of time).

Thus I then decided to completely revamp the conditions under which a URL should be deleted from the index:

  • If the status code is 404 or 410, always delete from the index.
  • If the response is succesful and the X-Robots-Tag contains noindex, always delete from the index.
  • If the response is succesful and the HTML contains the meta robots tag with noindex, always delete from the index.
  • Otherwise never delete from the index automatically, as any other state might just be temporary for any given URL.

@fritzmg fritzmg added the bug label Jun 5, 2024
@fritzmg fritzmg added this to the 4.13 milestone Jun 5, 2024
@fritzmg fritzmg requested review from ausi and Toflar June 5, 2024 13:16
@fritzmg fritzmg self-assigned this Jun 5, 2024
@ausi
Copy link
Member

ausi commented Jun 5, 2024

If the X-Robots-Tag contains noindex, always delete from the index.

Shouldn’t we also check for $response->isSuccessful() additionally here? Meaning if the response is successful AND X-Robots-Tag contains noindex we should delete from the index, but not otherwise.

@fritzmg
Copy link
Contributor Author

fritzmg commented Jun 5, 2024

Shouldn’t we also check for $response->isSuccessful() additionally here? Meaning if the response is successful AND X-Robots-Tag contains noindex we should delete from the index, but not otherwise.

Hmm, do you mean in case a controller only sends this header during a 500 response for example? I guess that makes sense.

fritzmg and others added 2 commits June 5, 2024 14:26
Co-authored-by: Martin Auswöger <martin@auswoeger.com>
@fritzmg fritzmg requested a review from ausi June 5, 2024 13:36
@fritzmg fritzmg changed the title Only delete search index entries under more specific conditions Delete search index entries under more specific conditions Jun 5, 2024
Toflar
Toflar previously approved these changes Jun 5, 2024
Copy link
Member

@Toflar Toflar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ausi
ausi previously approved these changes Jun 5, 2024
@fritzmg fritzmg dismissed stale reviews from ausi and Toflar via 41d2dc7 June 5, 2024 15:10
fritzmg and others added 3 commits June 5, 2024 16:16
Co-authored-by: Leo Feyer <1192057+leofeyer@users.noreply.github.com>
@leofeyer leofeyer enabled auto-merge (squash) June 5, 2024 15:21
@leofeyer
Copy link
Member

leofeyer commented Jun 5, 2024

Thank you @fritzmg.

@leofeyer leofeyer merged commit 49df784 into contao:4.13 Jun 5, 2024
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants