Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add robots.txt instructions #11693

Merged
merged 11 commits into from
Oct 16, 2023
Merged

Add robots.txt instructions #11693

merged 11 commits into from
Oct 16, 2023

Conversation

alecslupu
Copy link
Contributor

@alecslupu alecslupu commented Oct 1, 2023

🎩 What? Why?

In this PR we add some robots.txt ruleset, that would prevent search engine crawling certain pages from the platform ( search and profiles). as requested here

📌 Related Issues

Link your PR to an issue

Testing

  1. Make sure you run the bundle exec rails decidim:robots:replace
  2. Check your development_app robots.txt for the following lines
# the following adds a rule for all bots to not index any page that contains a profile or search
User-agent: *
Disallow: /profiles/
Disallow: /search
  1. Generate a new application & check the generated robots.txt file

📷 Screenshots

♥️ Thank you!

@alecslupu alecslupu marked this pull request as ready for review October 5, 2023 20:52
@alecslupu alecslupu requested a review from a team October 5, 2023 20:52
Copy link
Member

@andreslucena andreslucena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is already done in develop with #10120, what I had in mind is to do this directly in v0.27 (release/0.27-stable base branch) and backport to v0.26. Can you change the base branch? You probably need to start a new PR, I'm not too sure on how GH works with these kind of changes.

Also I was thinking in putting the instructions manually, but having the rake task is 👌🏽.

@alecslupu
Copy link
Contributor Author

As this is already done in develop with #10120, what I had in mind is to do this directly in v0.27 (release/0.27-stable base branch) and backport to v0.26. Can you change the base branch? You probably need to start a new PR, I'm not too sure on how GH works with these kind of changes.

Also I was thinking in putting the instructions manually, but having the rake task is 👌🏽.

Actually this is complementary to #10120 .

  • A robots.txt file controls crawling. It instructs robots (a.k.a. spiders) that are looking for pages to crawl to “keep out” of certain places. You place this file in your website’s root directory.
  • A noindex tag controls indexing. It tells spiders that the page should not be indexed. You place this tag in the code of the relevant web page. Here is an example of the tag:

In other words, if we just keep the meta tag, the search engines may still crawl those pages. Using robots.txt we tell the search engines that /profile or /search are "off limits"

RELEASE_NOTES.md Outdated Show resolved Hide resolved
@andreslucena
Copy link
Member

In other words, if we just keep the meta tag, the search engines may still crawl those pages. Using robots.txt we tell the search engines that /profile or /search are "off limits"

I have to admit that in my head these were interchangeable, and the main difference was if you had direct access to the application/HTML/views vs being able to just add/change a TXT file in the server. 🤷🏽

In our case, I think that both of these cases are good: this is more "hard" against the anti-spam measures, but as its actually a manual task that need to be run, and it could be missed by implementers when updating, the change from #10120 is like a safety net just in case.

Thanks for the explanation!

Copy link
Member

@andreslucena andreslucena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have some suggestions before merging this one

RELEASE_NOTES.md Outdated Show resolved Hide resolved
decidim-core/lib/tasks/decidim_robots.rake Outdated Show resolved Hide resolved
decidim-core/lib/tasks/decidim_robots.rake Outdated Show resolved Hide resolved
@alecslupu
Copy link
Contributor Author

@andreslucena let me know if there are any improvements to be done, or is a good to go.

Co-authored-by: Andrés Pereira de Lucena <andreslucena@users.noreply.github.com>
@andreslucena andreslucena added module: core type: change PRs that implement a change for an existing feature labels Oct 16, 2023
andreslucena
andreslucena previously approved these changes Oct 16, 2023
Copy link
Member

@andreslucena andreslucena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I'll accept the changes so we don't break the generator CI and we can merge this bad boy

Co-authored-by: Alexandru Emil Lupu <contact@alecslupu.ro>
@andreslucena
Copy link
Member

Merging even though we have some failing specs, as they're expected

@andreslucena andreslucena merged commit 69b7169 into develop Oct 16, 2023
101 of 104 checks passed
@andreslucena andreslucena deleted the feature/add-robots-txt branch October 16, 2023 13:21
entantoencuanto added a commit that referenced this pull request Oct 17, 2023
* develop: (30 commits)
  Add `process-content` to erb-lint's deprecated classes (#11762)
  Add possibility of overriding the tailwind.config.js (#11763)
  Ask old password when changing email or password (#11737)
  Add Projects (Budgets) to filtered search (#11740)
  Fix missing results on Geocoded when search without diacritics (#11761)
  Add robots.txt instructions (#11693)
  Add missing activerecord budget locales for search (#11766)
  Improve design of Admin's Sidebar pages (#11759)
  Show small static map on admin's meetings index with big screens  (#11715)
  Remove "Manage" button when there's a Sidebar (#11717)
  Fix admin breadcrumb in Process (#11757)
  Apply new rubocop rules on erb - Layout/MultilineMethodCallIndentation (#11756)
  Remove xlarge-* references from admin forms (#11712)
  Apply new rubocop rules on erb - Argument identation (#11707)
  Update HERE API autocomplete (#11507)
  Admin redesign proposal issues (#11668)
  Redesign: responsive links on cards (#11538)
  Refactor CI pipelines (#11196)
  Update postcss and graphql to latest versions (#11733)
  Fix develop pipeline (#11750)
  ...
@ahukkanen
Copy link
Contributor

@alecslupu Is is by design that every time I want to create a test app or development app, I get this conflict that I need to choose an option for:

    conflict  public/robots.txt
Overwrite /.../decidim/spec/decidim_dummy_app/public/robots.txt? (enter "h" for help) [Ynaqdhm] 
       force  public/robots.txt

We can also see this same in the CI runs:

CI run

@alecslupu
Copy link
Contributor Author

@alecslupu Is is by design that every time I want to create a test app or development app, I get this conflict that I need to choose an option for:

    conflict  public/robots.txt
Overwrite /.../decidim/spec/decidim_dummy_app/public/robots.txt? (enter "h" for help) [Ynaqdhm] 
       force  public/robots.txt

We can also see this same in the CI runs:

CI run

@ahukkanen this was initially designed to be appended to the existing file, but @andreslucena requested something else.

i can patch it if @andreslucena agrees.

Please see for additional details #11693 (comment)

@ahukkanen
Copy link
Contributor

@ahukkanen this was initially designed to be appended to the existing file, but @andreslucena requested something else.

i can patch it if @andreslucena agrees.

Please see for additional details #11693 (comment)

OK, I personally find it cumbersome that I have to manually agree on the rewrite of this file during the application generation.

It makes total sense when we are modifying an existing application to have the user to confirm the change but when generating new applications, it just adds one extra step. Since it takes a while to generate the application, I generally start it and go to get coffee or something while I'm waiting for the application to be generated. Now I had to find out that it wasn't actually generated because it was stuck waiting for user input.

@andreslucena
Copy link
Member

It makes total sense when we are modifying an existing application to have the user to confirm the change but when generating new applications, it just adds one extra step.

I agree, that's a bug. For new apps this should be override without asking for confirmation. I didn't check that out until today and didn't have time to make the PR yet.

@andreslucena
Copy link
Member

Just to be clear:

  • For existing apps, ask for confirmation (as it's a possible destructive action)
  • For new apps, do it without confirmation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: core type: change PRs that implement a change for an existing feature
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet

3 participants