Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Search engines index non-local communities, leading to undesirable results #3098

Closed
binwiederhier opened this issue Jun 14, 2023 · 3 comments
Labels
bug Something isn't working

Comments

@binwiederhier
Copy link

binwiederhier commented Jun 14, 2023

Thank you for your fantastic work on Lemmy. I love it!

Issue Summary

Due to the nature of the default robots.txt and the meta tags in Lemmy, search engines will index even non-local communities. This leads to results that are undesirable, such as unrelated/undesirable content being associated with your instance.

Example:

image

Steps to Reproduce

Open Google, and type <your instance> north korea

Suggested remediation/feature

I think it should be an opt-in feature to have non-local communities be indexed, e.g. [ ] Allow search engines to index non-local communities

Temporary workaround

I added this to my nginx config to prevent search engines from indexing the entire site:

# Disallow all search engines
location = /robots.txt {
    add_header Content-Type text/plain;
    return 200 "User-agent: *\nDisallow: /\n";
}
@binwiederhier binwiederhier added the bug Something isn't working label Jun 14, 2023
@jcgurango
Copy link

Correct me if I'm wrong but I believe this is the purpose of canonical URLs. Lemmy could add a link to the post in its original instance this way.

@kevinmershon
Copy link

@binwiederhier I think this may more appropriately be resolved within lemmy-ui: https://github.com/LemmyNet/lemmy-ui/pull/401/files

@binwiederhier
Copy link
Author

binwiederhier commented Jun 14, 2023

@jcgurango I think that would certainly help, but honestly I don't want the pages to be associated with my instance at all. So a robots = noindex is preferable for these pages.

@kevinmershon Oh I see. There are two issue trackers. I'll close this one and re-open over there.

Reported in LemmyNet/lemmy-ui#1275

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants