Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create redirects of docs.plone.org to 6.docs.plone.org #1496

Open
stevepiercy opened this issue May 13, 2023 · 10 comments
Open

Create redirects of docs.plone.org to 6.docs.plone.org #1496

stevepiercy opened this issue May 13, 2023 · 10 comments
Assignees

Comments

@stevepiercy
Copy link
Contributor

stevepiercy commented May 13, 2023

Search engines have indexed docs.plone.org heavily. We need to set up 301 redirects on the server.

I don't have access to do this. @polyester @fredvd have access. We should also discuss how to discover 301 redirects. Matomo should be able to do it, according to this blog article, but I see no such report. It might be for the paid version only. We might need to parse server logs.

List of redirects (add more as they are discovered)

Source Target
https://docs.plone.org/develop/plone.api/docs/* https://6.docs.plone.org/plone.api/*

Reference: post by @1letter on Community Forum:

https://community.plone.org/t/plone-6-documentation-update-2023-05-12-plone-6-documentation-released/17451/2

@fredvd
Copy link
Sponsor Member

fredvd commented May 13, 2023

I've chewed a while on this today. @stevepiercy The 'crawling errors report' in that article is promotion for the paid SearchEngineKeywordPerformance plugin, and it only works for Bing! and Yahoo search, not for Google.

For some searches on Plone terms I already get 6.docs.plone.org results on Google. Within maybe a few weeks most links will already be reindexed and old suggestions dropped.

I thought about adding a custom 404 page where we track 404's using instructions here https://matomo.org/faq/how-to/faq_60. But that doesn't work that way because we use a tag container. There are instructions for that situation here where you can check for the page title: https://matomo.org/blog/2019/07/how-to-analyse-404-pages/

But that's not happening here, we redirect any docs.plone/(.*) to 6.docs.plone.org, so on the destination site the page view is for the homepage, which is not a 404. So to detect 'old' redirects on the target site we need to change the current redirect so that it actually generates a 404 into the new docs site that we can track. Then a 404 report would help.

We could add/install Google Search Console access on the site to get more information from there. But I checked search console for another site I manage, and I don't think it will help. We would need to first register docs.plone.org in google search console, but that url is redirecting to ... 6.docs.plone.org. If we somehow fix that (disable redirect, install token for an accoun to please search console, enable it again), then you will probably just get a big report of all pages from the old docs site that where once there that google can't find anymore. That doesn't give us any extra information.

The only way I've come up with so far to do this efficiently is to look up the most 10-20 used search terms people use in google search console, Check where google would have sent them to (while the wrong indexed data is still there) and add manual redirects for that to our webserver config. But for how many weeks and wrong searches will we do that effort?

The lowest hanging fruit at the moment is probably adding an extra message on the homepage to ask people if they hit the homepage from a search result to kindly ask them to repeat the search terms in the local search toolbar on the site to find the new content. That is by the way something else we should validate in Matomo if we register internal site search.... Just checked, we didn't because we used the default search parameters (Searchabletext). I updated the config parameter now to be 'q', so we can inspect what people are searching for using the site search.

@polyester
Copy link
Sponsor Member

It's easy to redirect docs.plone.org/whatever not to 6.docs.plone.org but to 6.docs.plone.org/whatever so that would generate 404's. Would that be helpful? Any URL after 6.docs.plone.org that does not exist will show the custom error page at that non-existing URL

Can see if I can do that tonight, but just had an 8hr bus journey and need to get some rest, and supplies for tonight as the Gay Olympics Eurovision is on, so may be tomorrow, but first think through if that is helpful.

@fredvd
Copy link
Sponsor Member

fredvd commented May 13, 2023

@polyester Yes, changing the redirect to generate the 404 where and then picking up the 404's in Matomo to see the original paths to the documents is doable. But please don't activate it yet, let's first gather some more feedback.

The question if it is desirable to show 404's just to get that data in Matomo. And for how many weeks/searches we are organisation this effort before Google's index has been updated and the broken inbound links from Google search results are gone anyway.

If we would activate it then it would be nice to have a note on the 404 page (instead of the hompage) to suggest using the internal site search to find what they're looking for. And/or also suggest searching on 5.docs.plone.org .

But we might get as much results from keeping an eye on search console search keywords for the next weeks and adding the most popular destination page redirecrdts for those pages as I suggested. Allthough this procedure is more fuzzy then seeing the actual 404 for page links.

@stevepiercy
Copy link
Contributor Author

stevepiercy commented May 14, 2023

The 'crawling errors report'...

Let's not do this.

Google Search Console

Let's not do this.

The question if it is desirable to show 404's just to get that data in Matomo.

I think it would be helpful to know what we don't know. Read on for why.

And for how many weeks/searches we are organisation this effort before Google's index has been updated and the broken inbound links from Google search results are gone anyway.

With 301 redirects, crawlers will "self-heal" faster and to the correct destination, than without them, if at all. I think that is useful.

I am interested in checking Matomo periodically for a few months. Once the 301-to-404s drop off, I can do a reverse URL search on them to identify the sites that are still driving traffic to the old URL. If those external sites are under our control, then I'll fix them. For the dregs, such as personal bookmarks or third-party websites, I probably don't care enough but I will reserve judgment until I have the data.

The current 404 has a search field, just like the default page, so I am not terribly concerned about the inconvenience of the end user having to repeat their search on our site.

To summarize the actions I would like to see:

  • Update the default 404.html page by appending the text, " Use this site's search to find what you seek." Append search advice to 404 page #1502
  • Configure the web server with 301 redirects for specific URLs, currently only that one in the issue description. Others may be added in the future as reported and determined to be important.
  • For the non-matching remaining URLs, configure the web server with redirects of docs.plone.org/$foo to 6.docs.plone.org/$foo. 6.docs.plone.org/$foo will result in a 404, which in turn will be captured by Matomo.
  • Configure Matomo to capture the 404s.
  • Monitor Matomo periodically for 3 months.

How does that sound?

@mamico
Copy link
Member

mamico commented May 26, 2023

My five cents. At the moment, Google, and its friends, still return many pages of the old documentation, like this one:

https://docs.plone.org/4/en/manage/installing/requirements.html

which now redirects to a 404 at

https://6.docs.plone.org/4/en/manage/installing/requirements.html

Isn't it better to redirect to something like:

https://4.docs.plone.org/manage/installing/requirements.html

and point out at the top of these pages that the 4/5 documentation is obsolete?

@stevepiercy
Copy link
Contributor Author

@mamico thanks for the report. I agree that we need redirects for Plone 3, 4, and 5 docs.

For all of them, the version number moved from the path to the subdomain.

For 3 and 4, we should also drop the en from the URL.

Thus for 3 and 4:

https://docs.plone.org/3/en/foo
https://docs.plone.org/4/en/foo

...should redirect to:

https://3.docs.plone.org/foo
https://4.docs.plone.org/foo

And for v5:

https://docs.plone.org/5/foo

...should redirect to:

https://5.docs.plone.org/foo

@polyester and @fredvd do you agree?

<6 documentation is still used by roughly 20% of visitors, based on less than a day of data collected. That's too large to call them obsolete. But I can update the 404 page message to indicate that the visitor may go to previous versions of the docs. See #1504.

We do have warnings on both 3 and 5, but not 4, about the latest versions of docs.

Also the warning on 3 points to 5 as the latest, and should instead point to 6.

@polyester would you be able to add a warning to 4 docs, and fix the warning on 3?

@polyester
Copy link
Sponsor Member

polyester commented May 26, 2023

  • deployed the docs.plone.org/3/en, docs.plone.org/4/en and docs.plone.org/5 rewrites

  • Warnings on older versions: that takes a bit, as it has to be done on each and every page (3 and 4 versions are just a bunch of static pages now) and search/replace likes a laptop with some more oomph, can do tomorrow, will probably have to hand-create a html snippet for 4.docs.plone.org

@fredvd
Copy link
Sponsor Member

fredvd commented May 26, 2023

@mamico Thanks for your feedback. To update this ticket, from the discussion here we have implemented a 'tracking strategy' so that we can see which Google/Search traffic hits which pages to they trigger a 404 that can be analysed in Matomo so that we can add manual 'deeplink' paths to our redirects for the most popuplar destinations.

We didn't know before, because everything was redirected to the 6.docs.plone.org homepage.


copied from a discussion from thursday 25th

I have updated the tag container for the new docs website in Matomo according to https://matomo.org/blog/2019/07/how-to-analyse-404-pages/ and published a version 1.1

You have to play a bit with triggers.

There is a second trigger now that detects if the page title is "Page not found"
Then you create a custom html tag that contains a default Matomo tracking snippet that you can find in the website system configuration, but with some extra magic added that adds a category to the page title and adds the referrer, info here (https://matomo.org/faq/how-to/faq_60/)

The custom 404 page not found html trigger fires the 404 custom html tag. (This custom html tag is NOT the tag manager snippet by the way, then you might get some nice recursive behavior ) . )

And the important detail, there should always be one: you should also add the 404 trigger to the normal analytics tag, but then as an exclusion condition, so that both tags don't run at the same time.

It takes some time for Matomo to process things, so we might start to see results tomorrow. All this fancy tag manager stuff is necessary so that you don't have to change the documentation itself to have the snippet hardcoded only on the 404 page in the sphinx/myst templates. (thats the suggestion in https://matomo.org/faq/how-to/faq_60/ )

@fredvd
Copy link
Sponsor Member

fredvd commented May 26, 2023

@stevepiercy I'm still considering if we shouldn't register the docs.plone.org site for Search console so that we can log in there occasionally to see what search keywords people use before they are referred to our documentation sites. We'll have to place a small identifier snippet on docs.plone.org and serve instead of redirecting and then we have 'proven' we own the domain so it is added to Google search console.

Google is by far the mostly used search engine, in May alone 1800 searches came from Google when The first runner up is Bing with 57.

The search engines unfortunately don't pass the keywords along, so for google, google search console is the only way to get insight there. It has no GDPR implications, you get acces to statistics they gather themselves from google.com usage.

@stevepiercy
Copy link
Contributor Author

@fredvd Let's try to figure out whether using Google Search Console would help.

Advantages

  1. We can request Google to reindex our site faster.
  2. We can request temporary removal of pages from its index that should not be crawled.
  3. We could see which search terms are used, and populate the HTML meta tags on relevant pages to improve their indexing.
  4. Monitor pages that are included and excluded from indexing, and update our robots.txt accordingly.

Disadvantages

  1. Another service to install and monitor.

I think that is reason enough to enable it.

For which sites would you set up properties? docs.plone.org, 5.docs.plone.org, or 6.docs.plone.org?

I'll send you my Google Account privately, if you want to grant me access.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

No branches or pull requests

4 participants