-
-
Notifications
You must be signed in to change notification settings - Fork 213
search index #7652
Comments
|
I tried that on the contao.org website, and that has to be fixed. How to reproduce:
The Problem is within the checksum check in The checksum of the pages Example from Custom navigation here: Solution: Always strip out the current url from |
|
I agree that this behavior should be changed, however I don't think that stripping the URL is the correct solution. An URI string with variables might lead to a different output and should thus be indexed separately. @contao/developers /cc |
|
Why does a random suffix not result in a 404 in the first place? These nonsense pages shouldn't even get indexed? |
|
This is because the random string is treated as Therefore the "unused GET parameters" check does not respond with a 404 error. |
|
Maybe the comments? |
|
@fatcrobat you could still have page-relative links outside of those comments which would not change anything in that case then. |
|
Yes you are right, invalid requests should always return in an 404.
|
|
I'm sure it is called somewhere in the regular program flow, like in |
|
It should set it back to unused then if it did not use it? I don't get why an invalid URL can be generated? |
|
Let me sum this up:
The only way to "solve" this issue, is not to index URLs with a query string at all. |
Imho, that's correct. Why would you index e.g. a However, as long as we support ID URL's like |
|
But those pages usually display the teaser of news or events and the words contained in the teaser might not show up in the detail/full view. |
|
Even if you removed the |
|
Also, the fact that |
|
agree... the teaser is added to the metadesciption on the detailpage. |
😄 |
This however means that things like figcaptions etc. from pages other than the first page in paginated galleries for example also cannot be found via Contao's search module as discussed in #6942. In order to capture those, the new search indexer (#242) would have to crawl URLs containing query strings as well, would it not? |
|
Gallery pagination is yet another example of doing something wrong. I don't see any valid point for offering pagination for galleries. Besides the fact that it is the wrong approach from a URI point of view it is also very user unfriendly because users cannot (on mobile phones) continue to swipe but are interrupted by a page load. Using images basically happens in two different scenarios: a) They provide more insight/additional info to existing content. In scenario a) obviously they are placed on that existing content page (be it a news entry or a regular page) and thus indexed automatically. Rule of thumb: Anything that should be indexed should have a permanent URL. |
Agreed. But on that note I want to point out again that with the current news (and calendar) module's implementation, not all news entries are indexed - only news entries with Though regarding the original point about pagination - assuming that it would make sense, for whatever reason, to index paginated content. Maybe paginations in Contao should be reworked in general? With a general pagination controller or something similar with which the following conditions would be implemented:
This way the search indexer can truly omit any URL consisting of query strings - and meta tags like |
If anyone can provide a real use case why indexing paginated content, we can discuss that matter. For now, I cannot see any reason at all.
This is exactly what I want to avoid because it's completely incorrect. |
Why is it incorrect? Assuming there is a use case, where you want paginated content to be indexed. |
|
We should not support something that is wrong even if the users want to do it. This use case does not exist for me. |
|
I see, you meant it's "wrong" since there is no usecase (yet) that makes sense. |
|
Yes. And I'm pretty much convinced that there never will be any use case where indexing paginated content makes sense. Even if it did make sense to the user, it's most likely still wrong because it violates the URI principle. Every resource should have it's own unique URI. Paged content is mixed content and does not belong to the search index :) |
|
Well, one use case is to index Contao news articles or events which have a |
|
restricting search to just URLs without a QS wouldnt make much sense imho. function index($url, $content) {
$tokens = tokenize($content);
$hash = hash($tokens);
if(!index_contains($hash)) {
index_add($url, $tokens);
return;
}
$existingURL = index_url($hash);
if(is_better_url($url, $existingURL)) {
index_replace($url, $existingURL);
}
}
this would guarantee, that identical content (from a searchers PoV) is always only listed in the search results under one specific URL and that this URL is the best URL known |
|
if paginated page should be indexed or not, can and should be decided by the user on a page by page basis with the options already present in the page edit view. |
|
We have debugged the issue in our Mumble call on March 24th. It is actually caused by the change language extension. If I disable it, the URL |
|
I have created a ticket: terminal42/contao-changelanguage#68 |
|
The other issue should be fixed in 5030ccf. @contao/developers Please review carefully :) |
### 4.1.3 (2016-04-22) * Use data URIs for the image preview in the back end. * Use DIRECTORY_SEPARATOR to convert kernel.cache_dir into a relative path (see #464). * Always trigger the "isVisibleElement" hook (see contao/core#8312). * Do not change all sessions when switching users (see contao/core#8158). * Do not allow to close fieldsets with empty required fields (see contao/core#8300). * Make the path related properties of the File class binary-safe (see contao/core#8295). * Correctly validate and decode IDNA e-mail addresses (see contao/core#8306). * Skip forward pages entirely in the book navigation module (see contao/core#5074). * Do not add the X-Priority header in the Email class (see contao/core#8298). * Determine the search index checksum in a more reliable way (see contao/core#7652).
https://contao.org/en/search.html?keywords=cms&x=0&y=0&page_s39=6
why this pages in search index
https://contao.org/en/credits.html?=depth=0=depth=1=depth=2=depth=3=depth=4=endflag
https://contao.org/en/credits.html?a=register*
The text was updated successfully, but these errors were encountered: