-
-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a search indexer abstraction #730
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good overall.
I guess the TL_NOINDEX_KEYS
exists to prevent flooding the search index with duplicate or irrelevant search entries. We should preserve the functionality, although I agree that we should generate a noindex
tag in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think dropping TL_NOINDEX_KEYS
could lead to problems. Especially calender parameters like day
, month
and year
could result in a big number of duplicate entries for the same page.
Once we have a better way to detect such “duplicates” we can remove TL_NOINDEX_KEYS
IMO.
All comments addressed. Ready for another round of reviews. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🎉
Apart from a rebase to |
8f8c3de
to
aa70f92
Compare
Merged latest master into this PR and adjusted the configuration section acordingly. Should be all ready to merge now :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good job! Only the service IDs seem a little inconsistent to me (see my comments).
core-bundle/src/DependencyInjection/Compiler/SearchIndexerPass.php
Outdated
Show resolved
Hide resolved
Do we really need the namespace Contao\CoreBundle\Search\Indexer;
class DefaultIndexer
{
} Will there be a large number of different indexer classes? And what other sub-namespaces will we have in the future? |
I don't know. Maybe an |
dcc0864
to
fba751c
Compare
1b0269f
to
1d73d45
Compare
Thank you very much @Toflar. |
Description ----------- This is a follow-up PR for #730. I've introduced a search indexer abstraction there but clearing a single URI was not part of it. I've extended the interface which now also contains a `clearDocument()` method (no BC break, 4.9 is still in development) and I've extracted the logic into its own listener so that it doesn't just work for our own `PageError404` but for any exception. Commits ------- 44fd9aa Delete invalid URLs using the new search indexer abstraction 550141c Fixed comment a2eb4d7 Only handle responses that contained any JSON LD data 1195a3f Fix the coding style
Description ----------- This is a follow-up PR for contao/contao#730. I've introduced a search indexer abstraction there but clearing a single URI was not part of it. I've extended the interface which now also contains a `clearDocument()` method (no BC break, 4.9 is still in development) and I've extracted the logic into its own listener so that it doesn't just work for our own `PageError404` but for any exception. Commits ------- 44fd9aa7 Delete invalid URLs using the new search indexer abstraction 550141ca Fixed comment a2eb4d76 Only handle responses that contained any JSON LD data 1195a3ff Fix the coding style
This PR implements a search indexer abstraction level so that one can have additional search indexers. The core one can also be disabled completely by configuring
Here are some key concepts:
IndexerInterface
now. I've implemented aDelegatingIndexer
that just forwards to all registered indexers so we can have multiple indexers.Document
represents an URI, response status code, headers and the body. For meta data I chose to useapplication/ld+json
scripts because they are designed for exactly this use case (also see schema.org).$GLOBALS['TL_NOINDEX_KEYS']
because they are just plain nonsense. If you don't want the page to be indexed when these paramters are set, you have to configure the page to have a<meta name="robots" content="noindex">
tag. Otherwise neither any real search engine nor my planned indexer will have a chance to find out what you want to do. Also, why would you not want to index pages with apage
parameter present? Maybe that's just fine for some cases.All unit tests etc. are already done. So this PR is in a final state to be reviewed 😊