Clear invalid URLs using the new search indexer abstraction #887

Toflar · 2019-10-28T16:46:13Z

This is a follow-up PR for #730.
I've introduced a search indexer abstraction there but clearing a single URI was not part of it.
I've extended the interface which now also contains a clearDocument() method (no BC break, 4.9 is still in development) and I've extracted the logic into its own listener so that it doesn't just work for our own PageError404 but for any exception.

Toflar · 2019-10-28T16:56:39Z

Thinking about it, why did we only ever clear on 404 so far? That doesn't make sense to me. Imho anything that is successful should be indexed and the rest should be cleared.
So I might as well put the logic into the AddToSearchIndexListener which would become a general SearchIndexListener then.
Don't know if I may rename that because of BC. I think we have to decide on what we do with listeners as they constantly change and imho it doesn't make sense to BC protect them. They shouldn't be extended and used anywhere anyway.

aschempp · 2019-10-28T16:58:42Z

Regarding the BC promise issue, maybe we should have some sort of contao/framework-bundle that is BC, and keep "implementation" stuff in contao/core-bundle that is not BC.

aschempp · 2019-10-29T14:38:45Z

Thinking about it, why did we only ever clear on 404 so far? That doesn't make sense to me. Imho anything that is successful should be indexed and the rest should be cleared.

Agreed

fritzmg · 2019-10-29T14:40:41Z

Thinking about it, why did we only ever clear on 404 so far?

Because there was no other way of knowing when a page does not exist any more. You either have to do a full reindex, which clears everything before hand, or you had to remove entries dynamically, when an URL throws a 404 exception. Not really sure though, how you would solve it now.

aschempp · 2019-10-31T13:18:53Z

Regarding the BC question, I would simply keep the name and add the functionality. Who cares if it does not exactly match, it's still our search index listener and it will be obsolete at some point 🤷‍♂

Toflar · 2019-10-31T13:20:48Z

No idea why it would become obsolete. So shall I merge the logic into the existing AddToSearchIndexListener and not rename it then? @contao/developers

leofeyer · 2019-10-31T13:39:52Z

As briefly discussed in Slack, renaming the class to SearchIndexListener would be the best solution.

I am aware that renaming the class is a BC break, but our BC promise does not cover event listeners. If we really care about this, though, we might as well keep the old listener and no longer use it.

Toflar · 2019-10-31T15:17:11Z

but our BC promise does not cover event listeners

I find it very difficult if you put out such statements. What's "our BC promise"? Nobody knows until it's documented so I started it over here: contao/docs#138.

BTW: PR is updated, the failing tests are not related.

ausi · 2019-11-02T19:58:43Z

core-bundle/src/EventListener/SearchIndexListener.php


-        $this->indexer->index($document);
+            if (0 === \count($lds)) {
+                return;


Shouldn’t we also delete the document from the index in this case?

I thought so too but that would result in useless search index delete commands for 99% of all the use cases (everything that's not coming from Contao in your own Symfony app). The 1% case would be where a page was once generated by Contao and it's now not anymore. I think this is very rare and sounds like you've fundamentally changed your app = you should rebuild the whole search index anyway.

If search index delete commands for non-existent entries don’t hurt the performance too much, I think we should delete it in this case too to make it more robust.

I really don't know about this. I still think we should keep it as is.

I am also in favor of keeping the return statement. If there is no JSON data, the request is not a Contao request and the listener should not handle it.

don’t hurt the performance too much

Think of an external search indexer like ES or Algolia. It's an API request for every response. Sure, it's a kernel.terminate listener but we still have users that do not use fpm or litespeed.
I will also submit another PR to be able to disable this listener completely (makes no sense to run it if you clean the search index regularly using a cronjob imho).

The only unlogical thing about this is that we are not indexing the document if there is no JSON data but we are deleting it no matter whether there is JSON data or not. Is this intended?

Right, fixed in a2eb4d7.

leofeyer · 2019-11-08T16:20:10Z

Thank you @Toflar.

Toflar self-assigned this Oct 28, 2019

Toflar added the feature label Oct 28, 2019

Toflar added this to the 4.9 milestone Oct 28, 2019

Toflar requested review from leofeyer, aschempp and ausi October 28, 2019 16:56

aschempp removed their request for review October 31, 2019 13:15

Toflar force-pushed the feature/clear-search-document branch 2 times, most recently from 7b4a3fd to 9b614ca Compare October 31, 2019 14:41

Delete invalid URLs using the new search indexer abstraction

44fd9aa

Toflar force-pushed the feature/clear-search-document branch from 9b614ca to 44fd9aa Compare October 31, 2019 14:42

Toflar requested a review from aschempp October 31, 2019 15:17

Fixed comment

550141c

ausi reviewed Nov 2, 2019

View reviewed changes

ausi previously approved these changes Nov 6, 2019

View reviewed changes

Toflar dismissed ausi’s stale review via a04776b November 6, 2019 18:15

Only handle responses that contained any JSON LD data

a2eb4d7

Toflar force-pushed the feature/clear-search-document branch from a04776b to a2eb4d7 Compare November 6, 2019 18:17

Toflar requested a review from ausi November 6, 2019 18:18

ausi previously approved these changes Nov 6, 2019

View reviewed changes

Fix the coding style

1195a3f

leofeyer dismissed ausi’s stale review via 1195a3f November 8, 2019 15:51

leofeyer approved these changes Nov 8, 2019

View reviewed changes

leofeyer merged commit 2c80cbf into master Nov 8, 2019

leofeyer deleted the feature/clear-search-document branch November 8, 2019 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clear invalid URLs using the new search indexer abstraction #887

Clear invalid URLs using the new search indexer abstraction #887

Toflar commented Oct 28, 2019

Toflar commented Oct 28, 2019

aschempp commented Oct 28, 2019

aschempp commented Oct 29, 2019

fritzmg commented Oct 29, 2019

aschempp commented Oct 31, 2019

Toflar commented Oct 31, 2019

leofeyer commented Oct 31, 2019 •

edited

Loading

Toflar commented Oct 31, 2019

ausi Nov 2, 2019

Toflar Nov 4, 2019

ausi Nov 5, 2019

Toflar Nov 6, 2019

leofeyer Nov 6, 2019

Toflar Nov 6, 2019 •

edited

Loading

leofeyer Nov 6, 2019

Toflar Nov 6, 2019

leofeyer commented Nov 8, 2019

Clear invalid URLs using the new search indexer abstraction #887

Clear invalid URLs using the new search indexer abstraction #887

Conversation

Toflar commented Oct 28, 2019

Toflar commented Oct 28, 2019

aschempp commented Oct 28, 2019

aschempp commented Oct 29, 2019

fritzmg commented Oct 29, 2019

aschempp commented Oct 31, 2019

Toflar commented Oct 31, 2019

leofeyer commented Oct 31, 2019 • edited Loading

Toflar commented Oct 31, 2019

ausi Nov 2, 2019

Choose a reason for hiding this comment

Toflar Nov 4, 2019

Choose a reason for hiding this comment

ausi Nov 5, 2019

Choose a reason for hiding this comment

Toflar Nov 6, 2019

Choose a reason for hiding this comment

leofeyer Nov 6, 2019

Choose a reason for hiding this comment

Toflar Nov 6, 2019 • edited Loading

Choose a reason for hiding this comment

leofeyer Nov 6, 2019

Choose a reason for hiding this comment

Toflar Nov 6, 2019

Choose a reason for hiding this comment

leofeyer commented Nov 8, 2019

leofeyer commented Oct 31, 2019 •

edited

Loading

Toflar Nov 6, 2019 •

edited

Loading