Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clear invalid URLs using the new search indexer abstraction #887

Merged
merged 4 commits into from
Nov 8, 2019

Conversation

Toflar
Copy link
Member

@Toflar Toflar commented Oct 28, 2019

This is a follow-up PR for #730.
I've introduced a search indexer abstraction there but clearing a single URI was not part of it.
I've extended the interface which now also contains a clearDocument() method (no BC break, 4.9 is still in development) and I've extracted the logic into its own listener so that it doesn't just work for our own PageError404 but for any exception.

@Toflar Toflar self-assigned this Oct 28, 2019
@Toflar Toflar added the feature label Oct 28, 2019
@Toflar Toflar added this to the 4.9 milestone Oct 28, 2019
@Toflar
Copy link
Member Author

Toflar commented Oct 28, 2019

Thinking about it, why did we only ever clear on 404 so far? That doesn't make sense to me. Imho anything that is successful should be indexed and the rest should be cleared.
So I might as well put the logic into the AddToSearchIndexListener which would become a general SearchIndexListener then.
Don't know if I may rename that because of BC. I think we have to decide on what we do with listeners as they constantly change and imho it doesn't make sense to BC protect them. They shouldn't be extended and used anywhere anyway.

@aschempp
Copy link
Member

Regarding the BC promise issue, maybe we should have some sort of contao/framework-bundle that is BC, and keep "implementation" stuff in contao/core-bundle that is not BC.

@aschempp
Copy link
Member

Thinking about it, why did we only ever clear on 404 so far? That doesn't make sense to me. Imho anything that is successful should be indexed and the rest should be cleared.

Agreed

@fritzmg
Copy link
Contributor

fritzmg commented Oct 29, 2019

Thinking about it, why did we only ever clear on 404 so far?

Because there was no other way of knowing when a page does not exist any more. You either have to do a full reindex, which clears everything before hand, or you had to remove entries dynamically, when an URL throws a 404 exception. Not really sure though, how you would solve it now.

@aschempp aschempp removed their request for review October 31, 2019 13:15
@aschempp
Copy link
Member

Regarding the BC question, I would simply keep the name and add the functionality. Who cares if it does not exactly match, it's still our search index listener and it will be obsolete at some point 🤷‍♂

@Toflar
Copy link
Member Author

Toflar commented Oct 31, 2019

No idea why it would become obsolete. So shall I merge the logic into the existing AddToSearchIndexListener and not rename it then? @contao/developers

@leofeyer
Copy link
Member

leofeyer commented Oct 31, 2019

As briefly discussed in Slack, renaming the class to SearchIndexListener would be the best solution.

I am aware that renaming the class is a BC break, but our BC promise does not cover event listeners. If we really care about this, though, we might as well keep the old listener and no longer use it.

@Toflar Toflar force-pushed the feature/clear-search-document branch 2 times, most recently from 7b4a3fd to 9b614ca Compare October 31, 2019 14:41
@Toflar Toflar force-pushed the feature/clear-search-document branch from 9b614ca to 44fd9aa Compare October 31, 2019 14:42
@Toflar
Copy link
Member Author

Toflar commented Oct 31, 2019

but our BC promise does not cover event listeners

I find it very difficult if you put out such statements. What's "our BC promise"? Nobody knows until it's documented so I started it over here: contao/docs#138.

BTW: PR is updated, the failing tests are not related.

@Toflar Toflar requested a review from aschempp October 31, 2019 15:17

$this->indexer->index($document);
if (0 === \count($lds)) {
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t we also delete the document from the index in this case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too but that would result in useless search index delete commands for 99% of all the use cases (everything that's not coming from Contao in your own Symfony app). The 1% case would be where a page was once generated by Contao and it's now not anymore. I think this is very rare and sounds like you've fundamentally changed your app = you should rebuild the whole search index anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If search index delete commands for non-existent entries don’t hurt the performance too much, I think we should delete it in this case too to make it more robust.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't know about this. I still think we should keep it as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also in favor of keeping the return statement. If there is no JSON data, the request is not a Contao request and the listener should not handle it.

Copy link
Member Author

@Toflar Toflar Nov 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don’t hurt the performance too much

Think of an external search indexer like ES or Algolia. It's an API request for every response. Sure, it's a kernel.terminate listener but we still have users that do not use fpm or litespeed.
I will also submit another PR to be able to disable this listener completely (makes no sense to run it if you clean the search index regularly using a cronjob imho).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only unlogical thing about this is that we are not indexing the document if there is no JSON data but we are deleting it no matter whether there is JSON data or not. Is this intended?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, fixed in a2eb4d7.

ausi
ausi previously approved these changes Nov 6, 2019
@Toflar Toflar force-pushed the feature/clear-search-document branch from a04776b to a2eb4d7 Compare November 6, 2019 18:17
@Toflar Toflar requested a review from ausi November 6, 2019 18:18
ausi
ausi previously approved these changes Nov 6, 2019
@leofeyer leofeyer merged commit 2c80cbf into master Nov 8, 2019
@leofeyer
Copy link
Member

leofeyer commented Nov 8, 2019

Thank you @Toflar.

@leofeyer leofeyer deleted the feature/clear-search-document branch November 8, 2019 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants