Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleted page not removed from Elasticsearch index when page has content elements #76

Open
dzschille opened this issue Jan 12, 2021 · 2 comments

Comments

@dzschille
Copy link

I use mksearch with Elasticsearch which works well. But i found that deleted pages don't get removed from the index when the page has a content element (which normally is always the case).

I have verified this error with a fresh TYPO3 9.5.23 installation and version 9.5.16 of dmk/mksearch, both installed via Composer. Steps to reproduce:

  1. I made a page named "A" and a page named "B". Page B has a text/image content element with just a header and a sentence.
  2. I created a mksearch index for Elasticsearch ("test;localhost,9200,;").
  3. Add a indexer config type core : page. Only edited config here is "include.pageTrees.0 = 1".
  4. Index the queue and check that both pages are in the Elasticsearch index. I used Kibana for that.
  5. Delete pages A and B. Reindex the queue again.

Result: In Kibana i still see page B, page A has been removed from the index.

Expected: both pages should have been removed from the index.

@hannesbochmann
Copy link
Member

We need to check this problem. But this should have nothing to do with the content element as the page indexer doesn't mind the content elements on it at all.
Btw, if you want the content of a page to be indexed and searched you need to use the indexer core.tt_content. This indexer should take care of deleted pages on top. Maybe this solves your problem after all.

@dzschille
Copy link
Author

Thanks for your response and the hint with the content indexer @hannesbochmann .
In the project where i stumbled over the page deletion bug we removed the content indexer which we had added at the beginning. We had to many very small content Elements which just bloated the search index. So the search results weren't useful in a lot of cases. We removed the tt_content indexer and extended the page indexer to also index the text of the content elements of the pages. That made the search results more accurate because every page had just one instance in the index and it's content could be analyzed better by Elasticsearch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants