Implement a search indexer abstraction #730

Toflar · 2019-09-06T15:00:36Z

This PR implements a search indexer abstraction level so that one can have additional search indexers. The core one can also be disabled completely by configuring

contao:
    search:
        default_indexer:
            enabled: false

Here are some key concepts:

There's a general, simple IndexerInterface now. I've implemented a DelegatingIndexer that just forwards to all registered indexers so we can have multiple indexers.
A Document represents an URI, response status code, headers and the body. For meta data I chose to use application/ld+json scripts because they are designed for exactly this use case (also see schema.org).
I've removed the $GLOBALS['TL_NOINDEX_KEYS'] because they are just plain nonsense. If you don't want the page to be indexed when these paramters are set, you have to configure the page to have a <meta name="robots" content="noindex"> tag. Otherwise neither any real search engine nor my planned indexer will have a chance to find out what you want to do. Also, why would you not want to index pages with a page parameter present? Maybe that's just fine for some cases.

All unit tests etc. are already done. So this PR is in a final state to be reviewed 😊

core-bundle/src/DependencyInjection/Configuration.php

leofeyer

Looks pretty good overall.

I guess the TL_NOINDEX_KEYS exists to prevent flooding the search index with duplicate or irrelevant search entries. We should preserve the functionality, although I agree that we should generate a noindex tag in this case.

core-bundle/src/DependencyInjection/ContaoCoreExtension.php

core-bundle/src/Resources/config/services.yml

ausi

I think dropping TL_NOINDEX_KEYS could lead to problems. Especially calender parameters like day, month and year could result in a big number of duplicate entries for the same page.

Once we have a better way to detect such “duplicates” we can remove TL_NOINDEX_KEYS IMO.

core-bundle/src/Resources/contao/library/Contao/Config.php

core-bundle/src/Resources/contao/pages/PageRegular.php

core-bundle/src/Search/Document.php

core-bundle/src/Search/Indexer/DefaultIndexer.php

Toflar · 2019-09-09T12:56:50Z

All comments addressed. Ready for another round of reviews.
I've restored the TL_NOINDEX_KEYS feature in a7af57e although the hardcoded page_ comparison hurt my eyes so I had to replace it by a regular expression :)

ausi

LGTM 🎉

core-bundle/src/Resources/contao/classes/Frontend.php

core-bundle/src/Resources/contao/pages/PageRegular.php

Toflar · 2019-09-10T07:55:57Z

Apart from a rebase to master once Symfony deps are raised, this is RTM 🎉

Toflar · 2019-10-16T14:13:12Z

Merged latest master into this PR and adjusted the configuration section acordingly. Should be all ready to merge now :)

leofeyer

Very good job! Only the service IDs seem a little inconsistent to me (see my comments).

core-bundle/src/DependencyInjection/Compiler/SearchIndexerPass.php

core-bundle/src/DependencyInjection/Configuration.php

core-bundle/src/DependencyInjection/ContaoCoreExtension.php

core-bundle/src/Resources/config/services.yml

core-bundle/src/Resources/contao/pages/PageRegular.php

leofeyer · 2019-10-23T09:52:24Z

Do we really need the Indexer sub-namespace?

namespace Contao\CoreBundle\Search\Indexer;

class DefaultIndexer
{
}

Will there be a large number of different indexer classes? And what other sub-namespaces will we have in the future?

Toflar · 2019-10-23T12:12:51Z

I don't know. Maybe an ElasticSearchIndexer. An AlgolicaSearchIndexer?

…hus a memory leak

core-bundle/src/Resources/contao/pages/PageRegular.php

core-bundle/tests/Search/DocumentTest.php

leofeyer · 2019-10-23T16:06:47Z

Thank you very much @Toflar.

Description ----------- This is a follow-up PR for #730. I've introduced a search indexer abstraction there but clearing a single URI was not part of it. I've extended the interface which now also contains a `clearDocument()` method (no BC break, 4.9 is still in development) and I've extracted the logic into its own listener so that it doesn't just work for our own `PageError404` but for any exception. Commits ------- 44fd9aa Delete invalid URLs using the new search indexer abstraction 550141c Fixed comment a2eb4d7 Only handle responses that contained any JSON LD data 1195a3f Fix the coding style

Description ----------- This is a follow-up PR for contao/contao#730. I've introduced a search indexer abstraction there but clearing a single URI was not part of it. I've extended the interface which now also contains a `clearDocument()` method (no BC break, 4.9 is still in development) and I've extracted the logic into its own listener so that it doesn't just work for our own `PageError404` but for any exception. Commits ------- 44fd9aa7 Delete invalid URLs using the new search indexer abstraction 550141ca Fixed comment a2eb4d76 Only handle responses that contained any JSON LD data 1195a3ff Fix the coding style

Toflar added 2 commits September 6, 2019 16:46

Implemented search indexer abstraction

2de8017

Fixed phpstan errors

b84fb53

leofeyer assigned Toflar Sep 6, 2019

leofeyer added the feature label Sep 6, 2019

leofeyer added this to the 4.9 milestone Sep 6, 2019

leofeyer reviewed Sep 6, 2019

View reviewed changes

core-bundle/src/DependencyInjection/Configuration.php Outdated Show resolved Hide resolved

leofeyer requested changes Sep 6, 2019

View reviewed changes

core-bundle/src/DependencyInjection/ContaoCoreExtension.php Outdated Show resolved Hide resolved

core-bundle/src/Resources/config/services.yml Show resolved Hide resolved

ausi requested changes Sep 9, 2019

View reviewed changes

core-bundle/src/Resources/contao/library/Contao/Config.php Outdated Show resolved Hide resolved

core-bundle/src/Resources/contao/pages/PageRegular.php Outdated Show resolved Hide resolved

core-bundle/src/Search/Document.php Show resolved Hide resolved

ausi reviewed Sep 9, 2019

View reviewed changes

core-bundle/src/Search/Indexer/DefaultIndexer.php Outdated Show resolved Hide resolved

Toflar added 5 commits September 9, 2019 14:29

Directly purge search tables and renamed services

4652059

Added a Document::createFromRequestResponse() factory

ef3cb8c

Simplified legacy config mapper

98ff456

Fixed alias

5ce8580

Restored $GLOBALS['TL_NOINDEX_KEYS']

a7af57e

ausi approved these changes Sep 9, 2019

View reviewed changes

core-bundle/src/Resources/contao/classes/Frontend.php Outdated Show resolved Hide resolved

core-bundle/src/Resources/contao/pages/PageRegular.php Outdated Show resolved Hide resolved

Re-use factory

ae61ce6

Toflar changed the title ~~[RFC] Implemented search indexer abstraction~~ [RTM] Implemented search indexer abstraction Sep 10, 2019

ausi approved these changes Sep 10, 2019

View reviewed changes

leofeyer force-pushed the master branch from 2ea6ebd to 91d73a0 Compare September 26, 2019 16:21

Merge branch 'master' into feature/search-indexer-abstraction

aa70f92

Toflar force-pushed the feature/search-indexer-abstraction branch from 8f8c3de to aa70f92 Compare October 16, 2019 13:56

leofeyer requested changes Oct 22, 2019

View reviewed changes

Toflar added 5 commits October 23, 2019 09:43

Fixed wrong service name

ba3f3b1

Renamed and repositioned indexProtected configuration

29d6196

Merge branch 'master' into feature/search-indexer-abstraction

14a4183

Fixed service names

da1ef56

Fixed YAML indendation

e78fe14

Toflar added 2 commits October 23, 2019 10:34

Fixed missing class

9b9526a

Fixed missing class again

b0dfb41

Toflar and others added 3 commits October 23, 2019 14:26

Fixed wrong alias

605ebe7

Fixed delegating indexer calling itself causing an endless loop and t…

5e1c7eb

…hus a memory leak

Fix the coding style

fba751c

leofeyer force-pushed the feature/search-indexer-abstraction branch from dcc0864 to fba751c Compare October 23, 2019 14:54

leofeyer requested changes Oct 23, 2019

View reviewed changes

Toflar and others added 3 commits October 23, 2019 17:18

Use security.helper

7f8a3d7

Simplified TL_NOINDEX_KEYS filtering

485faf2

Rename the test methods and revert the security.helper changes

1d73d45

leofeyer force-pushed the feature/search-indexer-abstraction branch from 1b0269f to 1d73d45 Compare October 23, 2019 16:05

leofeyer merged commit 27fe686 into master Oct 23, 2019

leofeyer deleted the feature/search-indexer-abstraction branch October 23, 2019 16:07

This was referenced Oct 24, 2019

Document new search indexer abstraction contao/docs#114

Closed

Clear invalid URLs using the new search indexer abstraction #887

Merged

leofeyer changed the title ~~[RTM] Implemented search indexer abstraction~~ Implement a search indexer abstraction Dec 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a search indexer abstraction #730

Implement a search indexer abstraction #730

Toflar commented Sep 6, 2019

leofeyer left a comment •

edited

Loading

ausi left a comment

Toflar commented Sep 9, 2019

ausi left a comment

Toflar commented Sep 10, 2019 •

edited

Loading

Toflar commented Oct 16, 2019

leofeyer left a comment •

edited

Loading

leofeyer commented Oct 23, 2019

Toflar commented Oct 23, 2019

leofeyer commented Oct 23, 2019

Implement a search indexer abstraction #730

Implement a search indexer abstraction #730

Conversation

Toflar commented Sep 6, 2019

leofeyer left a comment • edited Loading

Choose a reason for hiding this comment

ausi left a comment

Choose a reason for hiding this comment

Toflar commented Sep 9, 2019

ausi left a comment

Choose a reason for hiding this comment

Toflar commented Sep 10, 2019 • edited Loading

Toflar commented Oct 16, 2019

leofeyer left a comment • edited Loading

Choose a reason for hiding this comment

leofeyer commented Oct 23, 2019

Toflar commented Oct 23, 2019

leofeyer commented Oct 23, 2019

leofeyer left a comment •

edited

Loading

Toflar commented Sep 10, 2019 •

edited

Loading

leofeyer left a comment •

edited

Loading