Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RTM] Implemented search indexer abstraction #730

Open
wants to merge 8 commits into
base: master
from

Conversation

@Toflar
Copy link
Member

commented Sep 6, 2019

This PR implements a search indexer abstraction level so that one can have additional search indexers. The core one can also be disabled completely by configuring

contao:
    search:
        default_indexer:
            enabled: false

Here are some key concepts:

  • There's a general, simple IndexerInterface now. I've implemented a DelegatingIndexer that just forwards to all registered indexers so we can have multiple indexers.
  • A Document represents an URI, response status code, headers and the body. For meta data I chose to use application/ld+json scripts because they are designed for exactly this use case (also see schema.org).
  • I've removed the $GLOBALS['TL_NOINDEX_KEYS'] because they are just plain nonsense. If you don't want the page to be indexed when these paramters are set, you have to configure the page to have a <meta name="robots" content="noindex"> tag. Otherwise neither any real search engine nor my planned indexer will have a chance to find out what you want to do. Also, why would you not want to index pages with a page parameter present? Maybe that's just fine for some cases.

All unit tests etc. are already done. So this PR is in a final state to be reviewed 馃槉

Toflar added 2 commits Sep 6, 2019

@leofeyer leofeyer added the feature label Sep 6, 2019

@leofeyer leofeyer added this to the 4.9 milestone Sep 6, 2019

return $treeBuilder;
}
private function addSearchSection(ArrayNodeDefinition $rootNode): void

This comment has been minimized.

Copy link
@leofeyer

leofeyer Sep 6, 2019

Member

Why did you encapsulate this in a separate section?

This comment has been minimized.

Copy link
@Toflar

Toflar Sep 8, 2019

Author Member

Because it鈥榮 best practice (should be done for all the rest as well).

@leofeyer
Copy link
Member

left a comment

Looks pretty good overall.

I guess the TL_NOINDEX_KEYS exists to prevent flooding the search index with duplicate or irrelevant search entries. We should preserve the functionality, although I agree that we should generate a noindex tag in this case.

// Configure whether to index protected pages on the default indexer
$defaultIndexer->setArgument(1, $config['search']['default_indexer']['enableIndexProtected']);
// Remove the default indexer completely if it was disabled

This comment has been minimized.

Copy link
@leofeyer

leofeyer Sep 6, 2019

Member

We should probably check this at the beginning and return early?

This comment has been minimized.

Copy link
@Toflar

Toflar Sep 8, 2019

Author Member

I didn鈥榯 do that on purpose. I like it better that way馃し鈥嶁檪锔 The only thing that could be moved is the argument thing anyway.

@@ -496,6 +496,20 @@ services:
- '%contao.prepend_locale%'
public: true

contao.search.indexer.default:

This comment has been minimized.

Copy link
@leofeyer

leofeyer Sep 6, 2019

Member

This should be Contao\CoreBundle\Search\Indexer\DefaultIndexer (we will be using the class names as service IDs for all new services).

This comment has been minimized.

Copy link
@Toflar

Toflar Sep 9, 2019

Author Member

Right, changed in 1b715d3.

@ausi
Copy link
Member

left a comment

I think dropping TL_NOINDEX_KEYS could lead to problems. Especially calender parameters like day, month and year could result in a big number of duplicate entries for the same page.

Once we have a better way to detect such 鈥渄uplicates鈥 we can remove TL_NOINDEX_KEYS IMO.

'pageId' => (int) $objPage->id,
'language' => $objPage->language,
'title' => $objPage->pageTitle ?: $objPage->title,
'noSearch' => (bool) $objPage->noSearch,

This comment has been minimized.

Copy link
@ausi

ausi Sep 9, 2019

Member

How about checking for $GLOBALS['TL_NOINDEX_KEYS'] here for backwards compatibility?

core-bundle/src/Search/Document.php Show resolved Hide resolved
@Toflar

This comment has been minimized.

Copy link
Member Author

commented Sep 9, 2019

All comments addressed. Ready for another round of reviews.
I've restored the TL_NOINDEX_KEYS feature in a7af57e although the hardcoded page_ comparison hurt my eyes so I had to replace it by a regular expression :)

@ausi
ausi approved these changes Sep 9, 2019
Copy link
Member

left a comment

LGTM 馃帀

core-bundle/src/Resources/contao/classes/Frontend.php Outdated Show resolved Hide resolved
$noSearch = (bool) $objPage->noSearch;
// BC, do not use $GLOBALS['TL_NOINDEX_KEYS'] anymore but make sure your page type delivers the correct
// <meta name="robots" content="noindex"> tag.

This comment has been minimized.

Copy link
@ausi

ausi Sep 9, 2019

Member

Should we check if TL_NOINDEX_KEYS was modified and trigger a deprecation warning here? Like:

if ($GLOBALS['TL_NOINDEX_KEYS'] && 10 !== \count($GLOBALS['TL_NOINDEX_KEYS'])) {
	@trigger_error(鈥);
}

This comment has been minimized.

Copy link
@Toflar

Toflar Sep 9, 2019

Author Member

I don't like this. If something is deprecated, the core shouldn't make use of it anymore.

@Toflar Toflar changed the title [RFC] Implemented search indexer abstraction [RTM] Implemented search indexer abstraction Sep 10, 2019

@ausi
ausi approved these changes Sep 10, 2019
@Toflar

This comment has been minimized.

Copy link
Member Author

commented Sep 10, 2019

Apart from a rebase to master once Symfony deps are raised, this is RTM 馃帀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can鈥檛 perform that action at this time.