Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TASK] Improve record indexer #310

Closed
wants to merge 1 commit into from
Closed

[TASK] Improve record indexer #310

wants to merge 1 commit into from

Conversation

dmitryd
Copy link
Contributor

@dmitryd dmitryd commented Apr 4, 2016

Improve language overlay handling of the record indexer and performance of the indexer.

Solr record indexer does not handle language overlay efficiently and does not support content fallback.

Firsts, the cache for language information is not kept across instances of the indexer. Each new record
requires a new instance of the indexer but language information does not change between creation of
indexer instances. Language information is obtained by instantiating TSFE and parsing TS templates.
This can take 0.5-2 seconds depending on the amount of TypoScript. It makes sense to cache this
information statically. It helps to shorten indexing times dramatically and lower CPU usage on
the server.

Secondly, if there is content overlay, than records like news will be properly overlayed by the
corrsponding news plugin (i.e. you can see de_DE record instead of de_CH if de_CH does not exist) but
solr indexer will only give a record in the default language in such case, which is unexpected result
for the customer.

This commit fixes both issues. It is implemented as a single change because it is highly connected
code.

…of the indexer

Solr record indexer does not handle language overlay efficiently and does not support content fallback.

Firsts, the cache for language information is not kept across instances of the indexer. Each new record
requires a new instance of the indexer but language information does not change between creation of
indexer instances. Language information is obtained by instantiating TSFE and parsing TS templates.
This can take 0.5-2 seconds depending on the amount of TypoScript. It makes sense to cache this
information statically. It helps to shorten indexing times dramatically and lower CPU usage on
the server.

Secondly, if there is content overlay, than records like news will be properly overlayed by the
corrsponding news plugin (i.e. you can see de_DE record instead of de_CH if de_CH does not exist) but
solr indexer will only give a record in the default language in such case, which is unexpected result
for the customer.

This commit fixes both issues. It is implemented as a single change because it is highly connected
code.
@timohund
Copy link
Contributor

timohund commented Apr 4, 2016

Thx for you patch. Is it possible to split this into two patches? I think the caching issue could be merged right away. Regarding the content fallback i am not sure. What happens when you fallback from german to english? Wouldn't you have english content in the german index for untranslated content? Afaik i was a wanted behaviour to skill the language fallback.

@dmitryd
Copy link
Contributor Author

dmitryd commented Apr 4, 2016

The purpose of the fallback is to show a version of the record in the different language in case if the version in the current language does not exist. It is possible to configure the system to show a totally different language and it happens quite often. There can be any number of fallback languages.

Language fallback is usually defined by the customer. For example, you can define fallback like: French -> German or even French -> German -> English.

We had a customer with about six languages but news were only in German and sometimes in English. The fallback was defined as

config {
  sys_language_mode = content_fallback; 1, 0
}

When users were on French (fr_CA and fr_CH) or Italian (it_IT or it_CH) pages, news were first searched in English and than in German. News plugin displays that correctly (try English, than German) but solr currently will only index German for our FR and IT. So when users select news facet, they see news search results and snippet in German but when they click on links, they see news in English.

I can split this PR but I afraid there will be merge conflicts after the first one is merged. I can also make caching PR and if it is approved, do another PR for fallback.

@timohund
Copy link
Contributor

timohund commented Apr 4, 2016

It would be nice when you could do the caching stuff in another PR. I think we can merge the caching stuff soon. Regarding the fallback i guess it is better to talk about it with ingo.

@dmitryd
Copy link
Contributor Author

dmitryd commented Apr 4, 2016

Done :)

#314

@dmitryd dmitryd closed this Apr 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants