Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ReindexThread always runs - never stops #18605

Closed
freddyucv opened this issue Jun 4, 2020 · 8 comments
Closed

ReindexThread always runs - never stops #18605

freddyucv opened this issue Jun 4, 2020 · 8 comments
Labels
Milestone

Comments

@freddyucv
Copy link

This Thread is doin a Database request two times for second here:

https://github.com/dotCMS/core/blob/master/dotCMS/src/main/java/com/dotmarketing/common/reindex/ReindexThread.java#L179

I think is better is it. just is Paused when not exists reindex to proccess and start it again when a reindex is push into the queue, here could be a good place:

https://github.com/dotCMS/core/blob/master/dotCMS/src/main/java/com/dotcms/content/elasticsearch/business/ContentletIndexAPIImpl.java#L581

Also the ReindexThread can try to procces the same reindex more than once, it keep a 'lastIdIndexed' to know witch content is already been proccesing:

https://github.com/dotCMS/core/blob/master/dotCMS/src/main/java/com/dotmarketing/common/reindex/ReindexQueueFactory.java#L274

but after a empty response from the database this index is set to 0

https://github.com/dotCMS/core/blob/master/dotCMS/src/main/java/com/dotmarketing/common/reindex/ReindexQueueFactory.java#L283

so it possible that meanwhile wait for a put reindex request the ReindexThread awake again and try to reindex the same contentlet again (it happend to me locally several times)

Describe the solution you'd like

I think the ReindexThread should pause after finish to send all the reindex request, and if a new reindex is register or a reindex fail it thread is start again.

@wezell
Copy link
Contributor

wezell commented Jul 6, 2020

Important note - a dotcms server can/should handle not just locally saved content, but content that has been imported by other servers in the cluster - indexing should be distributed across dotCMS nodes in a cluster which for the most part it is.

If we were to revisit this code, the next step would be to leverage a distributed multi-consumer queue (at least once) like kafka or redis streams to handle the queuing/de-queuing and retrying of the content to be indexed.

@wezell wezell changed the title Improvement ReindexThread performance ReindexThread always runs - never stops Oct 7, 2020
@wezell wezell added this to the Scout Current milestone Oct 20, 2020
wezell added a commit that referenced this issue Oct 20, 2020
@nollymar nollymar self-assigned this Oct 26, 2020
@wezell
Copy link
Contributor

wezell commented Oct 26, 2020

Important note - a dotcms server can/should handle not just locally saved content, but content that has been imported by other servers in the cluster - indexing should be distributed across dotCMS nodes in a cluster which for the most part it is.

If we were to revisit this code, the next step would be to leverage a distributed multi-consumer queue (at least once) like kafka or redis streams to handle the queuing/de-queuing and retrying of the content to be indexed.

Such a queue implementation will be started in this issue -
#18149

dsilvam pushed a commit that referenced this issue Oct 28, 2020
* #18605 pauses and then unpauses based on a cache invalidation

* #18605 adding ttl to the cache put in the logger

* #18605 less logging

* #19500 sanitize sql

* #19500 fixes potential sql vunerabilities

* #19500 writing tests

* #19500 tests

* we should not need TLS set to true

* #19500 removing unneeded files
@nollymar nollymar modified the milestones: 102720_SCOUT, Scout Current Oct 28, 2020
@nollymar
Copy link
Contributor

PR: #19470

nollymar pushed a commit that referenced this issue Oct 29, 2020
nollymar pushed a commit that referenced this issue Oct 29, 2020
nollymar pushed a commit that referenced this issue Oct 30, 2020
nollymar pushed a commit that referenced this issue Oct 30, 2020
@nollymar nollymar removed their assignment Oct 30, 2020
nollymar pushed a commit that referenced this issue Nov 2, 2020
nollymar pushed a commit that referenced this issue Nov 2, 2020
* #18605 pauses and then unpauses based on a cache invalidation

* #18605 adding ttl to the cache put in the logger

* #18605 less logging

* #18605 Fixing failing test

* #18605 Adding a kind of cache invalidation to unpause ReindexThread (10 minutes by default)

* #18605 Implementing new ITs

* #18605 Increasing logging time

* #18605 Doc test

Co-authored-by: Nollymar Longa <>
@nollymar
Copy link
Contributor

nollymar commented Nov 2, 2020

Note to QA: Please test the creation/edition of pieces of content. Besides, it's important to run a full reindex, especially in a clustered environment

@fabrizzio-dotCMS fabrizzio-dotCMS self-assigned this Nov 3, 2020
dsilvam added a commit that referenced this issue Nov 4, 2020
* Update dotcmsReleaseVersion and coreWebReleasion version

* update release version

* #18505 JSONTool does not return sub arrays

* #18505 now the JSONTool uses the Jackson to map the string json as a single Maps and Lists

* #18505 now the JSONTool uses the Jackson to map the string json as a single Maps and Lists

* #19364 Unable to edit category permissions as limited user even you have full rights

* #18314 Make Query Tool Use fetch() to fill response

* #19098 SAML update logout page.  (#19450)

* include css in jsp

* label updated

* Updating sql files (#19478)

* Updating sql files to remove contraints

* Updating sql files to remove contraints

* #18690 Allow Push publish just for enterprise license in the receiver (#19492)

* #18690 Allow Push publish just for enterprise license in the receiver

* testing

* Fixing test

* Issue 19500 sql injection containers (#19501)

* #18605 pauses and then unpauses based on a cache invalidation

* #18605 adding ttl to the cache put in the logger

* #18605 less logging

* #19500 sanitize sql

* #19500 fixes potential sql vunerabilities

* #19500 writing tests

* #19500 tests

* we should not need TLS set to true

* #19500 removing unneeded files

* #19338 dont lowercase (#19506)

* #19338 dont lowercase

* #19338 integration test

* #19338 missing test resource

* #19509 use proper db columm in query (#19510)

* #19509 use proper db columm in query

* #19509 use proper property from contentlet

* #19509 fix integration test

* #19509 fix integration test

* #19471 Use proper value when discarding conflicts (#19519)

* #18780 fixes job when new hostname starts with  original hostname (#19522)

* #19509 Fixing bug when use comma in host's name (#19528)

* #19509 Fixing bug when use comma in host's name

* Fixing test

* update core-web version

* merge with master

* Update .gitmodules

* Update gradle.properties

Co-authored-by: Jonathan <jonathan.sanchez@dotcms.com>
Co-authored-by: erickgonzalez <erick.gonzalez@dotcms.com>
Co-authored-by: hmoreras <31667212+hmoreras@users.noreply.github.com>
Co-authored-by: Freddy Rodriguez <freddy0309@gmail.com>
Co-authored-by: Will Ezell <will@dotcms.com>
@fabrizzio-dotCMS
Copy link
Contributor

fabrizzio-dotCMS commented Nov 5, 2020

I did set up a two-nodes cluster (node-1, and node-2) on my local machine.
Then I tested the following scenarios:

  1. I edited single pieces of content and verified the changes became visible for both nodes. No problem. The edition of one single piece of content seems to be processed by the same node issuing the reindex operation.

  2. I made modifications on content-types that already had a number of pieces of content and I can tell that the reindex operation isn't necessarily attended by the node issuing the operation. The changes in the pieces of content are picked by the two nodes smoothly.

  3. I tested bulk workflow actions. Firing them from the content search portlet and I was able to verify the ReindexThread workload is distributed on both nodes (Or at least the movement on the logs give me such an impression).
    I see entries like :

13:46:16.438  INFO  reindex.ReindexThread - ---  ReindexThread total/todo/bulk: 5590/6/12
13:46:16.463  INFO  reindex.ReindexThread - ---  ReindexThread total/todo/bulk: 5600/5/10
13:46:16.479  INFO  reindex.ReindexThread - ---  ReindexThread total/todo/bulk: 5604/2/4
13:46:16.496  INFO  reindex.ReindexThread - ---  ReindexThread total/todo/bulk: 5608/2/4
13:46:16.508  INFO  reindex.ReindexThread - ---  ReindexThread total/todo/bulk: 5610/1/2

in both Node-1 and Node-2
The workload doesn't seem to be distributed evenly between the two nodes but that doesn't seem to be an issue.

  1. I fired a reindex operation and again the Reindex Process seems to be distributed between the two nodes
    I see something like this going on in both nodes. The progress is visible on the UI from the two nodes.
13:38:13.631  INFO  reindex.BulkProcessorListener - -----------
13:38:13.631  INFO  reindex.BulkProcessorListener - Reindexing Server #  : 2 of 2
13:38:13.631  INFO  reindex.BulkProcessorListener - Total Indexed        : 250
13:38:13.631  INFO  reindex.BulkProcessorListener - ReindexEntries found : 300
13:38:13.632  INFO  reindex.BulkProcessorListener - BulkRequests created : 250
13:38:13.632  INFO  reindex.BulkProcessorListener - Full Reindex Elapsed : 5.632s

which again makes me think the operation is distributed and processed by the two nodes.

Therefore I am passing it to QA

As a side note, I had to increase my ES Heap memory to overcome a recurrent exception that wouldn't let me complete testing.

The exception looks more or less like this:

11:08:55.392  ERROR reindex.ReindexThread - Bulk  process failed entirely:Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [988315602/942.5mb], which is larger than the limit of [986061209/940.3mb], real usage: [988223152/942.4mb], new bytes reserved: [92450/90.2kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=92450/90.2kb, accounting=67407331/64.2mb]]
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=circuit_breaking_exception, reason=[parent] Data too large, data for [<http_request>] would be [988315602/942.5mb], which is larger than the limit of [986061209/940.3mb], real usage: [988223152/942.4mb], new bytes reserved: [92450/90.2kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=92450/90.2kb, accounting=67407331/64.2mb]]

But I was able to make it go away by changing the value of pram "PROVIDER_ELASTICSEARCH_HEAP_SIZE=2g"
in our dot_opendistro.

@fabrizzio-dotCMS fabrizzio-dotCMS removed their assignment Nov 5, 2020
@bryanboza
Copy link
Member

After these changes we have too many lines of the same message which is causing a lot of noise in the logs, we need to take care of this.
image

@nollymar
Copy link
Contributor

Fixed. The message will be logged every 60 minutes:
PR: #19600

@bryanboza
Copy link
Member

Fixed, now we don't have this message in the logs.

nollymar added a commit that referenced this issue Nov 19, 2020
Co-authored-by: Nollymar Longa <>
dsilvam added a commit that referenced this issue Nov 19, 2020
* Update dotcmsReleaseVersion and coreWebReleasion version

* update release version

* #19466 changing default DOTGENERATED_DEFAULT_PATH=shared (#19467)

* #19466 changing default DOTGENERATED_DEFAULT_PATH=shared

* #19466 changing default DOTGENERATED_DEFAULT_PATH=shared

* Issue 19486 failed login timeout (#19577)

* #19486 sane default for failed login timeout

* #19486 turning off clickstreams, linkchecker

Co-authored-by: Will Ezell <will@dotcms.com>

* Update coreWebReleaseVersion

* Update coreWebReleaseVersion

* PP Sometimes fails because of file-based containers (#19602)

* Fixing GA with env files (#19603)

* Fix starter use

* #19558 Adding detail page when sending a COntent Type by PP (#19601)

* #19598 Fixing compilation error after refactoring made on VersionableAPI (#19607)

Co-authored-by: Nollymar Longa <>

* #18605 Logging reindex pause every 1 hour (#19600)

Co-authored-by: Nollymar Longa <>

Co-authored-by: Will Ezell <will@dotcms.com>
Co-authored-by: Freddy Montes <freddymontes@gmail.com>
Co-authored-by: Freddy Rodriguez <freddy0309@gmail.com>
Co-authored-by: Victor Alfaro <victor.alfaro@dotcms.com>
Co-authored-by: Nollymar Longa <nollymar.longa@dotcms.com>
@wezell wezell closed this as completed Nov 24, 2020
dsilvam added a commit that referenced this issue Nov 30, 2020
* Update dotcmsReleaseVersion and coreWebReleasion version

* update release version

* #19466 changing default DOTGENERATED_DEFAULT_PATH=shared (#19467)

* #19466 changing default DOTGENERATED_DEFAULT_PATH=shared

* #19466 changing default DOTGENERATED_DEFAULT_PATH=shared

* Issue 19486 failed login timeout (#19577)

* #19486 sane default for failed login timeout

* #19486 turning off clickstreams, linkchecker

Co-authored-by: Will Ezell <will@dotcms.com>

* Update coreWebReleaseVersion

* Update coreWebReleaseVersion

* PP Sometimes fails because of file-based containers (#19602)

* Fixing GA with env files (#19603)

* Fix starter use

* #19558 Adding detail page when sending a COntent Type by PP (#19601)

* #19598 Fixing compilation error after refactoring made on VersionableAPI (#19607)

Co-authored-by: Nollymar Longa <>

* #18605 Logging reindex pause every 1 hour (#19600)

Co-authored-by: Nollymar Longa <>

* Updating starter version (#19609)

Co-authored-by: Nollymar Longa <>

* #19410 Setting default parent folder type if exists (#19610)

Co-authored-by: Nollymar Longa <>

* A new version of webcomponents was generated (#19611)

Co-authored-by: Nollymar Longa <>

* Added the changes for the new saml stuff (#19605)

* Added the changes for the new saml stuff

* Changes for new SAML

* Adding user_preference to the list of tables with identity (#19619)

Co-authored-by: Nollymar Longa <>

* #19613 Fixing error in postgresql (#19624)

Co-authored-by: Nollymar Longa <>

* Update gradle.properties

Co-authored-by: Will Ezell <will@dotcms.com>
Co-authored-by: Freddy Montes <freddymontes@gmail.com>
Co-authored-by: Freddy Rodriguez <freddy0309@gmail.com>
Co-authored-by: Victor Alfaro <victor.alfaro@dotcms.com>
Co-authored-by: Nollymar Longa <nollymar.longa@dotcms.com>
Co-authored-by: Jonathan <jonathan.sanchez@dotcms.com>
@jcastro-dotcms jcastro-dotcms added the LTS : Next Ticket that will be added to LTS label Jan 20, 2021
@john-thomas-dotcms john-thomas-dotcms added Release : 5.2.8.4 Included in LTS patch release 5.2.8.4 Release : 5.3.8.4 Included in LTS patch release 5.3.8.4 and removed LTS : Next Ticket that will be added to LTS LTS: Released labels Feb 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants