Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CleanUpFieldReferencesJob is not stateful #19006

Closed
wezell opened this issue Jul 30, 2020 · 8 comments
Closed

CleanUpFieldReferencesJob is not stateful #19006

wezell opened this issue Jul 30, 2020 · 8 comments

Comments

@wezell
Copy link
Contributor

wezell commented Jul 30, 2020

Apparently, when we schedule a call to CleanUpFieldReferencesJob as a SimpleTrigger, it does not run statefully. See:

https://github.com/dotCMS/core/blob/master/dotCMS/src/main/java/com/dotmarketing/quartz/job/CleanUpFieldReferencesJob.java#L125

We just had a customer's server go down because of multiple instances of this job running simultaneously.

https://gist.github.com/wezell/cb811c0d838603382e7386331e4dcdce

We need to make sure that we run Stateful jobs statefully

@nollymar
Copy link
Contributor

nollymar commented Aug 6, 2020

@nollymar
Copy link
Contributor

nollymar commented Aug 6, 2020

Note to QA: This fix affects the following functionalities:

  • Delete a field from a content type
  • Apply cascade permissions
  • Delete persona
  • Reset Permissions
  • Delete User

@nollymar nollymar removed their assignment Aug 6, 2020
@wezell
Copy link
Contributor Author

wezell commented Aug 6, 2020

@nollymar I think this might need work. A simple test is:

  1. create a content type with 4 indexed fields
  2. add a breakpoint in the CleanUpFieldReferencesJob.run method
  3. From the ui, delete the four fields

We should see 1 CleanUpFieldReferencesJob execute sequentially, 4 times. I am seeing 4 jobs happening simultaneously

Screen Shot 2020-08-06 at 9 58 48 AM

@fabrizzio-dotCMS
Copy link
Contributor

fabrizzio-dotCMS commented Aug 26, 2020

new PRs #19172 #19222

@nollymar nollymar modified the milestones: Maintenance Sprint, Scout Current Aug 27, 2020
fabrizzio-dotCMS added a commit that referenced this issue Aug 28, 2020
nollymar pushed a commit that referenced this issue Sep 1, 2020
* #19006 stateful job needs to be sequential and unque instance.

* #19006 moving trigger logic up to base class so it can be re-used

* #19006 updating jobs to fire throug enqueueTrigger

* #19006 clenup

* #19006 clenup code

* #19006 fix on how job detail is grabbed

* #19006 clean up

* #19006 fix test case

* #19006  test fix
@nollymar nollymar assigned nollymar and unassigned fabrizzio-dotCMS Sep 1, 2020
@nollymar
Copy link
Contributor

nollymar commented Sep 10, 2020

Tests Results: Needs work

Now, the CleanUpFieldReferencesJob is executed statefully (verified through a threaddump), but a CacheProvider exception is thrown multiple times and makes the catalina.out grows indefinitely

17:03:22.613  ERROR provider.CacheProviderAPIImpl - Error removing record from CacheProvider [H22 Cache]: group [velocitycache] - key [/live/e31ab4b6-108a-49ed-958a-9a9190a48c4d/54699dae-f65a-477d-875e-06019a544fd4.fields].
java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@7dc53567 rejected from java.util.concurrent.ThreadPoolExecutor@54659c6c[Running, pool size = 10, active threads = 9, queued tasks = 10000, completed tasks = 2251368]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2063) ~[?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:830) ~[?:1.8.0_211]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1379) ~[?:1.8.0_211]
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) ~[?:1.8.0_211]
	at com.dotmarketing.business.cache.provider.h22.H22Cache.removeAsync(H22Cache.java:251) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotmarketing.business.cache.provider.h22.H22Cache.remove(H22Cache.java:231) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.enterprise.cache.provider.CacheProviderAPIImpl.remove(SourceFile:404) ~[ee-5.3.8.jar:?]
	at com.dotmarketing.business.ChainableCacheAdministratorImpl.removeLocalOnly(ChainableCacheAdministratorImpl.java:359) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotmarketing.business.ChainableCacheAdministratorImpl.remove(ChainableCacheAdministratorImpl.java:332) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotmarketing.business.CommitListenerCacheWrapper.remove(CommitListenerCacheWrapper.java:152) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.rendering.velocity.services.DotResourceCache.remove(DotResourceCache.java:147) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.rendering.velocity.services.FieldLoader.invalidate(FieldLoader.java:85) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.rendering.velocity.services.FieldLoader.invalidate(FieldLoader.java:73) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.rendering.velocity.services.ContentletLoader.invalidate(ContentletLoader.java:607) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.rendering.velocity.services.DotLoader.invalidate(DotLoader.java:42) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.rendering.velocity.services.ContentletLoader.invalidate(ContentletLoader.java:547) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotmarketing.quartz.job.CleanUpFieldReferencesJob.run_aroundBody0(CleanUpFieldReferencesJob.java:107) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotmarketing.quartz.job.CleanUpFieldReferencesJob$AjcClosure1.run(CleanUpFieldReferencesJob.java:1) ~[dotcms_5.3.8_f938f86.jar:?]
	at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[aspectjrt-1.8.10.jar:?]
	at com.dotcms.aspects.aspectj.AspectJDelegateMethodInvocation.proceed(AspectJDelegateMethodInvocation.java:42) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotmarketing.db.LocalTransaction.wrapReturnWithListeners(LocalTransaction.java:59) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.aspects.interceptors.WrapInTransactionMethodInterceptor.invoke(WrapInTransactionMethodInterceptor.java:23) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotcms.aspects.aspectj.WrapInTransactionAspect.invoke(WrapInTransactionAspect.java:41) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotmarketing.quartz.job.CleanUpFieldReferencesJob.run(CleanUpFieldReferencesJob.java:50) ~[dotcms_5.3.8_f938f86.jar:?]
	at com.dotmarketing.quartz.DotJob.execute(DotJob.java:42) ~[dotcms_5.3.8_f938f86.jar:?]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:223) ~[dot.quartz-all-1.8.6_2.jar:?]
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) ~[dot.quartz-all-1.8.6_2.jar:?]

I changed the dotListenerSubmitter pool configuration using the following values, but it didn't work:

dotListenerSubmitterdotcms.concurrent.poolsize=100
dotListenerSubmitterdotcms.concurrent.maxpoolsize=250
dotListenerSubmitterdotcms.concurrent.queuecapacity=500

In spite of that, these functionalities were tested and work as expected:

  • Apply cascade permissions
  • Delete persona
  • Reset Permissions
  • Delete User

@wezell
Copy link
Contributor Author

wezell commented Sep 11, 2020

maybe something like this, where we only allow async operations if there is room in the queue? If the queue drains, async can proceed 34e1cf2

nollymar pushed a commit that referenced this issue Sep 14, 2020
#19261)

* #19006 cache eviction is perfrmed asynchrously when threadpool is under certain %

* #19006 isAllocationWithinTolerance shouldn't be syynnchronized

* #19006 slight optimization lowering number of threads to 5
@nollymar
Copy link
Contributor

Internal QA: After the fix applied by Fabrizzio on this PR: #19261, the CleanUpFieldReferencesJob performs delete operations correctly. No errors found in the log

@bryanboza
Copy link
Member

Fixed, tested every one of the scenarios described in the provided comment and everything works as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants