Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SiteSearch - items not removed from sitesearch index after unpublish/archive - 5.x #17976

Closed
craigWagner99 opened this issue Feb 11, 2020 · 13 comments

Comments

@craigWagner99
Copy link

Describe the bug

Unpublishing and/or archiving items does not remove them from a SiteSearch index.

Steps to reproduce the behavior:

Recreated on demo.dotcms.com

  1. Create new page(s)
  2. Create new (non-incremental) SiteSearch index
  3. Search for you new content - it should be there
  4. Unpublish your content
  5. Run SiteSearch reindex Job
  6. Search your SiteSearch

Content is still found

  1. Archive your content
  2. Run SiteSearch reindex Job
  3. Search your SiteSearch

Expected behavior

Content should be removed from SiteSearch index.

Actual behavior

Content is still found

Screenshots

Create new content
image

Create new SiteSearch index (newIndex)
image

Search SiteSearch Index
image

Unpublish/Archive Content
image

Run SiteSearch job on your new index
image

Search SiteSearch index... Content is still there

image

Desktop (please complete the following information):

  • OS: n/a
  • Browser Chrome / FF
  • Version dotCMS 5.x
@wezell wezell added this to the Bug Sprint milestone Feb 13, 2020
@wezell
Copy link
Contributor

wezell commented Feb 13, 2020

working or archived content should be removed from sitesearch indexes in both incremental and non-incremental site search jobs.

deleted content should only be removed from non-incremental site search jobs

@fabrizzio-dotCMS fabrizzio-dotCMS self-assigned this Feb 13, 2020
@wezell
Copy link
Contributor

wezell commented Feb 18, 2020

For this ticket we want to apply some improvements:

UI improvements:

  1. Remove "Create New Index and Make Default" option in the "Alias Name" dropbox.

image

  1. Allow the user to create new indexes directly in the "Alias Name" dropbox, meaning, if the typed index alias does not exist the index will be create it on the job creation.

image

Cases to handle

Incremental

  • Create/reuse a bundle named for the job
  • Pull changed content based on last index date
    • Live / Working / Archived
      • Publish Live into index
      • Delete Working and Archived from index

Non-Incremental

  • Create a new index
  • Generate bundle
    • Live
      • Publish Live
  • Change alias to old Index alias or name (removing the old index alias)
  • If old index was the default index mark the new one as default
  • Delete old index

@jgambarios
Copy link
Contributor

fabrizzio-dotCMS added a commit that referenced this issue Mar 3, 2020
fabrizzio-dotCMS added a commit that referenced this issue Mar 3, 2020
fabrizzio-dotCMS added a commit that referenced this issue Mar 3, 2020
fabrizzio-dotCMS added a commit that referenced this issue Mar 3, 2020
jgambarios pushed a commit that referenced this issue Mar 4, 2020
* #17976  fixing site search

* #17976  ovrride to string to improve debuging

* #17976  adding doc

* #17976  feedback applied

* #17976 applying feedback

* #17976  empty method removed from test

* #17976  removing printlines

* #17976  bad merge fixed
@jgambarios
Copy link
Contributor

jgambarios commented Mar 11, 2020

We found a couple of cases that need to be handle:

  1. Incremental and no incremental jobs are not indexing non default language pages:

    • Create a new folder
    • Add a page in English and add content to that page
    • Create the Spanish version of that page and make sure the content inside the page also have spanish versions.
    • Run a site search job
    • Check the generated bundle, the spanish version of the page have no spanish content on it.
  2. No incremental jobs uses a timestamp for the folder name, like for example: 2020-03-11_11-45-00, if two or more non incremental jobs run at the same time, like for example every day at 3am they will "share" the same folder creating inconsistencies in the bundle data, we need more unique folder names, maybe UUID + date?

    I get this error each time my jobs run (I have two incremental and two non incremental jobs running at the same time) and I think this error is related to the fact the folder and index name is not unique:

09:25:00.532  ERROR job.SiteSearchJobProxy - Elasticsearch exception [type=resource_already_exists_exception, reason=index [cluster_c6148f1ad9.sitesearch_20200312092500/h-yFSeJKTKyoO1XEpOpf1Q] already exists]
org.elasticsearch.ElasticsearchStatusException: Elasticsearch exception [type=resource_already_exists_exception, reason=index [cluster_c6148f1ad9.sitesearch_20200312092500/h-yFSeJKTKyoO1XEpOpf1Q] already exists]
	at org.elasticsearch.rest.BytesRestResponse.errorFromXContent(BytesRestResponse.java:177) ~[elasticsearch-7.3.2.jar:7.3.2]
	at org.elasticsearch.client.RestHighLevelClient.parseEntity(RestHighLevelClient.java:1727) ~[elasticsearch-rest-high-level-client-7.3.2.jar:7.3.2]
	at org.elasticsearch.client.RestHighLevelClient.parseResponseException(RestHighLevelClient.java:1704) ~[elasticsearch-rest-high-level-client-7.3.2.jar:7.3.2]
	at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1467) ~[elasticsearch-rest-high-level-client-7.3.2.jar:7.3.2]
	at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1439) ~[elasticsearch-rest-high-level-client-7.3.2.jar:7.3.2]
	at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1406) ~[elasticsearch-rest-high-level-client-7.3.2.jar:7.3.2]
	at org.elasticsearch.client.IndicesClient.create(IndicesClient.java:127) ~[elasticsearch-rest-high-level-client-7.3.2.jar:7.3.2]
	at com.dotcms.content.elasticsearch.business.ESIndexAPI.createIndex(ESIndexAPI.java:628) ~[classes/:?]
	at com.dotcms.enterprise.publishing.sitesearch.ESSiteSearchAPI.createSiteSearchIndex(ESSiteSearchAPI.java:333) ~[classes/:?]
	at com.dotcms.publishing.job.SiteSearchJobImpl.run(SiteSearchJobImpl.java:204) ~[classes/:?]
	at com.dotcms.publishing.job.SiteSearchJobProxy.run_aroundBody0(SiteSearchJobProxy.java:31) ~[classes/:?]
	at com.dotcms.publishing.job.SiteSearchJobProxy$AjcClosure1.run(SiteSearchJobProxy.java:1) ~[classes/:?]
	at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:149) ~[aspectjrt-1.8.10.jar:?]
	at com.dotcms.aspects.aspectj.AspectJDelegateMethodInvocation.proceed(AspectJDelegateMethodInvocation.java:42) ~[classes/:?]
	at com.dotcms.aspects.interceptors.CloseDBIfOpenedMethodInterceptor.invoke(CloseDBIfOpenedMethodInterceptor.java:29) ~[classes/:?]
	at com.dotcms.aspects.aspectj.CloseDBIfOpenedAspect.invoke(CloseDBIfOpenedAspect.java:41) ~[classes/:?]
	at com.dotcms.publishing.job.SiteSearchJobProxy.run(SiteSearchJobProxy.java:17) ~[classes/:?]
	at com.dotmarketing.quartz.DotJob.execute(DotJob.java:42) ~[classes/:?]
	at org.quartz.core.JobRunShell.run(JobRunShell.java:223) ~[dot.quartz-all-1.8.6_2.jar:?]
	at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549) ~[dot.quartz-all-1.8.6_2.jar:?]
	Suppressed: org.elasticsearch.client.ResponseException: method [PUT], host [https://127.0.0.1:19200], URI [/cluster_c6148f1ad9.sitesearch_20200312092500?master_timeout=30s&timeout=15000ms], status line [HTTP/1.1 400 Bad Request]
  1. After some runs I saw inside some incremental bundles folders mixing Spanish and English pages where only one of them should be there:

image

image

The way to reproduce the specific case in the images is to create a Folder1 folder with a page in English and Spanish and both versions of the page having content in also both languages.

  1. Failing unit test:
  • SiteSearchJobImplTest. Test_Incremental_Job_Test_Pages_Are_Found_Create_And_Publish_New_Page_Test_Changes_Are_Picked_Unpublish_Then_Verify_Page_Is_Gone
  • SiteSearchJobImplTest. Test_Non_Incremental_Create_Default_Index_Create_Second_Index_Run_Non_Incrementally_Expect_Non_Default_New_Index

@jgambarios
Copy link
Contributor

PR: #18170

jgambarios pushed a commit that referenced this issue Mar 23, 2020
* #17976 fixes on saving the index name

* #17976 fix comments
@jgambarios
Copy link
Contributor

jgambarios commented Mar 24, 2020

Content not found

  1. Create a new Rich text content en english and spanish:
    English:
    Title: Hello content
    Body: Hi, my name is Vlad Tepes (evil laugh)
    Spanish:
    Title: Hola content
    Body: Hola, mi nombre es Juana de Arco

  2. Create a new folder "folder1"

  3. Create a page just in English "page1" and add the created rich content to the page

  4. Dev Tools -> Site Search -> Job Scheduler
    Run now (But is the same result for incremental and no incremental scheduled jobs) selecting both languages

  5. Dev Tools -> Site Search -> Search
    Look for "vlad" -> Nothing is found
    Look for "juana" -> Page is found

NOTE: Having a version of the page in spanish and english the search looks right, when the page is just in english I see this behavior.



JSP Error in Job Audit Data

  1. Dev Tools -> Site Search -> Job Scheduler
    Non incremental, selecting both languages, running each 5mins 0 0/5 * 1/1 * ? * and Index All Sites

  2. Dev Tools -> Site Search -> Job Scheduler
    Incremental, selecting both languages, running each 5mins 0 0/5 * 1/1 * ? * and Index All Sites

  3. Wait to finish at least two runs

  4. Dev Tools -> Site Search -> Job Audit Data, try with both indexes, sometimes fails with the incremental, sometimes with the non incremental, first time I was able to reproduced failed with the Non incremental index, second time with the incremental.

15:32:25.355  ERROR lang.Class - An exception occurred processing JSP page [/html/portlet/ext/sitesearch/site_search_audit.jsp] at line [63]

60:         StringBuilder hostList=new StringBuilder();
61:         for(String hid : a.getHostList().split(",")) {
62:             if(UtilMethods.isSet(hid))
63:                 hostList.append(APILocator.getHostAPI().find(hid, APILocator.getUserAPI().getSystemUser(), false).getHostname())
64:                         .append("  ");
65:         }
66:         StringBuilder langList=new StringBuilder();


Stacktrace:
org.apache.jasper.JasperException: An exception occurred processing JSP page [/html/portlet/ext/sitesearch/site_search_audit.jsp] at line [63]

60:         StringBuilder hostList=new StringBuilder();
61:         for(String hid : a.getHostList().split(",")) {
62:             if(UtilMethods.isSet(hid))
63:                 hostList.append(APILocator.getHostAPI().find(hid, APILocator.getUserAPI().getSystemUser(), false).getHostname())
64:                         .append("  ");
65:         }
66:         StringBuilder langList=new StringBuilder();


Stacktrace:
	at org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:584) ~[jasper.jar:8.5.32]
	at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:481) ~[jasper.jar:8.5.32]
	at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:386) ~[jasper.jar:8.5.32]
	at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:330) ~[jasper.jar:8.5.32]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) ~[servlet-api.jar:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) ~[tomcat-websocket.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotmarketing.filters.CMSFilter.doFilterInternal(CMSFilter.java:191) ~[classes/:?]
	at com.dotmarketing.filters.CMSFilter.doFilter(CMSFilter.java:47) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotcms.filters.interceptor.AbstractWebInterceptorSupportFilter.doFilter(AbstractWebInterceptorSupportFilter.java:90) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotcms.filters.interceptor.AbstractWebInterceptorSupportFilter.doFilter(AbstractWebInterceptorSupportFilter.java:90) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotmarketing.filters.VanityURLFilter.doFilter(VanityURLFilter.java:104) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:399) ~[urlrewritefilter-4.0.4.jar:4.0.4]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotmarketing.filters.TimeMachineFilter.doFilter(TimeMachineFilter.java:134) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotmarketing.filters.ThreadNameFilter.doFilter(ThreadNameFilter.java:88) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotmarketing.filters.CookiesFilter.doFilter(CookiesFilter.java:48) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotmarketing.filters.CharsetEncodingFilter.doFilter(CharsetEncodingFilter.java:99) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotcms.filters.interceptor.AbstractWebInterceptorSupportFilter.doFilter(AbstractWebInterceptorSupportFilter.java:90) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at com.dotcms.filters.NormalizationFilter.doFilter(NormalizationFilter.java:44) ~[classes/:?]
	at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) ~[catalina.jar:8.5.32]
	at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:493) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140) ~[catalina.jar:8.5.32]
	at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81) ~[catalina.jar:8.5.32]
	at org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650) ~[catalina.jar:8.5.32]
	at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) ~[catalina.jar:8.5.32]
	at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342) ~[catalina.jar:8.5.32]
	at org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:800) ~[tomcat-coyote.jar:8.5.32]
	at org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66) ~[tomcat-coyote.jar:8.5.32]
	at org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:800) ~[tomcat-coyote.jar:8.5.32]
	at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1471) ~[tomcat-coyote.jar:8.5.32]
	at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) ~[tomcat-coyote.jar:8.5.32]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
	at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) ~[tomcat-util.jar:8.5.32]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
Caused by: java.lang.NullPointerException
	at org.apache.jsp.html.portlet.ext.sitesearch.site_005fsearch_005faudit_jsp._jspService(site_005fsearch_005faudit_jsp.java:599) ~[?:?]
	at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70) ~[jasper.jar:8.5.32]
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:742) ~[servlet-api.jar:?]
	at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:443) ~[jasper.jar:8.5.32]
	... 59 more

NOTE:

  1. When it fails the a.getHostList() in line 61 has EMPTY as value in the site_search_audit.jsp
  2. Looks like the Index All Sites is the trigger option for the error

@dotCMS dotCMS deleted a comment from fabrizzio-dotCMS Mar 25, 2020
jgambarios pushed a commit that referenced this issue Mar 25, 2020
* #17976  fixing site search

* #17976  ovrride to string to improve debuging

* #17976  adding doc

* #17976  feedback applied

* #17976 applying feedback

* #17976  empty method removed from test

* #17976  removing printlines

* #17976  bad merge fixed

(cherry picked from commit 7a71818)
jgambarios added a commit that referenced this issue Mar 25, 2020
* #17976  fixing site search

* #17976  ovrride to string to improve debuging

* #17976  adding doc

* #17976  feedback applied

* #17976 applying feedback

* #17976  empty method removed from test

* #17976  removing printlines

* #17976  bad merge fixed

(cherry picked from commit 7a71818)
jgambarios pushed a commit that referenced this issue Mar 25, 2020
* #17976

* #17976 fix indexing non-default lang

* #17976 adding concurrency  support.

* #17976 logger clean up

* #17976  feedback

* #17976 save point

* #17976 new unique site-search index name generation strategy

(cherry picked from commit 4b17b69)
jgambarios pushed a commit that referenced this issue Mar 25, 2020
* #17976 fixes on saving the index name

* #17976 fix comments

(cherry picked from commit 6050be9)
@jgambarios
Copy link
Contributor

jgambarios commented Mar 26, 2020

Latest found issues will be handle in:
#18194
#18207

nollymar pushed a commit that referenced this issue Apr 6, 2020
…to the index name to avoid duplicity in concurrent environments
@nollymar
Copy link
Contributor

nollymar commented Apr 6, 2020

Additional PR to support larger index names: #18269

nollymar added a commit that referenced this issue Apr 6, 2020
…to the index name to avoid duplicity in concurrent environments (#18269)

Co-authored-by: Nollymar Longa <>
@bryanboza
Copy link
Member

Fixed, tested after the last changes and this is ok for now, more work reported in new cards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment