Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

indexing on a new environment doesn't run automatically for the first time #385

Closed
eaquigley opened this issue Jul 9, 2014 · 10 comments
Closed
Assignees
Labels
Type: Bug a defect

Comments

@eaquigley
Copy link
Contributor


Author Name: Elda Sotiri (@esotiri)
Original Redmine Issue: 3800, https://redmine.hmdc.harvard.edu/issues/3800
Original Date: 2014-04-02
Original Assignee: Elda Sotiri


Gustavo and I saw this issue on http://dataverse.cloudapp.net/, had to run index manually for the first time. After that indexing works.


Related issue(s): #57, #91, #229
Redmine related issue(s): 3464, 3500, 3643


@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-04-02T16:21:53Z


To reproduce on a fresh VM/machine.

  1. Set up Solr
  2. Set up Dataverse (including creation of default dataverse by Pete, et al.)

See if the Solr index is correct (and reflected in browsing GUI) without having to run "index all".

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-04-07T13:03:47Z


I am able to replicate this. The issue is that indexing of the new "groups" dvtype introduced for #3464 does not happen automatically. That is to say, on a fresh install, all the dvtype:dataverses are indexed, but not the dvtype:groups, which are used for a JOIN on any search. Without these groups, there is nothing to JOIN on so no results can be found.

It should be relatively straighforward to index the User Private Groups (UPG) on user creation. For instance, group_user1 could be indexed when pete is created:

      {
        "dvtype":"groups",
        "id":"group_user1",
        "entityid":1,
        "name_sort":"pete",
        "groups_s":"group_user1",
        "_version_":1464729900750471168},


But what about non-User Private Groups groups such as group_public? These groups are all fake and hard coded for now until groups are implemented in #3500. For now, perhaps, we could say that when the first user is created (i.e. pete), go ahead and run an "index all" to create all the temporary fake groups such as group_public:

      {
        "dvtype":"groups",
        "id":"group_1",
        "entityid":1,
        "name_sort":"group_public",
        "groups_s":"group_1",
        "_version_":1464729900605767680},


@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-04-08T02:34:21Z


Please re-test after pulling in this commit:

136da87 index user on create, index groups with root dataverse #3800

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Elda Sotiri (@esotiri)
Original Date: 2014-04-08T19:00:08Z


Indexing is still not running automatically for the first time . I have a new db, new build, deleted all in solr /data/index dir.
Started solr and created three dv. Below is the solr output for dataverse named index3 as I was creating it.

392523 [qtp112712824-17] INFO org.apache.solr.core.SolrCore ? [collection1] webapp=/solr path=/spell params={facet=true&sort=release_or_create_date_dt+desc&hl.fl=description&wt=javabin&hl=true&version=2&rows=10&hl.snippets=1&facet.query=&start=0&q=&facet.field=hostdataverse_s&facet.field=affiliation_ss&facet.field=releasedate&facet.field=keyword_ss&facet.field=authorName_ss&facet.field=filetype_s&facet.field=dvtype&qt=/spell&fq=dvtype:(dataverses+OR+datasets)&fq=({!join+from%3Dgroups_s+to%3Dperms_ss}id:group_public+OR+{!join+from%3Dgroups_s+to%3Dperms_ss}id:group_user1)} hits=0 status=0 QTime=5
392555 [qtp112712824-14] INFO org.apache.solr.core.SolrCore ? [collection1] webapp=/solr path=/spell params={facet=true&sort=release_or_create_date_dt+desc&hl.fl=description&wt=javabin&hl=true&version=2&rows=10&hl.snippets=1&facet.query=&start=0&q=&facet.field=hostdataverse_s&facet.field=affiliation_ss&facet.field=releasedate&facet.field=keyword_ss&facet.field=authorName_ss&facet.field=filetype_s&facet.field=dvtype&qt=/spell&fq=dvtype:(dataverses+OR+datasets+OR+files)&fq=({!join+from%3Dgroups_s+to%3Dperms_ss}id:group_public+OR+{!join+from%3Dgroups_s+to%3Dperms_ss}id:group_user1)} hits=0 status=0 QTime=4
413637 [qtp112712824-13] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [collection1] webapp=/solr path=/update params={wt=javabin&version=2} {add=[dataverse_11 (1464842928178331648)]} 0 7
413643 [qtp112712824-13] INFO org.apache.solr.update.UpdateHandler ? start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
414075 [qtp112712824-13] INFO org.apache.solr.core.SolrCore ? SolrDeletionPolicy.onCommit: commits: num=2
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/usr/local/solr-4.6.0/example/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@32eaf0d7; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_d,generation=13}
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/usr/local/solr-4.6.0/example/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@32eaf0d7; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_e,generation=14}
414075 [qtp112712824-13] INFO org.apache.solr.core.SolrCore ? newest commit generation = 14
414083 [qtp112712824-13] INFO org.apache.solr.search.SolrIndexSearcher ? Opening Searcher@70db0a99 main
414085 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore ? QuerySenderListener sending requests to Searcher@70db0a99 main{StandardDirectoryReader(segments_e:39:nrt _1(4.6):C1 _6(4.6):C1 _7(4.6):C1 _8(4.6):C1 _9(4.6):C1 _b(4.6):C1 _c(4.6):C1)}
414087 [qtp112712824-13] INFO org.apache.solr.update.UpdateHandler ? end_commit_flush
414087 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore ? QuerySenderListener done.
414088 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore ? [collection1] Registered new searcher Searcher@70db0a99 main{StandardDirectoryReader(segments_e:39:nrt _1(4.6):C1 _6(4.6):C1 _7(4.6):C1 _8(4.6):C1 _9(4.6):C1 _b(4.6):C1 _c(4.6):C1)}
414089 [qtp112712824-13] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [collection1] webapp=/solr path=/update params={waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=false} {commit=} 0 446
414111 [qtp112712824-15] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [collection1] webapp=/solr path=/update params={wt=javabin&version=2} {add=[dataverse_11 (1464842928676405248)]} 0 6
414117 [qtp112712824-15] INFO org.apache.solr.update.UpdateHandler ? start commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
414562 [qtp112712824-15] INFO org.apache.solr.core.SolrCore ? SolrDeletionPolicy.onCommit: commits: num=2
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/usr/local/solr-4.6.0/example/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@32eaf0d7; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_e,generation=14}
commit{dir=NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/usr/local/solr-4.6.0/example/solr/collection1/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@32eaf0d7; maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_f,generation=15}
414563 [qtp112712824-15] INFO org.apache.solr.core.SolrCore ? newest commit generation = 15
414570 [qtp112712824-15] INFO org.apache.solr.search.SolrIndexSearcher ? Opening Searcher@745568d2 main
414572 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore ? QuerySenderListener sending requests to Searcher@745568d2 main{StandardDirectoryReader(segments_f:43:nrt _1(4.6):C1 _6(4.6):C1 _7(4.6):C1 _8(4.6):C1 _9(4.6):C1 _b(4.6):C1 _d(4.6):C1)}
414573 [qtp112712824-15] INFO org.apache.solr.update.UpdateHandler ? end_commit_flush
414573 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore ? QuerySenderListener done.
414574 [searcherExecutor-4-thread-1] INFO org.apache.solr.core.SolrCore ? [collection1] Registered new searcher Searcher@745568d2 main{StandardDirectoryReader(segments_f:43:nrt _1(4.6):C1 _6(4.6):C1 _7(4.6):C1 8(4.6):C1 9(4.6):C1 b(4.6):C1 d(4.6):C1)}
414576 [qtp112712824-15] INFO org.apache.solr.update.processor.LogUpdateProcessor ? [collection1] webapp=/solr path=/update params={waitSearcher=true&commit=true&wt=javabin&version=2&softCommit=false} {commit=} 0 460
414831 [qtp112712824-11] INFO org.apache.solr.core.SolrCore ? UnInverted multi-valued field {field=affiliation_ss,memSize=4426,tindexSize=43,time=1,phase1=1,nTerms=1,bigTerms=1,termInstances=0,uses=0}
414833 [qtp112712824-11] INFO org.apache.solr.core.SolrCore ? UnInverted multi-valued field {field=keyword_ss,memSize=4288,tindexSize=0,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=0}
414834 [qtp112712824-11] INFO org.apache.solr.core.SolrCore ? UnInverted multi-valued field {field=authorName_ss,memSize=4288,tindexSize=0,time=0,phase1=0,nTerms=0,bigTerms=0,termInstances=0,uses=0}
414838 [qtp112712824-11] INFO org.apache.solr.core.SolrCore ? [collection1] webapp=/solr path=/spell params={facet=true&sort=release_or_create_date_dt+desc&hl.fl=description&wt=javabin&hl=true&version=2&rows=10&hl.snippets=1&facet.query=
&start=0&q=
&facet.field=hostdataverse_s&facet.field=affiliation_ss&facet.field=releasedate&facet.field=keyword_ss&facet.field=authorName_ss&facet.field=filetype_s&facet.field=dvtype&qt=/spell&fq=subtree_ss:"/index3"&fq=dvtype:(dataverses+OR+datasets)&fq=({!join+from%3Dgroups_s+to%3Dperms_ss}id:group_public+OR+{!join+from%3Dgroups_s+to%3Dperms_ss}id:group_user1)} hits=0 status=0 QTime=69
414881 [qtp112712824-16] INFO org.apache.solr.core.SolrCore ? [collection1] webapp=/solr path=/spell params={facet=true&sort=release_or_create_date_dt+desc&hl.fl=description&wt=javabin&hl=true&version=2&rows=10&hl.snippets=1&facet.query=
&start=0&q=
&facet.field=hostdataverse_s&facet.field=affiliation_ss&facet.field=releasedate&facet.field=keyword_ss&facet.field=authorName_ss&facet.field=filetype_s&facet.field=dvtype&qt=/spell&fq=subtree_ss:"/index3"&fq=dvtype:(dataverses+OR+datasets+OR+files)&fq=({!join+from%3Dgroups_s+to%3Dperms_ss}id:group_public+OR+{!join+from%3Dgroups_s+to%3Dperms_ss}id:group_user1)} hits=0 status=0 QTime=17

Here are the notifications:
Your Dataverse, index3 Dataverse, has been created in the Root Dataverse. Remember to release it. 2014-04-08 18:51:18.161 Unread Remove
Your Dataverse, test index two Dataverse, has been created in the Root Dataverse. Remember to release it. 2014-04-08 18:46:43.397 Read Remove
Your Dataverse, test index Dataverse, has been created in the Root Dataverse. Remember to release it. 2014-04-08 18:45:18.425 Read Remove
Your Dataverse, Root Dataverse, has been created in the Dataverse. Remember to release it. 2014-04-08 18:34:11.78 Read Remove

However login as pete http://dataverse.cloudapp.net/ I see no datavereses other than root. Also I see /usr/local/solr-4.6.0/example/solr/collection1/data/index dir is being repopulated.

I haven't run index all yet.

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-04-08T19:31:07Z


Elda Sotiri wrote:

Indexing is still not running automatically for the first time . I have a new db, new build, deleted all in solr /data/index dir.

I'm concerned about what this means... I have never deleted files on disk from Solr... If I want to clear out the index, I do over HTTP with a curl command like this:

curl http://localhost:8983/solr/update/json?commit=true -H 'Content-type: application/json' -X POST -d '{"delete": { "query":":" }}'

You could also stop Solr and delete the entire Solr directory to clear it out. Then unpack the tarball to start with a fresh installation.

However login as pete http://dataverse.cloudapp.net/ I see no datavereses other than root. Also I see /usr/local/solr-4.6.0/example/solr/collection1/data/index dir is being repopulated.

A shell account on this server would help me troubleshoot. Can you please provide me the output of this command:

curl 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*'

Thanks. Passing this ticket back to you, Elda.

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Elda Sotiri (@esotiri)
Original Date: 2014-04-08T23:27:59Z


sorry, just saw this..

[root@dataverse ~]# curl 'http://localhost:8983/solr/collection1/select?rows=100&wt=json&indent=true&q=*'
{
  "responseHeader":{
    "status":0,
    "QTime":14,
    "params":{
      "indent":"true",
      "q":"*",
      "wt":"json",
      "rows":"100"}},
  "response":{"numFound":7,"start":0,"docs":[
      {
        "id":"dataverse_9",
        "entityid":9,
        "dvtype":"dataverses",
        "name":"test index",
        "name_sort":"test index",
        "release_or_create_date_dt":"2014-04-08T18:45:18.425Z",
        "perms_ss":["group_user1",
          "group_user5",
          "group_2"],
        "hostdataverse_s":"Root",
        "description":"test auto index",
        "affiliation_ss":["Top"],
        "parentid":"1",
        "parentname":"Root",
        "_version_":1464842551761567744},
      {
        "dvtype":"groups",
        "id":"group_1",
        "entityid":1,
        "name_sort":"group_public",
        "groups_s":"group_1",
        "_version_":1464842608414031872},
      {
        "dvtype":"groups",
        "id":"group_2",
        "entityid":2,
        "name_sort":"nsa",
        "groups_s":"group_2",
        "_version_":1464842608860725248},
      {
        "dvtype":"groups",
        "id":"group_3",
        "entityid":3,
        "name_sort":"flappybird",
        "groups_s":"group_3",
        "_version_":1464842609423810560},
      {
        "dvtype":"groups",
        "id":"group_4",
        "entityid":4,
        "name_sort":"2048",
        "groups_s":"group_4",
        "_version_":1464842609948098560},
      {
        "id":"dataverse_10",
        "entityid":10,
        "dvtype":"dataverses",
        "name":"test index two",
        "name_sort":"test index two",
        "release_or_create_date_dt":"2014-04-08T18:46:43.397Z",
        "perms_ss":["group_user1",
          "group_user5",
          "group_2"],
        "hostdataverse_s":"Root",
        "description":"test index dataverse two",
        "affiliation_ss":["Top"],
        "parentid":"1",
        "parentname":"Root",
        "_version_":1464842640734289920},
      {
        "id":"dataverse_11",
        "entityid":11,
        "dvtype":"dataverses",
        "name":"index3",
        "name_sort":"index3",
        "release_or_create_date_dt":"2014-04-08T18:51:18.161Z",
        "perms_ss":["group_user1",
          "group_user5",
          "group_2"],
        "hostdataverse_s":"Root",
        "description":"testing index 3",
        "affiliation_ss":["Top"],
        "parentid":"1",
        "parentname":"Root",
        "_version_":1464842928676405248}]
  }}
[root@dataverse ~]#

@eaquigley eaquigley added this to the Dataverse 4.0: Beta 1 milestone Jul 9, 2014
@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Elda Sotiri (@esotiri)
Original Date: 2014-04-08T23:44:52Z


phil, created an admin acct for you.. send info already.

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-04-09T13:35:57Z


Elda Sotiri wrote:

phil, created an admin acct for you.. send info already.

Thanks, I logged in so I could look at server.log. The problem is that "User Private Groups" such as "group_user1" for pete or "group_user2" for uma do not exist. There seem to have been two setup attempts on 2014-04-08, one at ~12:30 and one at ~18:30. Notice how during the more recent setup attempt the User Private Groups could not be created because Solr was down ("Server refused connection at: http://localhost:8983/solr"). As long as Solr is running during setup, everything should be fine. Please note that there is a related ticket at #3643 regarding "manage index failures for later correction" but it hasn't been worked on yet. For now, as long as Solr is up, it should work. Another way you can test a new environment is with vagrant up from the root of the git repo.

Here are the logs:

[pdurbin@dataverse ~]$ grep 'during user save' -B1 /usr/local/glassfish4/glassfish/domains/domain1/logs/server.log
[2014-04-08T12:31:39.620+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=19 _ThreadName=http-listener-1(2)] [timeMillis: 1396960299620] [levelValue: 800] [[
  during user save, indexing result was: indexed user 1:pete]]
--
[2014-04-08T12:31:40.195+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=18 _ThreadName=http-listener-1(1)] [timeMillis: 1396960300195] [levelValue: 800] [[
  during user save, indexing result was: indexed user 2:uma]]
--
[2014-04-08T12:31:40.776+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=21 _ThreadName=http-listener-1(4)] [timeMillis: 1396960300776] [levelValue: 800] [[
  during user save, indexing result was: indexed user 3:gabbi]]
--
[2014-04-08T12:31:41.807+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=22 _ThreadName=http-listener-1(5)] [timeMillis: 1396960301807] [levelValue: 800] [[
  during user save, indexing result was: indexed user 4:__GUEST__]]
--
[2014-04-08T12:31:42.265+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=20 _ThreadName=http-listener-1(3)] [timeMillis: 1396960302265] [levelValue: 800] [[
  during user save, indexing result was: indexed user 5:cathy]]
--
[2014-04-08T12:31:42.738+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=21 _ThreadName=http-listener-1(4)] [timeMillis: 1396960302738] [levelValue: 800] [[
  during user save, indexing result was: indexed user 6:nick]]
--
[2014-04-08T18:34:03.940+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=20 _ThreadName=http-listener-1(3)] [timeMillis: 1396982043940] [levelValue: 800] [[
  during user save, indexing result was: org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8983/solr]]
--
[2014-04-08T18:34:04.005+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=19 _ThreadName=http-listener-1(2)] [timeMillis: 1396982044005] [levelValue: 800] [[
  during user save, indexing result was: org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8983/solr]]
--
[2014-04-08T18:34:04.051+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=22 _ThreadName=http-listener-1(5)] [timeMillis: 1396982044051] [levelValue: 800] [[
  during user save, indexing result was: org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8983/solr]]
--
[2014-04-08T18:34:04.091+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=21 _ThreadName=http-listener-1(4)] [timeMillis: 1396982044091] [levelValue: 800] [[
  during user save, indexing result was: org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8983/solr]]
--
[2014-04-08T18:34:04.162+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=18 _ThreadName=http-listener-1(1)] [timeMillis: 1396982044162] [levelValue: 800] [[
  during user save, indexing result was: org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8983/solr]]
--
[2014-04-08T18:34:04.205+0000] [glassfish 4.0] [INFO] [] [edu.harvard.iq.dataverse.DataverseUserServiceBean] [tid: _ThreadID=20 _ThreadName=http-listener-1(3)] [timeMillis: 1396982044205] [levelValue: 800] [[
  during user save, indexing result was: org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://localhost:8983/solr]]
[pdurbin@dataverse ~]$

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Elda Sotiri (@esotiri)
Original Date: 2014-04-09T14:11:57Z


Phil updated the guide that solr should be running when setting up or rebuilding the environment, and there is no need to run indexall. Thanks Phil!

@eaquigley
Copy link
Contributor Author


Original Redmine Comment
Author Name: Philip Durbin (@pdurbin)
Original Date: 2014-04-09T14:13:20Z


Elda and I updated the Dev guide to say "Solr must be running with this custom schema in place during setup" in the very first paragraph.

We also talked about how editing objects such as users or dataverses that have never been indexed (due to Solr being down during setup, for example) will allow them to be indexed for the first time. This is because the indexing code is tied to "save" methods for users, dataverses, etc.

I also removed the "index all" step from the "Rebuilding your dev environment" section since it's no longer necessary but added an earlier step of clearing out your Solr index (with a curl command) as you start to rebuild (i.e. right after you drop your database).

This commit to the Dev guide:

738a63c Solr must be up during setup, how to clear Solr data #3800

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Bug a defect
Projects
None yet
Development

No branches or pull requests

2 participants