Skip to content
This repository has been archived by the owner on Dec 19, 2021. It is now read-only.

solr 5 restart fails on centos 7 (centos branch) #147

Closed
acozine opened this issue Jul 14, 2016 · 5 comments
Closed

solr 5 restart fails on centos 7 (centos branch) #147

acozine opened this issue Jul 14, 2016 · 5 comments

Comments

@acozine
Copy link
Contributor

acozine commented Jul 14, 2016

The deploy role's restart of Solr 5 fails on CentOS 7 with the error:

TASK [deploy : restart solr 5.x] ***********************************************
fatal: [23.23.8.207]: FAILED! => {"changed": false, "failed": true, "msg": "Job for solr.service failed because the control process exited with error code. See \"systemctl status solr.service\" and \"journalctl -xe\" for details.\n"}

If I ssh into the machine, I see:

[centos@ip-10-0-0-182 ~]$ sudo service solr status
Found 1 Solr nodes: 
Solr process 4924 running on port 8983
{
  "solr_home":"/var/solr/data",
  "version":"5.5.0 2a228b3920a07f930f7afb6a42d0d20e184a943c - mike - 2016-02-16 15:22:52",
  "startTime":"2016-07-14T15:19:19.171Z",
  "uptime":"0 days, 0 hours, 45 minutes, 1 seconds",
  "memory":"48.5 MB (%9.9) of 490.7 MB"}

[centos@ip-10-0-0-182 ~]$ sudo systemctl status solr
● solr.service - LSB: Controls Apache Solr as a Service
   Loaded: loaded (/etc/rc.d/init.d/solr)
   Active: failed (Result: exit-code) since Thu 2016-07-14 11:59:51 EDT; 4min 37s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 27722 ExecStart=/etc/rc.d/init.d/solr start (code=exited, status=1/FAILURE)

Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: Starting LSB: Controls Apache Solr as a Service...
Jul 14 11:59:51 ip-10-0-0-182 su[27724]: (to solr) root on none
Jul 14 11:59:51 ip-10-0-0-182 solr[27722]: Port 8983 is already being used by another process (pid: 4924)
Jul 14 11:59:51 ip-10-0-0-182 solr[27722]: Please choose a different port using the -p option.
Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: solr.service: control process exited, code=exited status=1
Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: Failed to start LSB: Controls Apache Solr as a Service.
Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: Unit solr.service entered failed state.
Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: solr.service failed.

if I then restart Solr from the command line with service the restart succeeds:

[centos@ip-10-0-0-182 ~]$ sudo service solr restart
Sending stop command to Solr running on port 8983 ... waiting 5 seconds to allow Jetty process 4924 to stop gracefully.
Waiting up to 30 seconds to see Solr running on port 8983 [/]  
Started Solr server on port 8983 (pid=28848). Happy searching!

and ps shows the new PID:

[centos@ip-10-0-0-182 ~]$ ps aux | grep 8983
solr     28848 28.6  1.5 4132516 234680 ?      Sl   12:05   0:06 java -server -Xms512m -Xmx512m -XX:NewRatio=3 -XX:SurvivorRatio=4 -XX:TargetSurvivorRatio=90 -XX:MaxTenuringThreshold=8 -XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:ConcGCThreads=4 -XX:ParallelGCThreads=4 -XX:+CMSScavengeBeforeRemark -XX:PretenureSizeThreshold=64m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=50 -XX:CMSMaxAbortablePrecleanTime=6000 -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -verbose:gc -XX:+PrintHeapAtGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -Xloggc:/var/solr/logs/solr_gc.log -Djetty.port=8983 -DSTOP.PORT=7983 -DSTOP.KEY=solrrocks -Duser.timezone=UTC -Djetty.home=/opt/solr/server -Dsolr.solr.home=/var/solr/data -Dsolr.install.dir=/opt/solr -Dlog4j.configuration=file:/var/solr/log4j.properties -Xss256k -jar start.jar -XX:OnOutOfMemoryError=/opt/solr/bin/oom_solr.sh 8983 /var/solr/logs --module=http

but systemctl still shows the old PID and the same error message (with an update to how long ago it happened):

[centos@ip-10-0-0-182 ~]$ sudo systemctl status solr
● solr.service - LSB: Controls Apache Solr as a Service
   Loaded: loaded (/etc/rc.d/init.d/solr)
   Active: failed (Result: exit-code) since Thu 2016-07-14 11:59:51 EDT; 5min ago
     Docs: man:systemd-sysv-generator(8)
  Process: 27722 ExecStart=/etc/rc.d/init.d/solr start (code=exited, status=1/FAILURE)

Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: Starting LSB: Controls Apache Solr as a Service...
Jul 14 11:59:51 ip-10-0-0-182 su[27724]: (to solr) root on none
Jul 14 11:59:51 ip-10-0-0-182 solr[27722]: Port 8983 is already being used by another process (pid: 4924)
Jul 14 11:59:51 ip-10-0-0-182 solr[27722]: Please choose a different port using the -p option.
Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: solr.service: control process exited, code=exited status=1
Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: Failed to start LSB: Controls Apache Solr as a Service.
Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: Unit solr.service entered failed state.
Jul 14 11:59:51 ip-10-0-0-182 systemd[1]: solr.service failed.

I can manually fix the box at the command line by stopping Solr with sudo service solr stop and then restarting seems to work with either command (systemctl or service).
Hacky Ansible fix would be to add a centos-only shell task to stop/start instead of restart but I'd rather understand the problem better and fix it right if possible.

@acozine
Copy link
Contributor Author

acozine commented Jul 15, 2016

The solr.service definition is in /run/systemd/generator.late/solr.service. The ExecStart and ExecStop commands point to /etc/rc.d/init.d/solr which contains

if [ -n "$RUNAS" ]; then
  su -c "SOLR_INCLUDE=\"$SOLR_ENV\" \"$SOLR_INSTALL_DIR/bin/solr\" $SOLR_CMD" - "$RUNAS"
else
  SOLR_INCLUDE="$SOLR_ENV" "$SOLR_INSTALL_DIR/bin/solr" "$SOLR_CMD"
fi

@acozine
Copy link
Contributor Author

acozine commented Jul 15, 2016

Gist of my shell session at https://gist.github.com/acozine/d82b19e0bc53a84a056cc31142a742bc; related issue at geerlingguy/drupal-vm#789

@geerlingguy
Copy link

@acozine - See my annoying, but working, fix here: geerlingguy/ansible-role-solr@71d5d56

@acozine acozine mentioned this issue Aug 2, 2016
@acozine
Copy link
Contributor Author

acozine commented Aug 2, 2016

@geerlingguy - Thanks! I've adopted your approach. This bug was the last blocker on merging in a lot of work, I really appreciate your workaround, and I agree that the problems is an annoying one.

@acozine acozine closed this as completed Nov 8, 2016
@acozine
Copy link
Contributor Author

acozine commented Nov 8, 2016

Fixed by 117bc76

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants