Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stopping liberty Windows service immediately after starting results in hang condition #23392

Closed
jimblye opened this issue Nov 16, 2022 · 2 comments · Fixed by #23727
Closed
Assignees
Labels
release bug This bug is present in a released version of Open Liberty release:23001

Comments

@jimblye
Copy link
Member

jimblye commented Nov 16, 2022

Describe the bug
This applies when an open liberty server is registered as a Windows service. So obviously it is Windows-only. If you stop the service immediately after starting the service, this will result in a hang condition.

Steps to Reproduce

  • Method 1
    Register a liberty server as a Windows service.
    Start the service using the Windows services panel.
    As soon as the service appears to be running, stop the service.
    Windows services should hang for a few minutes and eventually display a popup window.
    The state of the service will be stuck in stopping.
    To stop the service at this point, you will need to kill the server process.

  • Method 2
    Register a liberty server as a Windows service
    Enter the following commands one right after the other.
    server startWinService
    server stopWinService
    If you entered the 2nd command quick enough, the second command will hang.

Expected behavior
You should be able to start the Windows service and immediately stop it.

Diagnostic information:

  • OpenLiberty Version: [ * - 22.0.0.10] - beginning version is the 1st version that allows running liberty as a Windows service
  • Affected feature(s) - no specific feature.
  • Java Version: [ all versions ]
  • server.xml configuration ( doesn't matter)

Additional context
The apache program, prunsrv, is used to register, start, stop, and unregister a liberty server as a Windows service. When you use the Windows services app to start the service, it invokes prunsrv, which in turn invokes server.bat. The problem seems to be that prunsrv is not waiting for server.bat to finish. So it is reporting back to Windows services that the service is up, but the server is not quite finished yet. When the server is finished starting up, it creates a serverDir\workarea.sCommand file which contains the TCP/IP port that is needed to send commands to the server. If you try to stop the server too soon, the command will ultimately fail, but will hang for about a minute or two. It fails because it cannot find the .sCommand file.

An issue has been opened with prunsrv since they are not waiting for the command to finish:
https://issues.apache.org/jira/browse/DAEMON-449

This is also fixable in liberty code. The stop command first checks to see if the server is running. If it is not, it immediately stops. If it is running, it then checks for the existence of .sCommand. If it doesn't exist, there is either a problem, or the server is still starting (more likely.). The problem can be solved by simply polling on the existence of .sCommand. The timeout should probably be whatever the stop timeout is, which defaults to 30 seconds.

@jimblye jimblye added the release bug This bug is present in a released version of Open Liberty label Nov 16, 2022
@jimblye jimblye self-assigned this Nov 16, 2022
@jimblye
Copy link
Member Author

jimblye commented Dec 19, 2022

#build

@jimblye
Copy link
Member Author

jimblye commented Dec 19, 2022

#libby

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release bug This bug is present in a released version of Open Liberty release:23001
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants