Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plug-in caches and jira home lock file issue #89

Closed
bordenit opened this issue Mar 20, 2021 · 7 comments
Closed

Plug-in caches and jira home lock file issue #89

bordenit opened this issue Mar 20, 2021 · 7 comments

Comments

@bordenit
Copy link
Contributor

Recommend creating option and default init container to dump the plug-in caches and lock file before the container starts. Containers don't start up reliably without dumping the caches and lock file in the 267 iterations I tried, so I built the dumping of these files into a self-healing script for now, but it should proactively just dump those files before it starts.

@bordenit
Copy link
Contributor Author

https://confluence.atlassian.com/jirakb/troubleshoot-a-jira-server-startup-failed-error-394464512.html

"JIRA applications were shut down incorrectly, or failed to shut down" is one of the reasons for the lock file getting created. Pods may start and stop during rolling patching of servers and this should be able to handle this activity on its own without preventing itself from starting.

@bianchi2
Copy link
Collaborator

@bordenit thanks for the suggestion. Can you perhaps share the script and elaborate on how exactly Jira pods occasionally refused to start (stacktraces)? As far as I understand you had corrupt caches from time to time, and it'd be great to have an automated way to flush them? Please, correct me if I am wrong :)

@bordenit
Copy link
Contributor Author

bordenit commented Mar 21, 2021

I can include a stack trace the next time it happens, but it's basically this error:

https://community.atlassian.com/t5/Jira-questions/jira-startup-failed/qaq-p/1215226

So, I delete the .jira-home.lock file and flush the plugin caches in .bundled-plugins and .osgi-plug-ins in both the shared-home and home directories (as best can be deleted when these files are in use by the pods). Typically, the app comes back up after those steps. However, since the app is mounted to the pvc and the app is using the files you can't fully delete the plugin caches when you exec into the pod. So, I think that's a second reason for an init container so that these caches can be flushed fully in the pvcs before jira pods start. In non-kubernetes environment, I think the remediation is to fully shut down jira, then delete those caches. So, kubernetes deployment should ideally be able to do something similar. Hopefully, proactively rather than reactively.

@bianchi2
Copy link
Collaborator

thanks @bordenit indeed, a valid issue. I am not sure an init container is the best solution though - this init pod will flush cashes every time the container restarts, until the pod spec is updated to remove the init container. What you can do now is to define your init container in values.yaml and we'll pick it up and add to pod spec.

@bordenit
Copy link
Contributor Author

I believe the issue is a resource issue where Kubernetes itself does not load balance properly. If the pods or plugins fail to start, we force it to start on a different node to resolve the issue. The plug-in startup is very resource intensive and kicks the cpu on servers well over 100%. Hopefully, plug-in startup can be made less resource intensive, but you can close this for now, as we have a work around (even if not a very good one). Thanks.

@jesseborden
Copy link

jesseborden commented May 2, 2021

Kubernetes upgraded to v1.19.9 and same issues. 100% failure rate in first startup attempt. Flush cache, remove lock file, delete pod, wait, repeat is the workaround. We disabled McAfee on access scanner, increased JVM memory, catalina_opts startup timeout. Ticket is opened with Atlassian support, but it’s looking like moving to a VM or Ec2 instance might be the best idea. Tracking one more lead for plugin status in database. We have 4 environments that all fail hard at the plug-in startup requiring manual effort.

@bordenit
Copy link
Contributor Author

bordenit commented Jun 2, 2021

This is the fix for this. https://confluence.atlassian.com/jirakb/jira-startup-fails-with-message-that-required-plugins-are-not-started-254738702.html. I did that an restarted pods 10 times and didn't experience the issues again. This can be closed.

@bordenit bordenit closed this as completed Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants