Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httpd pod fails to start #18

Closed
ilackarms opened this issue Sep 10, 2017 · 15 comments · Fixed by #20
Closed

httpd pod fails to start #18

ilackarms opened this issue Sep 10, 2017 · 15 comments · Fixed by #20

Comments

@ilackarms
Copy link
Contributor

my httpd pod is failing (never starts). contents of journalctl -xe inside the httpd container

Sep 10 12:52:58 httpd-1-x08c9 systemd[1]: Failed to load environment files
: No such file or directory
Sep 10 12:52:58 httpd-1-x08c9 systemd[1]: httpd.service failed to run 'sta
rt' task: No such file or directory
Sep 10 12:52:58 httpd-1-x08c9 systemd[1]: Failed to start The Apache HTTP
Server.-
-- Subject: Unit httpd.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit httpd.service has failed.
--
-- The result is failed.
Sep 10 12:52:58 httpd-1-x08c9 systemd[1]: Unit httpd.service entered faile
d state.
Sep 10 12:52:58 httpd-1-x08c9 systemd[1]: httpd.service failed.

note: this is a duplicate of ManageIQ/manageiq-pods#215; however as I'm unsure whether the issue resides in the container itself or its configuration via the miq-template, I've put it in both places.

@abellotti
Copy link
Member

Not sure if related, but I had some odd timing issues last night causing httpd to fail to come up while I was testing SAML, here's the fix for it #17. Might help.

@ilackarms
Copy link
Contributor Author

yes, i figured out why this is happening. the missing file is /etc/container-environment. this file is created by /usr/bin/save-container-environment. /usr/bin/save-container-environment is invoked as a postStart lifecycle event on the pod. httpd service will crash when it starts before /usr/bin/save-container-environment has been run.

however, there is no guarantee that postStart commands will execute before the container entrypoint.

@abellotti
Copy link
Member

One option we have (maybe ugly), but in addition to #17 since we now have a directive statement for httpd to start after initialize-httpd-auth, we can have that service wait for the /etc/container-environment file to be created.

@fbladilo
Copy link
Contributor

fbladilo commented Sep 11, 2017

@abellotti If timing issues are encountered, another alternative is to use PreExecStart with a "sleep" call of a few seconds in the service unit file, https://www.freedesktop.org/software/systemd/man/systemd.service.html, this should give a fair chance for the post hook to finish performing the work, we have used this before on other unit files.

@abellotti
Copy link
Member

@fbladilo sleep seems indeterministic unless we have a guaranteed sleep time. what about a script that waits for the file to be created, i.e. something like ExecStartPre=/usr/bin/waitfor-container-environment ? we can add that to any service unit requiring /etc/container-environment.

@ilackarms
Copy link
Contributor Author

@abellotti afaict #17 does not fix this issue. it simply returns a more useful error when the pod starts:

root@ocp-master01 ~]# oc rsh httpd-1-jn4xh
sh-4.2# systemctl status httpd
● httpd.service - The Apache HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/httpd.service.d
           └─environment.conf
   Active: inactive (dead)
Condition: start condition failed at Tue 2017-09-12 14:53:24 UTC; 7s ago
           ConditionPathExists=/etc/container-environment was not met
     Docs: man:httpd(8)
           man:apachectl(8)

@fbladilo
Copy link
Contributor

@ilackarms I'm able to deploy httpd after #17 without issues, I'll try a couple of more runs during the day...

@ilackarms
Copy link
Contributor Author

@fbladilo see if you can reproduce my error. I don't believe #17 should fix this problem; ConditionPathExists does not wait for the file to exist, it's not a while, it's an if; if the file doesn't exist, the service will fail (and will not be restarted)

@fbladilo
Copy link
Contributor

@ilackarms I tried a few times already but cannot reproduce after #17, I haven't researched too deeply yet but seems gone here, I want to be able to reproduce so I might just clean all images and projects on the cluster and give it another fresh run with latest possible.

@fbladilo
Copy link
Contributor

@ilackarms A few more attemtps this morning, using latest images, no luck to reproduce.

[root@franco-ocp-master manageiq-pods (master)]# oc version
oc v3.5.5.5
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://franco-ocp-master.e2e.bos.redhat.com:8443
openshift v3.5.5.5
kubernetes v1.5.2+43a9be4
[root@franco-ocp-master manageiq-pods (master)]# oc get pods
NAME                 READY     STATUS    RESTARTS   AGE
httpd-1-fm0bz        1/1       Running   0          16m
manageiq-0           1/1       Running   1          16m
memcached-1-m01tq    1/1       Running   0          16m
postgresql-1-6z4hw   1/1       Running   0          16m

@ilackarms
Copy link
Contributor Author

weird! I really can't tell why it happens on almost every run for me (without the hotfix from my PR)...

@bazulay
Copy link

bazulay commented Sep 13, 2017

@fbladilo @ilackarms @abellotti
I was able to reproduce this issue,
This is a built in race between https comming up & the post job that produces the environment variable file.
So as @ilackarms stated I also do not think patch #17 solves the issue.
I would also try to avoid another unit file for a service that waits on the file (as suggested in patch #20) but will go with the PreExecStart of the httpd unit file , but not using sleep but rather do several attempts & fail gracefully if failed after configurable number of attempts.

@ilackarms
Copy link
Contributor Author

@bazulay If we go with ExecStartPre, we'll need to remove the ConditionPathExists line, as this dependency causes the unit to fail if the file doesn't exist, before running ExecStartPre

@bazulay
Copy link

bazulay commented Sep 13, 2017

@ilackarms I didn't refer to patch #17 , but about the general approach

@fbladilo
Copy link
Contributor

fbladilo commented Sep 13, 2017

@bazulay We are going with a simpler approach using ExecStartPre and without path conditionals and additional units.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants