crunchy-postgres: unable to create /pgdata/$HOSTNAME when using PVC. CrashLoop. #23

colemanserious · 2016-10-04T19:39:28Z

Running on kubernetes 1.3. Set up pgdata volume as a PVC, which is getting served by an NFS system. Note: if I use emptyDir for the volumes, no issue. (But of course I can lose my data.)

Issue: Receiving a "mkdir: cannot create directory '/pgdata/postgres-deployment-1217853053-9wk6b': Permission denied", which finally causes a CrashLoopBackoff after enough attempts to retry. (Note: container's being set up as part of a deployment with replicas count of 1.)

Have cross-checked examples/kube/kitchensink, as well as other kube examples. Not catching any relevant differences.

Looking in rhel7/9.5/Dockerfile.postgres.rhel7, looks like the chown happens in the Dockerfile itself, which means that it can't actually apply (I think) to directories created by persistent volume mounts. In that same Dockerfile, I see commented out setcap calls against chown like someone was tinkering with this. Also see in notes/notes-priv.txt and notes/todo2 various wrestlings that look like they're related to this issue...

jmccormick2001 · 2016-10-04T20:05:49Z

Hi, to help me debug this, what are the permissions set to on your NFS volume path? also, are you running with selinux enabled?

colemanserious · 2016-10-04T20:21:36Z

Not currently running with selinux enabled. And permissions are root:root 777 within a share. (Getting that info from someone who has direct access on the box : I'm currently just set up as a user within the kubernetes cluster.)

jmccormick2001 · 2016-10-04T20:35:58Z

one more question, is the postgres UID/GID set up on your kube box? these are normally set to UID=26 and GID=26.

colemanserious · 2016-10-04T21:32:32Z

Within the startup script, I see it output that it has uid=26, groupid=26, etc. I take that to mean that the container itself is comfortable with having a postgres user. We're going to do some experiments tomorrow to see about making the NFS aware of the user directly.

jmccormick2001 · 2016-10-04T21:35:22Z

within the container it will know about the postgres UID/GID because postgres is installed inside the container, but outside of the container where kube/nfs live, it will need to have the same UID/GID available.

jmccormick2001 · 2016-10-04T21:40:42Z

let me know how it goes tmorrow, but I think you are on the right track.

colemanserious · 2016-10-05T15:41:28Z

Working to get someone to add the UID knowledge to the NFS server, and will post results once we have them. That said, this seems to be a problematic solution for a persistent volume claim environment. Bad for static provisioning (pre-set), worse for dynamic provisioning. Much digging around docker issues (docker issue #2259 seems relevant) and Google Groups hasn't cracked the code as yet, though.

jmccormick2001 · 2016-10-05T15:59:48Z

I tried just now to run the examples/kube/master-nfs example without a postgres UID or GID defined, it works on my system. It just sets the UID/GID on the nfs created directory to 26/26. My NFS /etc/exports file is like this: /nfsfileshare 192.168.0.102(rw,sync)

see if you can run that example and send me the log output to further debug. I'm using the latest postgres container version 1.2.3 on Dockerhub.

colemanserious · 2016-10-05T16:55:17Z

Using Postgres container version 1.2.11, getting the following:
(Snippets below, since our corporate servers are disconnected from the Internet.)

setting PGROOT to /usr/pgsql-9.5
chown: cannot access '/pgdata/master-nfs': No such file or directory
mkdir: cannot create directory 'pgdata/master-nfs': Permission denied
chmod: cannot access '/pgdata/master-nfs': No such file or directory

source /opt/cpm/bin/setenv.sh
++ '[' -d /usr/pgsql-9.5 ']'
....
++ chown postgres /pgdata/master-nfs
chown: cannot access '/pgdata/master-nfs': No such file or directory
rm /pgdata/master-nfs/postmaster.pid
rm : cannot remove 'pgdata/master-nfs/postmaster.pid': No such file or direcotry
echo 'user id is...'
id
user id is...
uid=26(postgres) gid=26(postgres) groups=26(postgres)
ose_hack
++ id -u
export USER_ID=26
USER_ID=26

... there's more, but the root of the issue is that it can't create the master-nfs directory under /pgdata.

NOTE: I cannot currently execute the master-nfs-pv: run.sh returns in its log "Error from server: error when creating "STDIN": the server does not allow access to the requested resource (post persistentvolumes)." That's expected behavior in my environment. I have confirmed that kubernetes has bound a master-nfs-pvc claim in my namespace. I have also cross-checked the master-nfs-pv.json file to see if there was any special configuration of nfs in that file to pass along to our administrator team : none jumps out, anyway.

jmccormick2001 · 2016-10-05T18:41:17Z

one more question to make sure I'm testing your same config, are you building the containers using RHEL as the base image? and running Kube on a RHEL box?

colemanserious · 2016-10-05T18:55:54Z

We've imported the crunchydata containers directly. Although we have to import them to make them available in our corporate environment, believe they're a straight pass-through with no change. Currently just using the crunchy-postgres image, though intending to take advantage of backup, etc.

Kube master (and the nodes) are running on CentOS, without SELinux enabled.

jmccormick2001 · 2016-10-05T19:39:43Z

I just created a new VM, this one is RHEL7, I built kube 1.3.8 on it by compiling the source code for kube, I installed NFS according to the instructions in the install.asciidoc section on NFS, I then ran the crunchy-postgres 1.2.3 container, with selinux on permissive mode it seems to work...if I turn on enforcing I get the mkdir permission error...not sure how else to debug this one just yet but I'm guessing it might be how the OS or NFS is configured but not sure unless I had access to your machine. is your selinux in permissive mode or disabled?

colemanserious · 2016-10-11T14:45:55Z

Quick update: we've tested both by adding a postgres user to the NFS server and to the k8s master. No dice in either. We had the benefit of someone from CrunchyData who swung by, so not asking for additional info from Jeff - just saying this is still not quite yet figured out, in case anyone else is following this issue.

jmccormick2001 · 2016-10-11T16:09:31Z

I know that one customer I worked with recently had some NFS issues, the issues were caused by the NFS version on the NFS server versus the NFS client versions on the Openshift servers. OSE was being very picky about which versions it supported.

jmccormick2001 · 2016-10-17T21:19:23Z

if you can provide me the details on NFS versions (server and kube host), I could take a try at debugging this more...also OS version would be useful on both the NFS server and Kube host.

mikejk8s · 2016-10-28T15:36:19Z

I encounter this same situation when using a GCE Persistent Disk in k8s 1.4. I'm able to start up clusters when I don't assign the volume to /pgdata/ but that just means it uses /pgdata/ on the pod which of course is ephermal.

The mount to /pgdata/ works fine and it's accessible when I exec into the cluster, but only contains the lost+found after init. I figured it'd be a simple chown postgres but maybe not, I'm going to dig through the Crunchy scripts today to try and see whats going on.

I'm using the Crunchy containers (crunchydata/crunchy-postgres:centos7-9.5-1.2.4)

    volumeMounts:
    - mountPath: /pgdata/
      name: "{{.Values.masterPdName}}"
  volumes:
  - name: "{{.Values.masterPdName}}"
    gcePersistentDisk:
      pdName: "{{.Values.masterPdName}}"
      fsType: ext4

/dev/sdb on /pgdata type ext4 (rw,relatime,data=ordered)
/dev/sda1 on /pgconf type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
/dev/sda1 on /pgwal type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
/dev/sda1 on /recover type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
/dev/sda1 on /backup type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
/dev/sda1 on /dev/termination-log type ext4 (rw,relatime,commit=30,data=ordered)

jmccormick2001 · 2016-10-28T15:40:22Z

might require fsGroup: 26 to be added in the pod spec

mikejk8s · 2016-10-28T15:52:13Z

Unfortunately no go. I'll dig into it and see if I can figure something out if there aren't any other suggestions. Thanks!

  securityContext:
    fsgroup: 26

2016-10-28T15:49:56.394223115Z cp: cannot create regular file '/pgdata/org-member-service-db-master': Permission denied

edit: I typoed fsGroup (big G) that actually seems to have done it!

jmccormick2001 · 2016-10-28T19:26:42Z

just a guess, but there appears to be some permission settings related to disk that could make the disk read-only, you might verify that the GCE disk is not set to read-only within the GCE console.

jmccormick2001 · 2016-10-28T19:27:59Z

ah, I just saw your edit on your earlier comment, sounds like the fsGroup worked for you. I'll close this issue out if I don't hear back.

colemanserious · 2016-10-28T19:37:12Z

update here: we added supplementalGroups to the securityContext, rather than using fsGroup. But I understand that to be related to our use of shared storage (NFS) rather than block storage.

jmccormick2001 · 2016-10-28T19:43:46Z

ah, good to hear, I'd not used supplementalGroups before for NFS issues, that is a good one to know about.

colemanserious · 2016-10-28T19:47:19Z

Worthy to add to the examples / README / ...? (fsGroup and or supplementalGroups)?

jmccormick2001 · 2016-10-28T19:50:07Z

probably both, I have this in the examples/dedicated templates but only the fsGroup reference. What I should do is include this in all postgres container examples to be consistent...I'll create a new Issue to capture that work.

jmccormick2001 · 2016-10-31T15:31:09Z

I'll close this issue out and have entered a new Issue to capture the
example updates as mentioned in this issue #32

Remove openssh client from pgAdmin4 9.5, 9.6, and 10 Dockerfiles.

Revert "Remove Glide and use dep"

jmccormick2001 closed this as completed Oct 31, 2016

jmccormick2001 pushed a commit that referenced this issue Feb 9, 2018

Merge pull request #23 from steve-hetzel/rhelPgadminSsh

9ca5463

Remove openssh client from pgAdmin4 9.5, 9.6, and 10 Dockerfiles.

crunchyheath pushed a commit to crunchyheath/crunchy-containers that referenced this issue Mar 19, 2018

Merge pull request CrunchyData#23 from CrunchyData/revert-19-usedep

ac03ea0

Revert "Remove Glide and use dep"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

crunchy-postgres: unable to create /pgdata/$HOSTNAME when using PVC. CrashLoop. #23

crunchy-postgres: unable to create /pgdata/$HOSTNAME when using PVC. CrashLoop. #23

colemanserious commented Oct 4, 2016

jmccormick2001 commented Oct 4, 2016

colemanserious commented Oct 4, 2016

jmccormick2001 commented Oct 4, 2016

colemanserious commented Oct 4, 2016

jmccormick2001 commented Oct 4, 2016

jmccormick2001 commented Oct 4, 2016

colemanserious commented Oct 5, 2016 •

edited

Loading

jmccormick2001 commented Oct 5, 2016

colemanserious commented Oct 5, 2016

jmccormick2001 commented Oct 5, 2016

colemanserious commented Oct 5, 2016

jmccormick2001 commented Oct 5, 2016

colemanserious commented Oct 11, 2016

jmccormick2001 commented Oct 11, 2016

jmccormick2001 commented Oct 17, 2016

mikejk8s commented Oct 28, 2016 •

edited

Loading

jmccormick2001 commented Oct 28, 2016

mikejk8s commented Oct 28, 2016 •

edited

Loading

jmccormick2001 commented Oct 28, 2016

jmccormick2001 commented Oct 28, 2016

colemanserious commented Oct 28, 2016

jmccormick2001 commented Oct 28, 2016

colemanserious commented Oct 28, 2016

jmccormick2001 commented Oct 28, 2016

jmccormick2001 commented Oct 31, 2016

crunchy-postgres: unable to create /pgdata/$HOSTNAME when using PVC. CrashLoop. #23

crunchy-postgres: unable to create /pgdata/$HOSTNAME when using PVC. CrashLoop. #23

Comments

colemanserious commented Oct 4, 2016

jmccormick2001 commented Oct 4, 2016

colemanserious commented Oct 4, 2016

jmccormick2001 commented Oct 4, 2016

colemanserious commented Oct 4, 2016

jmccormick2001 commented Oct 4, 2016

jmccormick2001 commented Oct 4, 2016

colemanserious commented Oct 5, 2016 • edited Loading

jmccormick2001 commented Oct 5, 2016

colemanserious commented Oct 5, 2016

jmccormick2001 commented Oct 5, 2016

colemanserious commented Oct 5, 2016

jmccormick2001 commented Oct 5, 2016

colemanserious commented Oct 11, 2016

jmccormick2001 commented Oct 11, 2016

jmccormick2001 commented Oct 17, 2016

mikejk8s commented Oct 28, 2016 • edited Loading

jmccormick2001 commented Oct 28, 2016

mikejk8s commented Oct 28, 2016 • edited Loading

jmccormick2001 commented Oct 28, 2016

jmccormick2001 commented Oct 28, 2016

colemanserious commented Oct 28, 2016

jmccormick2001 commented Oct 28, 2016

colemanserious commented Oct 28, 2016

jmccormick2001 commented Oct 28, 2016

jmccormick2001 commented Oct 31, 2016

colemanserious commented Oct 5, 2016 •

edited

Loading

mikejk8s commented Oct 28, 2016 •

edited

Loading

mikejk8s commented Oct 28, 2016 •

edited

Loading