Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

crunchy-postgres: unable to create /pgdata/$HOSTNAME when using PVC. CrashLoop. #23

Closed
colemanserious opened this issue Oct 4, 2016 · 25 comments

Comments

@colemanserious
Copy link

Running on kubernetes 1.3. Set up pgdata volume as a PVC, which is getting served by an NFS system. Note: if I use emptyDir for the volumes, no issue. (But of course I can lose my data.)

Issue: Receiving a "mkdir: cannot create directory '/pgdata/postgres-deployment-1217853053-9wk6b': Permission denied", which finally causes a CrashLoopBackoff after enough attempts to retry. (Note: container's being set up as part of a deployment with replicas count of 1.)

Have cross-checked examples/kube/kitchensink, as well as other kube examples. Not catching any relevant differences.

Looking in rhel7/9.5/Dockerfile.postgres.rhel7, looks like the chown happens in the Dockerfile itself, which means that it can't actually apply (I think) to directories created by persistent volume mounts. In that same Dockerfile, I see commented out setcap calls against chown like someone was tinkering with this. Also see in notes/notes-priv.txt and notes/todo2 various wrestlings that look like they're related to this issue...

@jmccormick2001
Copy link
Contributor

Hi, to help me debug this, what are the permissions set to on your NFS volume path? also, are you running with selinux enabled?

@colemanserious
Copy link
Author

Not currently running with selinux enabled. And permissions are root:root 777 within a share. (Getting that info from someone who has direct access on the box : I'm currently just set up as a user within the kubernetes cluster.)

@jmccormick2001
Copy link
Contributor

one more question, is the postgres UID/GID set up on your kube box? these are normally set to UID=26 and GID=26.

@colemanserious
Copy link
Author

Within the startup script, I see it output that it has uid=26, groupid=26, etc. I take that to mean that the container itself is comfortable with having a postgres user. We're going to do some experiments tomorrow to see about making the NFS aware of the user directly.

@jmccormick2001
Copy link
Contributor

within the container it will know about the postgres UID/GID because postgres is installed inside the container, but outside of the container where kube/nfs live, it will need to have the same UID/GID available.

@jmccormick2001
Copy link
Contributor

let me know how it goes tmorrow, but I think you are on the right track.

@colemanserious
Copy link
Author

colemanserious commented Oct 5, 2016

Working to get someone to add the UID knowledge to the NFS server, and will post results once we have them. That said, this seems to be a problematic solution for a persistent volume claim environment. Bad for static provisioning (pre-set), worse for dynamic provisioning. Much digging around docker issues (docker issue #2259 seems relevant) and Google Groups hasn't cracked the code as yet, though.

@jmccormick2001
Copy link
Contributor

I tried just now to run the examples/kube/master-nfs example without a postgres UID or GID defined, it works on my system. It just sets the UID/GID on the nfs created directory to 26/26. My NFS /etc/exports file is like this: /nfsfileshare 192.168.0.102(rw,sync)

see if you can run that example and send me the log output to further debug. I'm using the latest postgres container version 1.2.3 on Dockerhub.

@colemanserious
Copy link
Author

Using Postgres container version 1.2.11, getting the following:
(Snippets below, since our corporate servers are disconnected from the Internet.)

setting PGROOT to /usr/pgsql-9.5
chown: cannot access '/pgdata/master-nfs': No such file or directory
mkdir: cannot create directory 'pgdata/master-nfs': Permission denied
chmod: cannot access '/pgdata/master-nfs': No such file or directory

  • source /opt/cpm/bin/setenv.sh
    ++ '[' -d /usr/pgsql-9.5 ']'
    ....
    ++ chown postgres /pgdata/master-nfs
    chown: cannot access '/pgdata/master-nfs': No such file or directory
  • rm /pgdata/master-nfs/postmaster.pid
    rm : cannot remove 'pgdata/master-nfs/postmaster.pid': No such file or direcotry
  • echo 'user id is...'
  • id
    user id is...
    uid=26(postgres) gid=26(postgres) groups=26(postgres)
  • ose_hack
    ++ id -u
  • export USER_ID=26
  • USER_ID=26

... there's more, but the root of the issue is that it can't create the master-nfs directory under /pgdata.

NOTE: I cannot currently execute the master-nfs-pv: run.sh returns in its log "Error from server: error when creating "STDIN": the server does not allow access to the requested resource (post persistentvolumes)." That's expected behavior in my environment. I have confirmed that kubernetes has bound a master-nfs-pvc claim in my namespace. I have also cross-checked the master-nfs-pv.json file to see if there was any special configuration of nfs in that file to pass along to our administrator team : none jumps out, anyway.

@jmccormick2001
Copy link
Contributor

one more question to make sure I'm testing your same config, are you building the containers using RHEL as the base image? and running Kube on a RHEL box?

@colemanserious
Copy link
Author

We've imported the crunchydata containers directly. Although we have to import them to make them available in our corporate environment, believe they're a straight pass-through with no change. Currently just using the crunchy-postgres image, though intending to take advantage of backup, etc.

Kube master (and the nodes) are running on CentOS, without SELinux enabled.

@jmccormick2001
Copy link
Contributor

I just created a new VM, this one is RHEL7, I built kube 1.3.8 on it by compiling the source code for kube, I installed NFS according to the instructions in the install.asciidoc section on NFS, I then ran the crunchy-postgres 1.2.3 container, with selinux on permissive mode it seems to work...if I turn on enforcing I get the mkdir permission error...not sure how else to debug this one just yet but I'm guessing it might be how the OS or NFS is configured but not sure unless I had access to your machine. is your selinux in permissive mode or disabled?

@colemanserious
Copy link
Author

Quick update: we've tested both by adding a postgres user to the NFS server and to the k8s master. No dice in either. We had the benefit of someone from CrunchyData who swung by, so not asking for additional info from Jeff - just saying this is still not quite yet figured out, in case anyone else is following this issue.

@jmccormick2001
Copy link
Contributor

I know that one customer I worked with recently had some NFS issues, the issues were caused by the NFS version on the NFS server versus the NFS client versions on the Openshift servers. OSE was being very picky about which versions it supported.

@jmccormick2001
Copy link
Contributor

if you can provide me the details on NFS versions (server and kube host), I could take a try at debugging this more...also OS version would be useful on both the NFS server and Kube host.

@mikejk8s
Copy link

mikejk8s commented Oct 28, 2016

I encounter this same situation when using a GCE Persistent Disk in k8s 1.4. I'm able to start up clusters when I don't assign the volume to /pgdata/ but that just means it uses /pgdata/ on the pod which of course is ephermal.

The mount to /pgdata/ works fine and it's accessible when I exec into the cluster, but only contains the lost+found after init. I figured it'd be a simple chown postgres but maybe not, I'm going to dig through the Crunchy scripts today to try and see whats going on.

I'm using the Crunchy containers (crunchydata/crunchy-postgres:centos7-9.5-1.2.4)

    volumeMounts:
    - mountPath: /pgdata/
      name: "{{.Values.masterPdName}}"
  volumes:
  - name: "{{.Values.masterPdName}}"
    gcePersistentDisk:
      pdName: "{{.Values.masterPdName}}"
      fsType: ext4
/dev/sdb on /pgdata type ext4 (rw,relatime,data=ordered)
/dev/sda1 on /pgconf type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
/dev/sda1 on /pgwal type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
/dev/sda1 on /recover type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
/dev/sda1 on /backup type ext4 (rw,nosuid,nodev,relatime,commit=30,data=ordered)
/dev/sda1 on /dev/termination-log type ext4 (rw,relatime,commit=30,data=ordered)

@jmccormick2001
Copy link
Contributor

might require fsGroup: 26 to be added in the pod spec

@mikejk8s
Copy link

mikejk8s commented Oct 28, 2016

Unfortunately no go. I'll dig into it and see if I can figure something out if there aren't any other suggestions. Thanks!

  securityContext:
    fsgroup: 26
2016-10-28T15:49:56.394223115Z cp: cannot create regular file '/pgdata/org-member-service-db-master': Permission denied

edit: I typoed fsGroup (big G) that actually seems to have done it!

@jmccormick2001
Copy link
Contributor

just a guess, but there appears to be some permission settings related to disk that could make the disk read-only, you might verify that the GCE disk is not set to read-only within the GCE console.

@jmccormick2001
Copy link
Contributor

ah, I just saw your edit on your earlier comment, sounds like the fsGroup worked for you. I'll close this issue out if I don't hear back.

@colemanserious
Copy link
Author

update here: we added supplementalGroups to the securityContext, rather than using fsGroup. But I understand that to be related to our use of shared storage (NFS) rather than block storage.

@jmccormick2001
Copy link
Contributor

ah, good to hear, I'd not used supplementalGroups before for NFS issues, that is a good one to know about.

@colemanserious
Copy link
Author

Worthy to add to the examples / README / ...? (fsGroup and or supplementalGroups)?

@jmccormick2001
Copy link
Contributor

probably both, I have this in the examples/dedicated templates but only the fsGroup reference. What I should do is include this in all postgres container examples to be consistent...I'll create a new Issue to capture that work.

@jmccormick2001
Copy link
Contributor

I'll close this issue out and have entered a new Issue to capture the
example updates as mentioned in this issue #32

jmccormick2001 pushed a commit that referenced this issue Feb 9, 2018
Remove openssh client from pgAdmin4 9.5, 9.6, and 10 Dockerfiles.
crunchyheath pushed a commit to crunchyheath/crunchy-containers that referenced this issue Mar 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants