Recovered from a botched cluster without a backup🤦♀️- Was this the right way of doing it? #2810
Replies: 1 comment 1 reply
-
I'd also like to share my attempt at restoring a dead cnpg DB, in a similar way to yours, with the main difference being that I ran a Postgres docker container right on the NAS and exported the DB using PGAdmin. In the end, I was not successful in restoring my old Nextcloud data - I'm not sure why, - but perhaps the method will be more useful to others. What happened in my case was that after upgrading TrueNAS Scale, my Nextcloud chart failed to deploy. I didn't understand what was going on and attempted a few fixes with zero knowledge, one of which was to downgrade NC to a previous version - in my defense, I didn't know that'd damage the DB, and changing versions seemed to be the only way to get NC unstuck from deploying indefinitely.
sudo docker run -it --rm -d --name=pgrecover \
-e POSTGRES_HOST_AUTH_METHOD=trust \
-e PGDATA=/var/lib/postgresql/data \
-v ${PWD}/pgdata:/var/lib/postgresql/data \
-v ${PWD}/pg_wal:/var/lib/postgresql/wal/pg_wal \
-v ${PWD}controller:/controller \
--network host \
postgres:15
sudo docker logs pgrecover
After this, I shut down the container, and tried to reinstall and restore Nextcloud following the migration guide. I got an error on the DB restore operation, but the guide suggests this may be safe to ignore. I did all the other steps (occ etc). However, my luck seems to have run out, as Perhaps main2 would've been the right DB to restore. Perhaps these errors could've been fixed had I contacted TrueCharts support (again), but at this point I gave up. My Nextcloud instance hadn't been used that much yet so losing the DB was not a huge blow, and I decided not to waste more time on trying to save it instead of setting it up from scratch. I still have the DB backup, so if someone tells me where I did wrong, I might be willing to give it another try just to see if this method could've been successful. |
Beta Was this translation helpful? Give feedback.
-
I came here asking questions, while gathering my thought and presenting the data, I actualy figured it out on my own.
TL;DR:
I managed to spin up a postgres docker container and do a pg_dump form the PVCs of a botched cnpg cluster.
If you're interested in the details, you can expand below and see the revised process with the success.
I still have some open qestions though...
If I was able to spin up a container manually that allowed me to read the database, isn't there another way to restore a broken cnpg cluster?
Is what I described above the only way I had?
My background
I'm completely new to cloudnative-pg, I'm no master of kubernetes either, although I believe I can navigate instructions well enough.
I'm completely comfortable with docker and docker-compose, been using that for years.
How I got here
Plain naivate, and ignorance. (see above)
The cloudnative-pg implementation I'm using are based on helm charts by TrueCharts deployed on a TrueNAS SCALE machine.
The TrueNAS SCALE has a k3s cluster implemented, and TrueCharts apps are in essence helm charts based on thier templates and the DB of choice there are cloudnative-pg. Suffice to say, I didn't actually deploy it myself, but it was deployed automatically.
After a power outtage on the machine which preceeded the machine running out of disk space - the combination prooved leathal for one of the apps, specifically that app's
cnpg
cluster.Mix that with me thinking I can figure it out on my own, I deleted pods manually (yes, yes I know - stupid!)
I take full responsibility for the stupidity. I know it is my fault, but the fact of the matter is - I'm stuck with it right now.
Since then I've stopped, started reading (what I should have done first), and I think I understand cnpg better, but that doesn't mean I know how to get out of this situation.
What I've got
The backup's I have are too old (another stupid), and so I'm invested in recovering the data.
I am left with the PVCs and a cluster that refuses to start.
Accoring to
kubectl cnpg status
(see below) the cluster's primary is set tomyappname-cnpg-main-2
, but that pod is missing. And switching over tomyappname-cnpg-main-1
is failing.Where I was struggling
I'm currently struggling executing my plan for recovery (based on this SO answer):
I woud like to spin up a
postgres
docker container running attached to the data in the PVCs so I canpg_dump
the database, and then I can reinstall the app and recover it.What I tried and eventually succeeded with
Copied over the content of the
myappname-cnpg-main-1
andmyappname-cnpg-main-1-wal
PVCs into a local dir (different machine) preserving the permissions, call thesepgdata
andpg_wal
in my local dir respectivley.The original copy contained a
custom.conf
file with posgres cluster/replication configuration and ssl, once I figured out that was stopping me from succeding, I moved the file away, and placed a blank file instead:Spin up a postgres docker container, mapped to the pgdata and wal properly.
There were a few warnings about SSL switched on while
pg_hba.conf
was processed, but I finally got message I wanted to get:Ctrl-C, to stop and remove the container, and spun it up again, this time with
--detached
so it will run in the background.Then I run
pg_dump
on the active background container, for both thepostgres
andmyappname
into local files.Stopped and removed the container, and I could continue with the recovery process.
(my old) Question (see top for a revised version of this one)
Is this approach viable? Is there a different way to recover this cluster without losing the data?
Technical Details / cli output
existing pods
(click to expand)Output of
k3s kubectl get --namespace=ix-myappname pods | sort
:cnpg status output
(click to expand)Output of
kubectl cnpg status --namespace=ix-myappname myappname-cnpg-main
:list of pvc
(click to expand)Output of
k3s kubectl get --namespace=ix-myappname pvc | sort
:Output of
zfs list -o ix:pvc-name,name,avail,usedbydataset,used,quota,canmount,mounted,mountpoint -d3 zpool/ix-applications/releases/myappname
:logs of the cnpg-main-1
(click to expand)Output of
kubectl logs --namespace ix-myappname myappname-cnpg-main-1
:PVC contents
(click to expand)Content of the pgdata PVC:
Content of the pgdata PVC:
Beta Was this translation helpful? Give feedback.
All reactions