Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backups with benji #274

Closed
allenporter opened this issue Aug 2, 2021 · 7 comments
Closed

Backups with benji #274

allenporter opened this issue Aug 2, 2021 · 7 comments

Comments

@allenporter
Copy link
Owner

After experimenting with velero in #273 i'd like to now try benji based, and will follow the pattern in https://github.com/toboshii/home-cluster/tree/main/cluster/apps/backup-system/

My requirements:

  • backup ceph based pvcs
  • target nfs (simply, without a provisioner)
@allenporter
Copy link
Owner Author

Downsides from discussion on k8s-at-home discord:

  • Requires postgresql db that needs its own backup strategy (e.g. pg_dump and rsync)
  • Project is not super widely used, but developer seems responsive

allenporter added a commit that referenced this issue Aug 17, 2021
allenporter added a commit that referenced this issue Aug 24, 2021
@allenporter
Copy link
Owner Author

Backup jobs are up. The backup-all job attempted to run and failed with:

RuntimeError: rbd invocation failed with return code 2 and output: did not load config file, using default settings. | 2021-08-23T19:10:02.748-0700 7fcfc335ef40 -1 Errors while parsing config file! | 2021-08-23T19:10:02.748-0700 7fcfc335ef40 -1 parse_file: filesystem error: cannot get file size: No such file or directory [ceph.conf] | 2021-08-23T19:10:02.748-0700 7fcfc335ef40 -1 Errors while parsing config file! | 2021-08-23T19:10:02.748-0700 7fcfc335ef40 -1 parse_file: filesystem error: cannot get file size: No such file or directory [ceph.conf] | unable to get monitor info from DNS SRV with service name: ceph-mon2021-08-23T19:10:02.756-0700 7fcfc335ef40 -1 failed for service _ceph-mon._tcp | | 2021-08-23T19:10:02.756-0700 7fcfc335ef40 -1 monclient: get_monmap_and_config cannot identify monitors to contact | rbd: couldn't connect to the cluster! 

I have not yet configured benji to know how to talk to ceph, so that makes sense. I am surprised though that its not just using CSI? Need to look closer at how i've configured benji and how it works.

allenporter added a commit that referenced this issue Aug 24, 2021
Keyring is encrypted with sops:

  sops --encrypt --in-place infrastructure/dev/benji-ceph-keyring.yaml

Issue #274
allenporter added a commit that referenced this issue Aug 24, 2021
allenporter added a commit that referenced this issue Aug 24, 2021
allenporter added a commit that referenced this issue Aug 24, 2021
allenporter added a commit that referenced this issue Aug 24, 2021
allenporter added a commit that referenced this issue Aug 24, 2021
allenporter added a commit that referenced this issue Aug 24, 2021
allenporter added a commit that referenced this issue Aug 24, 2021
allenporter added a commit that referenced this issue Aug 24, 2021
@allenporter
Copy link
Owner Author

Jobs are up and running. Postgres configuration issues are resolved.

Documentation for containerized benji:
https://github.com/elemental-lf/benji/blob/master/docs/source/container.rst

To list the cron jobs:

$ kubectl get cronjob -n benji
NAME                          SCHEDULE      SUSPEND   ACTIVE   LAST SCHEDULE   AGE
benji-backup-home-assistant   00 10 * * *   False     0        <none>          11m
benji-backup-monitoring       00 10 * * *   False     0        <none>          11m
benji-cleanup                 00 12 * * *   False     0        <none>          31m
benji-enforce                 00 11 * * *   False     0        <none>          31m

To run a manual backup job:

kubectl create job --from=cronjob/benji-backup-home-assistant test-backup-job-ha-1 -n benji

Now to list the existing backups:

$ JOB=benji-maint-56dc5c5cb-kq92m
$ kubectl exec -it -n benji "${JOB}" -- benji ls
+---------------------+--------------------------------------------------------+-------------------------------------------------+------------------------+---------+------------+--------+-----------+-----------+
|         date        | uid                                                    | volume                                          | snapshot               |    size | block_size | status | protected | storage   |
+---------------------+--------------------------------------------------------+-------------------------------------------------+------------------------+---------+------------+--------+-----------+-----------+
| 2021-08-24T16:43:37 | home-assistant-data-home-assistant-postgresql-0-etq93l | home-assistant/data-home-assistant-postgresql-0 | b-2021-08-24T23:43:35Z |  8.0GiB |     4.0MiB | valid  |   False   | storage-1 |
| 2021-08-24T16:58:10 | home-assistant-data-home-assistant-postgresql-0-1uvryr | home-assistant/data-home-assistant-postgresql-0 | b-2021-08-24T23:58:07Z |  8.0GiB |     4.0MiB | valid  |   False   | storage-1 |
| 2021-08-24T16:44:21 | home-assistant-home-assistant-config-o697ud            | home-assistant/home-assistant-config            | b-2021-08-24T23:44:18Z |  5.0GiB |     4.0MiB | valid  |   False   | storage-1 |
| 2021-08-24T16:58:25 | home-assistant-home-assistant-config-yqsjn3            | home-assistant/home-assistant-config            | b-2021-08-24T23:58:21Z |  5.0GiB |     4.0MiB | valid  |   False   | storage-1 |
+---------------------+--------------------------------------------------------+-------------------------------------------------+------------------------+---------+------------+--------+-----------+-----------+

Testing out a deep scrub:

$ kubectl exec -it -n benji "${JOB}" -- benji deep-scrub home-assistant-home-assistant-config-yqsjn3
    INFO: Deep-scrubbed 1/65 blocks (1.5%)
    INFO: Deep-scrubbed 2/65 blocks (3.1%)
    INFO: Deep-scrubbed 3/65 blocks (4.6%)
    INFO: Deep-scrubbed 4/65 blocks (6.2%)
    INFO: Deep-scrubbed 5/65 blocks (7.7%)
    INFO: Deep-scrubbed 6/65 blocks (9.2%)
    INFO: Deep-scrubbed 7/65 blocks (10.8%)
    INFO: Deep-scrubbed 8/65 blocks (12.3%)
    INFO: Deep-scrubbed 9/65 blocks (13.8%)
    INFO: Deep-scrubbed 10/65 blocks (15.4%)
    INFO: Deep-scrubbed 11/65 blocks (16.9%)
    INFO: Deep-scrubbed 12/65 blocks (18.5%)
    INFO: Deep-scrubbed 13/65 blocks (20.0%)
    INFO: Deep-scrubbed 14/65 blocks (21.5%)
    INFO: Deep-scrubbed 15/65 blocks (23.1%)
    INFO: Deep-scrubbed 16/65 blocks (24.6%)
    INFO: Deep-scrubbed 17/65 blocks (26.2%)
    INFO: Deep-scrubbed 18/65 blocks (27.7%)
    INFO: Deep-scrubbed 19/65 blocks (29.2%)
    INFO: Deep-scrubbed 20/65 blocks (30.8%)
    INFO: Deep-scrubbed 21/65 blocks (32.3%)
    INFO: Deep-scrubbed 22/65 blocks (33.8%)
    INFO: Deep-scrubbed 23/65 blocks (35.4%)
    INFO: Deep-scrubbed 24/65 blocks (36.9%)
    INFO: Deep-scrubbed 25/65 blocks (38.5%)
    INFO: Deep-scrubbed 26/65 blocks (40.0%)
    INFO: Deep-scrubbed 27/65 blocks (41.5%)
    INFO: Deep-scrubbed 28/65 blocks (43.1%)
    INFO: Deep-scrubbed 29/65 blocks (44.6%)
    INFO: Deep-scrubbed 30/65 blocks (46.2%)
    INFO: Deep-scrubbed 31/65 blocks (47.7%)
    INFO: Deep-scrubbed 32/65 blocks (49.2%)
    INFO: Deep-scrubbed 33/65 blocks (50.8%)
    INFO: Deep-scrubbed 34/65 blocks (52.3%)
    INFO: Deep-scrubbed 35/65 blocks (53.8%)
    INFO: Deep-scrubbed 36/65 blocks (55.4%)
    INFO: Deep-scrubbed 37/65 blocks (56.9%)
    INFO: Deep-scrubbed 38/65 blocks (58.5%)
    INFO: Deep-scrubbed 39/65 blocks (60.0%)
    INFO: Deep-scrubbed 40/65 blocks (61.5%)
    INFO: Deep-scrubbed 41/65 blocks (63.1%)
    INFO: Deep-scrubbed 42/65 blocks (64.6%)
    INFO: Deep-scrubbed 43/65 blocks (66.2%)
    INFO: Deep-scrubbed 44/65 blocks (67.7%)
    INFO: Deep-scrubbed 45/65 blocks (69.2%)
    INFO: Deep-scrubbed 46/65 blocks (70.8%)
    INFO: Deep-scrubbed 47/65 blocks (72.3%)
    INFO: Deep-scrubbed 48/65 blocks (73.8%)
    INFO: Deep-scrubbed 49/65 blocks (75.4%)
    INFO: Deep-scrubbed 50/65 blocks (76.9%)
    INFO: Deep-scrubbed 51/65 blocks (78.5%)
    INFO: Deep-scrubbed 52/65 blocks (80.0%)
    INFO: Deep-scrubbed 53/65 blocks (81.5%)
    INFO: Deep-scrubbed 54/65 blocks (83.1%)
    INFO: Deep-scrubbed 55/65 blocks (84.6%)
    INFO: Deep-scrubbed 56/65 blocks (86.2%)
    INFO: Deep-scrubbed 57/65 blocks (87.7%)
    INFO: Deep-scrubbed 58/65 blocks (89.2%)
    INFO: Deep-scrubbed 59/65 blocks (90.8%)
    INFO: Deep-scrubbed 60/65 blocks (92.3%)
    INFO: Deep-scrubbed 61/65 blocks (93.8%)
    INFO: Deep-scrubbed 62/65 blocks (95.4%)
    INFO: Deep-scrubbed 63/65 blocks (96.9%)
    INFO: Deep-scrubbed 64/65 blocks (98.5%)
    INFO: Deep-scrubbed 65/65 blocks (100.0%)
    INFO: Set status of version home-assistant-home-assistant-config-yqsjn3 to valid.
    INFO: Deep-scrub of version home-assistant-home-assistant-config-yqsjn3 successful.

@allenporter
Copy link
Owner Author

Attempting to restore:

# benji-restore-pvc --pvc-storage-class="rook-ceph-block" home-assistant-home-assistant-config-yqsjn3 home-assistant home-assistant-config-restore 
Restoring version home-assistant-home-assistant-config-yqsjn3 to PVC home-assistant/home-assistant-config-restore.
   ERROR: ConfigurationError: IO scheme rbd is undefined.
Traceback (most recent call last):
  File "/benji/bin/benji-restore-pvc", line 33, in <module>
    sys.exit(load_entry_point('benji-k8s-tools==0.1', 'console_scripts', 'benji-restore-pvc')())
  File "/benji/lib64/python3.6/site-packages/benji/k8s_tools/scripts/restore_pvc.py", line 104, in main
    args.restore_url_template.format(pool=pool, image=image),
  File "/benji/lib64/python3.6/site-packages/benji/helpers/utils.py", line 65, in subprocess_run
    raise RuntimeError(f'{args[0]} invocation failed with return code {result.returncode} and output: {_one_line_stderr(result.stderr)}')
RuntimeError: benji invocation failed with return code 78 and output:  ERROR: ConfigurationError: IO scheme rbd is undefined. 

The configuration does define rbd with kube-pool:

# cat /etc/benji/benji.yaml 
configurationVersion: "1"
databaseEngine: postgresql://benji:XXXXX@benji-postgresql-headless:5432/benji
defaultStorage: storage-1
ios:
- configuration:
    cephConfigFile: /etc/ceph/ceph.conf
    clientIdentifier: admin
    newImageFeatures:
    - RBD_FEATURE_LAYERING
    - RBD_FEATURE_EXCLUSIVE_LOCK
    - RBD_FEATURE_STRIPINGV2
    - RBD_FEATURE_OBJECT_MAP
    - RBD_FEATURE_FAST_DIFF
    - RBD_FEATURE_DEEP_FLATTEN
    simultaneousReads: 3
    simultaneousWrites: 3
  module: rbd
  name: kube-pool
storages:
- configuration:
    path: /backup-nfs
  module: file
  name: storage-1
  storageId: 1

allenporter added a commit that referenced this issue Aug 25, 2021
@allenporter
Copy link
Owner Author

allenporter commented Aug 25, 2021

Was able to get it to work by messing with --restore-url-template. I assumed that the scheme should be the rbd module, but really it needs to be prefixed with the storage name, which in my case matches the pool name.

BENJI_LOG_LEVEL=DEBUG benji-restore-pvc --restore-url-template="kube-pool:{pool}/{image}" --pvc-storage-class="rook-ceph-block" home-assistant-home-assistant-config-yqsjn3 home-assistant home-assistant-config-restore --force
$ kubectl get pvc -n home-assistant
NAME                               STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
home-assistant-config-restore      Bound    pvc-09390aaf-ebc4-4c8b-b42d-770bb90705db   5Gi        RWO            rook-ceph-block   48m

I also tried force updating an active PVC and it seemed to work and restore back to the original state.

allenporter added a commit that referenced this issue Aug 25, 2021
allenporter added a commit that referenced this issue Aug 25, 2021
@allenporter
Copy link
Owner Author

Manually running backups in production didn't work due to issues with the CronJob schema version.

Blocking on completing #237 in prod

@allenporter
Copy link
Owner Author

$ kubectl create job --from=cronjob/benji-backup-unifi -n benji test-backup-unifi-2021-08-25-16-45
job.batch/test-backup-unifi-2021-08-25-16-45 created

Looking at the logs, it failed because the database was not initialized:

RuntimeError: benji invocation failed with return code 70 and output: {"event": "RuntimeError: Database schema appears to be empty, it needs to be initialized.", "level": "error", "timestamp": 1629935173.991238, "file": "/benji/lib64/python3.6/site-packages/benji/scripts/benji.py", "line": 370, "function": "main", "process": 76, "thread_name": "MainThread", "thread_id": 140185782875968, "exception": "Traceback (most recent call last):\n File \"/benji/lib64/python3.6/site-packages/benji/scripts/benji.py\", line 357, in main\n func(**func_args)\n File \"/benji/lib64/python3.6/site-packages/benji/commands.py\", line 40, in backup\n with Benji(self.config) as benji_obj:\n File \"/benji/lib64/python3.6/site-packages/benji/benji.py\", line 54, in __init__\n Database.open()\n File \"/benji/lib64/python3.6/site-packages/benji/database.py\", line 986, in open\n migration_needed, current_revision, head_revision = self._migration_needed(alembic_config)\n File \"/benji/lib64/python3.6/site-packages/benji/database.py\", line 953, in _migration_needed\n raise RuntimeError('Database schema appears to be empty, it needs to be initialized.')\nRuntimeError: Database schema appears to be empty, it needs to be initialized."}

That is resolved by manually running benji database-init as described here: https://benji-backup.me/quickstart.html#backup

$ kubectl get pods -n benji benji-maint-56dc5c5cb-prkwb
NAME                          READY   STATUS    RESTARTS   AGE
benji-maint-56dc5c5cb-prkwb   1/1     Running   0          4h6m
$ kubectl exec -it  -n benji benji-maint-56dc5c5cb-prkwb -- benji database-init

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant