Skip to content


Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Latest commit


Git stats


Failed to load latest commit information.
Latest commit message
Commit time

Debate Map

Monorepo for the client, server, etc. of the Debate Map website (


The Debate Map project is a web platform aimed at improving the efficiency of discussion and debate. It's crowd-sourced and open-source, and welcomes reader contributions.

Its primary improvements are (in short):

  • Restructuring dialogue to make use of both dimensions.
  • Breaking down lines of reasoning into single-sentence "nodes".
  • Providing rich tools that operate on those nodes -- such as rating, tagging, statistical analysis, and belief-tree sharing and comparison.

The maps are constructed from "claims" (gray), and "arguments" (green and red) which support/oppose those claims. This structure cuts down on reading time, and lets us focus on the underlying chains of reasoning instead of parsing statement meanings and connections.

For more information, visit the website at:

Associated projects

Society Library

Development of Debate Map is partially supported by The Society Library, which is co-developing a separate infrastructural standard within Debate Map for its use.

Freeform documentation


  • client: Frontend code that runs in the browser; connects to the app-server pod. (and the monitor-backend pod, if the user is an admin) [TypeScript]
  • web-server: Serves the static frontend files for the website -- see "client" package above. (Rust)
  • app-server: Serves database queries and backend commands. (Rust)
  • monitor-client: Frontend code for; see monitor-backend for more info. (TypeScript)
  • monitor-backend: Backend code for, which is meant for admin-related functionality, and has several unique design goals (see here). (Rust)
  • js-common: Code shared between the various JS packages. (TypeScript)
  • deploy: Miscellaneous scripts and such, used in the deployment process.
  • rust-shared: Code shared between the various Rust packages. (Rust)
  • rust-macros: Procedural macros used by other Rust packages. (proc-macros can't be used from the crate they're defined in)

Guide modules

  • Note: The section below is for the "active guide modules" that are likely to be used. Ones unlikely to be used are placed in the file.
  • Tip: You can link someone to a specific guide-module by adding #MODULE_NAME to the end of the url. (eg:
  • Tip: If you want to search the text of collapsed guide-modules, you can either view the readme's source text, or open the dev-tools "Elements" tab and use its ctrl+f search function.

For all/most contributors

Tasks (one-time, or very rare)

[setup-general] General repo setup
  • 1) Ensure NodeJS (v14.13.0+) is installed, as well as Yarn needed for Yarn workspaces.
    • Note: Installation of a new command-line tool generally requires that you restart your terminal/IDE in order for its binaries to be accessible simply by name (assuming the installer has added its folder to the Path environment-variable automatically). So if a step fails due to "Command X is not recognized", check this first. (To save space, this "restart your terminal/IDE before proceeding" note will not be repeated in other guide-modules/steps.)
  • 2) Clone/download this repo to disk. (
  • 3) Install this repo's dependencies by running: yarn install
  • 4) There is an ugly additional step that used to be required here, relating to a messy transition in the NPM ecosystem from commonjs to esm modules. For now, this issue is being worked around in this repo through use of these and these patch files (which are auto-applied by npm/yarn). However, if you get strange webpack/typescript build errors relating to commonjs/esm modules, it's probably related to this issue, which may then require another look at the patch files (or attempting to find a more reliable solution).
  • 5) Copy the .env.template file in the repo root, rename the copy to .env, and fill in the necessary environment-variables. At the moment, regular frontend and backend devs don't need to make any modifications to the new .env file; only backend deployers/maintainers (ie. those pushing changes to the cloud for production) have environment-variables they need to fill in.

If you're looking for a higher-level "quick start" guide, see here: Quick start

[vscode] VSCode window setup

Prerequisite steps: setup-general

It's recommended to split your dev setup into two vscode windows:

  • 1) Window #1 in the Packages folder. Use this window to open files in Packages/client. (opening files in Packages/js-common is also fine)
  • 2) Window #2 in the repo root, for everything else. (server development, deployment, etc.)


  • About half of the development work is done in Packages/client, since it is the "driver" of most changes/functionality. And having the workload split between the two windows (by "area of concern"), helps maintain tab-count sanity, and clarity of where a given file/tab should be located.
  • A separate tasks.json file has been set up for the two folders, optimized for the frontend and backend "areas of concern"; by opening both vscode windows/instances, it's thus faster/easier to complete some guide-modules.

Tasks (occasional)

Tasks (frequent)


[project-service-urls] Project service urls (ie. where app contents are served)


  • localhost:5100: local (k8s), web-server (backend.[forward/tiltUp]_local must be running)
  • localhost:5101: local (webpack), web-server ( must be running)
  • localhost:5110: local (k8s), app-server (backend.[forward/tiltUp]_local must be running)
  • localhost:5120: local (k8s), postgres instance (backend.[forward/tiltUp]_local must be running)
  • localhost:5130: local (k8s), monitor-backend (with web-serving of monitor-client's files) (backend.[forward/tiltUp]_local must be running)
  • localhost:5131: local (webpack), monitor-client (alt web-server) ( must be running)
  • localhost:5140: local (k8s), hyperknowledge server (experimental backend) (backend.[forward/tiltUp]_local must be running)
  • localhost:5141: local (k8s), hyperknowledge postgres (backend.[forward/tiltUp]_local must be running)

Remote (private port-forwards/proxies):

  • localhost:5200: remote (k8s), web-server (backend.[forward/tiltUp]_ovh must be running)
  • localhost:5210: remote (k8s), app-server (backend.[forward/tiltUp]_ovh must be running)
  • localhost:5220: remote (k8s), postgres instance (backend.[forward/tiltUp]_ovh must be running)
  • localhost:5230: remote (k8s), monitor-backend (with web-serving of monitor-client's files) (backend.[forward/tiltUp]_ovh must be running)
  • localhost:5240: remote (k8s), hyperknowledge server (experimental backend) (backend.[forward/tiltUp]_local must be running)
  • localhost:5241: remote (k8s), hyperknowledge postgres (backend.[forward/tiltUp]_local must be running)

Remote (public): [note: the new version of debate-map is not yet served at these endpoints; these are the target urls, however, for when it's ready for public access]

  • remote (k8s), web-server
  • remote (k8s), app-server
  • remote (k8s), monitor-backend (with web-serving of monitor-client's files)

Port-assignment scheme: (ie. meaning of each digit in ABCD)

  • A) app/project [5: debate-map]
  • B) cluster [0: skipped, 1: local, 2: remote] (0 is skipped to avoid clashes with common ports, eg. 5000 for UPnP)
  • C) pod [0: web-server, 1: app-server, 2: postgres instance, 3: monitor, 4: hyperknowledge]
  • D) variant [0: main, 1: served from webpack, etc.]

Note: Not all web-accessible k8s services are shown in the list above. Specifically:

  • Mere "subcomponents" of the monitoring service: grafana, prometheus, alertmanager (Reason: They're accessible through the monitor tool's subpages/iframes. See Domains.ts or for more details.)

For frontend developers (coding UI, etc.)

Tasks (one-time, or very rare)

[dev-enhance] Enhance the local web-server dev experience

Tasks (occasional)

Tasks (frequent)

[run-frontend-local] How to run frontend codebase, for local development

Prerequisite steps: setup-general

  • 1) If this is the first run, or if you've made code changes, run: npm start client.tsc (has vsc-1 task), for the ts->js transpilation (leave running in background)
  • 2) Start the serving of the frontend files. (ie. the js files generated by step 1, along with images and such)
    • 2.1) Option 1, using webpack directly: (faster, and recommended atm)
      • 2.1.1) Run npm start (has vsc-1 task), for the webpack bundle-building (and serving to localhost:5101). (leave running in background)
    • 2.2) Option 2, using the web-server package within k8s: [if going this route, first follow the setup-k8s module]
      • 2.2.1) If this is the first run, or if you've made code changes, build the frontend's webpack bundle into an actual file, in production mode, by running npm start (has vsc-1 task).
      • 2.2.2) Run (in repo root): npm start backend.tiltUp_local
      • 2.2.3) Wait till Tilt has finished deploying everything to your local k8s cluster. (to monitor, press space to open the Tilt web-ui, or s for an in-terminal display)
  • 3) Open the locally-served frontend, by opening in your browser: localhost:5101 (webpack), or localhost:5100 (k8s web-server) (if you want to connect to the remote db, add ?db=prod to the end of the url)

For additional notes on using Tilt, see here: tilt-notes


[tilt-notes] Notes on using Tilt

Prerequisite steps: setup-backend


  • When making changes to files, and with Tilt live-updating the files in the pods, you may occasionally start hitting the error Build Failed: error during connect or Build Failed: [...] Error response from daemon or Get "https://kubernetes.docker.internal:6443/api[...]": net/http: TLS handshake timeout. Not completely sure what causes it (see my SO comment here), but I'm guessing the tilt-updating mechanism is overwhelming Docker Desktop's kubernetes system somehow. To fix:
    • Option 1 (recommended): Completely close Docker Desktop, shutdown WSL2 (wsl --shutdown) [not always necessary], restart Docker Desktop, then rerun npm start backend.tiltUp_local.
    • Option 2 (sometimes fails): Right click the Docker Desktop tray-icon and press "Restart Docker".
  • Manually restarting the "pgo" resource will clear the database contents! Use with caution.

For backend developers (coding app-server, web-server, etc.)

Tasks (one-time, or very rare)

[setup-backend] Setting up base tools needed for local/remote k8s deployments


  • 1) Install Rust via the rustup toolkit:
    • 1.1) If using VSCode, it's highly recommended to install the Rust Analyzer extension.
  • 2) Install Tilt: (I'm currently on version 0.30.13)
    • 2.1) If the tilt binary was not already added to your Path environment variable (depends on install path), do so.
  • 3) Install Helm (used during k8s deployment), v3.10.3+:
    • 3.1) On Windows, recommended install steps:
      • 3.1.1) Install Chocolatey. (if choco command not already present)
      • 3.1.2) Run: choco install kubernetes-helm
  • 3) Install a Docker container system.
    • 3.1) If on Windows, you'll first need to install WSL2. For the simple case, this involves...
      • 3.1.1) Run wsl --install, restart, wait for WSL2's post-restart installation process to complete, then enter a username and password (which is probably worth recording).
      • 3.1.2) It is highly recommended to set memory/cpu limits for the WSL system (as seen here), otherwise it can (and likely will) consume nearly all of your device's resources.
    • 3.2) Before installing your Docker container system, make sure the version you're installing is compatible with Debate Map's requirements. Currently, the repo is developed on machines with v1.24.2 (as part of Docker Desktop 4.11.0) and Kubernetes v1.25.2 (as part of Docker Desktop 4.15.0), so it's recommended to install one of those versions (preferably the newer one).
    • 3.3) On Windows and Mac, this means installing Docker Desktop (see step 3.2 above for recommended install link).
    • 3.4) On Linux, it's also recommended to install Docker Desktop (see step 3.2 above for recommended install link). (installing Docker Engine on its own is apparently also possible, though not recommended, since these docs are written assuming Docker Desktop is installed)

Highly recommended: (frontend devs can skip, if setting up a minimal local backend)

  • 1) Install Lens, a very handy, general-purpose k8s inspection tool.
  • 2) Install DBeaver, a ui tool for viewing/modifying postgresql databases.

Additional tools: (frontend devs can skip)

[setup-psql] Setting up the psql tool

Note: While installation of the psql tool on your host machine should not strictly be necessary (since there is an instance of it that can be accessed through some postgres-related docker containers), it is best to install it for more ergonomic usage: many of the helper scripts rely on it, and having it on your host machine makes it easier to use certain features, such as execution of .sql files present only on the host machine (eg. for when running the init-db and seed-db scripts).


  • 1) First, make a note of which major version of Postgres you need. This should be Postgres v13 (unless this step has become outdated); to confirm, you can run npm start ssh.db, then in that shell run psql --version.
  • 2) Next, download/install the package containing the psql binary. This means either...
  • 3) Ensure the psql binary is added to your Path environment-variable.
[setup-k8s] Setting up local k8s cluster (recommended route)

Prerequisite steps: setup-backend

There are multiple ways to set up a local Kubernetes cluster, but this guide-module assumes you'll be using the recommended option of Docker Desktop. If for some reason you instead want to use K3d, Kind, etc., see the setup-k8s-alt module.

  • 1) Create your Kubernetes cluster in Docker Desktop, by checking "Enable Kubernetes" in the settings, and pressing apply/restart.

To delete and recreate the cluster, use the settings panel.

After steps

  • 1) Create an alias/copy of the k8s context you just created, renaming it to "local":
    • 1.1) For Docker Desktop, this means:
      • 1.1.1) Open: $HOME/.kube/config
      • 1.1.2) Find the section with these contents:
       - context:
           cluster: docker-desktop
           user: docker-desktop
         name: docker-desktop
      • 1.1.3) Copy that section and paste it just below, changing the copy's name: docker-desktop to name: local.
  • 2) [opt] To make future kubectl commands more convenient, set the context's default namespace: kubectl config set-context --current --namespace=app


  • 1) If on Windows, your dynamic-ports range may start out misconfigured, which will (sometimes) cause conflicts with attempted port-forwards (from your Kubernetes pods to your localhost ports). See here for the fix. (worth checking ahead of time on Windows, as it wasted considerable time for me)
  • 2) If your namespace gets messed up, delete it using this (regular kill command gets stuck): npm start "backend.forceKillNS NAMESPACE_TO_KILL"
    • 2.1) If that is insufficient, you can either:
      • 2.1.1) Help the namespace to get deleted, by editing its manifest to no longer have any "finalizers", as shown here.
      • 2.1.2) Reset the whole Kubernetes cluster. (eg. using the Docker Desktop UI)
  • 3) When the list of images/containers in Docker Desktop gets annoyingly long, see the docker-trim module.
[setup-k8s-alt] Setting up local k8s cluster (using alternative k8s systems)

Prerequisite steps: setup-backend

There are multiple ways to set up a local Kubernetes cluster, with the recommened route being to use Docker Desktop, as described in the setup-k8s module. This module is for if you're certain you want to use an alternative like K3d or Kind.

Alternative options:

  • K3d
  • Kind


  • Docker Desktop has the advantage of not needing built docker-images to be "loaded" into the cluster; they were built there to begin with. This can save a lot of time, if full builds are slow. (for me, the deploy process takes ~3m on K3d, which Docker Desktop cuts out completely)
  • K3d has the fastest deletion and recreation of clusters. (so restarting from scratch frequently is more doable)
  • Docker Desktop seems to be the slowest running; I'd estimate that k3d is ~2x, at least for the parts I saw (eg. startup time).
  • Docker Desktop seems to have more issues with some networking details; for example, I haven't been able to get the node-exporter to work on it, despite it work alright on k3d (on k3d, you sometimes need to restart tilt, but at least it works on that second try; with Docker Desktop, node-exporters has never been able to work). However, it's worth noting that it's possible it's (at least partly) due to some sort of ordering conflict; I have accidentally had docker-desktop and k3d and kind running at the same time often, so the differences I see may just be reflections of a problematic setup.

Setup for K3d

  • 1) Download and install from here:
  • 2) Create a local registry: k3d registry create reg.localhost --port 5000
  • 3) Create a local cluster: k3d cluster create main-1 --registry-use k3d-reg.localhost:5000 (resulting image will be named k3d-main-1)
  • 4) Add an entry to your hosts file, to be able to resolve reg.localhost:
    • 4.1) For Windows: Add line k3d-reg.localhost to C:\Windows\System32\Drivers\etc\hosts.
    • 4.2) For Linux: Add line k3d-reg.localhost to /etc/hosts. (on some Linux distros, this step isn't actually necessary)

To delete and recreate the cluster: k3d cluster delete main-1 && k3d cluster create main-1

Setup for Kind

To delete and recreate the cluster: kind delete cluster --name main-1 && kind create cluster --name main-1

After steps and troubleshooting

  • For this info, open the setup-k8s module, and read through the "After steps" and "Troubleshooting" sections.
[continuous-profiling] How to set up continuous profiling of the NodeJS pods

We used to use NewRelic to try to do this, but that was cancelled. Tooling to use for this is "to be decided".

Tasks (occasional)

[pgo-required-updates] PGO (crunchydata postgres) required updates

The cluster's database is an instance of the CrunchyData Postgres Operator, v5; we use various docker images that they provide, under the path. However, they apparently do not keep those images up forever, meaning that at some point, updating to a new version is required (eg. since new devs/environments would then be unable to pull the images themselves).

To update only the "postgres" image/component within the pgo package (this is usually all that's needed):

  • 1) Take a look at the file Packages/deploy/PGO/install/values.yaml to see what postgres version-images are available; if there is a new major-version you can jump to (that's not dropped yet), targeting that should solve the issue. (of the postgres image the cluster had been using being dropped from the crunchydata registry)
  • 2) If doable, create a copy of the new postgres-version image in a private-registry (to avoid it being dropped in the future), and use that as the target rather than the url noted above:
    • IMPORTANT NOTE: Using a private-registry mirror of the postgres image is not yet fully figured out, because I don't yet know how to provide the authentication data to the production cluster for it to be able to read from the private registry. There is a CrunchyData guide here which presumably can make this work. But for now, I'll just use the official image -- but now using the postgres v15 image (since that one is not dropped from the crunchydata registry, and should not be dropped for another year or so).
    • 2.1) Ensure a copy of the target postgres image is stored in your private registry, by pasting the target version's image-url into the MirrorTransientImages.js file, then running it: node ./Scripts/Docker/MirrorTransientImages.js
    • 2.2) Now open the Packages/deploy/PGO/install/values.yaml file again, and update the relatedImages.postgres_XX.image field to match the url of the private-registory mirror of the image. (so it's getting dropped from the main registry does not cause problems for us later)
    • 2.3) If we're still needing to use the "large postgres images" fix, update the docker-pull command in to match the private-registry image-url.
  • 3) Now we need to update the postgres-cluster's data-folder to work with the new version of postgres:
    • 3.1) Option 1: Run the postgres major-version update process that CrunchyData outlines here (recommended [ideally anyway; it failed last attempt in the production cluster, forcing Option 2]):
      • 3.1.1) Notes for its step 1:
        • It's recommended to at least make a logical backup at this point. (see: pg-dump)
      • 3.1.2) Notes for its step 2:
        • In the first code-block (the yaml for the PGUpgrade), make sure you set:
          • meta.namespace: postgres-operator (alternately, you could do this as part of the kubectl apply command, but better to have it in the git-tracked file itself imo)
          • <name matching the pattern pg-upgrade-13-to-15>
        • The contents of that first code-block should be saved to a file in the Packages/deploy/PGO/@Operations folder, with the filename matching the field. (eg. pg-upgrade-13-to-15.yaml)
        • It doesn't say how to deploy the listed yaml to the cluster; you could add it to the tilt scripts, or run kubectl apply directly. For the latter, run (from repo root): kubectl apply -f Packages/deploy/PGO/@Operations/<file name from prior step> (make sure you have Docker Desktop targeting the correct cluster first, ie. local or remote)
        • It is now expected that, when viewing/"editing" the newly-added pg-upgrade-XXX object in Lens (under the "Custom Resources" category), you'll see the text (near the bottom): message: PostgresCluster instances still running
      • 3.1.3) Notes for its step 3:
        • In our case, the cluster-annotating command to run is: kubectl -n postgres-operator annotate postgrescluster debate-map"pg-upgrade-13-to-15"
        • When it says to shut down the cluster, in our case it means modifying the Packages/deploy/PGO/postgres/values.yaml file to have shutdown: true, then applying it (ie. by saving the file, with Tilt running; you could also use Lens to change the field directly, which is faster but requires more care with field-placement).
      • 3.1.4) Notes for its step 4:
        • If you hit an error about the pg15 (or whatever your target/new-pg-version is) directory already existing, this may be due to a prior (presumably failed) upgrade attempt. To fix this, delete the pg-upgrade object (pod will be dropped with it), comment out the shutdown: true line again, wait for the pgo X-instance1-X pod to start up again, then SSH into that pod, and delete/rename the given directory; then you can redo the relevant steps to get to step 3.1.4 again.
      • 3.1.5) Notes for its step 5:
        • Once you've confirmed the upgrade has completed successfully, it's recommended to remove the PGUpgrade object (I prefer to do this using Lens, with it shown under the "Custom Resources" category).
      • 3.1.6) Notes for its step 6:
        • For its vacuumdb command, I went with the first option (vacuumdb --all --analyze-in-stages), and ran it in the shell for the debate-map-instance1-XXX pod.
        • You could also do cleanup/removal of the old data-folder (eg. pg13) by following the given information (in the same instance1 pod above), but I wouldn't bother unless disk-space is running low. (I don't believe those old folders become part of either the physical or logical automated backups, so the impact is minimal)
    • 3.2) Option 2: Reset the database storage directory and restore a logical backup of the database.
      • 3.2.1) Shut down the postgres cluster: modify Packages/deploy/PGO/postgres/values.yaml to have shutdown: true, or use Lens to apply this change directly. (this step is maybe not necessary, but better safe than sorry for now)
      • 3.2.2) Change the postgresVersion field of Packages/deploy/PGO/postgres/values.yaml to the target/new version.
      • 3.2.3) Reset the database data-folder, as well as the pgbackrest data and repo folders:
        • Option 1: By nuking the persistent-volume-claims (recommended atm).
          • WARNING: This will destroy your in-cluster copy of the database contents. Only do this if you're sure you have external backups, and you want to fully reset the pgo data/storage.
          • Use Lens to destroy the persistent-volume-claims named debate-map-instance1-XXXX-pgdata and debate-map-repo1.
          • That's it for now... (proceed to step 3.2.4, then in step 3.2.5, we do an additional step to complete the resetting, followed by the restore of the logical-backup)
        • Option 2: By modifying the persistent-volume-claims to rename the relevant folders.
          • Rename the folders within pgdata:
            • We need a way to modify the persistent-volume-claim used by postgres, from a stable pod that will not keep restarting. To do this, deploy the Packages/deploy/PGO/@Operations/@explore-pvc_debate-map-instance1-XXX-pgdata.yaml file to the cluster. This will create a busybox pod that we can SSH into.
            • In the explore-pvc/busybox pod, run cd /mnt/volume1, then:
              • Run: mv pgXX pgXX_vdisabled1 (do this for all XX/versions present, both old and new)
              • Run: mv pgXX_wal pgXX_wal_vdisabled1. (do this for all XX/versions present, both old and new)
              • Run: mv pgbackrest pgbackrest_vdisabled1
          • Rename the folders within the pg repo folder (forget exact name, but used by repo1 pod):
            • Similar steps to the above, except for the repo1 persistent-volume-claim.
      • 3.2.4) Start up the postgres cluster again (do the opposite of step 3.2.1). Also, if you haven't already, close and start/restart the tilt-up process.
      • 3.2.5) You'll also need to get pgo to fully forget about its old repo-data (not sure how it is still accessing it, but this step was found necessary; tbh step 3.2.3 is probably not necessary if this step is done): With tilt running again, press the "Trigger update" button on the pgo and pgo_crd-definition resources; keep doing this (alternating between them every several seconds, eg. 3+ times), until the log-messages do not show errors, and the pgo resource shows in its logs messages like trying to bootstrap a new cluster, initialized a new cluster, and finally the "success" message of INFO: no action. I am (debate-map-instance1-XXXX-0), the leader with the lock. At this point, it should be a new pgo cluster that no longer tries to restore data from old/now-invalid physical backups.
      • 3.2.6) Restore the logical-backup, by following the "To restore a backup" section in guide-module: pg-dump

If you are doing an update of the entire postgres-operator package, here are some notes on it:

  • We are using the helm-based installation approach rather than kustomize-based one, since this way updates are less complicated/painful.
  • The official instructions for an initial install are here.
  • To do an update of the postgres-operator:
    • 1) If the update process will involve a major-version upgrade of postgres, you'll need to decide if you want to reset the cluster then restore a logical-backup, or instead use the CrunchyData cluster-upgrade tool. The latter is recommended if there is a postgres-version that is supported by both the old and new pgo packages. If doing the latter (the cluster-upgrade tool), then the first step is updating just the postgres data-folder, by following the steps in the section above (through path of step 4.1). If doing the former (cluster reset followed by logical-backup), then instead use the path through step 4.2, and do that data-restore after the steps below.
    • 2) Do a fresh clone of the examples repo (referenced in the helm-based install instructions above).
    • 3) Make a temporary copy (placed somewhere outside dm's git repo) of the files under Packages/deploy/PGO that were "customized" from their initial contents. Currently, this means:
      • 3.1) File: Packages/deploy/PGO/postgres/values.yaml
    • 4) Replace the contents of the Packages/deploy/PGO folder with the contents of the helm folder, in the examples repo from step 1.
    • 5) For each non-commented key in your backup of postgres/values.yaml (all keys except postgresVersion started commented), paste it into the appropriate location in the new PGO/postgres/values.yaml file.
[image-inspect] Docker image/container inspection

Prerequisite steps: setup-backend


  • Make a shortcut to \\wsl$\docker-desktop-data\version-pack-data\community\docker\overlay2; this is the path you can open in Windows Explorer to view the raw files in the docker-built "layers". (ie. your project's output-files, as seen in the docker builds)
  • Install the Docker "dive" tool (helps for inspecting image contents without starting container):
  • To inspect the full file-contents of an image: docker save IMAGE_NAME -o ./Temp/output.tar (followed by extraction, eg. using 7-zip)
[docker-trim] Docker image/container trimming

Prerequisite steps: setup-backend

  • 1) When the list of images in Docker Desktop gets too long, press "Clean up" in the UI, check "Unused", uncheck non-main-series images, then press "Remove". (run after container-trimming to get more matches)
  • 2) When the list of containers in Docker Desktop gets too long, you can trim them using a Powershell script like the below: (based on:
$containers = (docker container list -a).Split("`n") | % { [regex]::split($_, "\s+") | Select -Last 1 }
$containersToRemove = $containers | Where { ([regex]"^[a-z]+_[a-z]+$").IsMatch($_) }

# it's recommended to delete in batches, as too many at once can cause issues
$containersToRemove = $containersToRemove | Select-Object -First 30

foreach ($container in $containersToRemove) {
	# sync/wait-based version (slow)
	# docker container rm $container

	# async/background-process version (fast)
	Start-Process -FilePath docker -ArgumentList "container rm $container" -NoNewWindow
[k8s-ssh] How to ssh into your k8s pods (web-server, app-server, database, etc.)
  • For web-server: npm start ssh.web-server
  • For app-server: npm start
  • For database: npm start ssh.db
  • For others: kubectl exec -it $(kubectl get pod -o name -n NAMESPACE -l LABEL_NAME=LABEL_VALUE) -- bash

Note: If you merely want to explore the file-system of a running pod, it's recommended to use the Kubernetes Pod File System Explorer VSCode extension, as it's faster and easier. For editing files, see here: sandipchitale/kubernetes-file-system-explorer#4

[pod-quick-edits] How to modify code of running pod quickly

Update 2022-12-24: Quick-syncing is no longer being used atm. (the nodejs backend pods, where it had been useful, were retired)

  • 1) Tilt is set up to quickly synchronize changes in the following folders: .yalc, Temp_Synced, Packages/js-common
  • 2) If you want to quickly synchronize changes to an arbitrary node-module (or other location), do the following:
    • 2.1) Copy the node-module's folder, and paste it into the Temp_Synced folder.
    • 2.2) Open a shell in the target pod. (see k8s-ssh)
    • 2.3) Create a symbolic link, such that the target path now points to that temp-folder: ln -sf /dm_repo/Temp_Synced/MODULE_NAME /dm_repo/node_modules
    • 2.4) To confirm link was created, run: ls -l /dm_repo/node_modules/MODULE_NAME
    • Note: These symlinks will be cleared whenever yarn install is run again in the pod. (eg. if your app's package.json is changed)

Tasks (frequent)

[reset-db-local] How to init/reset the database in your local k8s cluster

Prerequisite steps: setup-k8s, setup-psql

  • 1) If there already exists a debate-map database in your local k8 cluster's postgres instance, "delete" it by running: npm start "db.demoteDebateMapDB_k8s local"
    • 1.1) For safety, this command does not technically delete the database; rather, it renames it to debate-map-old-XXX (with XXX being the date/time of the rename). You can restore the database by changing its name back to debate-map. To find the modified name of the database, run the query: SELECT datname FROM pg_database WHERE datistemplate = false; (to connect to the postgres server in order to run this query, run: npm start "db.psql_k8s local db:postgres")
  • 2) Run: npm start "db.initDB local" (or manually: connect to postgres server/pod and apply the ./Scripts/InitDB/@InitDB.sql script)
  • 3) Run: npm start "db.seedDB local" (or manually: connect to postgres server/pod and apply the ./Scripts/SeedDB/@SeedDB.sql script)
    • 3.1) If you get an error, changes may have been made to the expected database structure, with it being forgotten to update the GenerateSeedDB.ts code (or to regenerate its @SeedDB.sql output script). Open the Scripts\SeedDBGenerator\GenerateSeedDB.ts file, check for TypeScript errors, fix any you see, then run npm start "db.seedDB_freshScript local".
[run-backend-local] How to run backend codebase, for local development

Prerequisite steps: setup-k8s

  • 1) If this is the first run, or if changes were made to the client or monitor-client web/frontend codebases, run the relevant js-building and js-bundling script(s): [npm start client.tsc and npm start] and/or [npm start monitorClient.tsc and npm start] (has vsc-2 tasks)
  • 2) Launch the backend pods necessary for the behavior you want to test:
    • 2.1) Option 1, by launching the entire backend in your local k8s cluster: (recommended)
      • 2.1.1) If you have made any changes to dependencies that the backend uses, ensure the Others/yarn-lock-for-docker.lock file is up-to-date, by running: npm start backend.dockerPrep (has vsc-2 task)
      • 2.1.2) If your docker/kubernetes system is not active yet, start it now. (eg. on Windows, launching Docker Desktop from the start menu)
      • 2.1.3) Run (in repo root): npm start backend.tiltUp_local
      • 2.1.4) Wait till Tilt has finished deploying everything to your local k8s cluster. (to monitor, press space to open the Tilt web-ui, or s for an in-terminal display)
    • 2.2) Option 2, by launching individual pods/components directly on your host machine: (arguably simpler, but not recommended long-term due to lower reliability for dependencies, eg. platform-specific build hazards and versioning issues)
      • 2.2.1) Start app server (if needed): cd Packages/app-server; cargo run (not yet tested)
      • 2.2.2) Start web server (if needed): cd Packages/web-server; cargo run (not yet tested)
        • As an alternative to starting the web server pod, you can try an alternative (webpack-based serving) described in the run-frontend-local module.
    • Note: If changes were made that require changes to the db schema, you may hit errors on app-server startup. To resolve this, you can either reset your local database (see: #reset-db-local), or write/run a database migration (see: #db-migrate).
  • 3) Backend should now be up and running. You can test the deployment by opening the main web frontend (eg. localhost:[5100/5101]), or interacting with one of the pages served by another pod (eg. the graphiql page at localhost:5110/graphiql).

For additional notes on using Tilt, see here: tilt-notes


For backend deployers/maintainers

Tasks (one-time, or very rare)

[cloud-project-init] Cloud-projects initialization (eg. creating Google Cloud project for Pulumi to work within)

Note: We use Google Cloud here, but others could be used.

  • 1) Ensure you have a user-account on Google Cloud Platform:
  • 2) Install the Google Cloud SDK:
  • 3) Authenticate the gcloud sdk/cli by providing it with the key-file for a service-account with access to the project you want to deploy to.
    • 3.1) For the main Google Cloud project instance, you'll need to be supplied with the service-account key-file. (contact Venryx)
    • 3.2) If you're creating your own fork/deployment, you'll need to:
    • 3.3) Move (or copy) the JSON file to the following path: Others/Secrets/gcs-key.json (if there is an empty file here already, it's fine to overwrite it, as this would just be the placeholder you created in the setup-k8s module)
    • 3.4) Add the service-account to your gcloud-cli authentication, by passing it the service-account key-file (obtained from step 3.1 or gcloud auth activate-service-account FULL_SERVICE_ACCOUNT_NAME_AS_EMAIL --key-file=Others/Secrets/gcs-key.json
    • 3.5) Add the service-account to your Docker authentication, in a similar way: Get-Content Others/Secrets/gcs-key.json | & docker login -u _json_key --password-stdin (if you're using a specific subdomain of GCR, eg. or, fix the domain part in this command)
[pulumi-init] Pulumi initialization (provisioning GCS bucket, container registry, etc.)

Prerequisite steps: cloud-project-init

Note: We use Google Cloud here, but others could be used.

  • 1) Install the Pulumi cli:
  • 2) Ensure that a Pulumi project is set up, to hold the Pulumi deployment "stack".
    • 2.1) Collaborators on the main release can contact Stephen (aka Venryx) to be added as project members (you can view it online here if you have access).
    • 2.2) If you're creating your own fork/deployment:
      • 2.2.1) Create a new Pulumi project here. Make sure your project is named debate-map, so that it matches the name in Pulumi.yaml.
  • 3) Run: npm start pulumiUp (pulumi up also works, if the last result of npm start backend.dockerPrep is up-to-date)
  • 4) Select the stack you want to deploy to. (for now, we always deploy to prod)
  • 5) Review the changes it prepared, then proceed with "yes".
  • 6) After a bit, the provisioning/updating process should complete. There should now be a GCS bucket, container registry, etc. provisioned, within the Google Cloud project whose service-account was associated with Pulumi earlier.
  • 7) If the deploy went successfully, a PulumiOutput_Public.json file should be created in the repo root. This contains the url for your image registry, storage bucket, etc. The Tiltfile will insert these values into the Kubernetes YAML files in various places; to locate each of these insert points, you can search for the TILT_PLACEHOLDER: prefix.
[ovh-init] OVH initialization (provisioning remote kubernetes cluster)

Note: We use OVHCloud's Public Cloud servers here, but others could be used.

  • 1) Create a Public Cloud project on OVH cloud. (in the US, is recommended for their in-country servers)
  • 2) Follow the instructions here to setup a Kubernetes cluster:
    • 2.1) In the "node pool" step, select "1". (Debate Map does not currently need more than one node)
    • 2.2) In the "node type" step, select an option. (cheapest is Discovery d2-4 at ~$12/mo, but I use d2-8 at ~$22/mo to avoid occasional OOM issues)
  • 3) Run the commands needed to integrate the kubeconfig file into your local kube config.
  • 4) Create an alias/copy of the "kubernetes-admin@Main_1" k8s context, renaming it to "ovh". (open $HOME/.kube/config, copy the aforementioned context section, then change the copy's name to ovh)
  • 5) Add your Docker authentication data to your OVH Kubernetes cluster.
    • 5.1) Ensure that your credentials are loaded, in plain text, in your docker config.json file. By default, Docker Desktop does not do this! So most likely, you will need to:
      • 5.1.1) Disable the credential-helper, by opening $HOME/.docker/config.json, and setting the credsStore field to an empty string (ie. "").
      • 5.1.2) Log in to your image registry again. (ie. rerun step 3.5 of cloud-project-init)
      • 5.1.3) Submit the credentials to OVH: kubectl --context ovh create secret --namespace app generic registry-credentials --from-file=.dockerconfigjson=PATH_TO_DOCKER_CONFIG (the default path to the docker-config is $HOME/.docker/config.json, eg. C:/Users/YOUR_USERNAME/.docker/config.json)
    • 5.1) You can verify that the credential-data was uploaded properly, using: kubectl --context ovh get --namespace default -o json secret registry-credentials (currently we are pushing the secret to the default namespace, as that's where the web-server and app-server pods currently are; if these pods are moved to another namespace, adjust this line accordingly)
[dns-setup] How to set up DNS and CDN (if creating own fork/deployment)

Note: We use Cloudflare here, but others could be used.

  • 1) If not done already, update the domain-names in the code and k8s YAML files (eg. dmvx-ingress.yaml) to point to your chosen domain-names.
  • 2) Create a Cloudflare account, and start the add-website process on it. Follow the instructions for basic setup (using the defaults, unless otherwise specified).
    • 2.1) On your domain registrar manager/website, make sure that you configure Cloudflare as the DNS Name Servers.
    • 2.2) On Cloudflare, make-so it has the following dns-records set:
      • 2.2.1) {type: "CNAME", target: "<ovh kubernetes cluster host-name>", name: "*"}
      • 2.2.2) {type: "CNAME", target: "<ovh kubernetes cluster host-name>", name: "<your domain name, eg.>"}
    • Note: This should be set by default, but if not, enable the "SSL/TLS" -> "Edge Certificates" -> "Always Use HTTPS" option. (seems to not really be necessary, presumably because Traefik doesn't respond for non-https requests so Chrome retries with https automatically, but good practice)
  • 3) Set up a redirect from www.YOUR_DOMAIN.YOUR_TLD to YOUR_DOMAIN.YOUR_TLD. (using the Rules section, as seen here)
[oauth-setup] How to set up oauth

In order to use the oauth options for sign-in (eg. Google Sign-in), the frontend either must be running on localhost:[5100/5101], or you have to create your own online "application" configs/entries on each of the oauth-providers' platforms. The below instructions are for creating those "application" configs/entries. (replace the domains with your own, of course)

Google Sign-in:

Authorized JavaScript Origins:
* http://localhost
* http://localhost:5100
* http://[::1]:5100
* http://localhost:5101
* http://[::1]:5101

Authorized redirect URIs:
* http://localhost:5110/auth/google/callback
* http://[::1]:5110/auth/google/callback
* https://app-server.CLUSTER_IP_IN_CLOUD/auth/google/callback

Tasks (occasional)

[k8s-monitors] Various commands/info on monitoring system (prometheus, etc.)
  • To open a bash shell in the main prometheus pod: kubectl exec -it prometheus-k8s-[0/1] -n monitoring -- sh (or just use Lens)
  • To view the Grafana monitor webpage, open: localhost:[3405/4405] (3405 for local, 4405 for remote, if using Tilt; if not, manually launch using Lens)

    The page will ask for username and password. On first launch, this will be admin and admin.

    The Grafana instance has been preconfigured with some useful dashboards, which can be accessed through: Dashboards (in sidebar) -> Manage -> Default -> [dashboard name]. You can import additional plugins/dashboards from the Grafana plugin library and dashboard library.

  • To view the Prometheus monitor webpage, open the k8s cluster in Lens, find the prometheus service, then click it's "Connection->Ports" link.

    The page will ask for username and password. On first launch, this will be admin and admin.

  • To view the cAdvisor monitor webpage [not currently working/enabled], open the k8s cluster in Lens, find the cadvisor service, then click it's "Connection->Ports" link.
  • To view cpu and memory usage for pods using k8s directly (no external tools), run: kubectl top pods --all-namespaces (for additional commands, see here)
[port-forwarding] How to set up port-forwarding for your k8s db, etc.

For database pod:

  • 1) If you have tilt running, a port-forward should already be set up, on the correct port. (5120 for your local cluster, and 5220 for your remote cluster)
  • 2) You can also set up the port-forwarding by running the script (has vsc-2 tasks): npm start backend.forward_[local/remote] (to only port-forward the db pod, add arg: onlyDB)
[k8s-psql] How to connect to postgres in your kubernetes cluster, using psql

Approach 1: (by ssh'ing directly in the k8s pod)

  • 1) Run: npm start "ssh.db [local/ovh]"
  • 2) Run (in vm shell that opens): psql
  • 3) The shell should now have you logged in as the postgres user.

Approach 2: (by using external psql with port-forwarding; requires that PostgreSQL be installed on your host computer)

  • 1) Set up a port-forward from localhost:[5120/5220] to your k8s database pod. (see: port-forwarding)
  • 2) Run: npm start "db.psql_k8s [local/ovh]"
  • 3) The shell should now have you logged in as the admin user.
[k8s-view-pg-config] How to view various postgres config files in the kubernetes cluster

To view the pg config files postgresql.conf, pg_hba.conf, etc.:

  • 1) Run: kubectl exec -it $(kubectl get pod -n postgres-operator -o name -l, -- bash
  • 2) Run (in new bash): cat /pgdata/pg13/XXX
[db-migrate] Database migrations

Old overview:

New steps:

  • 1) Write a KnexJS script that modifies the db contents to match the new desired shape. (using native PG commands, for fast execution)
    • 1.1) Make a copy of the latest migration in Knex/Migrations, and give it an appropriate name.
    • 1.2) Write the migration code. (reference the older migration scripts to see patterns used)
  • 2) Enable a flag on the main debate-map database, which makes it read-only, and displays an explanation message to users.
    • 2.1) Using DBeaver, create/modify the single row in the globalData table, setting extras.dbReadOnly to true.
    • 2.2) If you want to customize the message that is shown to the users, set/modify the extras.dbReadOnly_message field. (default: Maintenance.)
  • 3) Create a copy of the database, named debate-map-draft.
    • 3.1) Run: TODO
  • 4) Execute the migration script against the draft copy of the database.
    • 4.1) Run: TODO
  • 5) Confirm that the draft database's contents are correct.
    • 5.1) Open the (locally-served) new frontend's code, connecting to the draft database (by adding the ?db=prod-draft flag to the url -- not yet implemented), and confirm that things work correctly.
    • 5.2) You could also connect to the draft database using a tool like DBeaver, and confirm that the contents look correct there.
  • 6) Demote the main debate-map database. (ie. renaming it to debate-map-old-XXX)
    • 6.1) Run: npm start "db.demoteDebateMapDB_k8s ovh"
  • 7) Promote the draft debate-map-draft database. (ie. renaming it to debate-map)
    • 7.1) Run: npm start "db.promoteDebateMapDraftDB_k8s ovh" [not yet implemented]
  • 8) Disable the dbReadOnly flag in the globalData table. (see step 2)

Tasks (frequent)

[k8s-remote] How to deploy web+app server packages to remote server, using docker + kubernetes

Prerequisite steps: pulumi-init, ovh-init

  • 1) If changes were made to the client or monitor-client web/frontend codebases (or you've never run these build commands before), run the relevant js-building and js-bundling script(s): [npm start client.tsc and npm start] and/or [npm start monitorClient.tsc and npm start] (has vsc-2 tasks)
  • 2) Run: npm start backend.tiltUp_ovh
  • 3) Wait till Tilt has finished deploying everything to your local k8s cluster. (to monitor, press space to open the Tilt web-ui, or s for an in-terminal display)
  • 4) Verify that the deployment was successful, by visiting the web-server: http://CLUSTER_URL:5200. (replace CLUSTER_URL with the url listed in the OVH control panel)
  • 5) If you haven't yet, initialize the DB, by following the steps in reset-db-local -- except replacing the local context listed in the commands with ovh.
  • 6) You should now be able to sign in, on the web-server page above. The first user that signs in is assumed to be one of the owner/developer, and thus granted admin permissions.

For additional notes on using Tilt, see here: tilt-notes

[k8s-troubleshooting] How to resolve various k8s-related issues
  • 1) In some cases, when pushing a new pod version to your k8s cluster, the pod will fail to be added, with the message 0/1 nodes are available: 1 node(s) had taint { }, that the pod didn't tolerate.
    • 1.1) You can manually remove the taint by running (as seen here): kubectl taint node <nodename> 1.1.1) Update: This didn't actually seem to work for me. Perhaps k8s is instantly re-applying the taint, since it's based on a persistent memory shortage? Anyway, currently I just wait for the memory shortage to resolve (somehow). 1.1.2) For now, another workaround that seems to help (from a couple tries), is opening pod-list in Lens, searching for all pods of the given type, selecting-all, then removing/killing all. 1.1.3) Another partial workaround seems to be to use Lens->Deployment, set Scale to 0, wait till entry updates, then set Scale to 1 again; in a couple cases this seemed to resolve the taint issue (maybe just coincidence though).
  • 2) If you get the error "Unable to attach or mount volumes: unmounted volumes [...]" (in my case, after replacing a 4gb node-pool with an 8gb one), the issue may be that the stale persistent-volume-claims requested by the old nodes are still sticking around, causing new claims for the new node to not get created (issue described here). To fix this:
    • 2.1) Run npm start backend.tiltDown_ovh.
    • 2.2) Tilt-down appears to not delete everything, so complete the job by using Tilt to manually delete anything added by our project: basically everything except what's in the kube-node-lease, kube-public, and kube-system namespaces.
      • 2.2.1) Regular deletion (eg. through the Lens UI) works fine for the following found leftovers: stateful sets, config maps, secrets, and services.
      • 2.2.2) For leftover namespaces: this deadlocks for me, seemingly due to the postgres-operator CRD having a deadlock occuring during its "finalizer", as described here (causing its postgres-operator namespace to stick around in a bad "terminating" state). See here to confirm what resources underneath that namespace are causing it to stick around, and then follow the steps below (assuming it's the CRD and/or PV/PVCs) to remove them, then the deadlocked namespace deletion task itself should complete.
      • 2.2.3) For the postgres-operator CRD, edit the manifest (eg. using the Lens UI's "Edit" option) to have its "finalizers" commented out, then delete like normal.
      • 2.2.4) For the persistent-volumes and persistent-volume-claims, due the same thing: comment out its "finalizers", then delete like normal.
    • 2.3) Rerun the tilt-up script.
    • 2.4) EDIT: After doing the above, the issue still remains :(. Based on my reading, the above "should" fix it, but it hasn't. For now, I'm resolving this issue by just completely resetting the cluster. (with "Computing nodes" option set to "Keep and reinstall nodes" -- the "Delete nodes" option appears to not be necessary)
[pg-dump] Basic backups using pg_dump (type: logical, focus: local-storage)

To create a backup:

  • 1) Option 1, using basic script:
    • 1.1) Run: npm start backend.makeDBDump (has vsc-2 tasks)
    • 1.2) A backup dump will be created at: ../Others/@Backups/DBDumps_[local/ovh]/XXX.sql
  • 2) Option 2, using DBeaver:
    • 2.1) Right-click DB in list. (this assumes you already are connected)
    • 2.2) Press Tools->Backup, select "app", press Next, set format to "Tar", and press Start.

To restore a backup:

  • 1) It's recommended to rename the existing app schema to app_old or the like, before restoring a backup file.
  • 2) Before restoring the backup file, make sure the rls_obeyer role is created, by executing the create role "rls_obeyer"... section in General_End.sql.
  • 3) Do some cleanup of the backup file: (pgdump is not perfect, and can output some lines that fail to restore as-is)
    • 3.1) If present, comment out the following line near the end of the file (pg_stat_statements table may not be created/populated yet): GRANT SELECT,INSERT,DELETE,UPDATE ON TABLE app.pg_stat_statements TO rls_obeyer;
  • 4) Execute the SQL dump/backup-file using psql or DBeaver.
    • 4.1) Option 1: Using psql:
      • 4.1.1) TODO
    • 4.2) Option 2: Using DBeaver:
      • 4.2.1) After connecting to the debate-map database, right-click it and press Tools->"Execute script", then supply the path to the backup file.
  • 5) Execute the SQL:
     ALTER DATABASE "debate-map" SET search_path TO 'app'; -- for future pg-sessions
     SELECT pg_catalog.set_config('search_path', 'app', false); -- for current pg-session
    • Note: Why is this necessary? Because SQL dumps/backups do not record the "search-path" of the database. This is by design apparently (, but means that the search-path must be set manually, if restoring to a fresh database. If you get errors during restore relating to search-paths (eg. due to a dev forgetting to add the schema qualifier to a recently-added function), try adding the sql code above to the start of the sql file (replacing the emptying search-path line already there).
[pgbackrest] Rich backups using pgBackRest (type: physical, focus: remote-storage)

General notes:

  • Automatic backups are already set up, writing to the debate-map-prod-uniform-private bucket provisioned by Pulumi in the Google Cloud, at the path: /db-backups-pgbackrest.
  • Schedule: Once a week, a "full" backup is created; once a day, a "differential" backup is created.

Backup structure:

  • Backups in pgbackrest are split into two parts: base-backups (the db-backups-pgbackrest/backup cloud-folder), and wal-archives (the db-backups-pgbackrest/archive cloud-folder).
    • Base-backups are complete physical copies of the database, as seen during the given generation period. (well, complete copies if of type full; differential backups rely on the last full backup to be complete, and incremental backups rely on the last full backup, the last differential (if any), along with the in-between series of incremental backups)
    • Wal-archives are small files that are frequently being created, which is basically a streaming "changelog" of database updates. Wal-archives allow you to do point-in-time restores to arbitrary times, by augmenting the base-backups with the detailed sequence of changes since them.


  • To view the list of backups in the Google Cloud UI, run: npm start backend.viewDBBackups

To manually trigger the creation of a full backup:

  • 1) Run: npm start backend.makeDBBackup
  • 2) Confirm that the backup was created by viewing the list of backups. (using npm start backend.viewDBBackups)
    • 2.1) If the backup failed (which is problematic because it seems to block subsequent backup attempts), you can:
      • 2.1.1) Trigger a retry by running npm start backend.makeDBBackup_retry PGO will then notice the unfinished job is missing and recreate it, which should hopefully work this time.
      • 2.1.2) Or cancel the manual backup by running: npm start backend.makeDBBackup_cancel
[pgbackrest-restore] Restoring from pgBackRest backups
  • 1) Find the point in time that you want to restore the database to. Viewing the list of base-backups in the Google Cloud UI (using npm start backend.viewDBBackups) can help with this, as a reference point (eg. if you made a backup just before a set of changes you now want to revert).
  • 2) Prepare the postgres-operator to restore the backup, into either a new or the current postgres instance/pod-set:
    • 2.1) Option 1, into a new postgres instance/pod-set that then gets promoted to master (PGO recommended way):
      • 2.1.1) Ensure that the tilt-up script is running for the target context. (and disable any tilt-up scripts running for other contexts)
      • 2.1.2) Uncomment the dataSource field in postgres.yaml, uncomment + fill-in the section matching the restore-type you want (then save the file):
        • If you want to restore exactly to a base-backup (without any wal-archive replaying), use the first section. (modifying "set" to the base-backup folder-name seen in the cloud-bucket)
          • At the moment, you also have to run a psql command to complete the restore. See here.
        • If you want to restore to a specific point-in-time (with wal-archive replaying), use the second section. (modifying "target" to the time you want to restore to, with a specified timezone [UTC recommended])
    • 2.2) Option 2, into the existing postgres instance/pod-set (imperative, arguably cleaner way -- but not yet working/reliable):
      • 2.2.1) Run: npm start "backend.restoreDBBackup_prep BACKUP_LABEL" This script patches the postgres-operator deployment/configuration to contain the fields that mark a restoration as active, and specify which backup to use.
      • 2.2.2) To actually activate the restore operation, run: npm start backend.restoreDBBackup_apply This will update the .../pgbackrest-restore annotation on the postgres-operator CRD to the current-time, which the operator interprets as the "go signal" to apply the specifying restoration operation.
  • 4) Observe the logs in the Tilt UI (atm, the restore operation's logs are visible in the "uncategorized" tilt-resource), to track the progress of the restore.
    • Note: It takes about 2.5 minutes just to start, so be patient; you'll know it's done when the logs say restored log file "XXX.history" from archive. (along with the HINT: Execute pg_wal_replay_resume() to promote., which must be followed to complete; see link in for more info)
    • Note: You can ignore the WARN: --delta or --force specified but unable to find... message, as that just means it's a fresh cluster that has to restore from scratch, which the restore module finds odd since it notices the useless [automatically added] delta/force flag)
    • Note: Until the restore process is completely done (eg. with the pgo operator having had time to update the admin-user auth-data secret), the app-server will be failing to start/connect; this is normal/fine.
  • 5) Check whether the restore operation succeeded, by loading up the website. (you may have to wait a bit for the app-server to reconnect; you can restart it manually to speed this up)
    • 5.1) If you get an error in the app-server pod along the lines of error: password authentication failed for user "admin", then it seems the debate-map-pguser-admin secret was already created (by pgo) prior to the restore, which may have made it invalid after the restore was completed (if the credentials differ). To resolve this, you can either:
      • 5.1.1) Delete the debate-map-pguser-admin secret in the postgres-operator namespace; pgo will recreate it in a few seconds, with a working set of credentials (and the reflected version of the secret, in the default namespace, will be updated a few seconds later). Note that in this process, the admin user's password is actually reset to a new (random) value, so you will have to copy the secret's password value for use in third-party programs accessing the database (eg. DBeaver).
      • 5.1.2) Alternately, you can modify the debate-map-pguser-admin secret (in the postgres-operator namespace) to hold the password value that was stored in the postgres backup that was just restored (this approach not yet tested, but presumably should work). One place you may have the old password stored is in DBeaver's password store, which can you decrept using these instructions.
  • 6) If the restore operation did not succeed, you'll want to either make sure it does complete, or cancel the restore operation (else it will keep trying to apply the restore, which may succeed later on when you don't want or expect it to, causing data loss). To cancel the restore:
    • 6.1) If option 1 was taken: Recomment the dataSource field in postgres.yaml, then save the file.
    • 6.2) If option 2 was taken: Run: npm start backend.restoreDBBackup_cancel.
  • 7) After the restore is complete, clean things up:
    • 7.1) If option 1 was taken: Recomment the dataSource field in postgres.yaml, then save the file. (needed so the restore operation is not attempted for other contexts, when their tilt-up scripts are run)
    • 7.2) If option 2 was taken: No action is necessary, because the postgres-operator remembers that the last-set value for the pgbackrest-restore annotation has already been applied, and the restore config was only placed into the target context. (If you want to be extra sure, though, you could follow step 6.2; this is fine, because the restore has already taken place, so it will not be reverted or the like.)
  • 8) Note that after the restore (if using option 1 anyway), the password for the admin user may have changed (it seems to have this time anyway). If that happens, retrieve the new password from the debate-map-pguser-admin secret (eg. using Lens, though make sure to press the eye icon to decode it first!), and update the passwords stored in DBeaver and the like.
[pgbackrest-troubleshooting] How to resolve various pgBackRest issues
  • 1) If you ever get the error command terminated with exit code 28: ERROR: [028]: backup and archive info files exist but do not match the database HINT: is this the correct stanza? HINT: did an error occur during stanza-upgrade?, do the following:
    • 1.1) First reference this comment for some general info. (in retrospect, I think my observations there were only partly true, so take with a grain of salt)
    • 1.2) Open a shell in the debate-map-instance1-XXX pod (using Lens or npm start ssh.db).
    • 1.3) Run pgbackrest info. This should tell you which repos are having backup issues. Note that if repo1 (in-k8s backup) is having an issue, this appears to block backups to repo2 (cloud storage backup), so you'll likely have to debug/resolve repo1 issues first before making progress on repo2's.
    • 1.4) Run pgbackrest check --stanza=db (note the stanza name: db). This should give the same error message that was encountered in the general pgo logs (the [028] backup and archive files exist but do not match the database error).
    • 1.5) It might also be helpful to confirm that things look correct in various configuration files: /etc/pgbackrest.conf, /etc/pgbackrest/conf.d/debate-map-instance1-XXXX.conf
    • 1.6) For actually resolving the issue:
      • 1.6.1) First, think about what caused the backups to start failing. The reasons so far have been due to, eg. swapping out my k8s node for another one (4gb variant to 8gb). If that's the case, the changes needed to get the backups working again are probably minimal.
      • 1.6.2) I don't know exactly what got the backups working again, but here the main actions I took, and in roughly the order I attempted (with something in there apparently resolving the issue):
        • Changing the repo2-path field in postgres.yaml from /db-backups-pgbackrest to /db-backups-pgbackrest-X for a while (with various actions, including the below, then taken), then changing it back. (with tilt-up running during this time)
        • Changing the shutdown field in postgres.yaml to true for a while; once I saw the database pods shut-down (other than pgo and the metrics-collectors), I commented the field again, causing the db pods to restart.
        • Attempting to run a manual backup, by running: npm start backend.makeDBBackup. (The pods attempting to make this backup did not start right away, iirc. When it did start [while messing with some of the steps below], it hit various errors [50, 82, then 62]. Eventually it succeeded, after the pgbackrest start command I believe -- at which point the regular cron-jobs showed up in Lens, and from those a full-backup job was created and completed.)
        • In the debate-map-instance1-XXX pod, run: pgbackrest stanza-upgrade --stanza=db. (failed with ERROR: [055]: unable to load info file '/db-backups-pgbackrest-2/archive/db/' or '/db-backups-pgbackrest-2/archive/db/': [...], but maybe it kickstarted something)
        • In the same pod, run pgbackrest stop, followed by pgbackrest start a few minutes later. (the stop command's effects didn't seem to complete when I tried it, so I ran start later to get things up and running again, after trying the other steps)