Add contents/docs/self-host/runbook #2202

guidoiaquinti · 2021-10-08T15:05:56Z

Changes

Add a new runbook section in our docs.

I wasn't able to add a subsection so that we can have runbooks/kafka/ but we can add it later.

Checklist

Titles are in sentence case
Words are spelled using American english
I have checked out our styleguide

yakkomajuri · 2021-10-08T16:10:56Z

contents/docs/self-host/runbook/kafka.md

+#### Resize data disk
+
+##### Requirements
+You need to run a Kubernetes cluster with the _Volume Expansion_ feature enabled. This feature is supported on the majority of volume types since Kubernetes version >= 1.11 (see [docs](https://kubernetes.io/docs/concepts/storage/storage-classes/#allow-volume-expansion)).


can't we just do this for users by default?

or is this aimed at users who have already deployed and might run into this issue?

that's another follow-up we need to check & update the docs for all of them

+1, we need to follow up as well with another PR describing a suggested k8s setup

tiina303 · 2021-10-08T17:03:48Z

contents/docs/self-host/runbook/kafka.md

+#### Resize data disk
+
+##### Requirements
+You need to run a Kubernetes cluster with the _Volume Expansion_ feature enabled. This feature is supported on the majority of volume types since Kubernetes version >= 1.11 (see [docs](https://kubernetes.io/docs/concepts/storage/storage-classes/#allow-volume-expansion)).


Can we break this up to two pieces: advice for when you don't have it enables (for which we should say you have no option but to nuke it & how to create the new one as expandable if possible or just suggest oversizing) & for when you do (with the implications for both)

I was thinking to tackle this from another angle (with a follow up PR) by adding a new alert/ folder where you can find what to do procedures when you are running low on free space:

vertical scaling (if you have volume expansion enabled)

horizontal scaling

nuke everything

What do you think about it?

Sure, happy for that to be there, maybe we can link to it. It's just odd to see a runbook have a requirement with something that hasn't always been the default for us.

tiina303 · 2021-10-08T17:06:15Z

contents/docs/self-host/runbook/kafka.md

+
+1. List your pods
+    ```
+    ➜ kubectl get pods


use -n posthog everywhere as that's the default namespace we ask folks to install everything, the ones who didn't know to remove it, the ones who did the default stuff will be confused about not seeing anything here

tiina303 · 2021-10-08T17:07:07Z

contents/docs/self-host/runbook/kafka.md

+
+1. Resize the underlying PVC (in this example we are resizing it to to 20G)
+    ```
+    ➜ kubectl patch pvc data-posthog-posthog-kafka-0 -p '{ "spec": { "resources": { "requests": { "storage": "20Gi" }}}}'


❤️ how you made it as fool-proof as possible by not asking someone to edit the file but using patch

tiina303 · 2021-10-08T17:08:42Z

contents/docs/self-host/runbook/kafka.md

+    Note: while resizing the PVC you might get an error `disk resize is only supported on Unattached disk, current disk state: Attached`. In this specific case you need to temporary scale down the `StatefulSet` replica value to zero. **This will briefly disrupt the Kafka service availability and all the events after this point will be dropped as event ingestion will stop working**
+
+    You can do that by running: `kubectl patch statefulset posthog-posthog-kafka -p '{ "spec": { "replicas": 0 }}'`
+
+    After you successfully resized the PVC, you can restore the initial replica definition with: `kubectl patch statefulset posthog-posthog-kafka -p '{ "spec": { "replicas": 1 }}'`


can we put this into a box or something visually so I can more easily jump over if I didn't run into that error

tiina303 · 2021-10-08T17:09:01Z

contents/docs/self-host/runbook/kafka.md

+
+    After you successfully resized the PVC, you can restore the initial replica definition with: `kubectl patch statefulset posthog-posthog-kafka -p '{ "spec": { "replicas": 1 }}'`
+
+1. Delete the `StatefulSet` definition but leave its `pod`s online: `kubectl delete sts --cascade=orphan posthog-posthog-kafka`


please explain why we want to leave the pod online (& what is happening while statefulset is deleted - since the pod is still running we have all the events data in memory and hence don't lose any events???)

tiina303 · 2021-10-08T17:14:39Z

contents/docs/self-host/runbook/kafka.md

+
+1. In your Helm chart configuration, update the `kafka.persistence` value in `value.yaml` to the target size (20G in this example)
+
+1. Run `helm update` to recycle all the pods and re-deploy the `StatefulSet` definition


Maybe link to the update commands in the various docs or to https://posthog.com/docs/self-host/configure/upgrading-posthog#upgrade-instructions

Potential problem: they haven't updated in a long time and now run into major chart updates etc which require some manual steps
Proposal:
(1) if they are trying to update and the chart is currently in a working state add 0. do a helm upgrade to make sure we're up to date so a later helm update will work fine
(2) if they are in a bad state, e.g. they are doing this because kafka was full <- can we give commands that don't use helm update here instead? && suggest that they later run helm update

Actually how do you feel about moving update & uninstall under runbooks & just linking to the runbook from the other pages. They are the same on all platforms.

If they run helm update they are simply going to upgrade their release to a new version of a chart (in this case one with the kafka.persistent value updated). I don't see where is the problem if they haven't updated it in long time.

The problem is that we update the chart constantly, so if they ran a helm update repo after last helm update or if it's someone other than the person who last updated then the chart will have a different local repo state. So during helm update all the differences in the repo state will be applied also. Does this make sense?

Minimally we should call this out that this could be a problem, ideally we'd do something better & propose solutions instead.

tiina303 · 2021-10-08T17:18:49Z

This is awesome!

Can we mention this in product-internal/infrastructure/runbooks/kafka/ so we don't forget to update this here later (consider removing it from there & linking to here instead & keeping only cloud specific stuff there).

guidoiaquinti · 2021-10-11T10:24:55Z

I'm going to merge and iterate further on this.

yakkomajuri

Thanks for documenting.

We can always iterate so don't feel blocked on PRs like these

leggetter · 2021-10-11T10:26:35Z

src/sidebars/sidebars.json

+                    "children": [
+                        {
+                            "name": "Overview",
+                            "url": "/docs/self-host/runbook/overview"


A bit late here, sorry, I was sick on Friday. This should have been /docs/self-host/runbook. We've dropped using overview.md[x].

leggetter · 2021-10-11T10:27:12Z

contents/docs/self-host/runbook/overview.md

@@ -0,0 +1,7 @@
+---


Should have been index.md

Guido Iaquinti added 2 commits October 8, 2021 17:03

Add contents/docs/self-host/runbook

fb1fefa

Fix

9386e82

guidoiaquinti requested review from macobo, yakkomajuri, tiina303 and fuziontech October 8, 2021 15:07

yakkomajuri reviewed Oct 8, 2021

View reviewed changes

tiina303 reviewed Oct 8, 2021

View reviewed changes

tiina303 mentioned this pull request Oct 8, 2021

Resizing Kafka on all platforms PostHog/charts-clickhouse#146

Closed

Guido Iaquinti and others added 4 commits October 11, 2021 11:33

Fixes after CR

4f3541e

Add safe redirects (JS)

5569c9d

Merge remote-tracking branch 'origin/master' into kafka

00f3b4e

Merge branch 'kafka' of github.com:PostHog/posthog.com into kafka

815c300

yakkomajuri approved these changes Oct 11, 2021

View reviewed changes

guidoiaquinti merged commit 90dfb9f into master Oct 11, 2021

guidoiaquinti deleted the kafka branch October 11, 2021 10:25

leggetter reviewed Oct 11, 2021

View reviewed changes

contents/docs/self-host/runbook/overview.md

@@ -0,0 +1,7 @@

---

Copy link

Contributor

leggetter Oct 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have been index.md

guidoiaquinti mentioned this pull request Oct 11, 2021

Fixes post PR review for contents/docs/self-host/runbook #2209

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add contents/docs/self-host/runbook #2202

Add contents/docs/self-host/runbook #2202

guidoiaquinti commented Oct 8, 2021

yakkomajuri Oct 8, 2021

yakkomajuri Oct 8, 2021

tiina303 Oct 8, 2021

guidoiaquinti Oct 11, 2021

tiina303 Oct 8, 2021 •

edited

Loading

guidoiaquinti Oct 11, 2021

tiina303 Oct 11, 2021

tiina303 Oct 8, 2021

tiina303 Oct 8, 2021

tiina303 Oct 8, 2021

tiina303 Oct 8, 2021 •

edited

Loading

tiina303 Oct 8, 2021

tiina303 Oct 8, 2021 •

edited

Loading

guidoiaquinti Oct 11, 2021

tiina303 Oct 11, 2021

tiina303 Oct 13, 2021

tiina303 commented Oct 8, 2021 •

edited

Loading

guidoiaquinti commented Oct 11, 2021

yakkomajuri left a comment

leggetter Oct 11, 2021

leggetter Oct 11, 2021


		After you successfully resized the PVC, you can restore the initial replica definition with: `kubectl patch statefulset posthog-posthog-kafka -p '{ "spec": { "replicas": 1 }}'`

		1. Delete the `StatefulSet` definition but leave its `pod`s online: `kubectl delete sts --cascade=orphan posthog-posthog-kafka`


		1. In your Helm chart configuration, update the `kafka.persistence` value in `value.yaml` to the target size (20G in this example)

		1. Run `helm update` to recycle all the pods and re-deploy the `StatefulSet` definition

Add contents/docs/self-host/runbook #2202

Add contents/docs/self-host/runbook #2202

Conversation

guidoiaquinti commented Oct 8, 2021

Changes

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiina303 Oct 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiina303 Oct 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiina303 Oct 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiina303 commented Oct 8, 2021 • edited Loading

guidoiaquinti commented Oct 11, 2021

yakkomajuri left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiina303 Oct 8, 2021 •

edited

Loading

tiina303 Oct 8, 2021 •

edited

Loading

tiina303 Oct 8, 2021 •

edited

Loading

tiina303 commented Oct 8, 2021 •

edited

Loading