-
Notifications
You must be signed in to change notification settings - Fork 433
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add contents/docs/self-host/runbook #2202
Conversation
#### Resize data disk | ||
|
||
##### Requirements | ||
You need to run a Kubernetes cluster with the _Volume Expansion_ feature enabled. This feature is supported on the majority of volume types since Kubernetes version >= 1.11 (see [docs](https://kubernetes.io/docs/concepts/storage/storage-classes/#allow-volume-expansion)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't we just do this for users by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is this aimed at users who have already deployed and might run into this issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's another follow-up we need to check & update the docs for all of them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, we need to follow up as well with another PR describing a suggested k8s setup
#### Resize data disk | ||
|
||
##### Requirements | ||
You need to run a Kubernetes cluster with the _Volume Expansion_ feature enabled. This feature is supported on the majority of volume types since Kubernetes version >= 1.11 (see [docs](https://kubernetes.io/docs/concepts/storage/storage-classes/#allow-volume-expansion)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we break this up to two pieces: advice for when you don't have it enables (for which we should say you have no option but to nuke it & how to create the new one as expandable if possible or just suggest oversizing) & for when you do (with the implications for both)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking to tackle this from another angle (with a follow up PR) by adding a new alert/
folder where you can find what to do procedures when you are running low on free space:
- vertical scaling (if you have volume expansion enabled)
- horizontal scaling
- nuke everything
What do you think about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, happy for that to be there, maybe we can link to it. It's just odd to see a runbook have a requirement with something that hasn't always been the default for us.
|
||
1. List your pods | ||
``` | ||
➜ kubectl get pods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use -n posthog
everywhere as that's the default namespace we ask folks to install everything, the ones who didn't know to remove it, the ones who did the default stuff will be confused about not seeing anything here
|
||
1. Resize the underlying PVC (in this example we are resizing it to to 20G) | ||
``` | ||
➜ kubectl patch pvc data-posthog-posthog-kafka-0 -p '{ "spec": { "resources": { "requests": { "storage": "20Gi" }}}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
❤️ how you made it as fool-proof as possible by not asking someone to edit the file but using patch
Note: while resizing the PVC you might get an error `disk resize is only supported on Unattached disk, current disk state: Attached`. In this specific case you need to temporary scale down the `StatefulSet` replica value to zero. **This will briefly disrupt the Kafka service availability and all the events after this point will be dropped as event ingestion will stop working** | ||
|
||
You can do that by running: `kubectl patch statefulset posthog-posthog-kafka -p '{ "spec": { "replicas": 0 }}'` | ||
|
||
After you successfully resized the PVC, you can restore the initial replica definition with: `kubectl patch statefulset posthog-posthog-kafka -p '{ "spec": { "replicas": 1 }}'` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we put this into a box or something visually so I can more easily jump over if I didn't run into that error
|
||
After you successfully resized the PVC, you can restore the initial replica definition with: `kubectl patch statefulset posthog-posthog-kafka -p '{ "spec": { "replicas": 1 }}'` | ||
|
||
1. Delete the `StatefulSet` definition but leave its `pod`s online: `kubectl delete sts --cascade=orphan posthog-posthog-kafka` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please explain why we want to leave the pod online (& what is happening while statefulset is deleted - since the pod is still running we have all the events data in memory and hence don't lose any events???)
|
||
1. In your Helm chart configuration, update the `kafka.persistence` value in `value.yaml` to the target size (20G in this example) | ||
|
||
1. Run `helm update` to recycle all the pods and re-deploy the `StatefulSet` definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe link to the update commands in the various docs or to https://posthog.com/docs/self-host/configure/upgrading-posthog#upgrade-instructions
Potential problem: they haven't updated in a long time and now run into major chart updates etc which require some manual steps
Proposal:
(1) if they are trying to update and the chart is currently in a working state add 0. do a helm upgrade to make sure we're up to date so a later helm update will work fine
(2) if they are in a bad state, e.g. they are doing this because kafka was full <- can we give commands that don't use helm update here instead? && suggest that they later run helm update
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually how do you feel about moving update & uninstall under runbooks & just linking to the runbook from the other pages. They are the same on all platforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they run helm update
they are simply going to upgrade their release to a new version of a chart (in this case one with the kafka.persistent
value updated). I don't see where is the problem if they haven't updated it in long time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that we update the chart constantly, so if they ran a helm update repo
after last helm update
or if it's someone other than the person who last updated then the chart will have a different local repo state. So during helm update
all the differences in the repo state will be applied also. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minimally we should call this out that this could be a problem, ideally we'd do something better & propose solutions instead.
This is awesome! Can we mention this in |
I'm going to merge and iterate further on this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for documenting.
We can always iterate so don't feel blocked on PRs like these
"children": [ | ||
{ | ||
"name": "Overview", | ||
"url": "/docs/self-host/runbook/overview" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit late here, sorry, I was sick on Friday. This should have been /docs/self-host/runbook
. We've dropped using overview.md[x]
.
@@ -0,0 +1,7 @@ | |||
--- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have been index.md
Changes
Add a new
runbook
section in our docs.I wasn't able to add a subsection so that we can have
runbooks/kafka/
but we can add it later.Checklist