New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Blog: How we improved feature flag resiliency #5546

Merged

andyvan-ph merged 20 commits into master from neilkakkar-patch-2

Sep 8, 2023

Contributor

neilkakkar commented Mar 16, 2023

Changes

Please describe.

Add screenshots or screen recordings for visual / UI-focused changes.

Checklist

Titles are in sentence case
Feature names are in sentence case too
Words are spelled using American English
I have checked out our style guide
If I moved a page, I added a redirect in vercel.json


          First draft, needs a lot more work, but getting it off my chest

9b4d31f

vercel bot commented Mar 16, 2023 •

edited

Loading

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Updated (UTC)
posthog	✅ Ready (Inspect)	Visit Preview	Sep 8, 2023 9:33am

neilkakkar marked this pull request as draft

March 16, 2023 17:40


          Fix typos

0cf526c

vercel bot deployed to Preview

March 16, 2023 18:16

View deployment

neilkakkar commented

View reviewed changes

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

andyvan-ph added the content label

andyvan-ph assigned neilkakkar


          Update how-we-improved-feature-flags-resiliency.md

85349ab

vercel bot deployed to Preview

March 21, 2023 12:18

View deployment

neilkakkar and others added 2 commits

March 21, 2023 12:51


          Update how-we-improved-feature-flags-resiliency.md

cb188cf


          Fix typos

6ae216c

Contributor Author

neilkakkar commented Mar 21, 2023

Okay, rough structure is there.

I could use some feedback now @andyvan-ph @joethreepwood @liyiy @EDsCODE on content, structure, and if this all makes sense.

And @ellie @hazzadous on accuracy of the small infra things I've mentioned here.

vercel bot deployed to Preview

March 21, 2023 13:31

View deployment

joethreepwood changed the title ~~First draft, needs a lot more work, but getting it done in chunks~~ Blog: How we improved feature flag resiliency

ellie reviewed

View reviewed changes

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

neilkakkar requested a review from ivanagas

March 22, 2023 14:24


          Update contents/blog/how-we-improved-feature-flags-resiliency.md

411b114

Co-authored-by: Ellie Huxtable <ellie@elliehuxtable.com>

vercel bot deployed to Preview

March 22, 2023 14:46

View deployment

ivanagas reviewed

View reviewed changes

Contributor

ivanagas left a comment

A bunch of edits.

I think it is important to tighten up the intro and get to the "meat" of the article faster (as it is really interesting and I want to make sure as many people as possible get to that point)

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

liyiy reviewed

View reviewed changes

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated


		So, when thinking about reliability, we want to prioritise defending against things that happen frequently, or have a high chance of occurring over time. This includes things like redis, postgres, or pgbouncer going down. Then, if we have the resources and nothing better to prioritise, we can focus on defending against asteroids.

		Today, we can't yet defend against asteroids, nor the entire infrastructure going down, but for other things, like postgres, we've found ways to defend against this, leveraging our special problem constraints.

Contributor

liyiy Mar 22, 2023

I like the light touch that a small joke about asteroids brings to the piece. I think it'd be even better if it was only contained in the last 1-2 sentences or only referenced 1-2x at most instead of 3-4x? 🤗

Contributor Author

neilkakkar Mar 23, 2023

LOL agree I think I liked it so much I forgot I already included it 😂

Have removed and combined the above sentence into one.

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

Contributor

liyiy commented Mar 22, 2023

Overall awesome work! Really helps to paint a picture of all the work the team has done in the past couple of months to improve feature flags 🎉

Contributor

liyiy commented Mar 22, 2023 •

edited

Loading

I'm not sure what images we would put here, maybe some screenshots of latency improvements from our grafana as an example? Just to split up the text a bit and make it more (visual) reader-friendly


          Update how-we-improved-feature-flags-resiliency.md

22fcbc0

vercel bot deployed to Preview

March 23, 2023 10:39

View deployment

neilkakkar commented

View reviewed changes

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved

neilkakkar commented

View reviewed changes

contents/blog/how-we-improved-feature-flags-resiliency.md Outdated Show resolved Hide resolved


          Apply suggestions from code review

9f0704a

vercel bot deployed to Preview

March 23, 2023 14:15

View deployment

neilkakkar marked this pull request as ready for review

May 4, 2023 12:17


          Minor bits

c190c28

andyvan-ph reviewed

View reviewed changes

contents/blog/how-we-improved-feature-flags-resiliency.md Show resolved Hide resolved

vercel bot deployed to Preview

August 29, 2023 17:39

View deployment

neilkakkar and others added 3 commits

August 31, 2023 14:36


          Apply suggestions from code review

f4641be

Co-authored-by: Andy Vandervell <92976667+andyvan-ph@users.noreply.github.com>


          Apply suggestions from code review

48ab474

Co-authored-by: Andy Vandervell <92976667+andyvan-ph@users.noreply.github.com>


          Update how-we-improved-feature-flags-resiliency.md

26dbce8

Contributor Author

neilkakkar commented Aug 31, 2023

I think we could lift this with one or two diagrams / flow charts that visualize what you're writing about. Any thoughts on what these could be? (attached an example from a recent newsletter illustrating what I mean)

Hmmm, good question, thinking about this, but nothing great comes to mind. Maybe a flow chart of how things are setup and where the borkages happen?

Like, a sample request flow?

Request comes in ---> Django server for feature flags ---> fetch feature flag definitions from redis

  --- first option---> try evaluating without database --> return result if evaluated
  ---second option--> Pgbouncer (connection pooler) -----> Database to fetch person properties

And the arrows going to redis and pgbouncer are sources of problems & latency.

vercel bot deployed to Preview

August 31, 2023 14:12

View deployment

Contributor

ivanagas commented Aug 31, 2023

How's this? Could potentially do ones for Partial flag evaluation and database down if good

Contributor Author

neilkakkar commented Aug 31, 2023

Ooh yes this is great!

Contributor Author

neilkakkar commented Aug 31, 2023

I'd just tilt them to make the flow clearer.

PostHog server on the left, and top

SDK in the middle, both height & width wise,

and client at the right, and bottom.

Should read more naturally, with the arrows going something like:

Contributor

ivanagas commented Aug 31, 2023

Partial flag evaluation

Contributor

ivanagas commented Aug 31, 2023

Contributor

ivanagas commented Aug 31, 2023

Another option

Contributor Author

neilkakkar commented Aug 31, 2023

yes 2nd option is very nice


          Polish pass

86a2852

Contributor

andyvan-ph commented Sep 5, 2023

@neilkakkar + @ivanagas: Thanks both for the graphic stuff. Have added both and done another light polish pass on the copy.

I was doing a mental Hacker News pre-mortem and I think the only thing this is missing is... evidence. We say we've made flags faster / more reliable, but there's nothing to back that up.

@neilkakkar is there something we can near to the end here? I don't think it need to be super in-depth, it just needs something to prove it out.

vercel bot deployed to Preview

September 5, 2023 15:11

View deployment

Contributor Author

neilkakkar commented Sep 5, 2023

hmm, we don't have our latency logs anymore, because this was > 3 weeks old. So we can't show before & after, but I guess we can show current latency times

and then the status page: https://status.posthog.com/uptime/1t4b8gf5psbc?page=3 and how incident rate has gone down.

Will add a blurb to the end when I'm back tomorrow, thanks!

neilkakkar added 4 commits

September 6, 2023 10:51


          Update how-we-improved-feature-flags-resiliency.md

b5065ae


          add image

ff8a72b


          rename

d17d8a1


          Update how-we-improved-feature-flags-resiliency.md

d0e3ccf

Contributor Author

neilkakkar commented Sep 6, 2023

just added this section as an appendix

vercel bot deployed to Preview

September 6, 2023 10:30

View deployment


          Update how-we-improved-feature-flags-resiliency.md

28cd7e4

andyvan-ph enabled auto-merge (squash)

September 8, 2023 09:17

vercel bot deployed to Preview

September 8, 2023 09:33

View deployment

andyvan-ph merged commit 75930df into master

2 checks passed

andyvan-ph deleted the neilkakkar-patch-2 branch

September 8, 2023 09:33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment