Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Resilient feature flags #74

Merged
merged 4 commits into from Jan 24, 2023
Merged

RFC: Resilient feature flags #74

merged 4 commits into from Jan 24, 2023

Conversation

neilkakkar
Copy link
Contributor

@neilkakkar neilkakkar commented Nov 25, 2022

Draft for now, need to flesh this out further, just jotting down core points

@neilkakkar neilkakkar changed the title Create resilient-feature-flags RFC: Resilient feature flags Nov 25, 2022
@neilkakkar neilkakkar marked this pull request as ready for review November 29, 2022 15:06
1. The database is down/unreachable: `decide` evaluation fails and returns 500
2. The servers are down/unreachable: requests from client libraries time out / error out.

(2) seems hard to defend against without a distributed app distribution (and then a resolver to go to the 'correct' app), but (1) is possible.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is a distributed app distribution and what is a 'correct' app?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk about terrible wording 😅 . I basically mean multiple server deployments where one going down doesn't affect the other, like an edge server; and a load balancer than can appropriately link to healthy closest server.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we should build an SDK making an assumption on the underlying infrastructure.

To build resilient software, use defensive programming: always hope for the best (uptime 100%) but always prepare for the worst (everything is down). AKA: a multi-geo deployment might help you, but you should not rely on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not assuming that at all, I'm just listing out what are ways things can go wrong here.

Even with defensive programming, the issue still remains that when servers go down, you stop getting flag information.

(a point below addresses this)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants