Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lukesmith bayesian power #2507

Open
wants to merge 31 commits into
base: main
Choose a base branch
from
Open

Conversation

lukebrawleysmith
Copy link
Contributor

adding power and mde calculation for bayesian framework

I use a bisection method for finding the Bayesian mde. Currently when finding the upper bound for the MDE, I do not permit more than 200% effect size (finding an upper bound can be tricky, because power is decreasing in effect size after a certain point). I will revisit this next week and treat finding an upper bound with more care.

I still need to:

  • sharpen procedure for finding mde upper bound (if possible)
  • make another pass through documentation

Copy link

github-actions bot commented May 10, 2024

Your preview environment pr-2507-bttf has been deployed.

Preview environment endpoints are available at:

@lukesonnet lukesonnet changed the base branch from main to node-sdk May 16, 2024 18:24
@lukesonnet lukesonnet changed the base branch from node-sdk to main May 16, 2024 18:24
Copy link
Contributor

@lukesonnet lukesonnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some smaller comments and one bigger comment.

Broad question: is the unit test here set up to mirror the python notebook and we should take the python notebook as the source of power validity, right?

packages/front-end/components/PowerCalculation/types.ts Outdated Show resolved Hide resolved
packages/back-end/types/stats.d.ts Outdated Show resolved Hide resolved
packages/front-end/components/PowerCalculation/stats.ts Outdated Show resolved Hide resolved
true
);
if (maxPower < power) {
console.log(`failing at iteration j: %d`, j);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these will need to be logged later.

how frequently do we get in this case? Is this the case where the UI will show that N/A or - or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so far, this case has never happened. i had this for testing purposes, happy to remove it if you want.

i haven't rigorously resolved the upper bound issue yet, so i kept this in there for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it could possibly happen, then we should have proper error handling for it, if it is impossible, we should remove it.

return mdeResults;
}

export function powerMetricWeeksBayesian(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see why you asked about copying. There's a lot of complex logic here that it might benefit us to only write once. Take a look at this slight rewrite https://github.com/growthbook/growthbook/compare/ls/power-structure?expand=1

I've done two key things:

  • Make some of the code a bit more typescript friendly (e.g. don't unpack a bunch of stuff, use .forEach)
  • Ask you to change powerEst and powerEstBayesian to instead take an object with your parameters specified. Then build those parameters and specify which function to use in one single if-else block.

My code is psuedo-code, since I didn't create those other types, and can just serve as a reference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I incorporated your pseudo-code. However, I didn't define new types as inputs for powerEst and powerEstBayesian, as the linter was fussing at me for having different types as inputs for a powerFunction function. Also, the way I have it is simple. However, if you have more elegant approach, I'm open to incorporating it. Also, I have statsEngineSettings as attribute of powerSettings, rather than standalone object, because we always want the freq statsEngineSettings to be part of PowerCalculationParams and bayesian statsEngineSettings to be associated with PowerCalculationParamsBayesian. Again, if you have better suggestion, I'm open.

Copy link

github-actions bot commented May 16, 2024

Deploy preview for docs ready!

✅ Preview
https://docs-nyobufaby-growthbook.vercel.app

Built with commit 998d5de.
This pull request is being automatically deployed with vercel-action

@lukebrawleysmith
Copy link
Contributor Author

Some smaller comments and one bigger comment.

Broad question: is the unit test here set up to mirror the python notebook and we should take the python notebook as the source of power validity, right?

n: number,
nVariations: number,
alpha: number = 0.05,
twoTailed: boolean = true,
sequentialTuningParameter = 0
sequentialTesting: false | number
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I think having something typed as either false | number isn't the best approach if we can avoid it. It's a bit confusing, but no need to change it right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I can talk to Romain about this, he suggested this input when only freq power analysis was available. Now that we are implementing Bayesian, I can see if we can simplify.

mde: -999,
};
if (powerSettings.statsEngineSettings.type === "frequentist") {
thisPower = powerEst(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we call these methods powerEstFrequentist and findMdeFrequentist for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why there is not an option to reply to the comment below ("If it could possibly happen, then we should have proper error handling for it, if it is impossible, we should remove it."), but I have revamped the bayesian mde code, and this does not exist anymore.

true
);
if (maxPower < power) {
console.log(`failing at iteration j: %d`, j);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it could possibly happen, then we should have proper error handling for it, if it is impossible, we should remove it.

@@ -116,14 +124,14 @@ Frequently asked questions:
3. What if my experiment is low-powered? Should I still run it? The biggest cost to running a low-powered experiment is that your results will probably be noisy, and you will face ambiguity in your rollout/rollback decision. That being said, you will probably still have learnings from your experiment.
4. What does "N/A" mean for my MDE result? It means there is no solution for the MDE, given the current number of weekly users, control mean, and control variance. Increase your number of weekly users or reduce your number of treatment variations.

## GrowthBook implementation
## GrowthBook implementation (frequentist)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, this documentation is really long and complicated and I'm worried about having it be public facing when so few people will find it relevant, and the few people who are technical and are curious about our approach cannot parse this information effectively.

Can we reformat and simplify as much as possible?

Some suggestions:

  • Clear formulae for each object that we care about (Power = ...; MDE = ...)
  • Plain language summary of our approach?
  • Minimal recreation of concepts explained elsewhere (e.g. var(\Delta))
  • Less explanation of the steps to solve the math, and a bit more reliance on english to explain how we ended up with final result. We can keep more documentation in an internal notion doc.
  • More sign posting and better formatting. This document is a wall of uneven line breaks and interspersed equations and plain text. Can we consider using more sub-headers, bolded concepts like "Power" with their equation right below it?

This comment is asking for a pretty big overhaul but this section is a bit messy and I think doesn't do a lot to build trust in the hard work you've done to do this right.

This applies to both Freq and Bayesian power sections.

@@ -107,6 +107,14 @@ In clinical trials, the standard is 80%. This means that if you were to run your

That being said, running an experiment with less than 80% power can still help your business. The purpose of an experiment is to learn about your business, not simply to roll out features that achieve statistically significant improvement. The biggest cost to running low-powered experiments is that your results will be noisy. This usually leads to ambiguity in the rollout decision.

## How do I run Bayesian power analysis?

For Bayesian power analysis, you specify the prior distribution of the treatment effect (see here [to be written] for guidance regarding prior selection). We then estimate Bayesian power, which is the probability that the $(1 - \alpha)$ credible interval after the test does not contain 0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't the flow most people will use, since they will largely accept default priors for that metric. Maybe we need to hold off on this section until we have more details on what users will actually do, but I think for V1 of bayesian power users can't modify priors, so this section is misleading.

* Almost done wiring.

* Almost there.

* Default to org setting for engine.

* Keep values in sync.

* Split metric params.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants