# Confidence Interval 

We can estimate population statistics using point estimates, which are single values calculated from sample data.

One way to understand the reliability of a point estimate is to consider its sampling distribution, which is the distribution of that statistic if we were to take many samples from the population. We could then try to estimate the population statistic based on the properties of this sampling distribution. However, in practice, it is often challenging and not feasible to generate more than one actual sample from the population.

In such cases, confidence interval estimation can be employed to provide a range of plausible values for the population statistic.

A confidence interval is a range calculated around a point estimate. It is specified with a confidence level, which represents the theoretical probability that if we were to repeat the sampling process many times and construct a confidence interval for each sample, the true population parameter would be contained within approximately that percentage of the intervals. For example, a 95% confidence level suggests that if we were to draw 100 samples and calculate a 95% confidence interval for each, about 95 of these intervals would be expected to include the true population value.



**Formal Introduction to Confidence Intervals:**

Let $\theta$ be an unknown population parameter that we wish to estimate. Suppose we have obtained a random sample $X_1, X_2, ..., X_n$ from the population with a probability distribution $F(x; \theta)$. Let $\hat{\theta}(X_1, X_2, ..., X_n)$ be a point estimator of $\theta$, which is a function of the sample.

A **$100(1-\alpha)\%$ confidence interval** for the parameter $\theta$ is an interval $[L(X_1, ..., X_n), U(X_1, ..., X_n)]$, where $L$ and $U$ are functions of the sample, such that the probability that this random interval contains the true value of $\theta$ is equal to $1-\alpha$. Formally:

$$P(L(X_1, ..., X_n) \leq \theta \leq U(X_1, ..., X_n)) = 1 - \alpha$$

Here:

* $\theta$ represents the true, unknown population parameter (e.g., population mean, population proportion).
* $X_1, ..., X_n$ is the random sample drawn from the population.
* $\hat{\theta}(X_1, ..., X_n)$ is the point estimator of $\theta$ (e.g., sample mean $\bar{X}$, sample proportion $\hat{p}$).
* $[L(X_1, ..., X_n), U(X_1, ..., X_n)]$ is the **confidence interval**, with $L$ being the lower limit and $U$ being the upper limit, both calculated from the sample data. These limits are random variables because they depend on the random sample.
* $1-\alpha$ is the **confidence level**, a pre-specified probability (typically 0.95 for a 95% confidence interval, corresponding to $\alpha = 0.05$). This probability reflects the long-run frequency with which intervals constructed using this method will contain the true parameter.
* $\alpha$ is the **significance level**, representing the probability that a random interval constructed using this method will *not* contain the true parameter.

**Interpretation of the Confidence Level (Formal):**

The statement $P(L \leq \theta \leq U) = 1 - \alpha$ should be interpreted in terms of repeated sampling. If we were to draw an infinite number of random samples of the same size $n$ from the population and construct a $100(1-\alpha)\%$ confidence interval for $\theta$ based on each sample, then in the long run, $100(1-\alpha)\%$ of these constructed intervals would contain the true, fixed value of $\theta$.

**Construction of Confidence Intervals (General Approach):**

The construction of a confidence interval typically involves:

1.  **Identifying a suitable point estimator** $\hat{\theta}$ for the parameter $\theta$.
2.  **Determining the sampling distribution** of the estimator $\hat{\theta}$. This often relies on the Central Limit Theorem for large samples or knowledge of the population distribution for small samples.
3.  **Finding the critical values** from the sampling distribution that correspond to the desired confidence level $1-\alpha$. These critical values define the boundaries that capture the central $1-\alpha$ probability.
4.  **Constructing the interval** using the point estimate and the margin of error, which is the product of the critical value and the standard error of the estimator.

For example, for the population mean $\mu$ with a known population standard deviation $\sigma$ and a large sample size $n$, a $(1-\alpha)$ confidence interval is given by:

$$\bar{X} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}$$

where $\bar{X}$ is the sample mean and $z_{\alpha/2}$ is the $z$-score such that $P(Z > z_{\alpha/2}) = \alpha/2$ for a standard normal random variable $Z$.



Let's introduce the concept of a confidence interval specifically within the context of A/B testing for a product data scientist, using more formal language:

**Formal Introduction to Confidence Intervals in A/B Testing:**

In the realm of A/B testing, our primary goal is often to estimate the true difference in a key performance indicator (KPI) – such as conversion rate, click-through rate, or average revenue per user – between two or more variants (e.g., control 'A' and treatment 'B') across the entire user population. We achieve this by exposing a random sample of users to each variant and observing their behavior.

Let $\mu_A$ be the true population mean of the KPI for variant A, and $\mu_B$ be the true population mean for variant B. Our parameter of interest is often the **true difference in means**, $\delta = \mu_B - \mu_A$.

From our A/B test, we obtain sample means $\bar{X}_A$ and $\bar{X}_B$ based on sample sizes $n_A$ and $n_B$ for each variant, respectively. The **point estimate** for the true difference in means is $\hat{\delta} = \bar{X}_B - \bar{X}_A$.

A **$100(1-\alpha)\%$ confidence interval for the true difference in means ($\delta$)** is a random interval $[L(Data_A, Data_B), U(Data_A, Data_B)]$, calculated from the observed data of both variants, such that the probability that this interval contains the true difference $\delta$ is equal to $1-\alpha$. Formally:

$$P(L(X_{A1}, ..., X_{An_A}, X_{B1}, ..., X_{Bn_B}) \leq \mu_B - \mu_A \leq U(X_{A1}, ..., X_{An_A}, X_{B1}, ..., X_{Bn_B})) = 1 - \alpha$$

Here:

* $\mu_B - \mu_A$ is the true, unknown difference in the population means of the KPI between variants B and A.
* $X_{Ai}$ and $X_{Bj}$ represent the individual user observations for variants A and B, respectively.
* $\bar{X}_B - \bar{X}_A$ is the point estimate of this difference.
* $[L, U]$ are the **lower and upper bounds of the confidence interval**, calculated from the sample data of both variants. These bounds are random variables dependent on the random assignment and user behavior.
* $1-\alpha$ is the **confidence level** (e.g., 0.95), representing the long-run frequency with which intervals constructed using this method will contain the true difference in population means.

**Interpretation in A/B Testing (Formal):**

A $100(1-\alpha)\%$ confidence interval for the difference in means implies that if we were to repeat our A/B test many times under identical conditions and construct a confidence interval for the difference in means for each test, then approximately $100(1-\alpha)\%$ of these intervals would contain the true difference in the population means.

**Construction in A/B Testing (Common Approach for Large Samples):**

Assuming sufficiently large sample sizes (invoking the Central Limit Theorem for the sample means), the confidence interval for the difference in means is often constructed as:

$$(\bar{X}_B - \bar{X}_A) \pm z_{\alpha/2} \cdot SE(\bar{X}_B - \bar{X}_A)$$

where:

* $\bar{X}_A$ and $\bar{X}_B$ are the sample means of the KPI for variants A and B.
* $z_{\alpha/2}$ is the critical value from the standard normal distribution corresponding to the desired confidence level.
* $SE(\bar{X}_B - \bar{X}_A)$ is the standard error of the difference in means, which depends on the sample standard deviations and sample sizes of both groups. For proportions (like conversion rates), this standard error has a specific formula based on the pooled variance or separate variances.



Okay, let's break down the steps for estimating the standard error of the difference between two independent sample means, which is crucial for constructing confidence intervals in A/B testing.

**Steps to Estimate the Standard Error of the Difference Between Two Means:**

Assume you have two independent samples, Group A with $n_A$ observations and Group B with $n_B$ observations, and you've calculated their respective sample means ($\bar{X}_A$ and $\bar{X}_B$) and sample standard deviations ($s_A$ and $s_B$).

The standard error of the difference between the two sample means ($SE(\bar{X}_B - \bar{X}_A)$) estimates the variability of the difference between the means of these two groups if you were to repeat the sampling process many times. Here are the steps to calculate it:

**Step 1: Calculate the Sample Variance for Each Group:**

The sample variance for each group is the square of the sample standard deviation:

* For Group A: $s_A^2 = \frac{\sum_{i=1}^{n_A} (X_{Ai} - \bar{X}_A)^2}{n_A - 1}$
* For Group B: $s_B^2 = \frac{\sum_{j=1}^{n_B} (X_{Bj} - \bar{X}_B)^2}{n_B - 1}$

    Note that we divide by $n-1$ (Bessel's correction) to get an unbiased estimate of the population variance.

**Step 2: Divide the Sample Variance by the Sample Size for Each Group:**

This step estimates the variance of the sampling distribution of the mean for each group:

* For Group A: $\frac{s_A^2}{n_A}$
* For Group B: $\frac{s_B^2}{n_B}$

**Step 3: Sum the Results from Step 2:**

Add the estimated variances of the sampling distributions of the means:

$$\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}$$

**Step 4: Take the Square Root of the Sum:**

The standard error of the difference between the two sample means is the square root of the value obtained in Step 3:

$$SE(\bar{X}_B - \bar{X}_A) = \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}$$

**In Summary (Formula):**

The formula for the estimated standard error of the difference between the means of two independent samples is:

$$SE(\bar{X}_B - \bar{X}_A) = \sqrt{\frac{s_A^2}{n_A} + \frac{s_B^2}{n_B}}$$

where:

* $s_A$ is the sample standard deviation of Group A.
* $s_B$ is the sample standard deviation of Group B.
* $n_A$ is the sample size of Group A.
* $n_B$ is the sample size of Group B.

**Why these steps?**

This formula arises from the properties of variances of independent random variables. If you have two independent random variables $Y_A$ and $Y_B$, then the variance of their difference ($Y_B - Y_A$) is the sum of their individual variances:

$$Var(Y_B - Y_A) = Var(Y_B) + Var(-Y_A) = Var(Y_B) + (-1)^2 Var(Y_A) = Var(Y_B) + Var(Y_A)$$

In our case, $\bar{X}_A$ is an estimator of the population mean $\mu_A$ with an estimated variance of $\frac{s_A^2}{n_A}$, and $\bar{X}_B$ is an estimator of $\mu_B$ with an estimated variance of $\frac{s_B^2}{n_B}$. Since the samples are independent, the variance of the difference of the sample means ($\bar{X}_B - \bar{X}_A$) is the sum of these two variances. The standard error is then the square root of this variance.

This standard error is a crucial component in constructing confidence intervals for the true difference in population means ($\mu_B - \mu_A$) in A/B testing. It tells us how much variability we can expect in our estimate of the treatment effect due to random sampling.