# Product Ranking Optimization | A/B Testing Project

## Problem description
Suppose that an online grocery store called “Rimi” wants to test a new ranking algorithm to provide products more relevant to customers.

![user_funnel.drawio.png](images/rimi.png)

## Methodology

1. **Problem statement** - What is the goal of the experiment?
    - Understanding the nature of the product
    - Asking clarifying questions:
        - What is the user journey?
        - What is the success metric? It should be:
            - Measurable
            - Attributable
            - Sensitive
            - Timely
2. **Hypothesis testing** - What result do you hypothesize from the experiment?
    - Set up: 
        - Null hypothesis 
        - Alternative hypothesis 
        - Significance level
        - Statistical power
        - Minimum detectable effect (MDE)
3. **Design the Experiment** - What are your experiment parameters?
    - Determine:
        - Randomization unit
        - Target population in the experiment
        - Sample size
        - Duration of the experiment
4. **Run the Experiment** - What are the requirements for running an experiment?
    - Set up the necessary instrumentation to:
        - Collect data 
        - Analyze the results
    - Avoid peeking p-values
5. **Validity Checks** - Did the experiment run soundly without errors or bias?
    - Check for:
        - Instrumentation Effect
        - External Factors
        - Selection Bias
        - Sample Ratio Mismatch
        - Novelty Effect
6. **Interpret Results** - Is the observed change in the metric both statistically and practically significant?
    - Assess the observed lift:
        - P-value
        - Confidence intervals
7. **Launch Decision** - Based on the results and trade-offs, should the change be launched?
    - Consider:
        - Metric Trade-Offs
        - Cost of Launching
        - Risk of committing false positive (Type 1 Error)

## Step 1 - Problem Statement

### Understanding the Nature of the Product

Rimi is an online grocery store that offers a wide range of products, including fresh produce, meat, dairy, baked goods, and more. The store uses a product ranking system or recommendation algorithm.

When a user enters keywords such as "meat" or "fruits," this algorithm generates a list of products that could be relevant to that customer, based on factors like their profile, purchase history, and other data.

If we modify this ranking algorithm, the suggested products may become more relevant to customers, which in turn should **boost sales** for the online store.


### User Journey 

![user_funnel.drawio.png](images/user_funnel.drawio.png)

Considering the user journey is crucial because it helps determine key factors later on, such as defining the success metric, identifying the target user population, and deciding at which stage of the journey a user should be considered as a participant in the experiment.

### Define the Success Metric

To define the success metric, we need to consider the folowing guiding princeples:
1. **Measurable**
    - Is it a type of user behavior that can be accurately captured through your instrumentation or platform?
2. **Attributable**
    - "Attributable" means establishing a clear link between the experiment and the observed changes in metrics.
    - Example: If you are testing a new website design (treatment) and notice an increase in conversions (metric), for the result to be considered "attributable," you need to be sure that the increase is specifically due to the design change, and not, for example, due to an increase in traffic or a marketing campaign that occurred during the same period.
3. **Sensitive**
    - A metric is considered "sensitive" if it is responsive enough to detect significant effects from the applied modification.
    - You want to identify a metric with low variability to increase the likelihood of detecting true effects.
4. **Timely**
    - A/B experiments need to be very quick, it's a very iterative process as a way to improve the product very quickly.
    - Therefore, consider what short-term behavior can serve as a proxy for the long-term desired behavior.


Our success metric is **Average Revenue Per User (ARPU)**, which we aim to increase. However, it's crucial that this improvement does not come at the expense of the **Conversion Rate**, which should remain stable or improve.


## Step 2 - Hypothesis testing


### State the Hypothesis Statement

**Null Hypothesis (H0)**: The average revenue per user (ARPU) between the old and new ranking algorithms is the same.

**Alternative Hypothesis (Ha)**: The average revenue per user (ARPU) between the old and new ranking algorithms is different.



### Set the Significance Level

**Alpha** = 0.05 <br> 
- If the p-value is less than 0.05, reject H0 and conclude that Ha is true.



### Set the Statistical Power

**Statistical Power** = 0.8 <br> 
- Statistical power is the probability of detecting an effect if the alternative hypothesis is true.



### Set the Minimum Detectable Effect (MDE)

**MDE** = 3% <br> 
- If the change in ARPU is at least 3% or higher, it is considered practically significant.

## Step 3 - Design the Experiment

### Set the Randomization Unit

**Randomization Unit** = User <br>
- This unit determines how participants are randomly assigned to groups (control and test) for the experiment. The individual user is the most common randomization unit, especially in digital A/B tests.


### Target Population in the Experiment

**Users** = Visitors who searches a product

- ![user_funnel.drawio.png](images/user_funnel.drawio.png)


### Determine the Sample Size

As a rule of thumb, the following formula is often used for rough estimates:

$$n \approx \frac{16\sigma^2}{\delta^2}$$ 

While the above formula can be useful for quick estimates, it is better to use more detailed methods for accurate results:

$$n = \frac{2(Z_{\alpha/2} + Z_\beta)^2 \cdot \sigma^2}{\delta^2}$$
- We can easily calculate this using Python and the `statsmodels` library.

#### Formula Explanation
- $n$ — This is the required sample size for each group (control and experimental).
- $Z_{\alpha/2}$ — This is the critical value of the normal distribution for the significance level ($\alpha$). It is set as $\alpha/2$ because we often use a two-tailed test. For example, for a significance level of 0.05, the value of $Z_{\alpha/2}$ is approximately 1.96.
- $Z_\beta$ — This is the critical value for the test power ($\beta$). For example, for a power of 0.8, the value of $Z_\beta$  is approximately 0.84. The power of the test is the probability of detecting an effect if it exists, and it is commonly set at 80%.
- $\sigma$ — This is the variance (the square of the standard deviation) of your metric. In the case of comparing means, it represents how spread out the values of the metric are (e.g., Average Revenue Per User or ARPU).
- $\delta$ —  This is the minimum detectable effect (MDE). It is the difference between the means of the control and experimental groups that you want to detect. The smaller $\delta$, the larger the sample size needed to accurately detect this difference.

#### Assumptions
Since we don’t have real data, we’ll estimate what it could look like based on industry averages.

##### Estimating ARPU
1. The average revenue per user (ARPU) in online grocery stores can vary significantly depending on how often customers place orders, their average basket size, and other factors.
2. Typical industry data:
    - ChatGPT suggests that ARPU for online grocery retailers often ranges from 20 to 100 euros, depending on the region and shopping frequency. The standard deviation ($\sigma$), on average, can range from 20% to 50% of the average ARPU.
3. Assumption:
    - **Average ARPU** = 50 euros
    - **Standard Deviation ($\sigma$)** = 15 euros (which corresponds to 30% of the average).

##### Estimating Conversion Rate
1. The conversion rate for online grocery stores is the percentage of users who complete a purchase out of the total number of website visitors.
2. Typical industry data:
    - Based on ChatGPT’s response, on average, the conversion rate for online grocery stores can range from 2% to 5%. However, grocery stores have a certain specificity — if a customer visits with the intent to buy groceries, the conversion rate might be higher compared to apparel or electronics stores.
    - For large retailers like Rimi, the conversion rate may be closer to the upper end of this range.
3. Assumption:
    - **Conversion rate**: 3-5% (which corresponds to the conversion rate for a typical online grocery store).

#### Calculations

In [3]:
from statsmodels.stats.power import TTestIndPower

# Define the parameters
alpha = 0.05  # significance level
power = 0.8   # test power
effect_size = 3 / 50  # MDE (3% of the mean ARPU, assuming ARPU is 50)
std_dev = 15  # standard deviation of ARPU

# Calculate the sample size
sample_size = TTestIndPower().solve_power(effect_size=effect_size, 
                                          alpha=alpha, 
                                          power=power, 
                                          alternative='two-sided')

# Rounding the sample size to the nearest integer
sample_size = int(sample_size)

sample_size

4361

### Duration of the Experiment

**Duration** = 1 to 2 weeks


---------

**Average ARPU**: 50 EUR <br>
**Standard deviation of ARPU**: 𝜎 = 15 <br>
σ=15 EUR (presumably 30% of average ARPU)<br>
**Conversion**: 3-5% (which corresponds to the conversion rate for a typical online grocery store)


https://chatgpt.com/c/45501f5d-3cde-4532-ae5c-df61a7d0a207