![ab_testing_image](ab_testing_image.jpg)

As a Data Scientist at a leading online travel agency, youâ€™ve been tasked with evaluating the impact of a new search ranking algorithm designed to improve conversion rates. The Product team is considering a full rollout, but only if the experiment shows a clear positive effect on the conversion rate and does not lead to a longer time to book.

They have shared A/B test datasets with session-level booking data (`"sessions_data.csv"`) and user-level control/variant split (`"users_data.csv"`). Your job is to analyze and interpret the results to determine whether the new ranking system delivers a statistically significant improvement and provide a clear, data-driven recommendation.

## `sessions_data.csv`

| column | data type | description | 
|--------|-----------|-------------|
| `session_id` | `string` | Unique session identifier (unique for each row) |
| `user_id` | `string` | Unique user identifier (non logged-in users have missing user_id values; each user can have multiple sessions) |
| `session_start_timestamp` | `string` | When a session started |
| `booking_timestamp` | `string` | When a booking was made (missing if no booking was made during a session) |
| `time_to_booking` | `float` | time from start of the session to booking, in minutes (missing if no booking was made during a session) |
| `conversion` | `integer` | _New column to create:_ did session end up with a booking (0 if booking_timestamp or time_to_booking is Null, otherwise 1) |

<br>

## `users_data.csv`

| column | data type | description | 
|--------|-----------|-------------|
| `user_id` | `string` | Unique user identifier (only logged-in users in this table) |
| `experiment_group` | `string` | control / variant split for the experiment (expected to be equal 50/50) |

<br>

The full on criteria are the following:
- Primary metric (conversion) effect must be statistically significant and show positive effect (increase).
- Guardrail (time_to_booking) effect must either be statistically insignificant or show positive effect (decrease)

In [53]:
import pandas as pd
from scipy.stats import chisquare
from pingouin import ttest
from statsmodels.stats.proportion import proportions_ztest

In [54]:
sessions = pd.read_csv('sessions_data.csv')
users = pd.read_csv('users_data.csv')

In [55]:
sessions.sample(5)

Unnamed: 0,session_id,user_id,session_start_timestamp,booking_timestamp,time_to_booking
1063,RzFBMHZ7f5nnDa8Y,sw5Rgp5sPKqlj3T9,2025-01-03 16:01:17.013273716,,
1021,sOHbBdx8SrSAx9fa,4sY90paq4iwo2g3r,2025-01-08 23:42:27.347571373,,
5709,qtkg8ooNilFSMrzH,,2025-01-05 05:09:56.049395561,,
6214,j857GHGLuUTs8yNq,22T2b8ApMm4QV22G,2025-01-10 12:19:40.278005838,,
7232,p1qIpaOX1sTqajGe,H3W5wKidB22Usi8s,2025-01-29 02:18:24.163240910,,


In [56]:
users.sample(5)

Unnamed: 0,user_id,experiment_group
8775,2JruzYnIviQFFfcr,control
1125,NfWOvLgAY64Azf7F,control
9042,wSy5BTTU6rBYZZCG,variant
6509,UVVoUmk5gt0lJnYK,control
8103,LB3xnIDM8b5D0sST,variant


### Your solution

In [57]:
confidence_level = 0.90  # Set the pre-defined confidence level (90%)
alpha = 1 - confidence_level  # Significance level for hypothesis tests

In [58]:
# Start here, using as many cells as you require
sessions_x_users = sessions.merge(users, on="user_id", how="inner")

#creating conversion column
sessions_x_users["conversion"] = sessions_x_users["booking_timestamp"].notna().astype('int')

#check for equal splitting
group_counts = users["experiment_group"].value_counts()

print(group_counts)

variant    5009
control    4991
Name: experiment_group, dtype: int64


In [59]:
#chi-squared 
observed = [group_counts['control'], group_counts['variant']]

total = sum(observed)
expected = [total/2, total/2]

chi_stat, srm_chi2_pval = chisquare(f_obs=observed, f_exp=expected)

srm_chi2_pval = round(srm_chi2_pval, 4)

print(f"Chi-Square Statistic: {chi_stat}")
print(f"SRM-CHI2-PVAL: {srm_chi2_pval}")

Chi-Square Statistic: 0.0324
SRM-CHI2-PVAL: 0.8572


## Independent T-Test

In [60]:
booked = sessions_x_users[sessions_x_users["conversion"] == 1]
# booked.head()

#running the test 
guardrail_test = ttest(
    booked[booked["experiment_group"] == "variant"]["time_to_booking"],
    booked[booked["experiment_group"] == "control"]["time_to_booking"],
    alternative="two-sided"
)

print(guardrail_test)

pval_guardrail = round(guardrail_test['p-val'].values[0], 4)

print(f"P-Value Guardrail: {pval_guardrail}")

               T          dof alternative  ...   cohen-d   BF10     power
T-test -0.618198  2575.103104   two-sided  ...  0.024224  0.053  0.094593

[1 rows x 8 columns]
P-Value Guardrail: 0.5365


## Effect Size

In [61]:
#filtering to sessions that have an experiment group 
experiment_data = sessions_x_users[sessions_x_users['experiment_group'].notna()]

# primary metric - conversion for all sessions
control_conv = experiment_data[experiment_data["experiment_group"] == "control"]['conversion'].mean()
variant_conv = experiment_data[experiment_data["experiment_group"] == "variant"]['conversion'].mean()

effect_size_primary = round(variant_conv / control_conv -1, 4)

print(f"Control conversion rate: {control_conv:.4f}")
print(f"Variant Conversion rate: {variant_conv:.4f}")
print(f"Effect Size primary: {effect_size_primary:.4f}")

#guardrail metric 
booked = experiment_data[experiment_data["conversion"] == 1]

control_ttb = booked[booked["experiment_group"] == "control"]["time_to_booking"].mean()
variant_ttb = booked[booked["experiment_group"] == "variant"]["time_to_booking"].mean()

#guardrail effect
effect_size_guardrail = round(variant_ttb / control_ttb -1, 4)

print(f"Control AVG time to booking: {control_ttb:.4f} min")
print(f"Variant AVG time to booking: {variant_ttb:.4f} min")
print(f"Effect Size Guardrail: {effect_size_guardrail:4f} min")

Control conversion rate: 0.1592
Variant Conversion rate: 0.1819
Effect Size primary: 0.1422
Control AVG time to booking: 15.0124 min
Variant AVG time to booking: 14.8940 min
Effect Size Guardrail: -0.007900 min


## Making a decision

In [62]:
#testing the primary metric
experiment_data = sessions_x_users[sessions_x_users["experiment_group"].notna()]

control = experiment_data[experiment_data["experiment_group"] == "control"]
variant = experiment_data[experiment_data["experiment_group"] == "variant"]

#count of conversions for each group 
count = [variant["conversion"].sum(), control["conversion"].sum()]
nobs = [len(variant), len(control)]

z_stat, p_val_primary = proportions_ztest(count, nobs, alternative="two-sided")

pval_primary = round(p_val_primary, 4)

print(f"Z-Statistic: {z_stat}")
print(f"P Val (Primary): {p_val_primary}")

Z-Statistic: 3.722044623223758
P Val (Primary): 0.00019761608793587393


In [66]:
#decision primary 
primary_significant = pval_primary < alpha
primary_positive = effect_size_primary > 0
primary_passes = primary_significant and primary_positive

#decision guardrail
guardrail_significant = pval_guardrail < alpha
guardrail_positive = effect_size_guardrail <= 0
guardrail_passes = guardrail_significant or guardrail_positive

print(f"Primary Significant: {primary_significant} (p={pval_primary} < {alpha}")
print(f"Primary Positive: {primary_positive} (effect={effect_size_primary})")
print(f"Primary Passes: {primary_passes}")
print()

print(f"Guardrail Significant: {guardrail_significant} (p={pval_guardrail} < {alpha}")
print(f"Guardrail Positive: {guardrail_positive} (effect={effect_size_guardrail})")
print(f"Guardrail Passes: {guardrail_passes}")

decision_full_on = ("Yes" if (primary_passes and guardrail_passes) else "No")
print(f"Final Decision: {decision_full_on}")

Primary Significant: True (p=0.0002 < 0.09999999999999998
Primary Positive: True (effect=0.1422)
Primary Passes: True

Guardrail Significant: False (p=0.5365 < 0.09999999999999998
Guardrail Positive: True (effect=-0.0079)
Guardrail Passes: True
Final Decision: Yes
