# AB测试 - 在线教育课程 - 页面停留时间

In [59]:
import pandas as pd 
import numpy as np
from scipy import stats

pd.set_option('display.float_format',lambda x : '%.2f' % x)
np.set_printoptions(suppress=True)

In [2]:
df = pd.read_csv('course_page_actions.csv')

Unnamed: 0,timestamp,id,group,action,duration
0,2016-09-24 17:14:52.012145,261869,experiment,view,130.545004
1,2016-09-24 18:45:09.645857,226546,experiment,view,159.862440
2,2016-09-24 19:16:21.002533,286353,experiment,view,79.349315
3,2016-09-24 19:43:06.927785,842279,experiment,view,55.536126
4,2016-09-24 21:08:22.790333,781883,experiment,view,204.322437
...,...,...,...,...,...
4044,2017-01-18 09:39:08.046251,931490,control,view,58.846204
4045,2017-01-18 09:44:15.239671,410222,experiment,enroll,101.231821
4046,2017-01-18 09:56:26.948171,364458,control,view,293.490566
4047,2017-01-18 10:10:18.293253,443603,experiment,view,149.026959


使用去重方法，观察group列是否纯净

In [7]:
df.drop_duplicates(subset=['group'])

Unnamed: 0,timestamp,id,group,action,duration
0,2016-09-24 17:14:52.012145,261869,experiment,view,130.545004
1,2016-09-24 18:45:09.645857,226546,experiment,view,159.862440
2,2016-09-24 19:16:21.002533,286353,experiment,view,79.349315
3,2016-09-24 19:43:06.927785,842279,experiment,view,55.536126
4,2016-09-24 21:08:22.790333,781883,experiment,view,204.322437
...,...,...,...,...,...
4044,2017-01-18 09:39:08.046251,931490,control,view,58.846204
4045,2017-01-18 09:44:15.239671,410222,experiment,enroll,101.231821
4046,2017-01-18 09:56:26.948171,364458,control,view,293.490566
4047,2017-01-18 10:10:18.293253,443603,experiment,view,149.026959


根据对照组与测试组分组

In [5]:
df_control = df[df.group == 'control']
df_experiment = df[df.group == 'experiment']
df_control, df_experiment

(                       timestamp      id    group action    duration
 11    2016-09-24 22:42:41.218757  701620  control   view  302.951552
 13    2016-09-25 01:23:19.800167  439910  control   view   61.234458
 15    2016-09-25 01:46:27.950552  842231  control   view  124.823065
 19    2016-09-25 03:33:02.810074  882775  control   view  326.025765
 21    2016-09-25 04:28:17.178986  491935  control   view  121.309499
 ...                          ...     ...      ...    ...         ...
 4038  2017-01-18 06:57:37.696847  901542  control   view  124.298489
 4041  2017-01-18 08:07:44.940868  282469  control   view  110.801373
 4042  2017-01-18 08:35:44.813986  313521  control   view  131.677707
 4044  2017-01-18 09:39:08.046251  931490  control   view   58.846204
 4046  2017-01-18 09:56:26.948171  364458  control   view  293.490566
 
 [1949 rows x 5 columns],
                        timestamp      id       group  action    duration
 0     2016-09-24 17:14:52.012145  261869  experiment    v

安排AB测试指标

双边检验：
H0：实验组浏览时间对比测试组均值相同
H1：实验组浏览时间对比测试组均值不同

exp mean 实验组均值
con mean 对照组均值
con std 对照组标准差
exp size 实验组容量
alpha 显著性水平
 
z-statistic z统计量
z-value z值
(p-value p值) 

Reject H0?

In [74]:
# 计算实验组均值 exp_mean = 130.93220512539477
exp_mean = df_experiment.duration.mean()

# 计算对照组均值 con_mean = 115.40710650582038 - 此处可以看出实验组的确是有提升的，问题在于显著性

con_mean = df_control.duration.mean()

# 计算对照组标准差 con_std = 70.25634436867115

con_std = df_control.duration.std()
# exp_mean, con_mean, con_std

# 实验组容量 exp_size = 2100

exp_size = df_experiment.shape[0]

# 显著性水平

alpha = 0.05

# 计算z统计量 z_statistics = 4.721294539547963

z_statistics = (exp_mean - con_mean)/(con_std/np.sqrt(exp_size))
z_statistics
exp_mean, con_mean, con_std

# 计算p值 - p值为 8.895150888254065e-12 约等于0 远小于0.05
# ttest_ind 双样本双边检验（判断两样本均值是否相同），此函数默认方差齐性，如方差不同可设equal_var = False
# levene 用于检验方差是否相同
v = stats.levene(df_experiment.duration, df_control.duration)
v
# 此处p = 0.026，小于0.05，方差不齐
# 因此需要设置equal_var = False
# 尽管案例中实际上没有影响

t_and_p = stats.stats.ttest_ind(df_experiment.duration, df_control.duration,equal_var = False)
# print ("%f" %t_and_p[1])



0.000000


p值为 8.895150888254065e-12 约等于0 远小于0.05 因此可以拒绝原假设，接受备择假设。
即实验组浏览时间对比测试组均值不同，且130>115，故而对照组存在显著提高。

事实上，如果考虑实验完整性，应对AB测试所需样本量进行评估

Python统计包statsmodels.stats.power中，有一个NormalIndPower工具，可以用其中的solve_power函数实现。

Solve_power函数中的参数如下：

（1）参数effect_size ： 两个样本均值之差/标准差

（2）nobs1：样本1的样本量，样本2的样本量=样本1的样本量*ratio

（3）alpha：显著性水平，一般取0.05

（4）power：统计功效，一般去0.8

（5）ratio: 样本2的样本量/样本1的样本量，一般取1

（6）alternative：字符串str类型，默认为‘two-sided’,也可以为单边检验：’larger’ 或’small’

例：目前的点击率CTR是0.3，我们要想提升10%，将点击率提升到0.33，测试组和对照组的样本量相同。
如：想要提升的现有基准——转化率（conversion rate，可以为点击率、订阅率等）为10%；想要在此基础上提高10%（minimum detectable effect），即提高到11%；统计显著性为5%，统计功效选80%，则计算出结果为14751，即对照组和测试组需要的样本量均为14751。

检测效果变化值越小，需要的样本量越大；检测效果变化值越大，需要的样本量越小。因为，变化效果越小，越有可能是抽样误差引起的；为了避免抽样误差的影响，需要增大样本量。