### Summary
From Lecture: https://product-data-science.datamasked.com/courses/621233/lectures/11105795

### Steps
1. 了解題意: 確定hypothesis指的是什麼? 原先發現「母國說西班牙語的 conversion 優於非西班牙語國家的 conversion」
    * 所以做了 A/B testing，在「非西班牙的國家」用localized的語言 v.s. 「在非西班牙國家」用西班牙語
    * 發現negative: 「在非西班牙國家」用西班牙語的 conversion > 在「非西班牙的國家」用localized的語言的 conversion
    * null = 等於 在「非西班牙的國家」用localized的語言的 conversion > 「在非西班牙國家」用西班牙語的 conversion (negative 代表拒絕 null hypothesis)
2. 確認問題:
    * Q1: Confirm that test is actually negative. I.e., the old version of the site with just one translation across Spain and LatAm performs better
    * A1: 先看西班牙是否 conversion 高於其他國家? 確認做 t-test 的前提假設、再做一次 t-test 看是不是真的 negative?
    * Q2: Explain why that might be happening. Are the localized translations really worse?
    * A2: 做 EDA 去找一些 insights
    * Q3: If you identified what was wrong, design an algorithm that would return FALSE if the same problem is happening in the future and TRUE if everything is good and results can be trusted.
3. 實作 A1
    * 先了解 data，看 data distribution, 是否是 imbalanced data, 是否能用 t-test 去做
    * 真正 apply t-test 去比對 Q1 結果
4. 實作 A2 - 做 EDA

### Questions
1. 到底要用哪種 t-test? null hypothesis 是什麼?


In [32]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

### Question 1

In [33]:
# Read in data
test = pd.read_csv("/Users/eve/Desktop/Datasets/Translation_Test/test_table.csv")
test.head(2)

Unnamed: 0,user_id,date,source,device,browser_language,ads_channel,browser,conversion,test
0,315281,2015-12-03,Direct,Web,ES,,IE,1,0
1,497851,2015-12-04,Ads,Web,ES,Google,IE,0,1


In [34]:
user = pd.read_csv("/Users/eve/Desktop/Datasets/Translation_Test/user_table.csv")
user.head(2)

Unnamed: 0,user_id,sex,age,country
0,765821,M,20,Mexico
1,343561,F,27,Nicaragua


In [35]:
print("Length of test table: {}, Length of user table: {}".format(len(test), len(user)))

Length of test table: 453321, Length of user table: 452867


In [45]:
# Merge two tables
df = test.merge(user, on = 'user_id')
print("Length of merged data: {}".format(len(df)))
df.head(2)

Length of merged data: 452867


Unnamed: 0,user_id,date,source,device,browser_language,ads_channel,browser,conversion,test,sex,age,country
0,315281,2015-12-03,Direct,Web,ES,,IE,1,0,M,32,Spain
1,497851,2015-12-04,Ads,Web,ES,Google,IE,0,1,M,21,Mexico


In [46]:
# 為了要重現 t-test，去掉 Spain，因為 Spain 並沒有變化 (i.e. in treatment group)
df = df[df["country"] != 'Spain']
len(df)

401085

In [48]:
# Work on re-do t-test
# 1. check t-test assumption - sample size & sample std
# sample size 差不多 --> 可用 t-test
# 注意這裡 sample variance 不同，記得在做 t-test 時用 equal_var = False
def sample_variance(x):
    return np.std(x, ddof = 1)

df.groupby(["test"])["conversion"].agg({"count", sample_variance}).reset_index()

Unnamed: 0,test,sample_variance,count
0,0,0.214383,185311
1,1,0.203781,215774


In [49]:
# Different way
df.groupby(["test"]).agg(np.std, ddof = 1)

Unnamed: 0_level_0,user_id,date,source,device,browser_language,ads_channel,browser,conversion,sex,age,country
test,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,288943.965717,,,,,,,0.214383,,6.788499,
1,288529.156089,,,,,,,0.203781,,6.762929,


In [50]:
# another way
np.sqrt(pd.Series(df[df["test"] == 0]["conversion"]).var(ddof = 1))

0.2143826987826136

In [51]:
np.sqrt(pd.Series(df[df["test"] == 1]["conversion"]).var(ddof = 1))

0.20378131704133629

In [52]:
# 用 package 去算 t-test
from scipy import stats
t, p = stats.ttest_ind(df[df["test"] == 1]["conversion"], 
                       df[df["test"] == 0]["conversion"],
                      equal_var=False)
print("Critical value = {}, P-value = {}".format(t, p))

Critical value = -7.353895203080277, P-value = 1.9289178577799033e-13
