# A/B Testing

## Deskripsi Teknis

- Nama eksperimen:󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀡󠀳 `recommender_system_test`
- Kelompok: А (kontrol), B (funnel pembayaran yang baru)󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀢󠀳
- Tanggal dimulainya eksperimen: 07-12-2020󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀣󠀳
- Tanggal saat mereka berhenti menerima pengguna baru: 21-12-2020󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀤󠀳
- Tanggal berakhirnya eksperimen: 01-01-2021󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀥󠀳
- Audiens: 15% pengguna baru dari kawasan Uni Eropa󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀦󠀳
- Tujuan eksperimen: menguji perubahan terkait pengenalan sistem rekomendasi yang telah ditingkatkan󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀧󠀳
- Hasil yang diharapkan: dalam kurun waktu 14 hari setelah pendaftaran, para pengguna menunjukkan peningkatan dalam hal konversi ke tayangan halaman produk (peristiwa atau event product_page), aktivitas penambahan item ke keranjang belanja (product_cart), dan pembelian (purchase).󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀨󠀳󠀰 Pada setiap tahapan funnel product_page → product_cart → purchase, minimal akan ada peningkatan sebesar 10%.󠀲󠀡󠀠󠀦󠀥󠀨󠀢󠀩󠀳
- Jumlah peserta eksperimen yang diharapkan: 6.000󠀲󠀡󠀠󠀦󠀥󠀨󠀣󠀠󠀳

## Deskripsi Data

- `ab_project_marketing_events_us.csv` — kalender marketing event untuk tahun 2020
- `final_ab_new_users_upd_us.csv` — semua pengguna yang mendaftar di toko online dari tanggal 7 sampai 21 Desember 2020
- `final_ab_events_upd_us.csv` — semua peristiwa dari pengguna baru sepanjang periode 7 Desember 2020 sampai 1 Januari 2021
- `final_ab_participants_upd_us.csv` — tabel yang berisi daftar peserta eksperimen

Dataframe `ab_project_marketing_events_us` memuat:
   - `name` — nama marketing event󠀲󠀡󠀠󠀦󠀥󠀨󠀤󠀠󠀳
   - `regions` — kawasan tempat ad campaign atau kampanye iklan akan berlangsung
   - `start_dt` — tanggal awal campaign
   - `finish_dt` — tanggal akhir campaign
   
Dataframe `final_ab_new_users_upd_us` memuat:
   - `user_id` — ID pelanggan
   - `first_date` — tanggal pendaftaran (sign up)
   - `region` —  Wilayah
   - `device` — perangkat yang digunakan untuk mendaftar
   
Dataframe `final_ab_events_upd_us` memuat:
   - `user_id` — ID pelanggan
   - `event_dt` — tanggal dan waktu peristiwa
   - `event_name` — nama jenis peristiwa
   - `details` — data tambahan terkait peristiwa tersebut (misalnya, jumlah total pesanan dalam USD untuk peristiwa purchase)
   
Dataframe `final_ab_participants_upd_us` memuat:
   - `user_id` — ID pelanggan
   - `ab_test` — nama eksperimen
   - `group` — kelompok eksperimen pengguna berasal

## Inisialisasi Library

In [16]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px

In [17]:
df = pd.read_csv('ab_project_marketing_events_us.csv')
df_1 = pd.read_csv('final_ab_events_upd_us.csv')
df_2 = pd.read_csv('final_ab_new_users_upd_us.csv')
df_3 = pd.read_csv('final_ab_participants_upd_us.csv')

## Mengenal Data

### ab_project_marketing_events_us

In [20]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14 entries, 0 to 13
Data columns (total 4 columns):
 #   Column     Non-Null Count  Dtype 
---  ------     --------------  ----- 
 0   name       14 non-null     object
 1   regions    14 non-null     object
 2   start_dt   14 non-null     object
 3   finish_dt  14 non-null     object
dtypes: object(4)
memory usage: 580.0+ bytes


In [18]:
df

Unnamed: 0,name,regions,start_dt,finish_dt
0,Christmas&New Year Promo,"EU, N.America",2020-12-25,2021-01-03
1,St. Valentine's Day Giveaway,"EU, CIS, APAC, N.America",2020-02-14,2020-02-16
2,St. Patric's Day Promo,"EU, N.America",2020-03-17,2020-03-19
3,Easter Promo,"EU, CIS, APAC, N.America",2020-04-12,2020-04-19
4,4th of July Promo,N.America,2020-07-04,2020-07-11
5,Black Friday Ads Campaign,"EU, CIS, APAC, N.America",2020-11-26,2020-12-01
6,Chinese New Year Promo,APAC,2020-01-25,2020-02-07
7,Labor day (May 1st) Ads Campaign,"EU, CIS, APAC",2020-05-01,2020-05-03
8,International Women's Day Promo,"EU, CIS, APAC",2020-03-08,2020-03-10
9,Victory Day CIS (May 9th) Event,CIS,2020-05-09,2020-05-11


In [19]:
df['regions'].value_counts()

APAC                        4
EU, CIS, APAC, N.America    3
EU, N.America               2
EU, CIS, APAC               2
CIS                         2
N.America                   1
Name: regions, dtype: int64

In [21]:
df.duplicated().sum()

0

In [22]:
report_null = df.isnull().sum().to_frame()
report_null = report_null.rename(columns={0:'missing_values'})
report_null['% of total'] = (report_null['missing_values'] / df.shape[0]).round(2)
report_null.sort_values(by='missing_values', ascending=False)

Unnamed: 0,missing_values,% of total
name,0,0.0
regions,0,0.0
start_dt,0,0.0
finish_dt,0,0.0


### final_ab_events_upd_us

In [24]:
df_1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 423761 entries, 0 to 423760
Data columns (total 4 columns):
 #   Column      Non-Null Count   Dtype  
---  ------      --------------   -----  
 0   user_id     423761 non-null  object 
 1   event_dt    423761 non-null  object 
 2   event_name  423761 non-null  object 
 3   details     60314 non-null   float64
dtypes: float64(1), object(3)
memory usage: 12.9+ MB


In [23]:
df_1.head()

Unnamed: 0,user_id,event_dt,event_name,details
0,E1BDDCE0DAFA2679,2020-12-07 20:22:03,purchase,99.99
1,7B6452F081F49504,2020-12-07 09:22:53,purchase,9.99
2,9CD9F34546DF254C,2020-12-07 12:59:29,purchase,4.99
3,96F27A054B191457,2020-12-07 04:02:40,purchase,4.99
4,1FD7660FDF94CA1F,2020-12-07 10:15:09,purchase,4.99


In [25]:
df_1['event_name'].value_counts()

login           182465
product_page    120862
purchase         60314
product_cart     60120
Name: event_name, dtype: int64

In [26]:
df_1.duplicated().sum()

0

In [27]:
report_null_1 = df_1.isnull().sum().to_frame()
report_null_1 = report_null_1.rename(columns={0:'missing_values'})
report_null_1['% of total'] = (report_null_1['missing_values'] / df_1.shape[0]).round(2)
report_null_1.sort_values(by='missing_values', ascending=False)

Unnamed: 0,missing_values,% of total
details,363447,0.86
user_id,0,0.0
event_dt,0,0.0
event_name,0,0.0


### final_ab_new_users_upd_us

In [29]:
df_2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 58703 entries, 0 to 58702
Data columns (total 4 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   user_id     58703 non-null  object
 1   first_date  58703 non-null  object
 2   region      58703 non-null  object
 3   device      58703 non-null  object
dtypes: object(4)
memory usage: 1.8+ MB


In [28]:
df_2.head()

Unnamed: 0,user_id,first_date,region,device
0,D72A72121175D8BE,2020-12-07,EU,PC
1,F1C668619DFE6E65,2020-12-07,N.America,Android
2,2E1BF1D4C37EA01F,2020-12-07,EU,PC
3,50734A22C0C63768,2020-12-07,EU,iPhone
4,E1BDDCE0DAFA2679,2020-12-07,N.America,iPhone


In [30]:
df_2.duplicated().sum()

0

In [31]:
report_null_2 = df_2.isnull().sum().to_frame()
report_null_2 = report_null_2.rename(columns={0:'missing_values'})
report_null_2['% of total'] = (report_null_2['missing_values'] / df_2.shape[0]).round(2)
report_null_2.sort_values(by='missing_values', ascending=False)

Unnamed: 0,missing_values,% of total
user_id,0,0.0
first_date,0,0.0
region,0,0.0
device,0,0.0


### final_ab_participants_upd_us

In [33]:
df_3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14525 entries, 0 to 14524
Data columns (total 3 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   user_id  14525 non-null  object
 1   group    14525 non-null  object
 2   ab_test  14525 non-null  object
dtypes: object(3)
memory usage: 340.6+ KB


In [32]:
df_3.head()

Unnamed: 0,user_id,group,ab_test
0,D1ABA3E2887B6A73,A,recommender_system_test
1,A7A3664BD6242119,A,recommender_system_test
2,DABC14FDDFADD29E,A,recommender_system_test
3,04988C5DF189632E,A,recommender_system_test
4,4FF2998A348C484F,A,recommender_system_test


In [34]:
df_3.duplicated().sum()

0

In [35]:
report_null_3 = df_3.isnull().sum().to_frame()
report_null_3 = report_null_3.rename(columns={0:'missing_values'})
report_null_3['% of total'] = (report_null_3['missing_values'] / df_3.shape[0]).round(2)
report_null_3.sort_values(by='missing_values', ascending=False)

Unnamed: 0,missing_values,% of total
user_id,0,0.0
group,0,0.0
ab_test,0,0.0


In [36]:
df_3['ab_test'].value_counts()

interface_eu_test          10850
recommender_system_test     3675
Name: ab_test, dtype: int64

In [37]:
df_3['group'].value_counts()

A    8214
B    6311
Name: group, dtype: int64