# AB Test - Marketing Campaign Analysis

Now, I'll conduct the A/B Test on our marketing campaign dataframe. It's important to stabilish:
- I'm analysing the promotions financial performance, so my MAIN METRIC will be `SalesInThousands`
- My hyphotesis are:
    - H0: the promotions did not have effect on revenue
    - H1: the promotions HAD an effect on revenue

Before begin doing test, IT'S FUNDAMENTAL TO LOOK INTO THE TYPE OF DISTRIBUTION I HAVE. For this, I'll use Shapiro Normality test and then determine which statiscial metrics I'll use.

In [1]:
import pandas as pd
from scipy.stats import shapiro, ttest_1samp, ttest_ind, levene, kruskal, mannwhitneyu, pearsonr, spearmanr

In [4]:
# import dataframe
df = pd.read_csv('WA_Marketing-Campaign.csv')

# keep only the metric we'll use to analyse promotion performance: SalesInThousands
df_clean = df[['Promotion', 'SalesInThousands']]

## Normality Check

In [26]:
# check to see if df has a normal distribution
## H0: df HAS a normal distribution: I'll move to parametric methods
## H1: df DOESN'T have a normal distribution: I'll use NON parametric methods

# Shapiro test is done in arrays, so I need to create an array for each promotion
promo_list = df_clean['Promotion'].unique()

# loop shapiro test between promotion list(1,2,3) and extract the p-value
for promotion in promo_list:
    normal_test = shapiro(df_clean.loc[df_clean['Promotion'] == promotion, 'SalesInThousands'])
    if normal_test[1] < 0.05:
        print(f'Promotion {promotion} DOESN NOT HAVE a normal distribution \n')
    else:
        print(f'Promotion {promotion} HAS a normal distribution \n')

Promotion 3 DOESN NOT HAVE a normal distribution 

Promotion 2 DOESN NOT HAVE a normal distribution 

Promotion 1 DOESN NOT HAVE a normal distribution 



Since our promotion don't have a normal distribution (AS NOTICED DURING THE EDA), I'll use **Nonparametric** statistical methods to determine if there's any didfference between the promotions.

## A/B Test

### Levene Test - Variance
HO: samples have same variance

H1: samples DO NOT HAVE the same variance

In [31]:
# Leven test compares samples in the format of arrays
## Since I have 3 samples, because I have 3 promotion, I need compare them all at the same time
levene_test, levene_pvalue = levene(df_clean.loc[df_clean['Promotion'] == 1, 'SalesInThousands'],
                                    df_clean.loc[df_clean['Promotion'] == 2, 'SalesInThousands'],
                                    df_clean.loc[df_clean['Promotion'] == 3, 'SalesInThousands'])

if levene_pvalue < 0.05:
    print(f'Promotions have different variance, P-value={round(levene_pvalue,4)} < 0.05')
else:
    print(f'Promotions have the same variance, P-value={round(levene_pvalue,4)} > 0.05')

Promotions have the same variance, P-value=0.2818 > 0.05


### Kruskal-Wallis H-test - Median
HO: sample medians are the same

H1: sample median are different

In [32]:
kruskal_test, kruskal_pvalue = kruskal(df_clean.loc[df_clean['Promotion'] == 1, 'SalesInThousands'],
                                    df_clean.loc[df_clean['Promotion'] == 2, 'SalesInThousands'],
                                    df_clean.loc[df_clean['Promotion'] == 3, 'SalesInThousands'])

if kruskal_pvalue < 0.05:
    print(f'Promotions have DIFFERENT medians, P-value={round(kruskal_pvalue,4)} < 0.05')
else:
    print(f'Promotions have the SAME median, P-value={round(kruskal_pvalue,4)} > 0.05')

Promotions have DIFFERENT medians, P-value=0.0 < 0.05


Comparing the 3 promotions, I identified there's a difference between them (due to the different median) and it's statistical significant. The problem I can't pinpoint which one is different, since my **variance** analysis suggest they are the same.

To fix this, I'll have to run individual tests and see which ones are different:
- Promotion 1 x Promotion 2
- Promotion 1 x Promotion 3
- Promotion 2 x Promotion 3

For this, I'm gonna conduct the a Dunn's Test that is usually conducted after an inconclusive Kurskall-Wallis Test.

In [None]:
import scikit_posthocs as sp

posthoc_results = sp.posthoc_dunn([df_clean[df_clean["Promotion"] == p]["NumSales"] for p in df_clean["Promotion"].unique()], p_adjust='bonferroni')


ModuleNotFoundError: No module named 'scikit_posthocs'

## A/B TEST Summary