<a href="https://colab.research.google.com/github/Sheik1sha/Sheik_Projects/blob/main/Customer360Insights_AB_testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Problem Statement:** To determine the effectiveness of different marketing campaigns on customer purchase behavior. Specifically, we aim to understand how different customer demographics (e.g., Age, Gender) respond to various campaign strategies.

**Dataset:** "Customer360Insights" from Kaggle

In [11]:
import pandas as pd

# Step 2.1: Load the data
# Assuming the data is in a CSV file named 'Customer360Insights.csv'
df = pd.read_csv('Customer360Insights.csv')

df.head()


Unnamed: 0,SessionStart,CustomerID,FullName,Gender,Age,CreditScore,MonthlyIncome,Country,State,City,...,Price,Quantity,CampaignSchema,CartAdditionTime,OrderConfirmation,OrderConfirmationTime,PaymentMethod,SessionEnd,OrderReturn,ReturnReason
0,2019-01-01 02:42:00,1001,Brittany Franklin,Male,57,780,7591,China,Guangdong,Dongguan,...,50,4,Instagram-ads,2019-01-01 02:49:00,True,2019-01-01 03:02:00,Cash On Delivery,2019-01-01 02:53:00,False,
1,2019-01-02 20:35:00,1002,Scott Stewart,Female,69,746,3912,China,Shandong,Yantai,...,80,6,Google-ads,2019-01-02 20:50:00,True,2019-01-02 20:58:00,Debit Card,2019-01-02 20:54:00,False,
2,2019-01-04 03:11:00,1003,Elizabeth Fowler,Female,21,772,7460,UK,England,Birmingham,...,20,2,Facebook-ads,2019-01-04 03:30:00,True,2019-01-04 03:40:00,Cash On Delivery,2019-01-04 03:35:00,False,
3,2019-01-05 09:01:00,1004,Julian Wall,Female,67,631,4765,UK,England,Birmingham,...,20,2,Twitter-ads,2019-01-05 09:17:00,True,2019-01-05 09:26:00,Cash On Delivery,2019-01-05 09:20:00,False,
4,2019-01-05 13:35:00,1005,James Simmons,Male,57,630,3268,China,Shandong,Yantai,...,100,6,Billboard-QR code,2019-01-05 13:40:00,True,2019-01-05 13:52:00,Debit Card,2019-01-05 13:42:00,False,


In [39]:
df.columns

Index(['SessionStart', 'CustomerID', 'FullName', 'Gender', 'Age',
       'CreditScore', 'MonthlyIncome', 'Country', 'State', 'City', 'Category',
       'Product', 'Cost', 'Price', 'Quantity', 'CampaignSchema ',
       'CartAdditionTime', 'OrderConfirmation', 'OrderConfirmationTime',
       'PaymentMethod', 'SessionEnd', 'OrderReturn', 'ReturnReason'],
      dtype='object')

In [16]:
# Data Cleaning
# Convert date columns to datetime format
df['SessionStart'] = pd.to_datetime(df['SessionStart'])
df['CartAdditionTime'] = pd.to_datetime(df['CartAdditionTime'])
df['OrderConfirmationTime'] = pd.to_datetime(df['OrderConfirmationTime'])
df['SessionEnd'] = pd.to_datetime(df['SessionEnd'])

# Handling missing values (if any)
df.dropna(subset=['SessionStart', 'CustomerID', 'Gender', 'Age', 'CampaignSchema ', 'OrderConfirmation'], inplace=True)

# Select only the columns needed for the campaign analysis
df_campaign = df[['CustomerID', 'Gender', 'Age', 'CampaignSchema ', 'OrderConfirmation']]

# Display the cleaned data information
print(df_campaign.info())

# Display the first few rows of the df_campaign to verify
print(df_campaign.head())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2000 entries, 0 to 1999
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   CustomerID         2000 non-null   int64 
 1   Gender             2000 non-null   object
 2   Age                2000 non-null   int64 
 3   CampaignSchema     2000 non-null   object
 4   OrderConfirmation  2000 non-null   bool  
dtypes: bool(1), int64(2), object(2)
memory usage: 64.6+ KB
None
   CustomerID  Gender  Age    CampaignSchema   OrderConfirmation
0        1001    Male   57      Instagram-ads               True
1        1002  Female   69         Google-ads               True
2        1003  Female   21       Facebook-ads               True
3        1004  Female   67        Twitter-ads               True
4        1005    Male   57  Billboard-QR code               True


In [18]:
# Segmenting data by age groups (e.g., 18-30, 31-50, 51+)
df_campaign.loc[:, 'AgeGroup'] = pd.cut(df_campaign['Age'], bins=[18, 30, 50, 100], labels=['18-30', '31-50', '51+'])

# A/B Testing Setup
# Calculating the conversion rate for each campaign within each demographic segment
conversion_rates = df_campaign.groupby(['AgeGroup', 'Gender', 'CampaignSchema '])['OrderConfirmation'].mean().reset_index()

# Renaming the column for clarity
conversion_rates.rename(columns={'OrderConfirmation': 'ConversionRate'}, inplace=True)

# Displaying the conversion rates for different segments and campaigns
print(conversion_rates)


   AgeGroup  Gender    CampaignSchema   ConversionRate
0     18-30  Female  Billboard-QR code        0.966667
1     18-30  Female            E-mails        0.826087
2     18-30  Female       Facebook-ads        0.860465
3     18-30  Female         Google-ads        0.842105
4     18-30  Female      Instagram-ads        0.865385
5     18-30  Female        Twitter-ads        0.816327
6     18-30    Male  Billboard-QR code        0.766667
7     18-30    Male            E-mails        0.935484
8     18-30    Male       Facebook-ads        0.812500
9     18-30    Male         Google-ads        0.727273
10    18-30    Male      Instagram-ads        0.882353
11    18-30    Male        Twitter-ads        0.871795
12    31-50  Female  Billboard-QR code        0.907692
13    31-50  Female            E-mails        0.846154
14    31-50  Female       Facebook-ads        0.857143
15    31-50  Female         Google-ads        0.863636
16    31-50  Female      Instagram-ads        0.783333
17    31-5

  conversion_rates = df_campaign.groupby(['AgeGroup', 'Gender', 'CampaignSchema '])['OrderConfirmation'].mean().reset_index()


In [20]:
from scipy.stats import chi2_contingency
import pandas as pd

# Prepare contingency tables for Chi-Square Test
contingency_tables = {}
age_groups = df_campaign['AgeGroup'].unique()
campaigns = df_campaign['CampaignSchema '].unique()

for age_group in age_groups:
    for gender in df_campaign['Gender'].unique():
        df_segment = df_campaign[(df_campaign['AgeGroup'] == age_group) & (df_campaign['Gender'] == gender)]

        # Create a contingency table
        contingency_table = pd.crosstab(df_segment['CampaignSchema '], df_segment['OrderConfirmation'])

        # Check if the contingency table has enough data
        if contingency_table.shape[0] > 1 and contingency_table.shape[1] > 1:
            # Perform Chi-Square Test
            chi2_stat, p_val, dof, expected = chi2_contingency(contingency_table, correction=False)

            contingency_tables[(age_group, gender)] = {
                'ContingencyTable': contingency_table,
                'Chi2Stat': chi2_stat,
                'PValue': p_val,
                'DegreesOfFreedom': dof,
                'ExpectedFrequencies': expected
            }

            # Print results
            print(f"Age Group: {age_group}, Gender: {gender}")
            print("Contingency Table:")
            print(contingency_table)
            print("Chi-Square Statistic:", chi2_stat)
            print("P-Value:", p_val)
            print("Degrees of Freedom:", dof)
            print("Expected Frequencies:")
            print(expected)
            print("\n")
        else:
            print(f"Insufficient data for Age Group: {age_group}, Gender: {gender}")
            print("Contingency Table:")
            print(contingency_table)
            print("\n")


Age Group: 51+, Gender: Male
Contingency Table:
OrderConfirmation  False  True 
CampaignSchema                 
Billboard-QR code     10     59
E-mails               14     36
Facebook-ads           8     50
Google-ads             8     43
Instagram-ads         15     62
Twitter-ads           10     57
Chi-Square Statistic: 5.441836914964881
P-Value: 0.36436583332444566
Degrees of Freedom: 5
Expected Frequencies:
[[12.05645161 56.94354839]
 [ 8.73655914 41.26344086]
 [10.1344086  47.8655914 ]
 [ 8.91129032 42.08870968]
 [13.45430108 63.54569892]
 [11.70698925 55.29301075]]


Age Group: 51+, Gender: Female
Contingency Table:
OrderConfirmation  False  True 
CampaignSchema                 
Billboard-QR code      7     58
E-mails                9     58
Facebook-ads           6     60
Google-ads            12     69
Instagram-ads         10     49
Twitter-ads            3     56
Chi-Square Statistic: 5.454129770638365
P-Value: 0.3630019390739719
Degrees of Freedom: 5
Expected Frequencies:


**A/B Testing Results and Analysis**

The A/B testing analyzed the effectiveness of various marketing campaigns (e.g., Google-ads, Instagram-ads) across different age groups and genders to understand their impact on conversion rates. The Chi-Square test was used to assess whether the observed differences in conversion rates were statistically significant.

**Results Summary**

**Age Group: 51+, Gender: Male**
Chi-Square Statistic: 5.44
P-Value: 0.36
The p-value is above 0.05, meaning there isn't a statistically significant difference between campaign effectiveness for males aged 51+.

**Age Group: 51+, Gender: Female**
Chi-Square Statistic: 5.45
P-Value: 0.36
Similar to the male group, the p-value indicates no significant difference in campaign effectiveness for females aged 51+.

**Age Group: 18-30, Gender: Male**
Chi-Square Statistic: 7.09
P-Value: 0.21
Again, the p-value suggests no significant difference between the campaigns for males aged 18-30.

**Age Group: 18-30, Gender: Female**
Chi-Square Statistic: 4.06
P-Value: 0.54
The p-value indicates that there is no significant difference in campaign effectiveness for females aged 18-30.

**Age Group: 31-50, Gender: Male**
Chi-Square Statistic: 8.38
P-Value: 0.14
The p-value suggests no significant difference in campaign effectiveness for males aged 31-50.

**Age Group: 31-50, Gender: Female**
Chi-Square Statistic: 3.97
P-Value: 0.55
Similar to other groups, the p-value indicates no significant difference in campaign effectiveness for females aged 31-50.

**For each demographic segment (age and gender), the chi-square tests indicate that there are no statistically significant differences in conversion rates between the different campaigns. This means that based on this data, no specific campaign type stands out as significantly better or worse across different demographics.**

Even though the chi-square tests didn't show significant differences, it’s still useful to review the actual conversion rates. For instance, you might notice trends or patterns in which campaigns perform better for specific demographics.


**Conversion Trends:**


*   Older Age Groups: For age groups 51+, both male and female, campaigns like Instagram-ads and Google-ads tend to have higher conversion rates compared to others. Although these differences are not statistically significant, it might be worth investigating further.

*   Younger Age Groups: For age groups 18-30, the results are less clear. There is some variability in campaign effectiveness, but again, none of these differences are statistically significant.

 **Campaign Preferences:**

1.   Instagram-Ads: This campaign seems to perform relatively well across various age and gender groups in terms of conversions, though it doesn't show a statistically significant difference.

2.   Billboard-QR Code: This method seems to have varying results depending on the age group, with some higher conversion rates in older demographics.

**Gender-Based Insights:**

Gender Differences: There are no significant differences in campaign effectiveness between genders for any age group. This suggests that, for this dataset, gender does not influence the effectiveness of different campaign types.