## **Introduction: Analyzing User Behavior and Conversion Metrics by Gender, Age, and time**

## Demographics

In this analysis, I explored the relationship between user demographics (gender and age) and key performance indicators (KPIs) such as impressions, clicks, and conversions. The goal was to understand if there were statistically significant differences between men and women in terms of these metrics and if age had any impact on conversion and click-through rates (CTR).

To achieve this, I applied a series of hypothesis tests and analysis of variance (ANOVA) techniques, including:

1. **Hypothesis Test for Gender Differences**: I performed a hypothesis test to determine whether women had more impressions, clicks, and conversions than men. Using the resultant p-value and an alpha of 0.05, women have a statisitically different amount of impressions and clicks. However, there is no difference in conversions between men and women.
  
2. **2-Way Z-Test for Conversion (CVR) and Click Rates (CTR)**: A two-way z-test was used to investigate conversion and click rates by gender. Using an alpha of 0.05 again, it was found that there is no significant difference between CVR or CTR for men and women. Even though women were the majority of impressions and clicks, they both had the same overall CVR and CTR.
  
3. **One-Way ANOVA for Age Effect**: I conducted a one-way ANOVA to assess whether age influenced conversion rates, as it was revealed gender does not. The test revealed that age had no significant effect on CVR.

4. **One-Way ANOVA for Gender and KPIs**: A one-way ANOVA was also performed to see if gender had an effect on conversion rates. The test revealed that gender had no significant effect on CVR, which can be seen in the z-test as well.

5. **Two-Way ANOVA with Post-Hoc Tukey Test**: To refine the analysis, a two-way ANOVA with a post-hoc Tukey test was applied to assess the effect of both age and gender on conversion rates. Using a 0.05 value for alpha, there was no significant difference between any of the categories. However, upon further analysis in the post-hoc tukey test, it was found that the following pairs had a p value of less than 0.1, which is also a valid alpha choice.
  - Age groups 18-25 and 55+
  - Women in the age group 18-25 and men in the age group 55+

 This suggests that older age groups, and older men specifically, are more likely to convert than the younger age groups, young women specifically. This makes sense, as older age groups are more likely to have money to spend on entertainment than the younger age groups.

This series of analyses provides valuable insights into how user demographics may affect conversion behaviors and helps identify actionable trends for improving performance across different user groups.

## Time Analysis

 For this analysis, I examined data from an advertising campaign that was run over the course of two different weeks.

 **Week 1**, I performed a one-way ANOVA test to determine if there were significant differences in impressions, clicks, and conversions across different days of the week. The results revealed a significant difference, with Sunday leading in all three metrics, followed by Saturday. To further explore the differences between the days, I used a post-hoc Tukey test, which compared all the days in pairs.

**Week 2**, I found that the data was insufficient to run an ANOVA test, so I evaluated the trends manually. I observed that Wednesday and Tuesday outperformed the other days in impressions, clicks, and conversions. Monday lagged behind in impressions and clicks but was relatively consistent with other days in terms of conversions. Thursday and Friday consistently had low performance across all metrics.

Lastly, I conducted an ANOVA test for impressions by hour of the day for week 1. The results showed no significant variance between hours, even with a p-value < 0.1. However, I did identify that the top 5 times for impressions were consistently around 300 impressions, with the peak time being over 500 impressions. The top 5 times, in order, were: 8 PM, 10 PM, 9 PM, 7 PM, and 6 PM—all occurring during the evening when people are generally more available after work and school.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

## Importing Data

In [None]:
camp_data = pd.read_csv('tiktok ads campaign data.csv')

In [None]:
dem_data = pd.read_csv('tiktok_ads_demographics.csv')

In [None]:
time_week1 = pd.read_csv('tiktok_time_week1.csv')

In [None]:
time_week2 = pd.read_csv('tiktok_time_week2.csv')

In [None]:
dem_data.head()

Unnamed: 0,Age,Gender,Cost,Impressions,Clicks (destination),Conversions,CTR (destination),Conversion rate (CVR),CPC (destination),CPM,Cost per conversion
0,35-44,Female,26.8,3326,126,45,3.79%,1.35%,0.21,8.06,0.6
1,25-34,Female,24.89,4026,128,40,3.18%,0.99%,0.19,6.18,0.62
2,45-54,Female,20.1,1899,69,26,3.63%,1.37%,0.29,10.58,0.77
3,≥55,Female,10.34,814,32,9,3.93%,1.11%,0.32,12.7,1.15
4,45-54,Male,8.32,811,26,10,3.21%,1.23%,0.32,10.26,0.83


In [None]:
dem_data.groupby('Gender')['Impressions'].sum()

Unnamed: 0_level_0,Impressions
Gender,Unnamed: 1_level_1
Female,11367
Male,4020
Unknown,122


## Begin statistical analysis

In [None]:
from scipy import stats
from statsmodels.stats.proportion import proportions_ztest

In [None]:
#Use a 2-sample z proportion test to see if the amount of impressions is significantly different for women and men.

male_impressions = 4020  # Total number of male impressions
female_impressions = 11367  # Total number of female impressions
total_impressions = male_impressions + female_impressions  # Combined total impressions

# Count of male and female viewers
imp_counts = np.array([male_impressions, female_impressions])

# Number of groups (two groups: male and female)
impressions_n = np.array([total_impressions, total_impressions])  # Number of total impressions in each group

# Perform a z-test for proportions
stat_impressions, p_value_impressions = proportions_ztest(imp_counts, impressions_n)

# Print results

print(f"Z-statistic: {stat_impressions}")
print(f"P-value: {p_value_impressions}")

# Interpret the result
alpha = 0.05  # Significance level of 5%

if p_value_impressions < alpha:
    print("Reject the null hypothesis: There is a significant difference between male and female impressions.")
else:
    print("Fail to reject the null hypothesis: No significant difference between male and female impressions.")

Z-statistic: -83.76219716328238
P-value: 0.0
Reject the null hypothesis: There is a significant difference between male and female impressions.


In [None]:
dem_data.rename(columns={'Clicks (destination)': 'Clicks'}, inplace=True)

In [None]:
dem_data.groupby('Gender')['Clicks'].sum()

Unnamed: 0_level_0,Clicks
Gender,Unnamed: 1_level_1
Female,386
Male,143
Unknown,6


In [None]:
#Use a 2-sample z proportion test to see if the amount of clicks is significantly different for women and men.
male = 143  # Total number of male clicks
female = 386  # Total number of female clicks
total = male + female  # Combined total clicks

# Count of male and female clicks
counts = np.array([male, female])

# Number of groups (two groups: male and female)
n = np.array([total, total])  # Number of total clicks in each group

# Perform a z-test for proportions
stat, p_value = proportions_ztest(counts, n)

# Print results

print(f"Z-statistic: {stat}")
print(f"P-value: {p_value}")

# Interpret the result
alpha = 0.05  # Significance level of 5%

if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between male and female clicks.")
else:
    print("Fail to reject the null hypothesis: No significant difference between male and female clicks.")

Z-statistic: -14.941473724202698
P-value: 1.770166676730958e-50
Reject the null hypothesis: There is a significant difference between male and female clicks.


In [None]:
dem_data.groupby('Gender')['Conversions'].sum()

Unnamed: 0_level_0,Conversions
Gender,Unnamed: 1_level_1
Female,128
Male,47
Unknown,1


In [None]:
#Use a 2-sample z proportion test to see if the amount of conversions is significantly different for women and men.
male = 128  # Total number of male conversions
female = 47  # Total number of female conversions
total = male + female  # Combined total conversions

# Count of male and female viewers
counts = np.array([male, female])

# Number of groups (two groups: male and female)
n = np.array([total, total])  # Number of total conversions in each group

# Perform a z-test for proportions
stat, p_value = proportions_ztest(counts, n)

# Print results

print(f"Z-statistic: {stat}")
print(f"P-value: {p_value}")

# Interpret the result
alpha = 0.05  # Significance level of 5%

if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between male and female conversions.")
else:
    print("Fail to reject the null hypothesis: No significant difference between male and female conversions.")

Z-statistic: 8.65926423796255
P-value: 4.748190009149622e-18
Reject the null hypothesis: There is a significant difference between male and female conversions.


In [None]:
dem_data['CVR'] = dem_data['Conversions'] / dem_data['Impressions']

In [None]:
dem_data['CTR'] = dem_data['Clicks'] / dem_data['Impressions']

In [None]:
dem_data.groupby('Gender')[['Clicks','Conversions','Impressions']].sum()

Unnamed: 0_level_0,Clicks,Conversions,Impressions
Gender,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Female,386,128,11367
Male,143,47,4020
Unknown,6,1,122


In [None]:
# Use a z test to compare click through rates between male and females
# success counts and total viewers for CTR
male_successes = 143  # Number of clicks for males
female_successes = 386  # Number of clicks for females
male_total = 4020  # Total number of male views
female_total = 11367  # Total number of female views

# Counts of successes for male and female
successes = np.array([female_successes, male_successes])

# Total number of trials for male and female
n = np.array([female_total, male_total])

# Perform the z-test for proportions
z_stat, p_value = proportions_ztest(successes, n)

# Print results
print(f"Z-statistic: {z_stat}")
print(f"P-value: {p_value}")

# Interpretation of results
alpha = 0.05  # Industry standard significance level of 5%

if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between male and female click through rate.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between male and female click through rate.")
else:
    print("Fail to reject the null hypothesis: No significant difference between male and female click through rate.")


Z-statistic: -0.4827917089346444
P-value: 0.6292436383699052
Fail to reject the null hypothesis: No significant difference between male and female click through rate.


In [None]:
# Use a z test to compare conversion rates between male and females
# success counts and total viewers for CVR
male_successes = 47  # Number of successful male viewers
female_successes = 128  # Number of successful female viewers
male_total = 4020  # Total number of male viewers
female_total = 11367  # Total number of female viewers

# Counts of successes for male and female
successes = np.array([female_successes, male_successes])

# Total number of trials for male and female
n = np.array([female_total, male_total])

# Perform the z-test for proportions
z_stat, p_value = proportions_ztest(successes, n)

# Print results
print(f"Z-statistic: {z_stat}")
print(f"P-value: {p_value}")

# Interpretation of results
alpha = 0.05  # Industry standard significance level of 5%

if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between male and female conversion rates.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between male and female conversion rates.")
else:
    print("Fail to reject the null hypothesis: No significant difference between male and female conversion rates.")


Z-statistic: -0.2214381765792326
P-value: 0.8247512695325454
Fail to reject the null hypothesis: No significant difference between male and female conversion rates.


In [None]:
dem_data_CVR = (dem_data.groupby('Age')['Conversions'].sum() / dem_data.groupby('Age')['Impressions'].sum())

In [None]:
dem_data_CVR

Unnamed: 0_level_0,0
Age,Unnamed: 1_level_1
18-24,0.006645
25-34,0.01029
35-44,0.012475
45-54,0.013573
Unknown,0.0
≥55,0.013889


use table to compare top to bottom CVRs

In [None]:
dem_data_CTR = (dem_data.groupby('Age')['Clicks'].sum() / dem_data.groupby('Age')['Impressions'].sum())

In [None]:
dem_data_CTR

Unnamed: 0_level_0,0
Age,Unnamed: 1_level_1
18-24,0.02381
25-34,0.033727
35-44,0.036534
45-54,0.036317
Unknown,0.0
≥55,0.042484


In [None]:
dem_data.groupby('Age')[['Clicks','Conversions','Impressions']].sum()

Unnamed: 0_level_0,Clicks,Conversions,Impressions
Age,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
18-24,43,12,1806
25-34,177,54,5248
35-44,164,56,4489
45-54,99,37,2726
Unknown,0,0,16
≥55,52,17,1224


In [None]:
import scipy.stats as stats

# One-way ANOVA: Testing conversion rates across different Age Groups
age_groups = dem_data['Age']
conversion_rates = dem_data['CVR']

# Perform the one-way ANOVA
f_stat, p_value = stats.f_oneway(
    *[conversion_rates[age_groups == group] for group in dem_data['Age'].unique()]
)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Decision based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between age groups.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between age groups.")
else:
    print("Fail to reject the null hypothesis: No significant difference between age groups.")


F-statistic: 1.3055326293859983
P-value: 0.3356341254848338
Fail to reject the null hypothesis: No significant difference between age groups.


I have some concerns about the ANOVA test, as I feel there may be insufficient data. I believe there should be variability in conversion rate (CVR) across different age groups. I will perform a post-hoc test to analyze the pair-wise data.

In [None]:
# Perform post-hoc test using tukey HSD to see if any pairs have a p value of under 0.1 or close
from statsmodels.stats.multicomp import pairwise_tukeyhsd

# Perform Tukey's HSD test
tukey_results = pairwise_tukeyhsd(dem_data['CVR'], dem_data['Age'], alpha=0.05)

# Display Tukey's test results
print(tukey_results)

 Multiple Comparison of Means - Tukey HSD, FWER=0.05 
 group1  group2 meandiff p-adj   lower  upper  reject
-----------------------------------------------------
  18-24   25-34   0.0025 0.9999 -0.0382 0.0431  False
  18-24   35-44    0.003 0.9998 -0.0377 0.0436  False
  18-24   45-54   0.0247 0.3517 -0.0159 0.0654  False
  18-24 Unknown  -0.0048 0.9996 -0.0622 0.0527  False
  18-24     ≥55   0.0056 0.9958  -0.035 0.0463  False
  25-34   35-44   0.0005    1.0 -0.0401 0.0411  False
  25-34   45-54   0.0223 0.4516 -0.0184 0.0629  False
  25-34 Unknown  -0.0072 0.9973 -0.0647 0.0502  False
  25-34     ≥55   0.0032 0.9997 -0.0374 0.0438  False
  35-44   45-54   0.0218 0.4735 -0.0189 0.0624  False
  35-44 Unknown  -0.0077 0.9963 -0.0652 0.0497  False
  35-44     ≥55   0.0027 0.9999 -0.0379 0.0433  False
  45-54 Unknown  -0.0295 0.5145  -0.087 0.0279  False
  45-54     ≥55  -0.0191 0.5982 -0.0597 0.0215  False
Unknown     ≥55   0.0104 0.9859  -0.047 0.0679  False
----------------------------

In [None]:
import scipy.stats as stats

# One-way ANOVA: Testing conversion rates across different genders
gender_groups = dem_data['Gender']
conversion_rates = dem_data['CVR']

# Perform the one-way ANOVA
f_stat, p_value = stats.f_oneway(
    *[conversion_rates[gender_groups == group] for group in dem_data['Gender'].unique()]
)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Decision based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between genders.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between genders.")
else:
    print("Fail to reject the null hypothesis: No significant difference between genders.")


F-statistic: 0.0228046979830761
P-value: 0.9774923755205904
Fail to reject the null hypothesis: No significant difference between genders.


This was expected, as in earlier calculations I found there is no significant difference in CVR by gender using a 2 sample z-test. However, I will still perform the same post-hoc test for completion.

In [None]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
# Perform post-hoc test using tukey HSD to see if any pairs have a p value of under 0.1 or close

tukey_results = pairwise_tukeyhsd(dem_data['CVR'], dem_data['Gender'], alpha=0.05)

# Display Tukey's test results
print(tukey_results)

Multiple Comparison of Means - Tukey HSD, FWER=0.05 
group1  group2 meandiff p-adj   lower  upper  reject
----------------------------------------------------
Female    Male   0.0016 0.9871 -0.0254 0.0285  False
Female Unknown  -0.0005 0.9988 -0.0262 0.0253  False
  Male Unknown   -0.002 0.9767 -0.0278 0.0238  False
----------------------------------------------------


While I expected the outcome of the gender ANOVA test, I am still not confident in the age test. I am going to run a 2 way anova test for gender and age to see if that effects the outcome, and then a perform the post-hoc test using tukey's HSD to see what groups are different. If there are none, it may be due to having too litte data.

In [None]:
dem_data['Age'] = pd.Categorical(dem_data['Age'])
dem_data['Gender'] = pd.Categorical(dem_data['Gender'])


In [None]:
dem_data.groupby(['Age', 'Gender'])[['Clicks','Conversions','Impressions', 'CVR']].mean()

  dem_data.groupby(['Age', 'Gender'])[['Clicks','Conversions','Impressions', 'CVR']].mean()


Unnamed: 0_level_0,Unnamed: 1_level_0,Clicks,Conversions,Impressions,CVR
Age,Gender,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
18-24,Female,31.0,8.0,1302.0,0.006144
18-24,Male,12.0,4.0,489.0,0.00818
18-24,Unknown,0.0,0.0,15.0,0.0
25-34,Female,128.0,40.0,4026.0,0.009935
25-34,Male,48.0,14.0,1188.0,0.011785
25-34,Unknown,1.0,0.0,34.0,0.0
35-44,Female,126.0,45.0,3326.0,0.01353
35-44,Male,37.0,11.0,1136.0,0.009683
35-44,Unknown,1.0,0.0,27.0,0.0
45-54,Female,69.0,26.0,1899.0,0.013691


In [157]:
import pandas as pd
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# attempting a 2 way anova test
data = {
    'Age Group': ['18-24', '18-24', '25-34', '25-34', '35-44', '35-44', '45-54', '45-54', '55+', '55+'],
    'Gender': ['Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female'],
    'Conversion Rate': [0.8180, 0.6144, 0.11785, 0.9935, 0.9683, 1.353, 1.233, 1.3691, 2.0202, 1.1057]
}

df = pd.DataFrame(data)

# Convert to categorical variables
df['Age Group'] = pd.Categorical(df['Age Group'])
df['Gender'] = pd.Categorical(df['Gender'])

# Perform Two-Way ANOVA
model = ols('Q("Conversion Rate") ~ C(Q("Age Group")) + C(Gender) + C(Q("Age Group")):C(Gender)', data=df).fit()

# Get the ANOVA table
anova_table = anova_lm(model)

# Output the results
print(anova_table)

                              df        sum_sq   mean_sq    F  PR(>F)
C(Q("Age Group"))            4.0  1.387468e+00  0.346867  0.0     NaN
C(Gender)                    1.0  7.747872e-03  0.007748  0.0     NaN
C(Q("Age Group")):C(Gender)  4.0  8.977738e-01  0.224443  0.0     NaN
Residual                     0.0  5.423611e-30       inf  NaN     NaN


  (model.ssr / model.df_resid))


Since the ANOVA test returned a NAN for the p-value, the dataset is likely too small. To demonstrate for educational purposes, I will generate additional data and use the average conversion rates for each age group.

In [None]:
dem_data.groupby(['Age','Gender'])[['Clicks','Conversions','Impressions', 'CVR']].mean()

  dem_data.groupby(['Age','Gender'])[['Clicks','Conversions','Impressions', 'CVR']].mean()


Unnamed: 0_level_0,Unnamed: 1_level_0,Clicks,Conversions,Impressions,CVR
Age,Gender,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
18-24,Female,31.0,8.0,1302.0,0.006144
18-24,Male,12.0,4.0,489.0,0.00818
18-24,Unknown,0.0,0.0,15.0,0.0
25-34,Female,128.0,40.0,4026.0,0.009935
25-34,Male,48.0,14.0,1188.0,0.011785
25-34,Unknown,1.0,0.0,34.0,0.0
35-44,Female,126.0,45.0,3326.0,0.01353
35-44,Male,37.0,11.0,1136.0,0.009683
35-44,Unknown,1.0,0.0,27.0,0.0
45-54,Female,69.0,26.0,1899.0,0.013691


In [None]:
import pandas as pd
import numpy as np

# Set a seed for reproducibility
np.random.seed(42)

# Define the parameters for the simulation
n = 1000  # number of observations per group
age_groups = ['18-24', '25-34', '35-44', '45-54', '55+']
genders = ['Male', 'Female']

# Simulate the Age Group column (randomly assign age groups)
# Changed size to n to match the intended number of observations
age_group = np.random.choice(age_groups, size=n, replace=True)

# Simulate the Gender column (randomly assign gender)
# Changed size to n to match the intended number of observations
gender = np.random.choice(genders, size=n, replace=True)

# Simulate Conversion Rates
# Assuming conversion rates are between 0 and 1, we can simulate using a normal distribution
# For simplicity and to keep the results consistent with the data, I set the mean conversion rates for each gender and age group from the table
mean_conversion_rate = {
    'Male': [0.008180, .011785 , .009683, .012330, .020202],
    'Female': [.006144, 0.009935, .01353, .013691, .011057]
}

# Generate conversion rates based on gender and age group
conversion_rates = []
for a, g in zip(age_group, gender):
    # Get the mean for the corresponding age group and gender
    age_index = age_groups.index(a)
    conversion_rates.append(np.random.normal(loc=mean_conversion_rate[g][age_index], scale=0.05))  # small variation (std=0.05)

# Create the DataFrame
df = pd.DataFrame({
    'Age Group': age_group,
    'Gender': gender,
    'Conversion Rate': conversion_rates
})

print(df.head())


  Age Group  Gender  Conversion Rate
0     45-54  Female         0.083659
1       55+  Female         0.057289
2     35-44  Female         0.016512
3       55+  Female        -0.021290
4       55+  Female         0.045968


Now, I am going to compare the original data set with the new data set to make sure it seems close enough to the original rates.

In [None]:
df.groupby(['Gender', 'Age Group'])[['Conversion Rate']].mean()

Unnamed: 0_level_0,Unnamed: 1_level_0,Conversion Rate
Gender,Age Group,Unnamed: 2_level_1
Female,18-24,0.005111
Female,25-34,0.010563
Female,35-44,0.012082
Female,45-54,0.020928
Female,55+,0.019357
Male,18-24,0.014963
Male,25-34,0.01616
Male,35-44,0.013566
Male,45-54,0.012667
Male,55+,0.025944


In [None]:
dem_data.groupby(['Gender','Age'])['CVR'].mean()

  dem_data.groupby(['Gender','Age'])['CVR'].mean()


Unnamed: 0_level_0,Unnamed: 1_level_0,CVR
Gender,Age,Unnamed: 2_level_1
Female,18-24,0.006144
Female,25-34,0.009935
Female,35-44,0.01353
Female,45-54,0.013691
Female,Unknown,
Female,≥55,0.011057
Male,18-24,0.00818
Male,25-34,0.011785
Male,35-44,0.009683
Male,45-54,0.01233


While different, it is still close enough for this purpose for me to use.

In [156]:
import statsmodels.api as sm
from statsmodels.formula.api import ols
from statsmodels.stats.anova import anova_lm

# Perform Two-Way ANOVA
model = ols('Q("Conversion Rate") ~ C(Q("Age Group")) + C(Gender) + C(Q("Age Group")):C(Gender)', data=df).fit()

# Get the ANOVA table
anova_table = anova_lm(model)

# Output the results
print(anova_table)

                                df    sum_sq   mean_sq         F    PR(>F)
C(Q("Age Group"))              4.0  0.018129  0.004532  1.814359  0.123820
C(Gender)                      1.0  0.002320  0.002320  0.928824  0.335404
C(Q("Age Group")):C(Gender)    4.0  0.010051  0.002513  1.005966  0.403338
Residual                     990.0  2.472953  0.002498       NaN       NaN


Now that there is enough data the p valuen is no longer NAN and can be used for evaluation.
- Age Group: The p-value is bigger than both 0.05 and 0.1, therefore I fail to reject the null hypothesis and cannot claim that any age group is different.

- Gender: The p value is bigger than both 0.05 and 0.1, so I fail to reject the null hypothesis and conclude that Gender does not significantly affect the conversion rate.

- Interaction (Age Group x Gender): The p-value for the interaction term is greater than 0.05 and 0.1, so I fail to reject the null hypothesis and conclude that there is no significant interaction between Age Group and Gender in affecting conversion rates.

Since this is not the real data, it is not a perfect representation of the data, but it can be considered for the case of this project.

Now I will conduct the post-hoc tukey test to see if any individual groups are close enough (around 0.1) to be considered statistically different.

In [None]:
from statsmodels.stats.multicomp import pairwise_tukeyhsd

In [None]:
tukey_age = pairwise_tukeyhsd(df['Conversion Rate'], df['Age Group'])
print(tukey_age)

# Tukey's HSD test for 'Gender' main effect
tukey_gender = pairwise_tukeyhsd(df['Conversion Rate'], df['Gender'])
print(tukey_gender)

# Tukey's HSD test for 'Age Group' and 'Gender' interaction
# First, you need to create a new factor that combines both 'Age Group' and 'Gender'
df['Age_Gender'] = df['Age Group'] + ' - ' + df['Gender']
tukey_interaction = pairwise_tukeyhsd(df['Conversion Rate'], df['Age_Gender'])
print(tukey_interaction)

Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj   lower  upper  reject
---------------------------------------------------
 18-24  25-34   0.0033 0.9663 -0.0104 0.0169  False
 18-24  35-44   0.0024 0.9891 -0.0113 0.0161  False
 18-24  45-54   0.0062 0.7088 -0.0072 0.0196  False
 18-24    55+   0.0123 0.0928 -0.0012 0.0257  False
 25-34  35-44  -0.0009 0.9998 -0.0149 0.0132  False
 25-34  45-54    0.003 0.9765 -0.0108 0.0167  False
 25-34    55+    0.009 0.3832 -0.0048 0.0228  False
 35-44  45-54   0.0038 0.9415 -0.0099 0.0176  False
 35-44    55+   0.0098 0.2893 -0.0039 0.0236  False
 45-54    55+    0.006 0.7399 -0.0075 0.0195  False
---------------------------------------------------
Multiple Comparison of Means - Tukey HSD, FWER=0.05
group1 group2 meandiff p-adj   lower  upper  reject
---------------------------------------------------
Female   Male   0.0029 0.3576 -0.0033 0.0091  False
---------------------------------------------------
        Mult

In [None]:
tukey_age.pvalues < 0.1

array([False, False, False,  True, False, False, False, False, False,
       False])

In [None]:
tukey_age.pvalues[3]

0.09278272471764437

This could be considered as different or to do further investigation into since the p value is smaller than 0.1 which is sometimes accepted. The pair that has this p value is 18-24 and 55+, meaning the lowest and oldest age groups might have some statistical difference. This makes sense, as older age groups are more likely to have money to spend on entertainment that younger viewers.

In [None]:
tukey_interaction.pvalues < 0.1

array([False, False, False, False, False, False, False, False,  True,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False])

In [None]:
tukey_interaction.pvalues[8]

0.09619245830318379

When examining the combined evaluation for gender and age, the smallest p-value— and the only one under 0.1— is found between the 18-24 female group and the 55+ male group. This result mirrors the pattern seen in the age-only evaluation, but with the added insight that the largest gap occurs specifically between 18-24-year-old females and 55+ males. This suggests that male viewers over 55+ are more likely to convert than women age 18-24, and possibly the other groups.

## Time Analysis

In [None]:
time_week1.head()

Unnamed: 0,Time,Cost,Cost per result,CPC (destination),Impressions,CTR (destination),Clicks (destination),Result rate,Results,Cost per conversion,Conversion rate (CVR),Conversions,Reach,"Cost per 1,000 people reached",Frequency,Currency
0,2024-05-21 16:00 - 16:59,0.0,-,0.0,0,0.00%,0,-,-,0.0,0.00%,0,0,0.0,0.0,USD
1,2024-05-21 10:00 - 10:59,0.0,-,0.0,1,0.00%,0,-,-,0.0,0.00%,0,1,0.0,1.0,USD
2,2024-05-21 06:00 - 06:59,0.0,-,0.0,1,0.00%,0,-,-,0.0,0.00%,0,1,0.0,1.0,USD
3,2024-05-20 19:00 - 19:59,0.0,-,0.0,0,0.00%,0,-,-,0.0,0.00%,0,0,0.0,0.0,USD
4,2024-05-20 18:00 - 18:59,0.0,-,0.0,1,100.00%,1,-,-,0.0,0.00%,0,1,0.0,1.0,USD


In [102]:
time_week1 = pd.read_csv('tiktok_time_week1_ff.csv')

In [103]:
time_week1.head()

Unnamed: 0,Date,Hour,Impressions,CTR (destination),Clicks (destination),Result rate,Results,Cost per conversion,Conversion rate (CVR),Conversions,Reach,"Cost per 1,000 people reached",Frequency,Currency
0,2024-05-21,16:00,0,0.00%,0,-,-,0.0,0.00%,0,0,0.0,0.0,USD
1,2024-05-21,10:00,1,0.00%,0,-,-,0.0,0.00%,0,1,0.0,1.0,USD
2,2024-05-21,6:00,1,0.00%,0,-,-,0.0,0.00%,0,1,0.0,1.0,USD
3,2024-05-20,19:00,0,0.00%,0,-,-,0.0,0.00%,0,0,0.0,0.0,USD
4,2024-05-20,18:00,1,100.00%,1,-,-,0.0,0.00%,0,1,0.0,1.0,USD


In [106]:
time_week1.rename(columns={'Clicks (destination)': 'Clicks'}, inplace=True)

In [109]:
time_week2.head()

Unnamed: 0,Date,Cost,Cost per result,CPC (destination),Impressions,CTR (destination),Clicks,Result rate,Results,Cost per conversion,Conversion rate (CVR),Conversions,Reach,"Cost per 1,000 people reached",Frequency,Currency
0,2024-06-28,0.0,-,0.0,1,0.00%,0,-,-,0.0,0.00%,0,1,0.0,1.0,USD
1,2024-06-27,3.3,-,0.25,571,2.28%,13,-,-,1.65,0.35%,2,564,5.85,1.01,USD
2,2024-06-26,21.67,-,0.19,3956,2.93%,116,-,-,0.8,0.68%,27,3692,5.87,1.07,USD
3,2024-06-25,21.14,-,0.22,3383,2.78%,94,-,-,0.62,1.01%,34,3034,6.97,1.12,USD
4,2024-06-24,13.89,-,0.18,1698,4.59%,78,-,-,0.41,2.00%,34,1599,8.69,1.06,USD


In [105]:
time_week2.rename(columns={'By Day': 'Date'}, inplace=True)

In [108]:
time_week2.rename(columns={'Clicks (destination)': 'Clicks'}, inplace=True)

In [120]:
# Convert the 'Date' column to datetime
time_week1['Date'] = pd.to_datetime(time_week1['Date'])

# Extract the weekday name (e.g., Monday, Tuesday, etc.)
time_week1['Weekday'] = time_week1['Date'].dt.strftime('%A')

In [127]:
time_week2['Date'] = pd.to_datetime(time_week2['Date'])

time_week2['Weekday'] = time_week2['Date'].dt.strftime('%A')

In [121]:
time_week1.head()

Unnamed: 0,Date,Hour,Impressions,CTR (destination),Clicks,Result rate,Results,Cost per conversion,Conversion rate (CVR),Conversions,Reach,"Cost per 1,000 people reached",Frequency,Currency,Weekday
0,2024-05-21,16:00,0,0.00%,0,-,-,0.0,0.00%,0,0,0.0,0.0,USD,Tuesday
1,2024-05-21,10:00,1,0.00%,0,-,-,0.0,0.00%,0,1,0.0,1.0,USD,Tuesday
2,2024-05-21,6:00,1,0.00%,0,-,-,0.0,0.00%,0,1,0.0,1.0,USD,Tuesday
3,2024-05-20,19:00,0,0.00%,0,-,-,0.0,0.00%,0,0,0.0,0.0,USD,Monday
4,2024-05-20,18:00,1,100.00%,1,-,-,0.0,0.00%,0,1,0.0,1.0,USD,Monday


In [122]:
time_week1.groupby('Weekday')['Impressions'].sum()

Unnamed: 0_level_0,Impressions
Weekday,Unnamed: 1_level_1
Friday,662
Monday,364
Saturday,1679
Sunday,3188
Tuesday,2


I will run an ANOVA tests to see if there is a variance in impressions by weekday.

In [123]:
import scipy.stats as stats

# One-way ANOVA: Testing impressions across different days
day = time_week1['Weekday']
impressions = time_week1['Impressions']

# Perform the one-way ANOVA
f_stat, p_value = stats.f_oneway(
    *[impressions[day == group] for group in time_week1['Weekday'].unique()]
)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Decision based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the impressions for different days.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between the impressions for different days.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the impressions for different days.")


F-statistic: 25.123496883229897
P-value: 7.345207049900165e-13
Reject the null hypothesis: There is a significant difference between the impressions for different days.


In [126]:
# Perform post-hoc test using tukey HSD to see if any pairs have a p value of under 0.1 or close

tukey_results = pairwise_tukeyhsd(time_week1['Impressions'], time_week1['Weekday'], alpha=0.05)

# Display Tukey's test results
print(tukey_results)

    Multiple Comparison of Means - Tukey HSD, FWER=0.05     
 group1   group2   meandiff p-adj    lower    upper   reject
------------------------------------------------------------
  Friday   Monday  -88.9216 0.0001 -140.6812 -37.1619   True
  Friday Saturday   -40.375 0.1661  -90.1266   9.3766  False
  Friday   Sunday      22.5 0.7122  -27.2516  72.2516  False
  Friday  Tuesday -109.6667 0.0015 -186.7415 -32.5918   True
  Monday Saturday   48.5466 0.0018   13.9933  83.0999   True
  Monday   Sunday  111.4216    0.0   76.8683 145.9749   True
  Monday  Tuesday  -20.7451 0.9134  -89.0037  47.5135  False
Saturday   Sunday    62.875    0.0   31.4093  94.3407   True
Saturday  Tuesday  -69.2917  0.038 -136.0404  -2.5429   True
  Sunday  Tuesday -132.1667    0.0 -198.9154 -65.4179   True
------------------------------------------------------------


It can be seen that the date pairs friday and monday, friday and tuesday, monday and saturday, monday and sunday, saturday and sunday, saturday and tuesday and sunday and tuesday are all significantly different using an alpha of 0.5. From the data, I know Sunday is the leading day, which is why it has so many pair-wise differences.

In [128]:
time_week1.groupby('Weekday')['Clicks'].sum()

Unnamed: 0_level_0,Clicks
Weekday,Unnamed: 1_level_1
Friday,21
Monday,10
Saturday,67
Sunday,136
Tuesday,0


In [133]:
import scipy.stats as stats

# One-way ANOVA: Testing clicks across different days
day = time_week1['Weekday']
clicks = time_week1['Clicks']

# Perform the one-way ANOVA
f_stat, p_value = stats.f_oneway(
    *[clicks[day == group] for group in time_week1['Weekday'].unique()]
)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Decision based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the clicks for different days.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between the clicks for different days.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the clicks for different days.")


F-statistic: 12.785302299505613
P-value: 7.69886663469155e-08
Reject the null hypothesis: There is a significant difference between the clicks for different days.


In [132]:
# Perform post-hoc test using tukey HSD to see if any pairs have a p value of under 0.1 or close

tukey_results = pairwise_tukeyhsd(time_week1['Clicks'], time_week1['Weekday'], alpha=0.05)

# Display Tukey's test results
print(tukey_results)

  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1   group2  meandiff p-adj   lower   upper  reject
--------------------------------------------------------
  Friday   Monday  -2.9118 0.0928 -6.1181  0.2946  False
  Friday Saturday  -0.7083 0.9672 -3.7903  2.3736  False
  Friday   Sunday   2.1667 0.2918 -0.9153  5.2486  False
  Friday  Tuesday     -3.5 0.2522 -8.2745  1.2745  False
  Monday Saturday   2.2034 0.0405   0.063  4.3439   True
  Monday   Sunday   5.0784    0.0   2.938  7.2189   True
  Monday  Tuesday  -0.5882  0.995 -4.8166  3.6402  False
Saturday   Sunday    2.875 0.0009  0.9258  4.8242   True
Saturday  Tuesday  -2.7917 0.3317 -6.9265  1.3432  False
  Sunday  Tuesday  -5.6667 0.0024 -9.8015 -1.5318   True
--------------------------------------------------------


It can be seen that the day friday and saturday, friday and sunday, friday and tuesday, monday and tuesday, and saturday and tuesday are the only ones not significantly different. From the data, I know Sunday is the leading day, which is why it has so many pair-wise differences. Also, Friday is the second leading day, which is why Friday and Sunday are not significantly different.

In [115]:
time_week1.groupby('Date')['Conversions'].sum()

Unnamed: 0_level_0,Conversions
Date,Unnamed: 1_level_1
2024-05-17,11
2024-05-18,28
2024-05-19,37
2024-05-20,3
2024-05-21,0


In [139]:
import scipy.stats as stats

# One-way ANOVA: Testing conversions across different days
day = time_week1['Weekday']
conv = time_week1['Conversions']

# Perform the one-way ANOVA
f_stat, p_value = stats.f_oneway(
    *[conv[day == group] for group in time_week1['Weekday'].unique()]
)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Decision based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the conversions for different days.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between the conversions for different days.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the conversions for different days.")


F-statistic: 4.218264636299403
P-value: 0.004103409418643862
Reject the null hypothesis: There is a significant difference between the conversions for different days.


In [136]:
# Perform post-hoc test using tukey HSD to see if any pairs have a p value of under 0.1 or close

tukey_results = pairwise_tukeyhsd(time_week1['Conversions'], time_week1['Weekday'], alpha=0.05)

# Display Tukey's test results
print(tukey_results)

  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
 group1   group2  meandiff p-adj   lower   upper  reject
--------------------------------------------------------
  Friday   Monday  -1.6569 0.0492 -3.3102 -0.0035   True
  Friday Saturday  -0.6667 0.7654 -2.2558  0.9225  False
  Friday   Sunday  -0.2917 0.9857 -1.8808  1.2975  False
  Friday  Tuesday  -1.8333 0.2379 -4.2953  0.6286  False
  Monday Saturday   0.9902 0.0995 -0.1135  2.0939  False
  Monday   Sunday   1.3652 0.0079  0.2615  2.4689   True
  Monday  Tuesday  -0.1765 0.9994 -2.3568  2.0039  False
Saturday   Sunday    0.375 0.8334 -0.6301  1.3801  False
Saturday  Tuesday  -1.1667 0.5451 -3.2988  0.9654  False
  Sunday  Tuesday  -1.5417  0.265 -3.6738  0.5904  False
--------------------------------------------------------


The conversion rates by weekday were more alike than both the impressions and clicks.

In [137]:
time_week2.groupby('Weekday')['Impressions'].sum()

Unnamed: 0_level_0,Impressions
Weekday,Unnamed: 1_level_1
Friday,1
Monday,1698
Thursday,571
Tuesday,3383
Wednesday,3956


In [160]:
import scipy.stats as stats

# One-way ANOVA: Testing impressions across different days
day = time_week2['Weekday']
impressions = time_week2['Impressions']

# Perform the one-way ANOVA
f_stat, p_value = stats.f_oneway(
    *[impressions[day == group] for group in time_week2['Weekday'].unique()]
)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Decision based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the impressions for different days.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between the impressions for different days.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the impressions for different days.")


F-statistic: nan
P-value: nan
Fail to reject the null hypothesis: No significant difference between the impressions for different days.


  if _f_oneway_is_too_small(samples):


In [142]:
# Perform post-hoc test using tukey HSD to see if any pairs have a p value of under 0.1 or close

tukey_results = pairwise_tukeyhsd(time_week2['Conversions'], time_week2['Weekday'], alpha=0.05)

# Display Tukey's test results
print(tukey_results)

Multiple Comparison of Means - Tukey HSD, FWER=0.05 
 group1    group2  meandiff p-adj lower upper reject
----------------------------------------------------
  Friday    Monday     34.0   nan   nan   nan  False
  Friday  Thursday      2.0   nan   nan   nan  False
  Friday   Tuesday     34.0   nan   nan   nan  False
  Friday Wednesday     27.0   nan   nan   nan  False
  Monday  Thursday    -32.0   nan   nan   nan  False
  Monday   Tuesday      0.0   nan   nan   nan  False
  Monday Wednesday     -7.0   nan   nan   nan  False
Thursday   Tuesday     32.0   nan   nan   nan  False
Thursday Wednesday     25.0   nan   nan   nan  False
 Tuesday Wednesday     -7.0   nan   nan   nan  False
----------------------------------------------------


  return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
  ret = ret.dtype.type(ret / rcount)


I am unable to do a one way ANOVA test, most likely because there is not enough data gathered for week 2, as I was not provided time data for week 2. Therefore, I will do a manual analysis for the sake of the project.

In [147]:
time_week2.groupby('Weekday')['Impressions'].sum().sort_values(ascending=False)

Unnamed: 0_level_0,Impressions
Weekday,Unnamed: 1_level_1
Wednesday,3956
Tuesday,3383
Monday,1698
Thursday,571
Friday,1


There is clearly not much difference between Wednesday and Tuesday, but Monday, had an almost 50% drop, with Thursday having around an 85% drop, which is safe to say in this case that they are different.

In [148]:
time_week2.groupby('Weekday')['Clicks'].sum().sort_values(ascending=False)

Unnamed: 0_level_0,Clicks
Weekday,Unnamed: 1_level_1
Wednesday,116
Tuesday,94
Monday,78
Thursday,13
Friday,0


Again, Wednesday and Tuesday are very close, and this time Monday is not that different either, with only about a 30% change from Wednesday and a 17% decrease from Tuesday. However, Thursday and Friday are low.

In [149]:
time_week2.groupby('Weekday')['Conversions'].sum().sort_values(ascending=False)

Unnamed: 0_level_0,Conversions
Weekday,Unnamed: 1_level_1
Monday,34
Tuesday,34
Wednesday,27
Thursday,2
Friday,0


Finally, Monday, Tuesday, and Wednesday recieved relatively the same amount of conversions, but Thursday and Friday recieved a significant amount less.

In [152]:
import scipy.stats as stats

# One-way ANOVA: Testing impressions across different days
hour = time_week1['Hour']
impressions = time_week1['Impressions']

# Perform the one-way ANOVA
f_stat, p_value = stats.f_oneway(
    *[impressions[hour == group] for group in time_week1['Hour'].unique()]
)

# Output the results
print(f"F-statistic: {f_stat}")
print(f"P-value: {p_value}")

# Decision based on p-value
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: There is a significant difference between the impressions for different times.")
elif p_value < 0.1:
    print("Moderate evidence against the null hypothesis: There is a moderate difference between the impressions for different times.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the impressions for different times.")


F-statistic: 0.6366794490234077
P-value: 0.8801486163223359
Fail to reject the null hypothesis: No significant difference between the impressions for different times.


In [154]:
# Perform post-hoc test using tukey HSD to see if any pairs have a p value of under 0.1 or close

tukey_results = pairwise_tukeyhsd(time_week1['Impressions'], time_week1['Hour'], alpha=0.05)

# Display Tukey's test results
print(tukey_results)

  Multiple Comparison of Means - Tukey HSD, FWER=0.05   
group1 group2  meandiff p-adj    lower    upper   reject
--------------------------------------------------------
  0:00  10:00    0.6667    1.0 -197.1773 198.5107  False
  0:00  11:00      -6.0    1.0  -203.844  191.844  False
  0:00  12:00       5.0    1.0  -192.844  202.844  False
  0:00  13:00    4.6667    1.0 -193.1773 202.5107  False
  0:00  14:00    0.3333    1.0 -197.5107 198.1773  False
  0:00  15:00   10.8333    1.0  -210.363 232.0296  False
  0:00  16:00   -9.9167    1.0 -194.9828 175.1494  False
  0:00  17:00      19.0    1.0  -178.844  216.844  False
  0:00  18:00    0.8333    1.0 -184.2328 185.8994  False
  0:00  19:00    8.5833    1.0 -176.4828 193.6494  False
  0:00   1:00  -10.3333    1.0 -208.1773 187.5107  False
  0:00  20:00  114.6667 0.8369  -83.1773 312.5107  False
  0:00  21:00      37.0    1.0  -160.844  234.844  False
  0:00  22:00      43.0    1.0  -154.844  240.844  False
  0:00  23:00   18.6667    1.0 

In [155]:
tukey_results.pvalues < 0.1

array([False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False, False, False, False, False, False, False, False,
       False, False,

In [162]:
time_week1.groupby('Hour')['Impressions'].sum().sort_values(ascending=False).head(10)

Unnamed: 0_level_0,Impressions
Hour,Unnamed: 1_level_1
20:00,565
22:00,350
21:00,332
19:00,329
18:00,298
17:00,278
23:00,277
16:00,255
9:00,243
12:00,236


# Results

## Demographics

In this analysis, I explored the relationship between user demographics (gender and age) and key performance indicators (KPIs) such as impressions, clicks, and conversions. The goal was to understand if there were statistically significant differences between men and women in terms of these metrics and if age had any impact on conversion and click-through rates (CTR).

To achieve this, I applied a series of hypothesis tests and analysis of variance (ANOVA) techniques, including:

1. **Hypothesis Test for Gender Differences**: I performed a hypothesis test to determine whether women had more impressions, clicks, and conversions than men. Using the resultant p-value and an alpha of 0.05, women have a statisitically different amount of impressions and clicks. However, there is no difference in conversions between men and women.
  
2. **2-Way Z-Test for Conversion (CVR) and Click Rates (CTR)**: A two-way z-test was used to investigate conversion and click rates by gender. Using an alpha of 0.05 again, it was found that there is no significant difference between CVR or CTR for men and women. Even though women were the majority of impressions and clicks, they both had the same overall CVR and CTR.
  
3. **One-Way ANOVA for Age Effect**: I conducted a one-way ANOVA to assess whether age influenced conversion rates, as it was revealed gender does not. The test revealed that age had no significant effect on CVR.

4. **One-Way ANOVA for Gender and KPIs**: A one-way ANOVA was also performed to see if gender had an effect on conversion rates. The test revealed that gender had no significant effect on CVR, which can be seen in the z-test as well.

5. **Two-Way ANOVA with Post-Hoc Tukey Test**: To refine the analysis, a two-way ANOVA with a post-hoc Tukey test was applied to assess the effect of both age and gender on conversion rates. Using a 0.05 value for alpha, there was no significant difference between any of the categories. However, upon further analysis in the post-hoc tukey test, it was found that the following pairs had a p value of less than 0.1, which is also a valid alpha choice.
  - Age groups 18-25 and 55+
  - Women in the age group 18-25 and men in the age group 55+

 This suggests that older age groups, and older men specifically, are more likely to convert than the younger age groups, young women specifically. This makes sense, as older age groups are more likely to have money to spend on entertainment than the younger age groups.

This series of analyses provides valuable insights into how user demographics may affect conversion behaviors and helps identify actionable trends for improving performance across different user groups.

## Time Analysis

 For this analysis, I examined data from an advertising campaign that was run over the course of two different weeks.

 **Week 1**, I performed a one-way ANOVA test to determine if there were significant differences in impressions, clicks, and conversions across different days of the week. The results revealed a significant difference, with Sunday leading in all three metrics, followed by Saturday. To further explore the differences between the days, I used a post-hoc Tukey test, which compared all the days in pairs.

**Week 2**, I found that the data was insufficient to run an ANOVA test, so I evaluated the trends manually. I observed that Wednesday and Tuesday outperformed the other days in impressions, clicks, and conversions. Monday lagged behind in impressions and clicks but was relatively consistent with other days in terms of conversions. Thursday and Friday consistently had low performance across all metrics.

Lastly, I conducted an ANOVA test for impressions by hour of the day for week 1. The results showed no significant variance between hours, even with a p-value < 0.1. However, I did identify that the top 5 times for impressions were consistently around 300 impressions, with the peak time being over 500 impressions. The top 5 times, in order, were: 8 PM, 10 PM, 9 PM, 7 PM, and 6 PM—all occurring during the evening when people are generally more available after work and school.