<a href="https://colab.research.google.com/github/DataSparkAJ/marketing-campaign-impact-analysis/blob/main/Statistics_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Business Problem

Did the marketing campaign increase daily sales?

# Data Generation
This dataset simulates daily sales for 60 days:

30 days before the marketing campaign and 30 days after.
Sales after the campaign are generated with a slightly higher average to represent the expected improvement due to marketing.

In [7]:
import numpy as np
import pandas as pd

np.random.seed(42)

# Generate sales before campaign
before_sales = np.random.normal(loc=2000, scale=200, size=30)

# Generate sales after campaign (slightly higher)
after_sales = np.random.normal(loc=2200, scale=220, size=30)

data = pd.DataFrame({
    "Day": range(1, 61),
    "Sales": np.concatenate([before_sales, after_sales]),
    "Campaign": ["Before"]*30 + ["After"]*30
})

data.head(), data.tail()


(   Day        Sales Campaign
 0    1  2099.342831   Before
 1    2  1972.347140   Before
 2    3  2129.537708   Before
 3    4  2304.605971   Before
 4    5  1953.169325   Before,
     Day        Sales Campaign
 55   56  2404.881626    After
 56   57  2015.372145    After
 57   58  2131.973277    After
 58   59  2272.877955    After
 59   60  2414.619928    After)

#Exploratory Analysis
Show:

* Mean

* Std dev

* Grouped summary

In [3]:
data.groupby("Campaign")["Sales"].describe()


Unnamed: 0_level_0,count,mean,std,min,25%,50%,75%,max
Campaign,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
After,30.0,2173.344257,204.842484,1768.872573,2043.994996,2185.793994,2319.825594,2607.501201
Before,30.0,1962.370621,180.001285,1617.343951,1881.78982,1953.170967,2072.067069,2315.842563


Interpretation:

Before the marketing campaign, the average daily sales were approximately ₹1,962, with a standard deviation of ₹180, indicating moderate variability in daily performance.
After the campaign, the average daily sales increased to approximately ₹2,173, with a standard deviation of ₹205.

This shows an initial indication that sales improved after the campaign, but statistical testing is required to confirm whether this difference is significant.

#Hypothesis Testing
Show:

* H₀ and H₁

* t-test

* p-value

* Decision

Hypotheses:

H₀ (Null Hypothesis): The marketing campaign did not change the average daily sales.

H₁ (Alternative Hypothesis): The marketing campaign increased the average daily sales.

In [4]:
from scipy import stats

before = data[data["Campaign"]=="Before"]["Sales"]
after = data[data["Campaign"]=="After"]["Sales"]

t_stat, p_value = stats.ttest_ind(after, before)

t_stat, p_value


(np.float64(4.237566032679272), np.float64(8.195104009397376e-05))

Interpretation:

The two-sample t-test produced a t-statistic of approximately 4.24 and a p-value of less than 0.001.
Since the p-value is far below the 0.05 significance level, we reject the null hypothesis.

This provides strong statistical evidence that the marketing campaign led to a real increase in daily sales.

#Confidence Interval
Show:

* Mean difference

* CI range

* Interpretation

In [5]:
import numpy as np

mean_diff = after.mean() - before.mean()

std_diff = np.sqrt((before.var()/len(before)) + (after.var()/len(after)))
ci_low = mean_diff - 1.96 * std_diff
ci_high = mean_diff + 1.96 * std_diff

mean_diff, ci_low, ci_high


(np.float64(210.9736357064196),
 np.float64(113.3920705353549),
 np.float64(308.5552008774843))

Interpretation:

The campaign increased daily sales by an average of ₹210.97.
The 95% confidence interval ranges from ₹113.39 to ₹308.56, meaning we are 95% confident that the true increase in sales due to the campaign lies within this range.

Since the entire interval is above zero, the campaign had a statistically meaningful positive impact.

#Regression Analysis

Show:

* Trend

* Slope

* R²

In [6]:
from sklearn.linear_model import LinearRegression
import numpy as np

data["Day_Num"] = data["Day"]

X = data[["Day_Num"]]
y = data["Sales"]

model = LinearRegression()
model.fit(X, y)

slope = model.coef_[0]
intercept = model.intercept_
r2 = model.score(X, y)

slope, intercept, r2


(np.float64(4.837917156592799),
 np.float64(1920.300965406922),
 0.14913734982091365)

Interpretation:

The regression model shows a positive sales trend of approximately ₹4.84 per day, indicating that sales increased over time.
The R² value of 0.15 suggests that about 15% of the variation in sales is explained by time, meaning daily sales are also influenced by other business factors such as customer demand, promotions, and seasonality.

#Final Conclusion:
This analysis demonstrates that the marketing campaign had a statistically significant and positive effect on sales.
On average, daily sales increased by approximately ₹211, with the true increase estimated to lie between ₹113 and ₹309 per day.
A t-test confirmed this improvement is highly significant (p < 0.001), and regression analysis showed a positive sales trend over time.

Based on this evidence, the marketing campaign can be considered successful and worth continuing.