## Problem Statement

A company has recently implemented a new marketing campaign for one of its products. The company wants to assess if this campaign has significantly increased the product's average monthly sales by more than 15%.
To evaluate the impact of this campaign, the company has compiled a sample dataset named **"monthly_sales_data.csv"**. It contains the following columns:

- **product_id:** A unique identifier for each product.
- **sales_increase_pct:** The percentage increase in monthly sales for each product as a result of the new marketing campaign.


The primary goal of the analysis is to determine whether this campaign increased the product's average monthly sales by more than 15%.


In [19]:
#given population parameters

population_mean = 15  #(This implies that before the new campaign, the average increase in sales was around 12%)
population_std_dev = 5  #(variability)

**Import Necessary Libraries**

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

### Task1: Data Import

1. Import the data from the "monthly_sales_data.csv" file.
2. display the number of rows and columns. 
3. Display the first few rows of the dataset to get an overview.


In [2]:
df = pd.read_csv("../datasets/monthly_sales_data.csv")
df

Unnamed: 0,product_id,sales_increase_pct
0,P0001,19.23
1,P0002,25.47
2,P0003,19.16
3,P0004,17.77
4,P0005,11.35
...,...,...
95,P0096,18.52
96,P0097,19.75
97,P0098,16.45
98,P0099,12.95


### Task2: Define Hypotheses

- State the null and alternative hypotheses based on the given scenario.

null -> monthly sales are increased by less than 15%
h1 -> monthly sales are increased by more than 15%

### Task3: Calculate Test Statistics

- Compute the sample mean of cost_reduction_pct.
- Determine the sample size.
- Calculate the standard error using the provided population standard deviation.
- Compute the Z-score for the test statistic

In [3]:
#1. sample mean of cost_reduction_pct
sample_mean = df["sales_increase_pct"].mean()
sample_mean

15.4845

In [4]:
#2. sample size
sample_size = df.shape[0]
sample_size

100

In [8]:
#3. standard error
standard_error = population_std_dev / np.sqrt(sample_size)
standard_error

0.5

In [20]:
#4. z_score
z_score = (sample_mean - population_mean) / standard_error
z_score

0.9690000000000012

### Task4: Calculate the P-Value

- Set the significance level (e.g., alpha = 0.05).
- Calculate the p-value associated with the obtained z-score.

In [21]:
#defining significance level
alpha = 0.05

In [22]:
#additional check with z_critical
z_critical = stats.norm.ppf(1-alpha)
z_critical

1.6448536269514722

In [23]:
z_score > z_critical

False

In [24]:
#p-value
p_value = 1 - stats.norm.cdf(z_score)
p_value

0.16627259458894816

### Task5: Decision Making

- Compare the calculated p-value with the alpha.
- Decide whether to reject or fail to reject the null hypothesis.
- Write a conclusion summarizing the findings.

 Final Conclusion
Sample mean: 15.48

Hypothesized mean (H₀): 15%

Z-score: 0.97

p-value: 0.16

Significance level (α): 0.05

Since the p-value (0.16) is greater than the significance level (0.05), we fail to reject the null hypothesis.

There is not enough statistical evidence to conclude that the marketing campaign has increased the average monthly sales by more than 15%.

This outcome was also confirmed by comparing the z-score to the critical z-value.