## Problem Statement

A company has recently implemented a new marketing campaign for one of its products. The company wants to assess if this campaign has significantly increased the product's average monthly sales by more than 15%.
To evaluate the impact of this campaign, the company has compiled a sample dataset named **"monthly_sales_data.csv"**. It contains the following columns:

- **product_id:** A unique identifier for each product.
- **sales_increase_pct:** The percentage increase in monthly sales for each product as a result of the new marketing campaign.


The primary goal of the analysis is to determine whether this campaign increased the product's average monthly sales by more than 15%.


In [6]:
#given population parameters

population_mean = 12  #(This implies that before the new campaign, the average increase in sales was around 12%)
population_std_dev = 5  #(variability)

**Import Necessary Libraries**

In [1]:
import pandas as pd
import numpy as np
from scipy import stats

### Task1: Data Import

1. Import the data from the "monthly_sales_data.csv" file.
2. display the number of rows and columns. 
3. Display the first few rows of the dataset to get an overview.


In [2]:
df = pd.read_csv("monthly_sales_data.csv")
print(df.shape)
df.head()

(100, 2)


Unnamed: 0,product_id,sales_increase_pct
0,P0001,19.23
1,P0002,25.47
2,P0003,19.16
3,P0004,17.77
4,P0005,11.35


### Task2: Define Hypotheses

- State the null and alternative hypotheses based on the given scenario.

**Null Hypothesis (H0):** The increase in average monthly sales is less than or equal to 15%.

**Alternative Hypothesis (H1):** The increase in average monthly sales is greater than 15%.


### Task3: Calculate Test Statistics

- Compute the sample mean of cost_reduction_pct.
- Determine the sample size.
- Calculate the standard error using the provided population standard deviation.
- Compute the Z-score for the test statistic

In [3]:
#1. sample mean of cost_reduction_pct

sample_mean = df.sales_increase_pct.mean()
sample_mean

15.4845

In [4]:
#2. sample size
sample_size = df.shape[0]
sample_size

100

In [7]:
#3. standard error
standard_error = population_std_dev/np.sqrt(sample_size)
standard_error

0.5

In [8]:
#4. z_score

z_score = round((sample_mean-population_mean)/standard_error, 2)
z_score

6.97

### Task4: Calculate the P-Value

- Set the significance level (e.g., alpha = 0.05).
- Calculate the p-value associated with the obtained z-score.

In [9]:
#defining significance level
alpha = 0.05 # alpha means significance level
alpha

0.05

In [15]:
p_value = 1 - stats.norm.cdf(z_score) # get p value from z score
p_value

1.5847323453499484e-12

### Task5: Decision Making

- Compare the calculated p-value with the alpha.
- Decide whether to reject or fail to reject the null hypothesis.
- Write a conclusion summarizing the findings.

In [16]:
p_value, alpha

(1.5847323453499484e-12, 0.05)

## Summary


**Hypothesis:**
- Null Hypothesis (H0): The increase in average monthly sales is less than or equal to 15%.
- Alternative Hypothesis (Ha):  The increase in average monthly sales is greater than 15%.


**Findings:**
- calculated p-value: 1.5847323453499484e-12 (very small)
- alpha value:  0.05


**Conclusion:** 
-  P-Value < Alpha
- The analysis results in the rejection of the Null Hypothesis, suggesting that the average monthly sales post the marketing campaign have exceeded 15%.