## Problem Statement

You are a data scientist at a business efficiency consulting firm. Your client, a multinational corporation, has recently implemented a series of cost-saving measures across various departments.  To evaluate the impact of these initiatives, the company has compiled a sample dataset named **"operational_costs_data.csv"**. This sample dataset tracks the percentage reduction in operational costs for each department post-implementation of the cost-saving activities. The dataset includes the following columns:



- **department_id:** A unique identifier for each department.
- **cost_reduction_pct:** The percentage reduction in operational costs for each department following the cost-saving measures..

The primary goal of the analysis is to determine whether these cost-saving measures have effectively reduced operational costs beyond the company's target of 8%. 

In [1]:
#given population parameters

population_mean = 7  #(indicating an average cost reduction target of 7% before the series of cost-saving measures).
population_std_dev = 3  #(variability).

**Import Necessary Libraries**

In [2]:
import pandas as pd
import numpy as np
from scipy import stats

### Task1: Data Import

1. Import the data from the "operational_costs_data.csv" file.
2. display the number of rows and columns. 
3. Display the first few rows of the dataset to get an overview.


In [3]:
df = pd.read_csv("operational_costs_data.csv")
print(df.shape)
df.head()

(100, 2)


Unnamed: 0,department_id,cost_reduction_pct
0,D001,7.4
1,D002,11.31
2,D003,10.78
3,D004,8.79
4,D005,3.29


### Task2: Define Hypotheses

- State the null and alternative hypotheses based on the given scenario.

In [5]:

## Null Hypothesis (H0):** The reduction in average monthly operational costs is less than or equal to 8%.

## Alternative Hypothesis (Ha):** The reduction in average monthly operational costs is greater than 8%.


### Task3: Calculate Test Statistics

- Compute the sample mean of cost_reduction_pct.
- Determine the sample size.
- Calculate the standard error using the provided population standard deviation.
- Compute the Z-score for the test statistic

In [7]:
#1. sample mean of cost_reduction_pct
sample_mean = df['cost_reduction_pct'].mean()
sample_mean

np.float64(7.2562)

In [11]:
#2. sample size

sample_size =df.shape
sample_size

(100, 2)

In [12]:
#3. standard error

standard_error = population_std_dev / np.sqrt(sample_size[0])
standard_error

np.float64(0.3)

In [13]:
#4. z_score
z_score = (sample_mean - population_mean) / standard_error
z_score

np.float64(0.8539999999999992)

### Task4: Determine Z-critical Value

- Set the significance level (e.g., alpha = 0.05).
- Find the critical Z-value corresponding to this alpha level.

In [14]:
#defining significance level
alpha = 0.05

In [18]:
#critical value
z_critical = stats.norm.ppf(1 - alpha)
z_critical

np.float64(1.6448536269514722)

### Task5: Decision Making

- Compare the calculated Z-score with the critical Z-value.
- Decide whether to reject or fail to reject the null hypothesis.
- Write a conclusion summarizing the findings.

In [19]:
z_score, z_critical

(np.float64(0.8539999999999992), np.float64(1.6448536269514722))

## Summary


**Hypothesis:**
- Null Hypothesis (H0): Average monthly operational cost reduction ≤ 8%.
- Alternative Hypothesis (Ha): Average monthly operational cost reduction > 8%.


**Findings:**
- Calculated Z-Score: 0.85
- Critical Z-Value: 1.645


**Conclusion:** 
-  z_score < z_critical
- Since the Z-Score (0.85) is less than the Critical Z-Value (1.645), we do not have sufficient evidence to reject the Null Hypothesis. This suggests that the cost-saving measures did not significantly reduce operational costs beyond the 8% target.