# Z - test 

**Performing a Z-test on an sklearn dataset involves comparing a sample mean to a population mean or comparing two sample means if certain conditions are met.Below is a step-by-step guide to performing a Z-test using a dataset from scikit-learn.**


---
---

In [4]:
# library
from sklearn.datasets import load_iris
import numpy as np
from scipy.stats import norm

---
---

![image.png](attachment:image.png)

---
---

![image.png](attachment:image.png)

In [5]:
# Load the Iris dataset
iris = load_iris()
sepal_lengths = iris.data[:, 0]  # Extract the sepal length feature



# Hypothesized population mean
population_mean = 6.0



# Calculate sample statistics
sample_mean = np.mean(sepal_lengths)
sample_std = np.std(sepal_lengths, ddof=0)  # Population standard deviation
n = len(sepal_lengths)



# Calculate the Z-score
z_score = (sample_mean - population_mean) / (sample_std / np.sqrt(n))



# Calculate the p-value
p_value = 2 * (1 - norm.cdf(abs(z_score)))  # Two-tailed test



# Print results
print(f"Sample Mean: {sample_mean:.3f}")
print(f"Z-Score: {z_score:.3f}")
print(f"P-Value: {p_value:.3f}")



# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The sample mean is significantly different from the population mean.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the sample and population mean.")


Sample Mean: 5.843
Z-Score: -2.325
P-Value: 0.020
Reject the null hypothesis: The sample mean is significantly different from the population mean.


![image.png](attachment:image.png)

---
---

**two-tailed test**  
p_value = 2 * (1 - norm.cdf(abs(z_score)))  

**One-tailed test (right-tailed)**  
p_value_right_tail = 1 - norm.cdf(z_score)  

**one-tailed test (left-tailed)**  
p_value_left_tail = norm.cdf(z_score)  

---
---

# Two sample Z - test

![image.png](attachment:image.png)

![image.png](attachment:image.png)

---
---

![image.png](attachment:image.png)

In [3]:
# Load the Iris dataset
iris = load_iris()
data = iris.data
target = iris.target



# Extract sepal lengths for Setosa (class 0) and Versicolor (class 1)
setosa_sepal_length = data[target == 0, 0]
versicolor_sepal_length = data[target == 1, 0]



# Calculate sample statistics for each group
mean_setosa = np.mean(setosa_sepal_length)
std_setosa = np.std(setosa_sepal_length, ddof=0)  # Population standard deviation approximation
n_setosa = len(setosa_sepal_length)



mean_versicolor = np.mean(versicolor_sepal_length)
std_versicolor = np.std(versicolor_sepal_length, ddof=0)
n_versicolor = len(versicolor_sepal_length)



# Calculate the pooled standard error
pooled_se = np.sqrt((std_setosa**2 / n_setosa) + (std_versicolor**2 / n_versicolor))



# Calculate the Z-score
z_score = (mean_setosa - mean_versicolor) / pooled_se



# Calculate the p-value
p_value = 2 * (1 - norm.cdf(abs(z_score)))  # Two-tailed test



# Print results
print(f"Setosa Mean: {mean_setosa:.3f}, Versicolor Mean: {mean_versicolor:.3f}")
print(f"Z-Score: {z_score:.3f}")
print(f"P-Value: {p_value:.3f}")



# Interpretation
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The means of sepal lengths are significantly different.")
else:
    print("Fail to reject the null hypothesis: No significant difference between the means.")


Setosa Mean: 5.006, Versicolor Mean: 5.936
Z-Score: -10.628
P-Value: 0.000
Reject the null hypothesis: The means of sepal lengths are significantly different.
