# Z Score

What is Z-Score?
A Z-score tells us how many standard deviations a data point is from the mean of the dataset. It helps to standardize data and identify outliers.


where:

X = individual data point
μ = mean of the dataset
σ = standard deviation of the dataset

2. How to Interpret Z-Scores?
🔹 Z = 0 → The data point is exactly at the mean.
🔹 Z = +1 → The data point is 1 standard deviation above the mean.
🔹 Z = -1 → The data point is 1 standard deviation below the mean.
🔹 Z > 3 or Z < -3 → Possible outliers (depends on the dataset).

3. Why Use Z-Score?
✅ Standardization: Converts different datasets to a common scale.
✅ Outlier Detection: Points with very high or low Z-scores might be outliers.
✅ Probability Calculations: Used in normal distributions for probability estimation.
✅ Feature Scaling for Machine Learning: Used in algorithms like KNN, SVM, etc.



In [1]:
from scipy.stats import zscore

In [2]:
import numpy as np

In [3]:
data=[10,20,30,40,50]
z_score=zscore(data)
print(z_score)

[-1.41421356 -0.70710678  0.          0.70710678  1.41421356]


In [4]:
#### Mannual Method

import numpy as np
data=[10,20,30,40,50]
mean=np.mean(data)
std_dev=np.std(data)
z_score=[(x-mean)/std_dev for x in data]
print(z_score)

[np.float64(-1.414213562373095), np.float64(-0.7071067811865475), np.float64(0.0), np.float64(0.7071067811865475), np.float64(1.414213562373095)]


# Covariance 📊

Covariance measures the direction of the relationship between two variables. It helps determine whether two variables tend to increase or decrease together.

 Applications of Covariance
🔹 Stock Market Analysis (Relationship between stock prices)
🔹 Portfolio Diversification (Helps choose assets with negative covariance)
🔹 Machine Learning (Feature selection, PCA for dimensionality reduction)
🔹 Economics & Finance (Finding relationships between economic indicators)

In [5]:
import numpy as np

x=[1,2,3,4,5]
y=[2,4,6,8,10]

cov_matrix=np.cov(x, y, bias=True)
print("Covariance:", cov_matrix[0,1])

Covariance: 4.0


In [6]:
import pandas as pd

df=pd.DataFrame({'x':[1,2,3,4,5], 'y':[2,4,6,8,20]})
print(df.cov())

      x     y
x   2.5  10.0
y  10.0  50.0


# Correlation 📊

Correlation measures the strength and direction of the relationship between two variables. Unlike covariance, correlation is standardized, meaning it always falls between -1 and +1.

 Types of Correlation

1️⃣ Pearson Correlation (Most Common) – Measures linear relationships.

2️⃣ Spearman Rank Correlation – Measures monotonic relationships (useful for non-linear trends).

3️⃣ Kendall’s Tau – Measures ordinal relationships (ranks).

In [7]:
import numpy as np

x=[1,2,3,4,5]
y=[2,4,6,8,10]

corr_matrix=np.corrcoef(x,y)
print(corr_matrix[0,1])

0.9999999999999999


In [9]:
### Using Pandas

from scipy.stats import pearsonr, spearmanr

perarson_corr, _=pearsonr(x,y)
spearson_corr, _=spearmanr(x,y)

print("pearson :", perarson_corr)
print("spearson :", spearson_corr)

pearson : 1.0
spearson : 0.9999999999999999


# Hypothesis Testing in Python

What is Hypothesis Testing?
Hypothesis testing is a statistical method used to make decisions or inferences about a population based on a sample of data. It helps determine whether an observed effect is real or due to random chance.

2. Steps in Hypothesis Testing
1️⃣ State the Null and Alternative Hypotheses

Null Hypothesis → No effect or no difference.

Alternative Hypothesis → There is an effect or difference.

2️⃣ Select a Significance Level
Typically 0.05 (5%), meaning we allow a 5% chance of rejecting
when it is true.

3️⃣ Choose a Test & Compute the Test Statistic
Based on data type and distribution, choose t-test, chi-square test, ANOVA, etc.

4️⃣ Find the p-value
If p-value < 0.05 → Reject

If p-value > 0.05 → Fail to reject

5️⃣ Make a Conclusion

Based on p-value, conclude whether the observed effect is statistically significant.



In [10]:
### One sample t-test

import numpy as np
from scipy.stats import ttest_1samp

data=[20,22,21,19,23,20,18,22,21,20]
pop_mean=21


t_stats, p_values=ttest_1samp(data,pop_mean)

print(f"T-statistics: {t_stats:.4f}, p-value: {p_values:.4f}")

if p_values<0.05:
    print("Reject the null hypothesis")
else:
    print("Fail to reject the null hypothesis")

T-statistics: -0.8402, p-value: 0.4226
Fail to reject the null hypothesis


# z test

A Z-test is a statistical hypothesis test used to determine if there is a significant difference between a sample mean and a population mean when the population variance is known and the sample size is large (N>30).

When to Use a Z-Test?

✅ Population standard deviation (σ) is known

✅ Sample size is large (30N>30)

✅ Data is normally distributed (or approximately normal by Central Limit Theorem)




In [12]:
import numpy as np
from scipy.stats import norm

sample=[180,175,170,185,190,195,178,182,176,184]
pop_mean=175
pop_std=10
n=len(sample)

sample_mean=np.mean(sample)

z_score=(sample_mean-pop_mean)/(pop_std/np.sqrt(n))

p_value=2*(1-norm.cdf(np.abs(z_score)))

print(f"Z-score: {z_score:.4f}, p-value: {p_value:.4f}")

# Conclusion
if p_value < 0.05:
    print("Reject Null Hypothesis (Significant difference)")
else:
    print("Fail to Reject Null Hypothesis (No significant difference)")

Z-score: 2.0555, p-value: 0.0398
Reject Null Hypothesis (Significant difference)
