### Dataset:
The annual salaries (in $1000) of 20 employees in a tech company:

[45, 48, 50, 52, 55, 58, 60, 62, 65, 68, 70, 72, 75, 78, 80, 85, 90, 95, 150, 500]

### Calculate:
1. Range
2. Variance
3. Standard deviation
4. Q2
5. IQR (Interquartile Range)
6. Calculate the lower and upper fences for outlier detection
7. Identify any outliers using the IQR method
8. How many outliers are there and what are their values?


In [1]:
import numpy as np
import pandas as pd

data = np.array([45, 48, 50, 52, 55, 58, 60, 62, 65, 68, 70, 72, 75, 78, 80, 85, 90, 95, 150, 500])

n = len(data)
data_sorted = np.sort(data)
data_range = np.max(data) - np.min(data)
mean = np.mean(data)
variance = np.var(data, ddof=1)  # sample variance
std_dev = np.std(data, ddof=1)   # sample standard deviation
Q1 = np.percentile(data, 25)
Q2 = np.percentile(data, 50)
Q3 = np.percentile(data, 75)
IQR = Q3 - Q1

# Fences for outlier detection (IQR method)
lower_fence = Q1 - 1.5 * IQR
upper_fence = Q3 + 1.5 * IQR

# Identify outliers
outliers = data[(data < lower_fence) | (data > upper_fence)]

# Print results
print("SALARY DATA ANALYSIS RESULTS")
print("-------------------------------------")
print(f"Dataset (sorted): {data_sorted}")
print(f"Count (n): {n}")
print(f"Range: {data_range}")
print(f"Mean: {mean:.2f}")
print(f"Variance: {variance:.2f}")
print(f"Standard Deviation: {std_dev:.2f}")
print(f"Q1: {Q1:.2f}")
print(f"Q2 (Median): {Q2:.2f}")
print(f"Q3: {Q3:.2f}")
print(f"IQR: {IQR:.2f}")
print(f"Lower Fence: {lower_fence:.2f}")
print(f"Upper Fence: {upper_fence:.2f}")
print(f"Outliers: {list(outliers)}")
print(f"Number of Outliers: {len(outliers)}")

# Optional: display results in a table
summary = {
    "Statistic": ["Range", "Mean", "Variance", "Std Deviation", "Q1", "Q2 (Median)", "Q3", "IQR", "Lower Fence", "Upper Fence", "Outliers", "Number of Outliers"],
    "Value": [data_range, mean, variance, std_dev, Q1, Q2, Q3, IQR, lower_fence, upper_fence, list(outliers), len(outliers)]
}

df_summary = pd.DataFrame(summary)
print("\n\nSummary Table:")
display(df_summary)


SALARY DATA ANALYSIS RESULTS
-------------------------------------
Dataset (sorted): [ 45  48  50  52  55  58  60  62  65  68  70  72  75  78  80  85  90  95
 150 500]
Count (n): 20
Range: 455
Mean: 92.90
Variance: 9718.41
Standard Deviation: 98.58
Q1: 57.25
Q2 (Median): 69.00
Q3: 81.25
IQR: 24.00
Lower Fence: 21.25
Upper Fence: 117.25
Outliers: [np.int64(150), np.int64(500)]
Number of Outliers: 2


Summary Table:


Unnamed: 0,Statistic,Value
0,Range,455
1,Mean,92.9
2,Variance,9718.410526
3,Std Deviation,98.581999
4,Q1,57.25
5,Q2 (Median),69.0
6,Q3,81.25
7,IQR,24.0
8,Lower Fence,21.25
9,Upper Fence,117.25
