# 09. Mini-Project: Outlier Detection 🔍

**Use Case:** Perform Outlier detection for a given dataset.

**Tasks:**
1. Load the data in the DataFrame.
2. Detection of Outliers.

### 1. Load the Data

Since `datasetExample.csv` is not a standard dataset, we will create a sample DataFrame for this demonstration. This code will work on any dataset with numerical columns.

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Create a sample dataset with some outliers
data = {
    'Salary': [45000, 50000, 52000, 48000, 47000, 55000, 53000, 120000, 46000, 51000, 49000, 54000, 15000]
}
df = pd.DataFrame(data)

print("Sample Dataset:")
display(df)

### 2. Detection of Outliers

We will use two common methods for outlier detection.

#### Method 1: Visualization using Box Plot

A box plot is an excellent way to see the distribution of the data and visually identify points that lie outside the main range (whiskers).

In [None]:
plt.figure(figsize=(8, 6))
sns.boxplot(y=df['Salary'])
plt.title('Box Plot of Salary')
plt.ylabel('Salary')
plt.grid(True)
plt.show()

# The points above the top whisker are potential outliers.

#### Method 2: Statistical Detection using Interquartile Range (IQR)

The IQR method defines outliers as any data points that fall below `Q1 - 1.5 * IQR` or above `Q3 + 1.5 * IQR`.

In [None]:
# Calculate Q1 (25th percentile) and Q3 (75th percentile)
Q1 = df['Salary'].quantile(0.25)
Q3 = df['Salary'].quantile(0.75)

# Calculate the Interquartile Range (IQR)
IQR = Q3 - Q1

# Define the lower and upper bounds for outlier detection
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR

print(f"Q1: {Q1}")
print(f"Q3: {Q3}")
print(f"IQR: {IQR}")
print(f"Lower Bound: {lower_bound}")
print(f"Upper Bound: {upper_bound}")

In [None]:
# Identify the outliers
outliers = df[(df['Salary'] < lower_bound) | (df['Salary'] > upper_bound)]

print("\nDetected Outliers:")
if outliers.empty:
    print("No outliers found.")
else:
    display(outliers)