# Module 75 Anamoly detection Ass2

Q1. What is the role of feature selection in anomaly detection?

A1. Feature selection helps improve anomaly detection by:

1.) **Enhancing model performance** – Removing irrelevant features reduces noise.

2.) **Reducing computational complexity** – Fewer features make the model run faster.

3.) **Avoiding the curse of dimensionality** – High-dimensional data makes distance-based detection ineffective.

4.) **Improving interpretability** – Helps understand which features contribute to anomalies.


Q2. What are some common evaluation metrics for anomaly detection algorithms and how are they
computed?

A2. Some common evaluation metrics for anomaly detection algorithms are :

1.) **Precision** – Measures how many detected anomalies are actual anomalies.

``` Precision = TP/ (TP+FP) ```

2.) **Recall (Sensitivity)** – Measures how many actual anomalies are detected.

``` Recall = TP/ (TP + FN) ```

3.) **F1-Score** – Harmonic mean of precision and recall.

``` F1 = 2 * [(Precision * Recall) / (Precision + Recall)] ```

4.) **Area Under ROC Curve (AUC-ROC)** – Measures model’s ability to separate anomalies and normal points.

5.) **Mean Squared Error (MSE)** – Used for reconstruction-based models (e.g., autoencoders).


Q3. What is DBSCAN and how does it work for clustering?

A3. DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups points based on density.

## Steps:

### 1.) **Define parameters:**

i.) eps (ε): Neighborhood radius.

ii.) min_samples: Minimum points required to form a cluster.


### 2.) **Classify points:**

i.) Core points – Have at least min_samples within eps.

ii.) Border points – Near a core point but do not satisfy min_samples.

iii.) Noise points (outliers) – Not part of any cluster.


### 3.) **Expand clusters:**  
By connecting core points and border points.

**Key advantage:** No need to specify the number of clusters.

Q4. How does the epsilon parameter affect the performance of DBSCAN in detecting anomalies?

A4. The epsilon parameter affect the performance of DBSCAN in detecting anomalies by following -

1.) **Small eps** → Many small clusters, more outliers.

2.) **Large eps** → Fewer clusters, fewer outliers (risk of merging distinct clusters).

3.) **Optimal eps** is chosen using a k-distance plot, where the elbow point suggests a good value.

Q5. What are the differences between the core, border, and noise points in DBSCAN, and how do they relate
to anomaly detection?

A5. Difference between these 3 points are -

### 1.) Core point
Definition - Has at least min_samples within eps.

Role in Anamoly detection - Forms the cluster structure.

### 2.) Border point
Definition - Lies within eps of a core point but has < min_samples neighbors.

Role in Anamoly detection - Helps expand clusters.

### 3.) Noise(Outlier) point
Definition - Doesn't belong to any cluster.

Role in Anamoly detection - Considered an anomaly.

Q6. How does DBSCAN detect anomalies and what are the key parameters involved in the process?

A6. DBSCAN treats noise points (points that don’t belong to any cluster) as anomalies.

### Key parameters:

1.) eps (epsilon) – Controls the radius for neighborhood search.

2.) min_samples – Minimum number of points needed to form a cluster.

3.) Distance metric – Typically Euclidean distance, but can vary based on data type.

Larger eps reduces anomalies, while smaller eps increases them.

Q7. What is the make_circles package in scikit-learn used for?

A7. The make_circles function generates two concentric circles for clustering and classification tasks. Usage:



```
from sklearn.datasets import make_circles
X, y = make_circles(n_samples=500, factor=0.3, noise=0.1)

```

factor: Controls the distance between inner and outer circles.

noise: Adds randomness to make clustering more challenging.

Useful for **non-linear clustering** algorithms like DBSCAN.


Q8. What are local outliers and global outliers, and how do they differ from each other?

A8. Global and local outliers are different like :

## Global outliers:

Deviate significantly from the entire dataset.

Example: A 7-foot-tall person in a general population.

Detected by: Isolation Forest, Z-score.


## Local outliers:

Anomalous within a small region but not globally.

Example: A person wearing winter clothes in summer.

Detected by: LOF (Local Outlier Factor).


Q9. How can local outliers be detected using the Local Outlier Factor (LOF) algorithm?

A9. LOF detects local anomalies by comparing the density of a point to its neighbors.

### Steps:

1.) Compute local reachability density (LRD) for each point.

2.) Compute LOF score using the density ratio with neighbors.

3.) LOF > 1 → Likely an outlier.

```
from sklearn.neighbors import LocalOutlierFactor
lof = LocalOutlierFactor(n_neighbors=20)
outlier_scores = lof.fit_predict(X)
```

Points labeled -1 are anomalies.


Q10. How can global outliers be detected using the Isolation Forest algorithm?

A10. Isolation Forest detects global anomalies by recursively splitting the dataset.

### Steps:

1.) Randomly select a feature and split the data.

2.) Repeat until all points are isolated.

3.) Points with shorter average path lengths are anomalies.

```
from sklearn.ensemble import IsolationForest

iso_forest = IsolationForest(n_estimators=100, contamination=0.05)
outliers = iso_forest.fit_predict(X)

```

Points labeled -1 are anomalies.


Q11. What are some real-world applications where local outlier detection is more appropriate than global outlier detection, and vice versa?

A11. Lets' take

LOF = Local Outlier Detection

GOD = Global Outlier Detection (Isolation Forest, Z- score)

## Based on the application -

### 1.) Fraud detection

**LOF** - Unusual spending patterns in specific categories

**GOD** - Overall spending deviates from past behavior

### 2.) Industrial monitoring -

**LOF** - 	A sensor in one machine behaving abnormally

**GOD** - A sensor reading far outside normal range

### 3.) Cybersecurity -

**LOF** - Abnormal network traffic in a subnet

**GOD** - Sudden increase in failed login attempts

### 4.) Healthcare -

**LOF** - Unusual heart rate within a specific age group

**GOD** - Extremely high or low heart rate overall

### 5.) Finance -

**LOF** - Local stock price anomalies

**GOD** - Market-wide anomalies in financial data.


## Key takeaway:

LOF is useful for detecting context-dependent anomalies.

Isolation Forest/Z-score is useful for detecting extreme values globally.
