## Task 3: Document Model Response under High-Stress Scenarios

## Anomaly Detection Stress Testing Report

### Overview
This report assesses the performance of two anomaly detection models—**Isolation Forest** and **Local Outlier Factor (LOF)**—under both normal and high-stress conditions. The stress testing was conducted by injecting high-frequency anomalies into the dataset and evaluating how well each model identified these anomalies. The goal was to simulate extreme conditions to test the robustness of each model in identifying both natural and synthetic anomalies.

### Baseline Performance (Original Data)

For the baseline data, the following metrics were recorded for both models without any additional stress-induced anomalies:

- **Isolation Forest:**
  - **Accuracy**: 96%
  - **Precision**: 14%
  - **Recall**: 100%
  - **F1 Score**: 24%

- **Local Outlier Factor (LOF):**
  - **Accuracy**: 96%
  - **Precision**: 14%
  - **Recall**: 100%
  - **F1 Score**: 24%

The baseline performance metrics show that both models have a high accuracy but a low precision. High recall values of 100% indicate that the models captured all true anomalies in the dataset, but the low precision suggests that there were many false positives, impacting the models' effectiveness in distinguishing true anomalies from normal data points.

### Performance on Stressed Data

To simulate high-stress conditions, synthetic anomalies were introduced into the dataset at regular intervals, creating high-frequency extreme values. This stressed dataset was used to measure each model's ability to detect anomalies under these challenging conditions.

#### Isolation Forest - Detection Rates on Stressed Data

- **Accuracy**: 95.66%
- **Precision**: 13.65%
- **Recall**: 98%
- **F1 Score**: 23.96%

#### Local Outlier Factor (LOF) - Detection Rates on Stressed Data

- **Accuracy**: 95.65%
- **Precision**: 13.51%
- **Recall**: 97%
- **F1 Score**: 23.72%

### Analysis and Observations

1. **Accuracy**: Both models saw a slight drop in accuracy (from 96% to around 95.65%) when stressed with synthetic anomalies, suggesting that the models were able to maintain a high level of overall performance. However, this slight decrease indicates that some of the synthetic anomalies impacted model decisions on a subset of normal points.

2. **Precision**: Precision dropped marginally for both models (Isolation Forest from 14% to 13.65% and LOF from 14% to 13.51%) under stressed conditions. This reduction in precision suggests an increase in false positive rates, where the models may have mistakenly labeled more normal data points as anomalies under high-stress scenarios.

3. **Recall**: Recall slightly decreased for both models under stress, with Isolation Forest reducing to 98% and LOF to 97%. This small reduction implies that, although most anomalies were still detected, a few synthetic anomalies introduced during stress testing were missed. Given the high frequency of synthetic anomalies, this decrease in recall reflects the challenges introduced by extreme values and dense anomaly points.

4. **F1 Score**: Both models experienced a minor decrease in F1 scores under stress. The F1 score for Isolation Forest dropped from 24% to 23.96%, and for LOF, it decreased from 24% to 23.72%. This slight change suggests that both models handled high-stress conditions fairly well, despite the additional anomalies.

### Conclusions

- **Robustness Under Stress**: Isolation Forest and LOF models demonstrated robust performance under high-stress conditions, as evidenced by only a minor decrease in accuracy and recall. The slight decrease in precision and F1 score indicates an increased sensitivity to extreme anomalies, resulting in more false positives but maintaining overall anomaly detection.

- **Model Suitability**: Isolation Forest and LOF both performed comparably under high-stress scenarios. The choice between the two models may depend on application requirements, as Isolation Forest showed marginally higher recall, while LOF had a similar F1 score with a slight computational efficiency advantage.

### Recommendations

1. **Optimization**: To enhance precision and F1 scores, hyperparameter tuning (e.g., varying contamination levels) could improve precision, especially under high-frequency anomaly scenarios.
  
2. **Hybrid Models**: Combining Isolation Forest and LOF, or utilizing an ensemble model, could enhance robustness by leveraging the complementary strengths of each model.

3. **Further Testing**: Conduct additional stress tests by varying the magnitude and frequency of synthetic anomalies to observe model adaptability under more diverse conditions.

In summary, both Isolation Forest and LOF demonstrated resilience to stress-induced anomalies with only minor variations in performance metrics, supporting their applicability in real-world anomaly detection scenarios.

DAY 12
## Stress Test Results Documentation

### 1. Summary of Key Findings

Stress testing was conducted on two primary anomaly detection models: **Isolation Forest** and **Local Outlier Factor (LOF)**. Each model was evaluated under both normal and stressed conditions to assess their performance and robustness. The stressed dataset included high-frequency synthetic anomalies introduced at regular intervals, simulating extreme conditions.

#### Key Findings:

- **Accuracy**: Both models maintained high accuracy under stress, decreasing slightly from 96% to approximately 95.65%, indicating resilience against synthetic anomalies.
- **Precision**: Both models experienced a minor decrease in precision. Isolation Forest’s precision fell from 14% to 13.65%, and LOF’s precision dropped from 14% to 13.51%. This suggests an increased rate of false positives under stressed conditions.
- **Recall**: The recall values dropped slightly, with Isolation Forest at 98% and LOF at 97%. The minor decrease indicates that the models missed a few synthetic anomalies, but both maintained strong recall rates.
- **F1 Score**: Both models’ F1 scores showed minimal reductions, with Isolation Forest decreasing from 24% to 23.96% and LOF from 24% to 23.72%. This slight change indicates that each model handled the high-stress environment reasonably well.



## Group 4: 06.11.2024

DAY 13 -- CHANDANA M -- OUTPUT

### 1. Code to Test Model Performance Against Different Types of Simulated Anomalies

  
### Explanation of Code:

1. **Simulate Anomalies**: The `simulate_anomalies` function allows us to introduce four types of anomalies in the dataset: spikes, drifts, drops, and noise.
2. **Model Setup**: Isolation Forest and LOF models are initialized.
3. **Performance Metrics Calculation**: Each model's performance is evaluated using `accuracy`, `precision`, `recall`, and `f1 score` for every type of simulated anomaly.
4. **Result Logging**: Results are stored in a DataFrame and printed for analysis. 

This code provides a structured approach to testing model performance across various simulated anomaly types, documenting their ability to identify anomalies under different stress conditions.

In [98]:
# DAY 13
import numpy as np
import pandas as pd
from sklearn.ensemble import IsolationForest
from sklearn.neighbors import LocalOutlierFactor
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Original Data Setup
data = augmented_data.copy()
features = ['Acc X', 'Acc Y', 'Acc Z', 'gyro_x', 'gyro_y', 'gyro_z']  # Define feature columns for models
data['anomaly'] = 0  # Initialize anomaly column for simulated data

# Function to Simulate Different Types of Anomalies
def simulate_anomalies(data, feature, anomaly_type="spike", magnitude=10, frequency=0.05):
    data_sim = data.copy()
    anomaly_indices = np.random.choice(data_sim.index, int(frequency * len(data_sim)), replace=False)
    
    if anomaly_type == "spike":
        data_sim.loc[anomaly_indices, feature] += magnitude * np.random.randn(len(anomaly_indices))
    elif anomaly_type == "drift":
        data_sim.loc[anomaly_indices, feature] += np.linspace(0, magnitude, len(anomaly_indices))
    elif anomaly_type == "drop":
        data_sim.loc[anomaly_indices, feature] = data_sim[feature].min()
    elif anomaly_type == "noise":
        data_sim.loc[anomaly_indices, feature] += magnitude * np.random.uniform(-1, 1, len(anomaly_indices))
    
    data_sim.loc[anomaly_indices, 'anomaly'] = 1
    return data_sim

# Initialize results dictionary
results = {"Model": [], "Anomaly Type": [], "Accuracy": [], "Precision": [], "Recall": [], "F1 Score": []}

# Types of anomalies to test
anomaly_types = ["spike", "drift", "drop", "noise"]

# Isolation Forest and LOF Setup
iso_forest = IsolationForest(contamination=0.05, random_state=42)
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.05, novelty=True)

for anomaly_type in anomaly_types:
    # Simulate anomalies
    simulated_data = simulate_anomalies(data, feature='Acc X', anomaly_type=anomaly_type, magnitude=10, frequency=0.1)

    # Isolation Forest Model Evaluation
    iso_forest.fit(simulated_data[features])
    iso_preds = iso_forest.predict(simulated_data[features])
    iso_preds = (iso_preds == -1).astype(int)
    
    accuracy_iso = accuracy_score(simulated_data['anomaly'], iso_preds)
    precision_iso = precision_score(simulated_data['anomaly'], iso_preds)
    recall_iso = recall_score(simulated_data['anomaly'], iso_preds)
    f1_iso = f1_score(simulated_data['anomaly'], iso_preds)
    
    # Log Isolation Forest results
    results["Model"].append("Isolation Forest")
    results["Anomaly Type"].append(anomaly_type)
    results["Accuracy"].append(accuracy_iso)
    results["Precision"].append(precision_iso)
    results["Recall"].append(recall_iso)
    results["F1 Score"].append(f1_iso)

    # LOF Model Evaluation
    lof.fit(simulated_data[features])
    lof_preds = lof.predict(simulated_data[features])
    lof_preds = (lof_preds == -1).astype(int)
    
    accuracy_lof = accuracy_score(simulated_data['anomaly'], lof_preds)
    precision_lof = precision_score(simulated_data['anomaly'], lof_preds)
    recall_lof = recall_score(simulated_data['anomaly'], lof_preds)
    f1_lof = f1_score(simulated_data['anomaly'], lof_preds)
    
    # Log LOF results
    results["Model"].append("LOF")
    results["Anomaly Type"].append(anomaly_type)
    results["Accuracy"].append(accuracy_lof)
    results["Precision"].append(precision_lof)
    results["Recall"].append(recall_lof)
    results["F1 Score"].append(f1_lof)

# Convert results to DataFrame for readability
results_df = pd.DataFrame(results)
print("Model Performance with Different Types of Simulated Anomalies:\n", results_df)




Model Performance with Different Types of Simulated Anomalies:
               Model Anomaly Type  Accuracy  Precision    Recall  F1 Score
0  Isolation Forest        spike  0.870905   0.208914  0.104603  0.139405
1               LOF        spike  0.875296   0.223950  0.100418  0.138662
2  Isolation Forest        drift  0.870626   0.206128  0.103208  0.137546
3               LOF        drift  0.868744   0.158295  0.072524  0.099474
4  Isolation Forest         drop  0.878015   0.279944  0.140167  0.186803
5               LOF         drop  0.867838   0.148936  0.068340  0.093690
6  Isolation Forest        noise  0.871881   0.218663  0.109484  0.145911
7               LOF        noise  0.873554   0.197452  0.086471  0.120272


## Group 4: 08.11.2024 
DAY 14

DAY 14

## Final Report on Simulation and Stress Testing of Anomaly Detection Models

### 2.4 Test Alternative Density-Based Techniques

Incorporating techniques like **DBSCAN** or **Extended LOF** could increase robustness in areas with dense anomaly clusters. These density-based methods are more sensitive to local data variations, enabling them to detect densely packed anomalies without misclassifying benign data.

### 2.5 Utilize Incremental Learning Techniques

Incremental learning approaches allow models to adapt to evolving anomaly patterns over time. Training the models to recognize gradual changes could improve their performance under scenarios where data distributions shift dynamically.

### 2.6 Explore Advanced Detection Thresholding

Applying dynamic thresholding based on contextual or seasonal variations could improve anomaly detection accuracy. For example, adjusting thresholds during high-fluctuation periods could help models focus on significant anomalies, thus reducing false positives.




DAY 15
## 11.11.24

DAY 15
## Tasks done by Group 4

### LOF (Local Outlier Factor) Model Tasks:
1. **Outlier Validation and Threshold Adjustments**:
   - **Analyze False Positives/Negatives**: Identified patterns in false positives and negatives detected by IQR and Z-Score.
   - **Threshold Optimization**: Suggested optimal combinations of IQR and Z-Score thresholds to minimize false positives and negatives.
   - **Exploration of Alternative Methods**: Investigated alternative methods, such as Mahalanobis Distance and Robust Covariance Estimation, for potentially more effective outlier detection.

2. **Hyperparameter Tuning**:
   - **Research Hyperparameters**: Examined key hyperparameters for LOF, such as `n_neighbors` and `contamination`, to understand their impact on performance.
   - **Experimentation and Tuning**: Implemented tuning strategies, potentially adjusting hyperparameters based on observed results and analysis of detection performance.

---

### Isolation Forest Model Tasks:
1. **Data Augmentation for Anomalies**:
   - **Synthetic Anomaly Generation**: Created synthetic anomalies by introducing noise or modifying patterns, including complex anomalies that involve accelerometer and gyroscope data.
   - **Integrate Synthetic Data**: Augmented the original dataset with synthetic anomalies, retrained Isolation Forest, and monitored detection efficacy.
   - **Model Performance Comparison**: Compared model precision and recall between real and synthetic anomalies, documenting the impact on performance.

2. **Visualization of Anomalies**:
   - **Augmented Data Visualization**: Visualized synthetic anomalies to ensure correct identification as outliers.
   - **Model Comparison**: Visualized differences in anomaly detection results between Isolation Forest and other models for interpretability and performance insights.

3. **Simulation and Stress Testing**:
   - **Simulate and Test Anomalies**: Conducted robustness testing by simulating high-frequency and varied types of anomalies to evaluate the model's detection rate and resilience.
   - **Document Model Responses**: Tracked Isolation Forest’s performance under stress, noting detection weaknesses and response to stress conditions.
   - **Compile Findings**: Summarized stress test outcomes, highlighting model strengths, weaknesses, and proposed enhancements for robustness.

4. **Final Documentation**:
   - **Stress Test Reporting**: Documented key insights from stress tests, offering actionable steps to improve model resilience against high-frequency anomalies.
   - **Validation Report and Strategy Update**: Compiled validation findings and suggested updates to the anomaly detection strategy based on observed results.

ABSENT - 12th to 15th November due to examination