In [1]:
# Anomaly Detection with Isolation Forest

This notebook demonstrates how to perform basic Exploratory Data Analysis (EDA) on sensor data and train an Isolation Forest model to detect anomalies.

## 1. Load Data

First, we load the `sample_sensor_data.csv` file into a pandas DataFrame.
```python
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

# Load the data
try:
    df = pd.read_csv('../data/sample_sensor_data.csv', parse_dates=['timestamp'])
    df.set_index('timestamp', inplace=True)
    print("Data loaded successfully:")
    print(df.head())
except FileNotFoundError:
    print("Error: 'sample_sensor_data.csv' not found. Please ensure the file exists in the 'data/' directory.")
    exit()
```

## 2. Basic EDA (Exploratory Data Analysis)

We will perform basic EDA by looking at summary statistics and plotting the sensor readings over time.

### Summary Statistics
```python
print("\nSummary Statistics:")
print(df.describe())
```

### Line Plots of Sensor Readings
```python
print("\nPlotting Sensor Data:")
df.plot(subplots=True, figsize=(12, 8))
plt.tight_layout()
plt.show()
```

## 3. Train Isolation Forest Model

Next, we train an Isolation Forest model on the sensor data to identify anomalies. Isolation Forest is an unsupervised learning algorithm that works by isolating observations and is effective at detecting outliers.

```python
# Initialize and train the Isolation Forest model
# contamination is the proportion of outliers in the dataset, which can be estimated or set based on domain knowledge
model = IsolationForest(contamination=0.05, random_state=42)
model.fit(df)
```

## 4. Mark and Plot Anomalies

Finally, we predict anomalies using the trained model and visualize them on the plots. Anomalies are typically marked with a score of -1.

```python
# Predict anomalies (-1 for anomalies, 1 for normal observations)
df['anomaly'] = model.predict(df)

print("\nDataFrame with Anomalies Marked:")
print(df[df['anomaly'] == -1])

# Plotting anomalies
plt.figure(figsize=(14, 10))
for i, col in enumerate(df.columns[:-1]): # Exclude the 'anomaly' column
    plt.subplot(len(df.columns) - 1, 1, i + 1)
    plt.plot(df.index, df[col], label=col)
    anomalies = df[df['anomaly'] == -1]
    plt.scatter(anomalies.index, anomalies[col], color='red', label='Anomaly')
    plt.title(f'{col} Readings with Anomalies')
    plt.legend()
plt.tight_layout()
plt.show()

SyntaxError: invalid syntax (4144237465.py, line 3)