## Q1. What is Anomaly Detection? Explain its types (point, contextual, and collective anomalies) with examples.
- Anomaly Detection is the process of identifying unusual patterns, behaviors, or data points that significantly differ from the majority of data. These anomalies often indicate fraud, system failures, or rare events.

### Types of Anomalies:

#### Point Anomalies:
- A single data point is far from the rest.
- Example: A transaction of $50,000 when most transactions are < $500.

#### Contextual Anomalies (Conditional Anomalies):
- Data points that are unusual in a specific context (e.g., time or location).
- Example: A temperature of 30°C is normal in summer but abnormal in winter.

#### Collective Anomalies:
- A group of data points considered abnormal together, even if individually they look normal.
- Example: Multiple failed login attempts in a short span → possible cyber-attack.

## Q2. Compare Isolation Forest, DBSCAN, and Local Outlier Factor in terms of their approach and suitable use cases.
- Algorithm	Approach	Best Use Cases
- Isolation Forest	Randomly partitions data to isolate anomalies (anomalies get isolated quickly).	Large high-dimensional datasets (fraud detection, server logs).
- DBSCAN	Density-based clustering; points in low-density areas are anomalies.	- Spatial/geographical data, clustering with noise.
LOF	Compares local density of a point to its neighbors.	Detecting local anomalies in datasets with varying densities.

## Q3. What are the key components of a Time Series? Explain each with one example.
- Trend: Long-term upward or downward movement.
- Example: Increasing sales of smartphones over years.
- Seasonality: Repeating patterns at fixed intervals.
- Example: Ice cream sales peaking every summer.
- Cyclic Patterns: Fluctuations not of fixed frequency, often linked to economy.
- Example: Business cycles (growth → recession → recovery).
- Residual/Noise: Random variation unexplained by trend or seasonality.
- Example: Sudden spike in electricity usage due to an event.

## Q4. Define Stationary in time series. How can you test and transform a non-stationary series into a stationary one?
- A stationary series has constant mean, variance, and autocorrelation over time.
- Non-stationary data → hard to forecast.
- How to Test?

- Visual Inspection: Plot the series.
- Rolling Statistics: Compare rolling mean/variance.
- Augmented Dickey-Fuller (ADF) Test:
- Null hypothesis: Data is non-stationary.
- Transformations:
- Differencing: Subtract current value from previous value.
- Log Transformation: Stabilizes variance.
- Decomposition: Remove trend and seasonality.

## Q5. Differentiate between AR, MA, ARIMA, SARIMA, and SARIMAX models.

| Model                   | Description                                         | Example Use Case                          |
| ----------------------- | --------------------------------------------------- | ----------------------------------------- |
| **AR (Autoregressive)** | Future value depends on past values.                | Stock prices.                             |
| **MA (Moving Average)** | Future value depends on past forecast errors.       | Noise reduction in sales data.            |
| **ARIMA**               | Combination of AR + MA + differencing.              | Non-stationary series forecasting.        |
| **SARIMA**              | ARIMA + seasonality handling.                       | Airline passengers (monthly seasonality). |
| **SARIMAX**             | SARIMA + external regressors (exogenous variables). | Energy demand with weather factors.       |


## Q6. Load a time series dataset (AirPassengers), plot the original series, and decompose it.

In [None]:
## Solution 6

import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.tsa.seasonal import seasonal_decompose

# Load AirPassengers dataset
data = pd.read_csv("AirPassengers.csv", parse_dates=['Month'], index_col='Month')

# Plot original
data.plot(title="AirPassengers Data")
plt.show()

# Decomposition
decomposition = seasonal_decompose(data['#Passengers'], model='multiplicative')
decomposition.plot()
plt.show()


## Q7. Apply Isolation Forest on NYC Taxi Fare dataset to detect anomalies.

In [None]:
## Solution 7

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

# Example synthetic taxi data
data = pd.DataFrame({
    "fare_amount": [5, 7, 8, 100, 6, 7, 8, 120, 9, 10],
    "trip_distance": [1, 2, 2.5, 20, 1.5, 2, 3, 25, 2, 2.5]
})

# Apply Isolation Forest
iso = IsolationForest(contamination=0.2, random_state=42)
data['anomaly'] = iso.fit_predict(data)

# Plot
plt.scatter(data["fare_amount"], data["trip_distance"],
            c=data['anomaly'], cmap="coolwarm", s=80)
plt.xlabel("Fare")
plt.ylabel("Distance")
plt.title("Isolation Forest Anomaly Detection")
plt.show()

##Q8. Train a SARIMA model on the AirPassengers dataset and forecast 12 months.


In [None]:
# Solution 8


from statsmodels.tsa.statespace.sarimax import SARIMAX

# Train SARIMA
model = SARIMAX(data['#Passengers'], order=(1,1,1), seasonal_order=(1,1,1,12))
results = model.fit()

# Forecast
forecast = results.get_forecast(steps=12)
forecast_ci = forecast.conf_int()

# Plot
plt.figure(figsize=(10,5))
plt.plot(data, label='Observed')
plt.plot(forecast.predicted_mean, label='Forecast')
plt.fill_between(forecast_ci.index,
                 forecast_ci.iloc[:,0], forecast_ci.iloc[:,1], color='pink', alpha=0.3)
plt.legend()
plt.show()

## Q9. Apply Local Outlier Factor (LOF) on a dataset and visualize anomalies.

In [None]:
## Solution 9

from sklearn.neighbors import LocalOutlierFactor
import numpy as np

# Create synthetic dataset
X = np.random.randn(200, 2)
X[:10] = X[:10] + 6  # anomalies

# LOF
lof = LocalOutlierFactor(n_neighbors=20, contamination=0.05)
y_pred = lof.fit_predict(X)

# Plot
plt.scatter(X[:,0], X[:,1], c=y_pred, cmap="coolwarm")
plt.title("Local Outlier Factor Anomaly Detection")
plt.show()

## Q10. Real-time power grid monitoring: Workflow

In [None]:
## Solution 10

# Workflow:

# Anomaly Detection:

# Use Isolation Forest or LOF for real-time detection of abnormal spikes.

# DBSCAN not suitable for streaming (batch-based).

# Forecasting:

# Use SARIMAX (captures seasonality + external features like weather).

# Validation & Monitoring:

# Rolling forecasts, backtesting.

# Monitor forecast accuracy using MAE, RMSE, MAPE.

# Business Impact:

# Early detection prevents blackouts.

# Forecasting helps optimize power distribution.

# Improves energy purchase planning → cost savings.

# Example SARIMAX for energy demand
model = SARIMAX(data['#Passengers'], order=(1,1,1), seasonal_order=(1,1,1,12))
results = model.fit()

# Anomaly detection (Isolation Forest on consumption data)
iso = IsolationForest(contamination=0.05)
labels = iso.fit_predict(data[['#Passengers']])