**Question 1:** What is Anomaly Detection? Explain its types.

**Answer:** Anomaly Detection is a data analysis technique used to identify observations that are significantly different from the normal behavior of a dataset. These unusual data points are called anomalies, outliers, or exceptions. Anomalies often indicate important events such as fraud, system failures, data errors, cyber-attacks, or sudden changes in business patterns.

Types of Anomalies
1. Point Anomaly

A point anomaly occurs when a single data point is very different from the rest of the data.

Example:
If a customer usually spends around ₹2,000 per month but suddenly makes a transaction of ₹2,00,000, that transaction is a point anomaly.

2. Contextual Anomaly

A contextual anomaly is abnormal only within a specific context such as time, season, or location.

Example:
A temperature of 30°C is normal in summer but abnormal in winter.
Similarly, high electricity usage at midnight may be unusual but normal during daytime.

3. Collective Anomaly

A collective anomaly occurs when a group of data points together behave abnormally, even if individual points may appear normal.

Example:
A sudden continuous increase in network traffic over several minutes may indicate a cyber attack. Each individual value may seem normal, but the pattern as a whole is abnormal.

**Question 2:** Isolation Forest, DBSCAN, and Local Outlier Factor (LOF)

**Answer:** **Isolation** Forest is an anomaly detection algorithm based on the idea that anomalies are easier to isolate than normal points. It randomly splits the data using decision trees. Since anomalies are rare and different, they require fewer splits to isolate. It works well with large datasets and high-dimensional data.

**DBSCAN** (Density-Based Spatial Clustering of Applications with Noise) is a clustering algorithm that groups data points based on density. Points that lie in low-density regions and do not belong to any cluster are considered anomalies. It is useful when data has clusters of irregular shapes or when the number of clusters is unknown.

**Local** Outlier Factor (LOF) detects anomalies by comparing the local density of a point with the density of its neighbors. If a point has much lower density than its surrounding points, it is marked as an outlier. LOF is useful for detecting anomalies that are unusual only within their local neighborhood.

**Question 3:** Key Components of a Time Series

**Answer:** A time series is data collected at regular time intervals. It usually consists of the following components:

1. Trend

Trend represents the long-term upward or downward movement in the data.

Example:
The number of online shoppers increasing every year.

2. Seasonality

Seasonality refers to patterns that repeat at regular intervals such as daily, monthly, or yearly.

Example:
Air conditioner sales increase every summer and decrease in winter.

3. Cyclical Component

Cyclical patterns are long-term fluctuations that do not have a fixed period and are often related to economic or business cycles.

Example:
Economic growth followed by recession and recovery.

4. Residual (Noise)

Residual is the random variation left after removing trend and seasonal effects. It represents unpredictable behavior.

**Question 4:** Stationarity in Time Series

**Answer:** A time series is called stationary if its statistical properties such as mean, variance, and autocorrelation remain constant over time. Many forecasting models like ARIMA require stationary data.

**How to Test Stationarity**

The most common method is the Augmented Dickey-Fuller (ADF) Test:

- If the p-value is less than 0.05, the series is stationary.

- If the p-value is greater than 0.05, the series is non-stationary.

**How to Make a Series Stationary**

- Differencing (subtract previous value)

- Log transformation (to stabilize variance)

- Seasonal differencing

- Removing trend

**Question 5:** AR, MA, ARIMA, SARIMA, and SARIMAX

**Answer:** **Autoregressive (AR) Model**
This model predicts future values based on past values of the series.

**Moving Average (MA) Model**
This model uses past prediction errors to forecast future values.

**ARIMA (AutoRegressive Integrated Moving Average)**
ARIMA combines AR and MA models and includes a differencing step to make the data stationary. It is used for non-seasonal time series.

**SARIMA (Seasonal ARIMA)**
SARIMA extends ARIMA by adding seasonal components to handle repeating seasonal patterns such as monthly or yearly cycles.

**SARIMAX**
SARIMAX is an extension of SARIMA that allows external variables (exogenous variables) such as weather, holidays, or economic factors to improve forecasting.

**Question 6:** Time Series Decomposition (AirPassengers)

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from statsmodels.datasets import airpassengers
from statsmodels.tsa.seasonal import seasonal_decompose

# Load dataset
data = airpassengers.load_pandas().data
data['Month'] = pd.date_range(start='1949-01', periods=len(data), freq='M')
data.set_index('Month', inplace=True)

# Plot original series
plt.figure(figsize=(10,4))
plt.plot(data['AirPassengers'])
plt.title("Original AirPassengers Time Series")
plt.show()

# Decompose the series
decomposition = seasonal_decompose(data['AirPassengers'], model='multiplicative')

decomposition.plot()
plt.show()


**Question 7:** Isolation Forest Example

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import IsolationForest

# Create sample data
np.random.seed(42)
fare = np.random.normal(15, 5, 500)
distance = np.random.normal(3, 1, 500)

# Add anomalies
fare[:10] = np.random.uniform(50, 100, 10)
distance[:10] = np.random.uniform(10, 20, 10)

df = pd.DataFrame({'fare': fare, 'distance': distance})

# Apply Isolation Forest
model = IsolationForest(contamination=0.02)
df['anomaly'] = model.fit_predict(df[['fare','distance']])

# Plot
plt.scatter(df['distance'], df['fare'], c=df['anomaly'])
plt.xlabel("Distance")
plt.ylabel("Fare")
plt.title("Isolation Forest Anomaly Detection")
plt.show()


**Question 8:** SARIMA Forecast

In [None]:
from statsmodels.tsa.statespace.sarimax import SARIMAX

# Train SARIMA model
model = SARIMAX(data['AirPassengers'],
                order=(1,1,1),
                seasonal_order=(1,1,1,12))

results = model.fit()

# Forecast next 12 months
forecast = results.forecast(steps=12)

# Plot results
plt.figure(figsize=(10,4))
plt.plot(data['AirPassengers'], label='Original')
plt.plot(forecast, label='Forecast', color='red')
plt.legend()
plt.title("SARIMA Forecast for Next 12 Months")
plt.show()


**Question 9:** Local Outlier Factor (LOF)

In [None]:
from sklearn.neighbors import LocalOutlierFactor

X = df[['fare','distance']]

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.02)
labels = lof.fit_predict(X)

plt.scatter(X['distance'], X['fare'], c=labels)
plt.xlabel("Distance")
plt.ylabel("Fare")
plt.title("LOF Anomaly Detection")
plt.show()


**Question 10:** Real-Time Workflow for Power Grid Monitoring


**Answer:** **Anomaly Detection**     
For streaming data arriving every 15 minutes:

Train an Isolation Forest model on historical normal data.

Score each new data point in real time.

If the anomaly score is high, trigger an alert for abnormal spike or drop.

Isolation Forest is preferred because it is fast, scalable, and works well with large continuous data.

**Forecasting Model**

Use SARIMAX because:

Energy demand has daily and weekly seasonality.

Weather conditions affect demand.

Regional differences exist.

External variables like temperature and humidity can be included.

**Validation and Monitoring**

Use performance metrics such as MAE, RMSE, and MAPE.

Apply rolling window validation.

Monitor prediction errors over time.

Retrain the model periodically if performance degrades.

Track number of anomalies detected to identify system drift.

**Business Benefits**

Prevent power outages by early detection of abnormal usage.

Optimize load distribution across regions.

Reduce operational costs.

Detect equipment faults or power theft.

Improve decision-making for energy planning and resource allocation.