# SecurePay — Intelligent Transaction Anomaly Detection System  
## Notebook 04 — Local Outlier Factor & Model Evaluation

---

## Introduction

While Isolation Forest detects globally unusual transactions, some suspicious activities are only anomalous within their local neighborhood. Local Outlier Factor (LOF) is a density-based anomaly detection algorithm that identifies observations which have significantly lower local density compared to their surrounding data points.

This notebook applies the LOF model to detect locally anomalous transactions and compares its behavior with the Isolation Forest model. The goal is to understand different anomaly detection perspectives and evaluate model effectiveness using behavioral deviation patterns.


#Stage 1

In [1]:
import pandas as pd
from sklearn.neighbors import LocalOutlierFactor

df = pd.read_csv("securepay_txn_stream.csv")

features = [
    'txn_hour',
    'txn_amount',
    'amount_deviation',
    'txn_velocity',
    'behavior_score'
]

X = df[features]


#Stage 2

In [2]:
lof = LocalOutlierFactor(
    n_neighbors=20,
    contamination=0.015
)

df['lof_flag'] = lof.fit_predict(X)
df['lof_flag'] = df['lof_flag'].map({1: 0, -1: 1})

df[['lof_flag']].head()


Unnamed: 0,lof_flag
0,0
1,0
2,0
3,0
4,0


#Stage 3

In [3]:
df[df['lof_flag'] == 1].head(10)


Unnamed: 0,txn_id,txn_hour,txn_amount,amount_deviation,txn_velocity,behavior_score,payment_channel,risk_flag,lof_flag
177,TXN00178,22,1032.39,0.48,0.56,0.28,CreditCard,0,1
211,TXN00212,8,486.04,-0.46,0.38,0.32,CreditCard,0,1
249,TXN00250,22,200.29,-0.18,0.8,0.28,DebitCard,0,1
285,TXN00286,8,221.14,0.15,0.31,0.06,UPI,0,1
365,TXN00366,22,697.6,-0.32,0.27,0.05,CreditCard,0,1
510,TXN00511,8,968.22,-0.33,0.36,0.29,DebitCard,0,1
625,TXN00626,8,1187.22,0.17,0.72,0.12,DebitCard,0,1
675,TXN00676,8,440.37,0.12,0.1,0.21,DebitCard,0,1
706,TXN00707,22,794.52,0.07,0.45,0.18,UPI,0,1
850,TXN00851,8,377.9,-0.41,0.29,0.23,UPI,0,1


#Stage 4

In [4]:
df['lof_flag'].value_counts()


Unnamed: 0_level_0,count
lof_flag,Unnamed: 1_level_1
0,9850
1,150


#Stage 5

In [5]:
from sklearn.ensemble import IsolationForest

# Recreate Isolation Forest results inside this notebook
if_model = IsolationForest(
    n_estimators=100,
    contamination=0.015,
    random_state=42
)

if_model.fit(X)

df['iforest_flag'] = if_model.predict(X)
df['iforest_flag'] = df['iforest_flag'].map({1: 0, -1: 1})


#Stage 6

In [6]:
comparison = df[['iforest_flag', 'lof_flag']]
comparison.head()

comparison.value_counts()


Unnamed: 0_level_0,Unnamed: 1_level_0,count
iforest_flag,lof_flag,Unnamed: 2_level_1
0,0,9717
0,1,133
1,0,133
1,1,17
