# üí≥ Transaction Fraud Detection using Logistic Regression

This notebook builds a Logistic Regression model to detect fraudulent financial transactions using engineered features such as transaction type indicators and balance differences.


In [21]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


## üìÇ Load and Inspect the Dataset

We load the transaction dataset and inspect its structure, data types, and summary statistics.


In [22]:
transactions = pd.read_csv('transactions_modified.csv')
transactions.head()
transactions.info()
transactions['amount'].describe()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   step            1000 non-null   int64  
 1   type            1000 non-null   object 
 2   amount          1000 non-null   float64
 3   nameOrig        1000 non-null   object 
 4   oldbalanceOrg   1000 non-null   float64
 5   newbalanceOrig  1000 non-null   float64
 6   nameDest        1000 non-null   object 
 7   oldbalanceDest  1000 non-null   float64
 8   newbalanceDest  1000 non-null   float64
 9   isFraud         1000 non-null   int64  
 10  isPayment       1000 non-null   int64  
 11  isMovement      1000 non-null   int64  
 12  accountDiff     1000 non-null   float64
dtypes: float64(6), int64(4), object(3)
memory usage: 101.7+ KB


count    1.000000e+03
mean     5.373080e+05
std      1.423692e+06
min      0.000000e+00
25%      2.933705e+04
50%      1.265305e+05
75%      3.010378e+05
max      1.000000e+07
Name: amount, dtype: float64

## üõ†Ô∏è Feature Engineering

We create new features to better capture transaction behavior:
- **isPayment** ‚Üí Identifies PAYMENT or DEBIT transactions
- **isMovement** ‚Üí Identifies CASH_OUT or PAYMENT transactions
- **accountDiff** ‚Üí Absolute balance difference between sender and receiver


In [23]:
transactions['isPayment'] = transactions['type'].isin(['PAYMENT', 'DEBIT']).astype(int)

transactions['isMovement'] = transactions['type'].isin(['CASH_OUT', 'PAYMENT']).astype(int)

transactions['accountDiff'] = abs(
    transactions['oldbalanceOrg'] - transactions['oldbalanceDest']
)

transactions[['isPayment', 'isMovement', 'accountDiff']].head()


Unnamed: 0,isPayment,isMovement,accountDiff
0,0,1,649420.67
1,1,1,0.0
2,0,1,818679.85
3,0,1,6224.42
4,0,0,5542581.85


## üéØ Feature Selection and Target Variable

In [24]:
features = transactions[['amount', 'isPayment', 'isMovement', 'accountDiff']]
label = transactions['isFraud']


## üîÄ Train‚ÄìTest Split

We split the data into training (70%) and testing (30%) sets.


In [25]:
X_train, X_test, y_train, y_test = train_test_split(
    features, label, test_size=0.3, random_state=42
)

## üìä Feature Scaling

Standardization ensures all features contribute equally to the model.


In [26]:
scaler = StandardScaler()

X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)


## ü§ñ Train Logistic Regression Model

In [27]:
model = LogisticRegression()

model.fit(X_train_scaled, y_train)


## üìà Model Evaluation

We evaluate model accuracy on both training and test datasets.

In [28]:
print("Training Accuracy:", model.score(X_train_scaled, y_train))
print("Testing Accuracy:", model.score(X_test_scaled, y_test))


Training Accuracy: 0.8285714285714286
Testing Accuracy: 0.84


## üîç Model Coefficients

These values show how strongly each feature influences fraud prediction.


In [29]:
for feature, coef in zip(features.columns, model.coef_[0]):
    print(f"{feature}: {coef}")

amount: 2.7637196408323277
isPayment: -1.7673805313302933
isMovement: -0.08442369672393296
accountDiff: -1.2088536992947363


## üîÆ Fraud Prediction on New Transactions

We test the trained model on new sample transaction data.


In [30]:
transaction1 = np.array([123456.78, 0.0, 1.0, 54670.1])
transaction2 = np.array([98765.43, 1.0, 0.0, 8524.75])
transaction3 = np.array([543678.31, 1.0, 0.0, 510025.5])

sample_transactions = np.stack((transaction1, transaction2, transaction3))

In [31]:
sample_transactions_scaled = scaler.transform(sample_transactions)
print(model.predict(sample_transactions_scaled))
model.predict_proba(sample_transactions_scaled)


[0 0 0]




array([[0.7079163 , 0.2920837 ],
       [0.99324567, 0.00675433],
       [0.98673395, 0.01326605]])

## ‚úÖ Conclusion

- Feature engineering improved fraud detection capability
- Logistic Regression effectively classified fraudulent transactions
- Balance differences and transaction movement patterns are strong indicators of fraud

This notebook demonstrates an end-to-end machine learning workflow for fraud detection.
