✅ What is Anomaly Detection?

Anomaly detection is the process of identifying rare or abnormal patterns in data that deviate significantly from expected behavior. In credit card fraud detection, anomalies are transactions that differ from normal spending patterns. Such transactions are usually rare but very important to detect.

✅ What is an Autoencoder?

An autoencoder is an unsupervised neural network that learns to compress data into a smaller representation (encoding) and then reconstruct it back to the original form (decoding). It learns patterns of normal data, so when abnormal data is passed, it fails to reconstruct well. This reconstruction error is used to detect anomalies.

✅ Why use Autoencoder for Anomaly Detection?

Autoencoders learn the regular patterns in normal data and reconstruct them accurately. Fraud/abnormal samples show high reconstruction error because they don’t match learned patterns. This makes autoencoders ideal for fraud detection where anomalies are rare.

✅ Encoder

The encoder part compresses the input data into a smaller latent representation. It captures the most important features of normal transactions. This helps the model learn essential behavior and ignore noise.

✅ Latent Representation

Latent representation is the compressed encoded form of input data learned by the autoencoder. It contains hidden important features extracted by the encoder. This bottleneck forces the network to learn meaningful patterns.

✅ Decoder

The decoder reconstructs the original input data from the latent representation. It tries to produce output similar to input. Poor reconstruction indicates abnormal transaction patterns.

✅ Reconstruction Error

Reconstruction error is the difference between the original input and the output generated by the autoencoder. High error suggests the sample is an anomaly or fraud. This is used as a scoring measure to flag frauds.

✅ Why use Credit Card Dataset?

The credit card dataset contains real-world transactions with a high class imbalance — few frauds and many normal samples. This makes it ideal to test anomaly detection because autoencoders learn only normal patterns.

✅ Optimizer

An optimizer like Adam adjusts network weights during training to reduce loss. It speeds up convergence and improves learning efficiency. Adam is widely used because it adapts learning rate automatically.

✅ Loss Function

A reconstruction loss (like Mean Squared Error) measures how well the model rebuilds input data. Lower loss means good reconstruction of normal data. Higher loss for fraud data helps detect anomalies.

✅ Evaluation Metrics

Metrics like Precision, Recall, F1-Score, and ROC-AUC are used to evaluate anomaly models. Recall is especially important because missing a fraud transaction is more harmful than a false alarm.

✅ Imbalanced Dataset Problem

Credit card fraud dataset is highly imbalanced — fraud cases are extremely rare. Traditional models struggle, but autoencoders work well because they learn only normal behavior patterns.

✅ Threshold for Anomaly

A threshold value of reconstruction error is chosen to label a transaction as fraud. If error > threshold, transaction = anomaly. Threshold tuning affects performance and must be chosen carefully.

✅ Why Autoencoder instead of Supervised Learning?

In fraud detection, fraud labels are limited and change constantly. Autoencoders don't require labels — they learn from normal data only. This makes them perfect for dynamic fraud patterns.

✅ Overfitting in Autoencoder

If autoencoder becomes too powerful, it may also reconstruct anomalies well. To prevent this, we use dropout, limit network size, and validate model using normal data only.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
import matplotlib.pyplot as plt

from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.layers import Dense,Input

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import confusion_matrix,accuracy_score,precision_score,recall_score


In [None]:
df=pd.read_csv('creditcard.csv')
x=df.drop(["Time","Class"],axis=1)
y=df['Class']

sc=StandardScaler()
x_scaled=sc.fit_transform(x)
input_dim=x_scaled.shape[1] #Get number of input features
x_normal=x_scaled[y==0]  #Keep only normal transactions for training

x_train,x_val=train_test_split(x_normal,test_size=0.20,random_state=42)
print(input_dim)

29


In [None]:
input_layer=Input(shape=(input_dim,))
encoder=Dense(24,activation='relu')(input_layer)
latent=Dense(14,activation='relu')(encoder)

decoder=Dense(24,activation='relu')(latent)
output_layer=Dense(input_dim,activation='linear')(decoder)

autoencoder=Model(inputs=input_layer,outputs=output_layer)
autoencoder.compile(optimizer=Adam(learning_rate=0.001),loss='mse',metrics=['accuracy'])
autoencoder.summary()

In [None]:
print("\nStarting Autoencoder model training...")
h_auto=autoencoder.fit(x_train,x_train,batch_size=128,epochs=20,verbose=1,shuffle=True,validation_data=(x_val, x_val))
print("Autoencoder model training complete.")


Starting Autoencoder model training...
Epoch 1/20
[1m1777/1777[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m8s[0m 4ms/step - accuracy: 0.2575 - loss: 0.6789 - val_accuracy: 0.4596 - val_loss: 0.3539
Epoch 2/20
[1m1777/1777[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.4643 - loss: 0.3272 - val_accuracy: 0.4884 - val_loss: 0.3004
Epoch 3/20
[1m1777/1777[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.5003 - loss: 0.2741 - val_accuracy: 0.5106 - val_loss: 0.2732
Epoch 4/20
[1m1777/1777[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.5151 - loss: 0.2592 - val_accuracy: 0.5268 - val_loss: 0.2648
Epoch 5/20
[1m1777/1777[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m5s[0m 3ms/step - accuracy: 0.5277 - loss: 0.2460 - val_accuracy: 0.5316 - val_loss: 0.2518
Epoch 6/20
[1m1777/1777[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 3ms/step - accuracy: 0.5421 - loss: 0.2380 - val_accuracy: 0.

In [None]:
reconstructions = autoencoder.predict(x_scaled)

# Calculate the Mean Squared Error (MSE) for each transaction
mse = np.mean(np.square(x_scaled - reconstructions), axis=1)

# Store results in a DataFrame for easy analysis
error_df = pd.DataFrame({
    'Reconstruction_Error': mse,
    'True_Class': y
})

[1m8901/8901[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m11s[0m 1ms/step


In [None]:
fraud_error=error_df[error_df['True_Class']==1]
normal_error=error_df[error_df['True_Class']==0]

print(fraud_error.tail())
print("\n")
print(normal_error.tail())

        Reconstruction_Error  True_Class
279863              5.735359           1
280143              3.038280           1
280149              3.036277           1
281144              5.677252           1
281674              0.048213           1


        Reconstruction_Error  True_Class
284802              0.310502           0
284803              0.231006           0
284804              0.056988           0
284805              0.150966           0
284806              0.137831           0


In [None]:
# Extract the normal (non-fraudulent) reconstruction errors
normal_error = error_df[error_df['True_Class'] == 0].Reconstruction_Error

# 1. Set Anomaly Threshold
# Use the 95th percentile of the reconstruction error from NORMAL transactions
THRESHOLD = np.percentile(normal_error, 95)
print(f"\nCalculated Anomaly Threshold: {THRESHOLD:.6f}")


# 2. Predict anomalies for the entire dataset
# The prediction is TRUE (1 or Fraud) if the error is above the threshold
predicted_anomalies = error_df['Reconstruction_Error'] > THRESHOLD



Calculated Anomaly Threshold: 0.473653


In [None]:
print("\nConfusion Matrix")
print(confusion_matrix(error_df['True_Class'], predicted_anomalies))


Confusion Matrix
[[270099  14216]
 [    79    413]]


In [None]:

# Calculate and print Precision for the minority class (pos_label=1)
precision = precision_score(error_df['True_Class'], predicted_anomalies, pos_label=1)
print(f"Precision: {100*precision:.2f}%")

# Calculate and print Recall for the minority class (pos_label=1)
recall = recall_score(error_df['True_Class'], predicted_anomalies, pos_label=1)
print(f"Recall: {100*recall:.2f}%")

Precision: 2.82%
Recall: 83.94%


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input,Dense
from tensorflow.keras.optimizers import Adam

from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score,confusion_matrix,precision_score,recall_score
from sklearn.model_selection import train_test_split

In [None]:
df=pd.read_csv('ecg_autoencoder_dataset.csv',header=None)

x=df.iloc[:,:-1]
y=df.iloc[:,-1]

sc=StandardScaler()
x_scale=sc.fit_transform(x)
input_dim=x_scale.shape[1]

x_normal=x_scale[y==0]
x_train,x_val=train_test_split(x_normal,random_state=42,train_size=0.20)

print(f"Data ready. Input dimension: {input_dim} features.")
print(f"Training Autoencoder on {x_normal.shape[0]} normal heartbeats.")

Data ready. Input dimension: 140 features.
Training Autoencoder on 2919 normal heartbeats.


In [None]:

LATENT_DIM = 70     # Bottleneck size (140 / 2)
INTERMEDIATE_DIM = 120


# Input Layer
# Compressed Layer 1
# Latent Representation (Bottleneck)
input_layer=Input(shape=(input_dim,))
encoder=Dense(INTERMEDIATE_DIM,activation='relu')(input_layer)
latent=Dense(LATENT_DIM,activation='relu')(encoder)


# Decompressed Layer 1 (Symmetrical to Encoder_L1)
# Output Layer (Must match the Input Dimension)
decoder=Dense(INTERMEDIATE_DIM,activation='relu')(latent)
output_layer=Dense(input_dim,activation='linear')(decoder)

# Create the Full Autoencoder Model
autoencoder=Model(inputs=input_layer,outputs=output_layer)
autoencoder.compile(optimizer=Adam(learning_rate=0.001),metrics=['accuracy'],loss='mse') # Mean Squared Error is the metric for reconstruction quality
autoencoder.summary()

In [None]:
# Note that the input and output are identical (X_train_normal, X_train_normal), as the goal is self-reconstruction.

print("\nStarting Autoencoder model training...")

h_aukto=autoencoder.fit(x_scale,x_scale,batch_size=128,epochs=20,verbose=1,shuffle=True,validation_data=(x_val,x_val))

print("Autoencoder model training complete.")

Epoch 1/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 13ms/step - accuracy: 0.0197 - loss: 0.8186 - val_accuracy: 0.1109 - val_loss: 0.2519
Epoch 2/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.0960 - loss: 0.2652 - val_accuracy: 0.1879 - val_loss: 0.1439
Epoch 3/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.1537 - loss: 0.1730 - val_accuracy: 0.2307 - val_loss: 0.1093
Epoch 4/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - accuracy: 0.1847 - loss: 0.1260 - val_accuracy: 0.2453 - val_loss: 0.0915
Epoch 5/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.2027 - loss: 0.1116 - val_accuracy: 0.2684 - val_loss: 0.0792
Epoch 6/20
[1m40/40[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - accuracy: 0.2295 - loss: 0.0919 - val_accuracy: 0.2748 - val_loss: 0.0746
Epoch 7/20
[1m40/40[0m [32m━━━━━━━━━

Our Autoencoder Model Predicts the features (not target) given the features itself (It tries to reconstruct the input values as it is).

Error rates are low (close to 0) when model reconstructs normal heatbeat's features as it is familiar with these patterns (we train the model only on normal data).

Abnormal heartbeats have a larger error rate as the model is not familiar with these patterns. (they are like 'out of syllabus' questions).




In [None]:
reconstruction=autoencoder.predict(x_scale)

mse=np.mean(np.square(x_scale-reconstruction),axis=1)

error_df=pd.DataFrame({
    'Reconstuction_Error':mse,
    'True_Class':y
})

[1m157/157[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step


We find a **THRESHOLD** value for all the errors to be compared with.

This is the **value which is greater than 95% of the error values** of all **non-fraudulent** transactions.

This also means that all other transactions with **error > THRESHOLD** will be considered **FRAUD** (including 5% normal transactions)

In [None]:
# Extract the normal reconstruction errors
normal_error=error_df[error_df['True_Class']== 1.0 ].Reconstuction_Error

# 1. Set Anomaly Threshold
# Use the 95th percentile of the reconstruction error from NORMAL hearbeats
threshold=np.percentile(normal_error,95)

# 2. Predict anomalies for the entire dataset
# The prediction is TRUE (1 or Abnormal) if the error is above the threshold
anamoly_error=error_df['Reconstuction_Error'] > threshold
anamoly_error=np.where(anamoly_error,0.0,1.0)

In [None]:
abnormal_errors = error_df[error_df['True_Class'] == 1]
normal_errors = error_df[error_df['True_Class'] == 0]

print(abnormal_errors.tail())
print("\n")
print(normal_errors.tail())

In [None]:
print("\nConfusion Matrix")
print(confusion_matrix(error_df['True_Class'], predicted_anomalies))

[[1975  104]
 [2768  151]]


In [None]:
# Calculate and print Precision for the minority class (pos_label=0)
precision = precision_score(error_df['True_Class'], predicted_anomalies, pos_label=0.0)
print(f"Precision: {100*precision:.2f}%")

# Calculate and print Recall for the minority class (pos_label=0)
recall = recall_score(error_df['True_Class'], predicted_anomalies, pos_label=0.0)
print(f"Recall: {100*recall:.2f}%")

Here, the **main evaluation metric is Recall** and not Precision.

High Recall indicates that higher number of Fraud transactions have been correctly flagged, which is the main goal.