#Members:
Bhumika S Mandikel              - PES1UG22AM043

Arnitha Sathish                 - PES1UG22AM030

Athmica Kishore K               - PES1UG22AM036

Anvitha Anand                   - PES1UG22AM029

# Reptile Meta-Learning Algorithm for financial fraud detection

This notebook implements the Reptile meta-learning algorithm to train a model that adapts quickly to new binary classification tasks.We are doing this for financial fraud detection. It includes:
- Dataset preprocessing and handling class imbalance.
- Meta-learning using the Reptile algorithm.
- Evaluation of the meta-learned model on a new task.

The key idea is to train a meta-model that learns an initialization capable of fast adaptation to new tasks.


Installing the required libraries

In [None]:
!pip install numpy pandas scikit-learn tensorflow




## Importing Libraries

We import essential libraries for:
- Data manipulation (`numpy`, `pandas`).
- Model building (`tensorflow`).
- Data preprocessing and evaluation (`sklearn`).


In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import classification_report
from sklearn.utils import resample
import time

## Loading and Preprocessing Data

The `load_data` function:
1. Loads a dataset from a given path.
2. Handles missing values by filling them with zeros.
3. Encodes categorical target variables into numeric form if necessary.
4. Addresses class imbalance by oversampling the minority class with a controlled ratio.

Oversampling ensures that the model doesn't bias towards the majority class.
The size of the datasets chosen is arround 800 for training purposes.
This shows how meta-learning is particularly helpful when u have a smaller amount of data.


In [None]:
# Function to load and preprocess datasets
def load_data(path, oversample_ratio=0.5):
    # Load your dataset
    data = pd.read_csv(path)
    data = data[:800]  # Subset data for faster experimentation

    # Handle NaN values
    if data.isnull().values.any():
        print(f"Warning: NaN values found in {path}. Filling with zeros.")
        data = data.fillna(0)

    # Split into features (X) and target (y)
    X = data.iloc[:, :-1].astype(np.float32).values
    y = data.iloc[:, -1]

    # Encode target variable if categorical
    if y.dtype == 'object':
        le = LabelEncoder()
        y = le.fit_transform(y)
    else:
        y = y.astype(np.float32).values

    # Handle class imbalance through controlled oversampling
    combined = pd.DataFrame(X)
    combined['target'] = y

    majority_class = combined[combined['target'] == 0]
    minority_class = combined[combined['target'] == 1]

    minority_oversampled = resample(
        minority_class,
        replace=True,
        n_samples=int(len(minority_class) + oversample_ratio * len(majority_class)),
        random_state=42
    )

    balanced_data = pd.concat([majority_class, minority_oversampled], axis=0)
    balanced_data = balanced_data.sample(frac=1, random_state=42).reset_index(drop=True)

    X = balanced_data.iloc[:, :-1].values
    y = balanced_data['target'].values

    return X, y

## Creating a Simple Neural Network Model

The `create_model` function defines a basic feedforward neural network:
- 2 hidden layers with 64 and 32 neurons respectively.
- `ReLU` activation for non-linearity.
- A final layer with a sigmoid activation for binary classification.

The model uses the Adam optimizer and binary cross-entropy loss.


In [None]:
# Create a simple neural network model
def create_model(input_shape):
    model = tf.keras.Sequential([
        tf.keras.layers.Input(shape=(input_shape,)),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
    return model

## Training a Model on a Single Task

The `train_on_task` function:
1. Trains the model on a given dataset for a specified number of epochs.We are training for 10 epochs.
2. Records and returns the final training accuracy and training time.

This function is essential for evaluating performance on individual tasks.


In [None]:
# Function to train a model on a single task
def train_on_task(model, X_train, y_train, X_val, y_val, epochs=10):
    start_time = time.time()
    history = model.fit(X_train, y_train, validation_data=(X_val, y_val), epochs=epochs, verbose=0)
    training_time = time.time() - start_time
    return history.history['accuracy'][-1], training_time

## Reptile Algorithm for Meta-Learning

The `reptile` function:
1. Initializes a meta-model to learn shared parameters across tasks.
2. For each task:
   - Trains a task-specific model.
   - Updates the meta-model's parameters using the Reptile update rule.

The meta-model learns an initialization that allows rapid adaptation to new tasks.


In [None]:
# Reptile algorithm implementation
def reptile(train_data, num_tasks=4, epochs=10, num_inner_updates=5):
    input_shape = train_data[0][0].shape[1]
    meta_model = create_model(input_shape)

    for task_idx in range(num_tasks):
        print(f"Training on task {task_idx + 1}/{num_tasks}")
        X, y = train_data[task_idx]
        X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=42)

        task_model = create_model(input_shape)
        for _ in range(num_inner_updates):
            acc, training_time = train_on_task(task_model, X_train, y_train, X_val, y_val, epochs=epochs)
            print(f"Task {task_idx + 1}: Training accuracy: {acc:.4f}, Time: {training_time:.4f} seconds")

        weights = task_model.get_weights()
        meta_model.set_weights([meta + (weight - meta) / (task_idx + 1) for meta, weight in zip(meta_model.get_weights(), weights)])

    return meta_model


## Evaluating the Model

The `evaluate_model_with_report` function:
1. Predicts labels for a test set using the trained model.
2. Generates a classification report to evaluate precision, recall, and F1-score.

This helps us in understanding the model's performance on  unseen data.


In [None]:
# Function to evaluate the model and print a classification report
def evaluate_model_with_report(model, X_test, y_test):
    y_pred = (model.predict(X_test) > 0.5).astype("int32")
    print("Classification Report:\n", classification_report(y_test, y_pred))
    loss, accuracy = model.evaluate(X_test, y_test, verbose=0)
    return accuracy


## Training the Meta-Model

We train the meta-model on multiple datasets (`task_paths`). Each task contributes to learning a shared initialization through the Reptile algorithm.

This step creates a general model capable of adapting to similar tasks.

The datasets chosen have 32 features.We have tried to incorporate different types of financial fraud detection in our 4 training datasets to ensure our model adapts well across the domain. These datasets have been reduced to the desirable size from the original datasets using pca.

In [None]:
# Paths to datasets
task_paths = [
    '/content/preprocessed_dataset1.csv',
    '/content/reduced_dataset_32_features.csv',
    '/content/reduced_dataset_creditcard_32_features.csv',
    '/content/reduced_fraud_data_32_features.csv'
]

# Load datasets
train_data = [load_data(path) for path in task_paths]

# Train the meta-model
meta_model = reptile(train_data)


Training on task 1/4
Task 1: Training accuracy: 0.9352, Time: 2.5351 seconds
Task 1: Training accuracy: 0.9947, Time: 1.4642 seconds
Task 1: Training accuracy: 1.0000, Time: 1.3511 seconds
Task 1: Training accuracy: 1.0000, Time: 1.2753 seconds
Task 1: Training accuracy: 1.0000, Time: 1.1904 seconds
Training on task 2/4
Task 2: Training accuracy: 0.9023, Time: 3.6717 seconds
Task 2: Training accuracy: 0.9545, Time: 1.3379 seconds
Task 2: Training accuracy: 0.9670, Time: 1.3146 seconds
Task 2: Training accuracy: 0.9693, Time: 1.5100 seconds
Task 2: Training accuracy: 0.9784, Time: 1.3939 seconds
Training on task 3/4
Task 3: Training accuracy: 0.9830, Time: 2.5110 seconds
Task 3: Training accuracy: 0.9955, Time: 1.4073 seconds
Task 3: Training accuracy: 0.9989, Time: 2.6711 seconds
Task 3: Training accuracy: 0.9989, Time: 2.2762 seconds
Task 3: Training accuracy: 1.0000, Time: 1.3557 seconds
Training on task 4/4
Task 4: Training accuracy: 0.8850, Time: 2.6091 seconds
Task 4: Training acc

## Testing the Meta-Model on a New Task

We evaluate the meta-learned initialization by:
1. Loading and preprocessing a new dataset (`new_task_path`).
2. Training a model initialized with the meta-learned weights.
3. Measuring its adaptability and performance on the new task.



In [None]:
# Evaluate the meta-learned model on a new task
new_task_path = '/content/carclaims_2000.csv'
X_new, y_new = load_data(new_task_path)
X_new_train, X_new_val, y_new_train, y_new_val = train_test_split(X_new, y_new, test_size=0.2, random_state=42)

# Train on the new task

new_task_model = create_model(X_new.shape[1])
new_task_model.set_weights(meta_model.get_weights())  # Load weights from the meta-model
start_time = time.time()

train_on_task(new_task_model, X_new_train, y_new_train, X_new_val, y_new_val, epochs=20)
training_time = time.time() - start_time

# Evaluate and display results
new_task_accuracy = evaluate_model_with_report(new_task_model, X_new_val, y_new_val)
print(f"New Task Model Training Time: {training_time:.4f} seconds")
print(f"New Task Model Accuracy: {new_task_accuracy:.4f}")

[1m8/8[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step 
Classification Report:
               precision    recall  f1-score   support

         0.0       0.63      0.99      0.77       141
         1.0       0.85      0.12      0.21        92

    accuracy                           0.64       233
   macro avg       0.74      0.55      0.49       233
weighted avg       0.72      0.64      0.55       233

New Task Model Training Time: 3.8752 seconds
New Task Model Accuracy: 0.6438


## Results and Conclusion

- The meta-learned model demonstrates the ability to adapt quickly to new tasks.

- This showcases the efficiency of the Reptile meta-learning approach for financial fraud detection.


# ***Note:***
The model's accuracy is not consistent it changes when we keep re-running the cell which is because of the high class imbalance present in the data.We are working towards solving that problem.