In [6]:
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import roc_auc_score, classification_report, confusion_matrix
import tenseal as ts
import json
import base64

# Manually generating synthetic data
data = pd.DataFrame({
    'Cardholder_Name': ['Alice Smith', 'Bob Johnson', 'Charlie Brown', 'David Wilson', 'Eva Green'],
    'Customer_ID': [1, 2, 3, 4, 5],
    'Age': [45, 34, 65, 29, 53],
    'Transaction_Amount_AUD': [4500, 4900, 4700, 4600, 4800],  # High amounts to indicate suspicious transactions
    'Credit_Score': [400, 450, 420, 410, 430],  # Low credit scores to indicate suspicious behavior
    'IsFraud': [1, 1, 1, 1, 1]  # All transactions are marked as fraudulent
})

print("Generated synthetic data:")
print(data)

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

# Select features and target
X = data[['Transaction_Amount_AUD', 'Credit_Score']]
y = data['IsFraud']

# Train-test split (though it's a small dataset, we'll still split)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)

# Standardize the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Train a linear regression model
model = LinearRegression()
model.fit(X_train_scaled, y_train)

# Predict on the test set
y_pred = model.predict(X_test_scaled)

print("Predictions on test set:")
print(y_pred)

# Calculate model metrics
if len(np.unique(y_test)) > 1:
    roc_auc = roc_auc_score(y_test, y_pred)
else:
    roc_auc = "ROC AUC is not defined due to only one class present in y_test."

conf_matrix = confusion_matrix(y_test, y_pred.round())
class_report = classification_report(y_test, y_pred.round(), zero_division=0)

# Printing model metrics
print("Model Metrics:")
print(f"ROC AUC: {roc_auc}")
print(f"Confusion Matrix:\n{conf_matrix}")
print(f"Classification Report:\n{class_report}")

# Initialize TenSEAL context for CKKS scheme
context = ts.context(ts.SCHEME_TYPE.CKKS, poly_modulus_degree=8192, coeff_mod_bit_sizes=[60, 40, 40, 60])
context.global_scale = 2**40
context.generate_galois_keys()

# Encrypt the predictions
encrypted_predictions = [ts.ckks_vector(context, [pred]) for pred in y_pred]

# Perform both homomorphic sum and product operations on the encrypted predictions
encrypted_sum = encrypted_predictions[0] + encrypted_predictions[1]
encrypted_product = encrypted_predictions[0] * encrypted_predictions[1]

# Serialize the encrypted results for sending
serialized_sum = base64.b64encode(encrypted_sum.serialize()).decode('utf-8')
serialized_product = base64.b64encode(encrypted_product.serialize()).decode('utf-8')

# Prepare the JSON response
response_data = {
    "encrypted_sum": serialized_sum,
    "encrypted_product": serialized_product,
    "model_metrics": {
        "roc_auc": roc_auc,  # **Dynamic model metric included**
        "confusion_matrix": conf_matrix.tolist(),
        "classification_report": class_report
    }
}

# Convert to JSON
response_json = json.dumps(response_data)

# Print the JSON response (In a real web application, this would be sent as an HTTP response)
print("JSON response with encrypted results:")
print(response_json)

# Decrypt the results for display
decrypted_sum = ts.ckks_vector_from(context, base64.b64decode(serialized_sum)).decrypt()[0]
decrypted_product = ts.ckks_vector_from(context, base64.b64decode(serialized_product)).decrypt()[0]

# Interpret the decrypted results for end users
def interpret_risk(prob):
    if prob > 0.7:
        return "High Risk: This transaction is highly likely to be fraudulent. Immediate action is required."
    elif prob > 0.3:
        return "Moderate Risk: There’s a moderate risk associated with this transaction. Please verify it."
    else:
        return "Low Risk: This transaction appears to be normal. No further action is required."

sum_risk_message = interpret_risk(decrypted_sum)
product_risk_message = interpret_risk(decrypted_product)

# Print the interpreted messages
print("Risk Interpretation based on Sum:", sum_risk_message)
print("Risk Interpretation based on Product:", product_risk_message)

# Optionally, include these messages in the JSON response
response_data.update({
    "risk_interpretation": {
        "sum_risk_message": sum_risk_message,
        "product_risk_message": product_risk_message
    }
})

response_json = json.dumps(response_data)
print("Final JSON response with risk interpretation:")
print(response_json)


Generated synthetic data:
  Cardholder_Name  Customer_ID  Age  Transaction_Amount_AUD  Credit_Score  \
0     Alice Smith            1   45                    4500           400   
1     Bob Johnson            2   34                    4900           450   
2   Charlie Brown            3   65                    4700           420   
3    David Wilson            4   29                    4600           410   
4       Eva Green            5   53                    4800           430   

   IsFraud  
0        1  
1        1  
2        1  
3        1  
4        1  
Predictions on test set:
[1. 1.]
Model Metrics:
ROC AUC: ROC AUC is not defined due to only one class present in y_test.
Confusion Matrix:
[[2]]
Classification Report:
              precision    recall  f1-score   support

           1       1.00      1.00      1.00         2

    accuracy                           1.00         2
   macro avg       1.00      1.00      1.00         2
weighted avg       1.00      1.00      1.00    

Result interpretation:

Precision (1.00): The model accurately identified all transactions it labeled as fraudulent.

Recall (1.00): The model caught all actual fraudulent transactions without missing any.

F1-Score (1.00): The model has a perfect balance between precision and recall.

Accuracy (1.00): The model correctly classified all transactions in the test set.

Introduction

In this experiment, I developed a proof-of-concept for detecting fraudulent credit card transactions using synthetic data and Full Homomorphic Encryption (FHE). The key objectives were to demonstrate how we can generate synthetic data, train a simple model, encrypt the predictions, and provide a risk assessment that is both secure and understandable to non-technical users.

1. Synthetic Data Generation

To begin, I manually created a small dataset containing five credit card transactions. Each transaction is designed to appear suspicious, with high transaction amounts and low credit scores. These characteristics were deliberately chosen to simulate fraudulent activity.

2. Model Training

I then trained a simple Linear Regression model using the transaction amount and credit score as features. Despite the small dataset, the model was able to predict the likelihood of each transaction being fraudulent.

Predictions on test set:
[1. 1.]

3. Encryption of Predictions Using FHE

After obtaining the model's predictions, I applied Full Homomorphic Encryption (FHE) using the CKKS scheme from TenSEAL. The predictions were encrypted, and both addition (sum) and multiplication (product) operations were performed on the encrypted data.

The encrypted predictions and the sum are then serialized and included in a JSON response, which could be sent over the network in a real-world scenario.

4. Decryption and Risk Interpretation

Finally, the encrypted sum was decrypted, and I applied a simple risk interpretation:

"sum_risk_message": "High Risk: This transaction is highly likely to be fraudulent. Immediate action is required."

And the output is stored as JSON so its ready to be integrated with the website.

Conclusion

This experiment demonstrates how we can generate synthetic data, train a model, and use Full Homomorphic Encryption to securely handle and transmit prediction results. The key takeaway is that even sensitive data can be processed securely without compromising privacy, all while providing actionable insights in a user-friendly format.

This approach can be extended to larger datasets and more complex models, ensuring that data privacy is maintained throughout the entire process.

