**Secure Multiparty Computation (SMPC) based Federated Learning**
In this workbook we implement the idea of sharing clients data using SMPC methods to be used to perform a Federated Learning machine learning model.
Federated Learning involves multiple parties collaborating to train a machine learning model on their combined data without revealing their individual data to each other. In your scenario, we'll have four parties (representing different organizations or datasets) collaboratively training a model on customer information without revealing individual customer data.!The main steps include:
1) Model Parameters: Each party will train a local model on its data. After training, they will share only the model parameters (weights and biases) with the trusted party.
   3) 
Aggregated Gradients: During the training process, parties will compute gradients locally and share only the aggregated gradients (not the actual gradients themselves) with the trusted party. This ensures that individual gradients are not revealed.ll:

In [108]:
import numpy as np
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.ensemble import GradientBoostingClassifier
import pandas as pd



for simplicity we used a Credit Card Fraud dataset from Kaggle and will split it to 4 dataset to resemble data from 4 clients. The source of data is: https://www.kaggle.com/datasets/kartik2112/fraud-detection

In [109]:
# load the data
data = pd.read_csv("ForSMPC.csv")  

data.head()

Unnamed: 0,Num,trans_date_trans_time,cc_num,merchant,category,amt,first,last,gender,street,...,lat,long,city_pop,job,dob,trans_num,unix_time,merch_lat,merch_long,is_fraud
0,0,21/06/2020 12:14,2291164000000000.0,fraud_Kirlin and Sons,personal_care,2.86,Jeff,Elliott,M,351 Darlene Green,...,33.9659,-80.9355,333497,Mechanical engineer,19/03/1968,2da90c7d74bd46a0caf3777415b3ebd3,1371816865,33.986391,-81.200714,0
1,1,21/06/2020 12:14,3573030000000000.0,fraud_Sporer-Keebler,personal_care,29.84,Joanne,Williams,F,3638 Marsh Union,...,40.3207,-110.436,302,"Sales professional, IT",17/01/1990,324cc204407e99f51b0d6ca0055005e7,1371816873,39.450498,-109.960431,0
2,2,21/06/2020 12:14,3598215000000000.0,"fraud_Swaniawski, Nitzsche and Welch",health_fitness,41.28,Ashley,Lopez,F,9333 Valentine Point,...,40.6729,-73.5365,34496,"Librarian, public",21/10/1970,c81755dbbbea9d5c77f094348a7579be,1371816893,40.49581,-74.196111,0
3,3,21/06/2020 12:15,3591920000000000.0,fraud_Haley Group,misc_pos,60.05,Brian,Williams,M,32941 Krystal Mill Apt. 552,...,28.5697,-80.8191,54767,Set designer,25/07/1987,2159175b9efe66dc301f149d3d5abf8c,1371816915,28.812398,-80.883061,0
4,4,21/06/2020 12:15,3526826000000000.0,fraud_Johnston-Casper,travel,3.19,Nathan,Massey,M,5783 Evan Roads Apt. 465,...,44.2529,-85.017,1126,Furniture designer,6/07/1955,57ff021bd3f328f8738bb535c302a31b,1371816917,44.959148,-85.884734,0


In [53]:
# Split data into features (X) and target variable (y)
X = data.drop(columns=["is_fraud", "dob", "trans_date_trans_time","first","last","city", "trans_num","unix_time","street"])  # Update "target_column" with the name of your target column
y = data["is_fraud"]

X.head()

Unnamed: 0,Num,cc_num,merchant,category,amt,gender,state,zip,lat,long,city_pop,job,merch_lat,merch_long
0,0,2291164000000000.0,fraud_Kirlin and Sons,personal_care,2.86,M,SC,29209,33.9659,-80.9355,333497,Mechanical engineer,33.986391,-81.200714
1,1,3573030000000000.0,fraud_Sporer-Keebler,personal_care,29.84,F,UT,84002,40.3207,-110.436,302,"Sales professional, IT",39.450498,-109.960431
2,2,3598215000000000.0,"fraud_Swaniawski, Nitzsche and Welch",health_fitness,41.28,F,NY,11710,40.6729,-73.5365,34496,"Librarian, public",40.49581,-74.196111
3,3,3591920000000000.0,fraud_Haley Group,misc_pos,60.05,M,FL,32780,28.5697,-80.8191,54767,Set designer,28.812398,-80.883061
4,4,3526826000000000.0,fraud_Johnston-Casper,travel,3.19,M,MI,49632,44.2529,-85.017,1126,Furniture designer,44.959148,-85.884734


In [110]:
# Clean up the data

#Create an Integer encoding instanse 
encoder = LabelEncoder()

# Encode all of the columns in the DataFrame
X_encoded = X.apply(encoder.fit_transform)

X_encoded.head()

Unnamed: 0,Num,cc_num,merchant,category,amt,gender,state,zip,lat,long,city_pop,job,merch_lat,merch_long
0,0,397,319,10,186,1,39,254,182,663,794,275,114727,392332
1,1,540,591,10,2884,0,43,811,524,96,89,392,277654,59408
2,2,584,611,5,4028,0,33,69,558,860,675,259,332793,510508
3,3,571,222,9,5905,1,8,291,29,666,702,407,19517,399880
4,4,458,292,13,219,1,21,478,808,523,254,196,501776,305489


In [111]:

# Split data into four parties (for demonstration)
num_parties = 4
data_splits = np.array_split(X_encoded, num_parties)
target_splits = np.array_split(y, num_parties)


# Initialise a logistic regression model for each party
models = [LogisticRegression() for _ in range(num_parties)]
#models = [GradientBoostingClassifier() for _ in range(num_parties)]


  return bound(*args, **kwds)
  return bound(*args, **kwds)


In [112]:
# Train models locally on each party's data

for i in range(num_parties):
    X_train, X_test, y_train, y_test = train_test_split(data_splits[i], target_splits[i], test_size=0.2, random_state=42)
    models[i].fit(X_train, y_train)


STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression
  n_iter_i = _check_optimize_result(


In [113]:
# Function to securely share data among parties using additive secret sharing

def share_data(data):
    shares = [np.random.randint(0, 100, data.shape) for _ in range(num_parties - 1)]
    shares.append(data - np.sum(shares, axis=0))
    return shares

In [114]:
# Function to securely reconstruct data from shared shares

def reconstruct_data(shares):
    return np.sum(shares, axis=0)

# Securely share and aggregate model parameters
shared_params = [share_data(model.coef_) for model in models]
aggregated_params = reconstruct_data(shared_params)


shared_bias = [share_data(model.intercept_.reshape(1, -1)) for model in models]
aggregated_bias = reconstruct_data(shared_bias).reshape(-1)

In [115]:
# Function to securely aggregate gradients
def compute_aggregated_gradients(models, data, targets):
    gradients = []
    for i in range(len(models)):
        model = models[i]
        preds = model.predict_proba(data[i])
        error = preds - targets[i].values.reshape(-1, 1)
        gradient = np.dot(data[i].T, error) / len(data[i])
        gradients.append(share_data(gradient))
    aggregated_gradients = reconstruct_data(gradients)
    return aggregated_gradients


In [116]:
# Assuming each party computes aggregated gradients and securely shares them
aggregated_gradients = compute_aggregated_gradients(models, data_splits, target_splits)


In [117]:
# Aggregation of Model Updates (Trusted Party's Role)
aggregated_gradients = compute_aggregated_gradients(models, data_splits, target_splits)

# Update Global Model
global_model = LogisticRegression()
global_model.coef_ = reconstructed_gradients[0]  # Initialize global model with the first party's model parameters
global_model.intercept_ = reconstructed_gradients[1]



In [119]:
# Global Model Distribution (to all parties)
global_params = global_model.coef_
global_bias = global_model.intercept_

# Securely share global model parameters among parties using SMPC
shared_global_params = share_data(global_params)
shared_global_bias = share_data(global_bias)

# Local Model Refinement
for i in range(num_parties):
    models[i].coef_ = shared_global_params[i]  # Update local model parameters
    models[i].intercept_ = shared_global_bias[i]

# we can evaluate the final global model on a test set (not shown)