# Central and Worker 

This notebook goes over the necessery code for central and worker federated learning agents, which have their own machine learning pipelines that enable the following incremental actions:
1. Global model initilization in central
2. Sending initial model to workers
3. Training a new model in workers
4. Returning model updates to central
5. Aggregating updates into a global model
6. Repeating steps 2 to 4 until model converges

In this project we will use the [Synthetic Financial Datasets For Fraud Detection](https://www.kaggle.com/datasets/ealaxi/paysim1/data) to simulate a fraud detection infrastucture, where the central node is controlled by the trade organization and worker nodes are different banks that belong to that organisation where the trade organisation decides to use federated learning to facilitate a adapting, robust and private fraud detection system for their partners.The import we will use in this notebook are the following:

- Pandas
- Numpy
- Scikit-learn

In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import RandomizedSearchCV
from sklearn.ensemble import RandomForestRegressor

from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from sklearn.neural_network import MLPClassifier

from sklearn.metrics import r2_score
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

In [None]:
source_data_df = pd.read_csv('data/Fraud_Detection.csv')

In [None]:
source_data_df

## Formatting

The columns are:
- Row index = The amount of logs
- Step = One hour in the real world 
- Type = Transaction type: CASH-IN, CASH-OUT, DEBIT, PAYMENT and TRANSFER
- Amount = Unit of local currency
- NameOrig = Customer who started the transaction
- OldbalanceOrig = Initial balance before the transaction
- NewbalanceOrig = New balance after the transaction
- NameDest = Customer who is the recipient of the transaction
- oldbalanceDest = Initial balance recipient before the transaction.
- NewbalanceDest = New balance recipient after the transaction
- IsFraud = The transactions made by the fraudulent agents.
- IsFlaggedFraud = Existing detection, where more than 200.000 transcations are flagged

In order to simulate fraud detection, we need to remove the following columns:
- OldbalanceOrg
- NewbalanceOrig
- OldbalanceDest
- NewbalanceDest
- IsFlaggedFraud (Should be used for comparison, but not for training a model)

After that, we need to modify the following columns:
- type = Requires hot one encoding using integers
- nameOrig = requires string integer encoding
- nameDest = requires string integer encoding
- amount = round up

In [None]:
irrelevant_columns = [
    "oldbalanceOrg",
    "newbalanceOrig",
    "oldbalanceDest",
    "newbalanceDest",
    "isFlaggedFraud"
]

formated_data_df = source_data_df.copy()
# Removing irrelevant columns
formated_data_df.drop(
    columns = irrelevant_columns, 
    inplace = True
)
# Hot one encoding type
formated_data_df = pd.get_dummies(
    data = formated_data_df, 
    columns = ['type']
)
# Chancing bool columns into integers
for column in formated_data_df.columns:
    if 'type' in column:
        formated_data_df[column] = formated_data_df[column].astype(int)

In [None]:
formated_data_df

In [None]:
# Getting all unique strings in nameOrig
unique_values_orig = formated_data_df['nameOrig'].unique()
unique_value_list_orig = unique_values_orig.tolist()
# Getting all unique strings in nameDest
unique_values_dest = formated_data_df['nameDest'].unique()
unique_value_list_dest = unique_values_dest.tolist()

In [None]:
print(len(unique_value_list_orig))
print(len(unique_value_list_dest))

In [None]:
# Checking if there is similar strings between orig and dest
set_orig_ids = set(unique_value_list_orig)
set_dest_ids = set(unique_value_list_dest)
intersection = set_dest_ids.intersection(set_orig_ids)

In [None]:
len(intersection)

In [None]:
# Removing cross over strings and creating a new list
set_dest_ids.difference_update(intersection)
fixed_unique_value_list_dest = list(set_dest_ids)

In [None]:
# Index encoding all orig strings
orig_encoding_dict = {}
index = 1
for string in unique_value_list_orig:
    if not string in orig_encoding_dict:
        orig_encoding_dict[string] = index
        index = index + 1

In [None]:
len(orig_encoding_dict)

In [None]:
# Index encoding all dist strings
dest_encoding_dict = {}
cont_index = len(orig_encoding_dict) + 1
for string in fixed_unique_value_list_dest:
    if not string in dest_encoding_dict:
        dest_encoding_dict[string] = cont_index
        cont_index = cont_index + 1

In [None]:
len(dest_encoding_dict)

In [None]:
len(orig_encoding_dict) + len(dest_encoding_dict)

In [None]:
string_orig_values = formated_data_df['nameOrig'].tolist()
string_dest_values = formated_data_df['nameDest'].tolist()

In [None]:
orig_encoded_values = []
for string in string_orig_values:
    orig_encoded_values.append(orig_encoding_dict[string])

In [None]:
len(orig_encoded_values)

In [None]:
dest_encoded_values = []
for string in string_dest_values:
    if not string in dest_encoding_dict:
        dest_encoded_values.append(orig_encoding_dict[string])
        continue
    dest_encoded_values.append(dest_encoding_dict[string])

In [None]:
len(dest_encoded_values)

In [None]:
formated_data_df['nameOrig'] = orig_encoded_values
formated_data_df['nameDest'] = dest_encoded_values

In [None]:
formated_data_df

In [None]:
formated_data_df['amount'] = formated_data_df['amount'].round(0).astype(int)

In [None]:
formated_data_df

In [None]:
column_order = [
    'step',
    'amount',
    'nameOrig',
    'nameDest',
    'type_CASH_IN',
    'type_CASH_OUT',
    'type_DEBIT',
    'type_PAYMENT',
    'type_TRANSFER',
    'isFraud'
]
formated_data_df = formated_data_df[column_order]

In [None]:
formated_data_df

In [None]:
formated_data_df.to_csv('data/Formated_Fraud_Detection_Data.csv', index = True)