# Federated Learning Tutorial

Author: Daniyal Shahrokhian

## Problem

Worldline has open sourced some of their data of credit card transactions to
try to predict fraud:

https://www.kaggle.com/mlg-ulb/creditcardfraud

Imagine this dataset is cut in half horizontally. Alice has half of the data
and Bob has the other half. Neither of them wants to send their raw data to us.
However, we convince them to let our model learn from their data in a federated
setting. Implement a way for our model to train on the combined data of both
Alice and Bob without either of them sending us any raw data. Compare it with the model with the traditional approach that can see all the data at once.

## Dependencies & Setup

In [None]:
%%shell
pip install sklearn
pip install pandas
pip install matplotlib
pip install tensorflow

pip uninstall --yes tensorboard tb-nightly

pip install --quiet --upgrade tensorflow-federated
pip install --quiet --upgrade nest-asyncio
pip install --quiet --upgrade tensorboard

In [None]:
import matplotlib.pyplot as plt
%matplotlib notebook
%matplotlib inline

import nest_asyncio
nest_asyncio.apply()

import pandas as pd
import random
from sklearn.model_selection import train_test_split
import tensorflow as tf
import tensorflow_federated as tff
from tensorflow.keras.metrics import BinaryAccuracy, Precision, Recall

SEED = 1337
tf.random.set_seed(SEED)

## Data

In [None]:
df = pd.read_csv('creditcard.csv')
df

In [None]:
# Creating Alice and Bob's splits:
alice_df = df[:len(df.index)//2]
bob_df = df[len(df.index)//2:]

### Exploratory Analysis

Fraudulent transactions only account for 0.17% of the total transactions. Given the large distribution difference, class weighting applied to the classifier won't cut it, so it is very likely that we will need to rely on under/over-sampling.

In [None]:
df['Class'].value_counts()

When splitting the data horizontally, the class distribution does not change drastically.

In [None]:
alice_df['Class'].value_counts()

In [None]:
bob_df['Class'].value_counts()

As it can be seen by plotting the density estimation of the two datasets, there is some differences between the two. In many Federated scenarios, the data sources are non-i.i.d. (Independent and Identically Distributed). At first glance, this also happens in our dataset, but given the small variance it won't be much of a problem. The only variables that have significant differences are `Time` and `Amount`, the former we will not even include on our classifier.

In [None]:
for col in df:
    combined = pd.concat([alice_df[col].reset_index(drop=True), bob_df[col].reset_index(drop=True)], axis=1, ignore_index=True, keys=['Alice', 'Bob'])
    fig, ax = plt.subplots(figsize=(3,2))
    combined.sample(1000, random_state=SEED).plot(kind='density', ax=ax) # Random sample of 1000 to ease computation
    ax.title.set_text(col)
    ax.legend(['Alice', 'Bob'])
    plt.show()

## Setup

In [None]:
EPOCHS = 100
BATCH_SIZE = 64

## Federated Learning Approach

### Data Loading

In [None]:
def make_tf_dataset(dataframe, negative_ratio=None, batch_size=None):
    dataset = dataframe.drop(['Time'], axis=1, inplace=False)

    # Class balancing
    pos_df = dataset[dataset['Class'] == 1]
    neg_df = dataset[dataset['Class'] == 0]
    if negative_ratio:
        neg_df = neg_df.iloc[random.sample(range(0, len(neg_df)), len(pos_df)*negative_ratio), :]
    balanced_df = pd.concat([pos_df, neg_df], ignore_index=True, sort=False)

    y = balanced_df.pop('Class')
    
    # Dataset creation
    dataset = tf.data.Dataset.from_tensor_slices((balanced_df.values, y.to_frame().values))
    if batch_size:
        dataset = dataset.batch(batch_size)
    
    return dataset

In [None]:
train_data, eval_data = [], []
for client_data in [alice_df, bob_df]:
    train_df, eval_df = train_test_split(client_data, test_size=0.1, random_state=SEED)
    train_data.append(make_tf_dataset(train_df, negative_ratio=10, batch_size=BATCH_SIZE))
    eval_data.append(make_tf_dataset(eval_df, batch_size=1))

### Model Definition

In [None]:
def input_spec():
    return (
        tf.TensorSpec([None, 29], tf.float64),
        tf.TensorSpec([None, 1], tf.int64)
    )

def model_fn():
    model = tf.keras.models.Sequential([
        tf.keras.layers.InputLayer(input_shape=(29,)),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid'),
    ])

    return tff.learning.from_keras_model(
        model,
        input_spec=input_spec(),
        loss=tf.keras.losses.BinaryCrossentropy(),
        metrics=[BinaryAccuracy(), Precision(), Recall()])

### Training

In [None]:
trainer = tff.learning.build_federated_averaging_process(
    model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.Adam(),
    server_optimizer_fn=lambda: tf.keras.optimizers.Adam()
)

state = trainer.initialize()
train_hist = []
for i in range(EPOCHS):
    state, metrics = trainer.next(state, train_data)
    train_hist.append(metrics)

    print(f"\rRun {i+1}/{EPOCHS}", end="")

Each time the `next` method is called, the server model is broadcast to each client using a broadcast function. For each client, one epoch of local training is performed. Each client computes the difference between the client model after training and the initial broadcast model. These model deltas are then aggregated at the server using some aggregation function.

### Evaluation

In [None]:
evaluator = tff.learning.build_federated_evaluation(model_fn)

In [None]:
federated_metrics = evaluator(state.model, eval_data)
federated_metrics

## Single Model with all Data at once (for comparison)

### Data Loading

In [None]:
train_data = train_data[0].concatenate(train_data[1])
eval_data = eval_data[0].concatenate(eval_data[1])

### Model Definition

In [None]:
def model_fn():
    model = tf.keras.models.Sequential([
        tf.keras.layers.InputLayer(input_shape=(29,)),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(64, activation='relu'),
        tf.keras.layers.Dense(32, activation='relu'),
        tf.keras.layers.Dense(1, activation='sigmoid'),
    ])
    
    model.compile(
        loss=tf.keras.losses.BinaryCrossentropy(),
        optimizer=tf.keras.optimizers.Adam(),
        metrics=[BinaryAccuracy(), Precision(), Recall()],
    )
    
    return model

### Training

In [None]:
model = model_fn()
history = model.fit(train_data, epochs=EPOCHS)

### Evaluation

In [None]:
test_scores = model.evaluate(eval_data)
single_metrics = {
    'loss': test_scores[0],
    'binary_accuracy': test_scores[1],
    'precision': test_scores[2],
    'recall': test_scores[3]
}

## Conclusion

Comparing both models:

In [None]:
print(f"---Single model metrics---\n{single_metrics}\n")
print(f"---Federated model metrics---\n{dict(federated_metrics)}")

The Federated Learning approach has a better balance between precision and recall, which might be an indicator of better handling of the imbalanced dataset.