# Outline:


This notebook discusses applying Horizontal Federated Learning (HFL) using TensorFlow Federated on image datasets which include the following steps: 

*   Tensorflow Federated dataset
*   Tensorflow Federated model
*   Tensorflow Federated computations for initialization train and validation


## Usage:

To use this notebook, you need to create your dataset as a dictionary where the keys are client IDs and values are the associated image/ label datasets. More precisely, each key is a client ID, and each value is a tuple of NumPy arrays for images and their associated one-hot encoded labels. Note that we assume the images in NumPy arrays are already preprocessed.

After having your data ready in the mentioned format, all you need is to specify the "**path**" variable to be the path to the dictionary file (images/labels from different clients).


### Installation of packages

In [1]:
# !pip install tensorflow-federated==0.18
# !pip install nest_asyncio


### Importing Required Packages:

In [1]:
import nest_asyncio
nest_asyncio.apply()
import glob
import pandas as pd
import os
import numpy as np
import matplotlib.pyplot as plt

import collections
import tensorflow as tf
import tensorflow_federated as tff


from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD

2021-11-01 09:22:41.258295: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2021-11-01 09:22:41.258328: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.


In [2]:
# Base path to the data dictionary
path=''

data=np.load(path,allow_pickle='TRUE').item() # data[0] return the image-lable tuple for hospital number 0


FileNotFoundError: [Errno 2] No such file or directory: ''

## Creating Federated Data:
The function `tff.simulation.ClientData.from_clients_and_fn`, requires that we write a function that accepts a `client_id` as input and returns a `tf.data.Dataset`. Let's do that in the helper function below: 

In [None]:

batch_size=64
SHUFFLE_BUFFER=128
def create_tf_dataset_for_client_fn(client_id):
    client_data = data[client_id]
    dataset=tf.data.Dataset.from_tensor_slices((client_data[0], client_data[1])).prefetch(buffer_size=128)  #client_data[0] is images,   client_data[1] is labels
    dataset=dataset.shuffle(2000, reshuffle_each_iteration=True).batch(batch_size)
    
    return dataset


Now, let's create the training and testing federated data. To this end, we specify the `train_client_ids` and `test_client_ids` which contain the IDs of train and test clients:

In [None]:
# IDs for train and test clients. For example:
#train_client_ids=[0,2,3]
#test_client_ids=[1]


# specify the train and test clients based on their IDs:
train_client_ids=[]  
test_client_ids=[]


train_data = tff.simulation.ClientData.from_clients_and_fn(client_ids=train_client_ids,   create_tf_dataset_for_client_fn=create_tf_dataset_for_client_fn)
test_data = tff.simulation.ClientData.from_clients_and_fn(
        client_ids=test_client_ids,
        create_tf_dataset_for_client_fn=create_tf_dataset_for_client_fn)




Let's see number of the clients for training federated data and also the sructure of data:

In [None]:
len(train_data.client_ids)

3

In [None]:
train_data.element_type_structure

(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float16, name=None),
 TensorSpec(shape=(None, 3), dtype=tf.float64, name=None))

To see exactly how one batch od data look like we create the following example dataset from one the client ids of train federated data:

In [None]:
example_dataset = train_data.create_tf_dataset_for_client(
        train_data.client_ids[0]
    )
#print(example_dataset)
example_element = iter(example_dataset).next()
print(example_element)

(<tf.Tensor: shape=(64, 224, 224, 3), dtype=float16, numpy=
array([[[[0.5566 , 0.3882 , 0.706  ],
         [0.502  , 0.3333 , 0.651  ],
         [0.4707 , 0.298  , 0.6235 ],
         ...,
         [0.5215 , 0.2864 , 0.682  ],
         [0.541  , 0.2825 , 0.6943 ],
         [0.5884 , 0.3215 , 0.741  ]],

        [[0.6665 , 0.4941 , 0.82   ],
         [0.5923 , 0.4197 , 0.745  ],
         [0.5176 , 0.3452 , 0.6704 ],
         ...,
         [0.537  , 0.3098 , 0.682  ],
         [0.5527 , 0.302  , 0.6904 ],
         [0.5884 , 0.3293 , 0.7256 ]],

        [[0.6904 , 0.5254 , 0.859  ],
         [0.6313 , 0.4666 , 0.8    ],
         [0.545  , 0.3804 , 0.714  ],
         ...,
         [0.5254 , 0.3137 , 0.639  ],
         [0.5254 , 0.302  , 0.6353 ],
         [0.545  , 0.3098 , 0.655  ]],

        ...,

        [[0.655  , 0.4824 , 0.8394 ],
         [0.6157 , 0.443  , 0.792  ],
         [0.5923 , 0.4197 , 0.7686 ],
         ...,
         [0.4236 , 0.2354 , 0.5566 ],
         [0.4548 , 0.2744 , 

We now have almost all the building blocks in place to construct federated datasets.

One of the ways to feed federated data to TFF in a simulation is simply as a Python list, with each element of the list holding the data of an individual user, as a `tf.data.Dataset`. Since we already have an interface for that, let's use it.

The helper function `make_federated_data` below will construct a list of datasets from the
given set of users as an input to a round of training or evaluation.

In [None]:
def make_federated_data(client_data, client_ids):
    return [client_data.create_tf_dataset_for_client(x) for x in client_ids]

In [None]:
federated_train_data = make_federated_data(train_data, train_client_ids)
print('Number of client datasets: {l}'.format(l=len(federated_train_data)))
print('First dataset: {d}'.format(d=federated_train_data[0]))

Number of client datasets: 3
First dataset: <BatchDataset shapes: ((None, 224, 224, 3), (None, 3)), types: (tf.float16, tf.float64)>


## Tensorflow Federated model
First we create a simple CNN model with Keras:


In [None]:
num_classes = data[0][1].shape[1]
img_height=data[0][0].shape[1]
img_width=data[0][0].shape[2]


def create_keras_model():
    model = Sequential([
      layers.Conv2D(16, 3, padding='same', activation='relu',input_shape=(img_height, img_width, 3)),
      layers.MaxPooling2D(),
      layers.Conv2D(32, 3, padding='same', activation='relu'),
      layers.MaxPooling2D(),
      layers.Conv2D(64, 3, padding='same', activation='relu'),
      layers.MaxPooling2D(),
      layers.Flatten(),
      layers.Dense(128, activation='relu'),
      layers.Dense(num_classes,activation='softmax')
    ])
    
    return model

Note that we do not compile the model yet. The loss, metrics, and optimizers are introduced later.

If you have a Keras model like the one we've just defined above, you can have TFF wrap it for you by invoking
`tff.learning.from_keras_model`, passing the model and a sample data batch as
arguments (`input_spec=example_dataset.element_spec`), as shown below.


In [None]:
# We _must_ create a new model here, and _not_ capture it from an external
# scope. TFF will call this within different graph contexts.
def model_fn():
    keras_model = create_keras_model()

    return tff.learning.from_keras_model(
            keras_model,
            input_spec=example_dataset.element_spec,
            loss=tf.keras.losses.CategoricalCrossentropy(),
            metrics=[tf.keras.metrics.CategoricalAccuracy(),tf.keras.metrics.AUC(name='AUC')])




The model_fn is a no-arg function that returns a `tff.learning.Model`. 

### Training the model on federated data 

Now that we have a model wrapped as `tff.learning.Model` for use with TFF, we
can let TFF construct a **Federated Averaging** algorithm by invoking the helper
function `tff.learning.build_federated_averaging_process`, as follows.

One critical note on the Federated Averaging algorithm below, there are **2**
optimizers: a _client_optimizer_ and a _server_optimizer_. The
_client_optimizer_ is only used to compute local model updates on each client.
The _server_optimizer_ applies the averaged update to the global model at the
server. In particular, this means that the choice of optimizer and learning rate
used may need to be different than the ones you have used to train the model on
a standard i.i.d. dataset.



In [None]:
iterative_process = tff.learning.build_federated_averaging_process(
    model_fn,
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.01),
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.05))

In this case, the two computations generated and packed into iterative_process implement Federated Averaging.

Let's start with the `initialize` computation. As is the case for all federated
computations, you can think of it as a function. The computation takes no
arguments, and returns one result - the representation of the state of the
Federated Averaging process on the server. While we don't want to dive into the
details of TFF, it may be instructive to see what this state looks like. You can
visualize it as follows.

In [None]:
str(iterative_process.initialize.type_signature)

'( -> <model=<trainable=<float32[3,3,3,16],float32[16],float32[3,3,16,32],float32[32],float32[3,3,32,64],float32[64],float32[50176,128],float32[128],float32[128,3],float32[3]>,non_trainable=<>>,optimizer_state=<int64>,delta_aggregate_state=<value_sum_process=<>,weight_sum_process=<>>,model_broadcast_state=<>>@SERVER)'

While the above type signature may at first seem a bit cryptic, you can recognize that the server state consists of a model (the initial model parameters that will be distributed to all devices), and optimizer_state (additional information maintained by the server, such as the number of rounds to use for hyperparameter schedules, etc.).

Let's invoke the initialize computation to construct the server state.

In [None]:
state = iterative_process.initialize()

In [None]:
state

ServerState(model=ModelWeights(trainable=[array([[[[ 0.04500341,  0.07506523,  0.13334626,  0.02123837,
          -0.15200004, -0.0527821 ,  0.16835535, -0.03819233,
          -0.00059795, -0.06880844, -0.06946006, -0.1415799 ,
          -0.04111256, -0.00379904,  0.09466806,  0.00757378],
         [ 0.06174199, -0.0798183 , -0.04008025,  0.16750336,
          -0.01329011,  0.01764365, -0.15192157,  0.13478664,
           0.08884886,  0.07034698,  0.02075073, -0.02016206,
          -0.11975331, -0.04103395,  0.13985214,  0.17341033],
         [-0.10627598,  0.07886904,  0.15986231,  0.1613945 ,
          -0.0751308 ,  0.01347902,  0.06773752, -0.05200399,
          -0.00820206,  0.11969686, -0.03520213,  0.06296366,
          -0.02007082, -0.0860141 , -0.10902989, -0.10729892]],

        [[ 0.11151829,  0.0926097 , -0.1490289 ,  0.02408271,
           0.04292114, -0.13799317,  0.17074338, -0.17743888,
           0.02143997, -0.13180053,  0.13209558,  0.1731816 ,
          -0.1467872 , 

The second of the pair of federated computations, `next`, represents a single
round of Federated Averaging, which consists of pushing the server state
(including the model parameters) to the clients, on-device training on their
local data, collecting and averaging model updates, and producing a new updated
model at the server.

Conceptually, you can think of `next` as having a functional type signature that
looks as follows.

```
SERVER_STATE, FEDERATED_DATA -> SERVER_STATE, TRAINING_METRICS
```

In particular, one should think about `next()` not as being a function that runs on a server, but rather being a declarative functional representation of the entire decentralized computation - some of the inputs are provided by the server (`SERVER_STATE`), but each participating device contributes its own local dataset.

To run a single round of training and visualizing the results We can use the following line of code and federated data we've already generated above:






In [None]:
state, metrics = iterative_process.next(state, federated_train_data)
print('round  1, metrics={}'.format(metrics))

  futures._chain_future(ensure_future(coro, loop=loop), future)
  futures._chain_future(ensure_future(coro, loop=loop), future)


round  1, metrics=OrderedDict([('broadcast', ()), ('aggregation', OrderedDict([('mean_value', ()), ('mean_weight', ())])), ('train', OrderedDict([('categorical_accuracy', 0.67475504), ('AUC', 0.82607585), ('loss', 0.7946014)])), ('stat', OrderedDict([('num_examples', 6429)]))])


## Evaluation
To perform evaluation on federated data, you can construct another *federated
computation* designed for just this purpose, using the
`tff.learning.build_federated_evaluation` function, and passing in your model
constructor as an argument. Note that as the evaluation doesn't perform gradient descent, and there's no need to construct
optimizers.

For experimentation and research, when a centralized test dataset is available,
[Federated Learning for Text Generation](federated_learning_for_text_generation.ipynb)
demonstrates another evaluation option: taking the trained weights from
federated learning, applying them to a standard Keras model, and then simply
calling `tf.keras.models.Model.evaluate()` on a centralized dataset.

In [None]:
evaluation = tff.learning.build_federated_evaluation(model_fn)   # iterative_process = tff.learning.build_federated_averaging_process(model_fn,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.01), server_optimizer_fn=lambda: tf.keras.optimizers.SGD(learning_rate=0.05))


federated_test_data = make_federated_data(test_data, test_client_ids)

val_metrics = evaluation(state.model, federated_test_data)

val_metrics

OrderedDict([('categorical_accuracy', 0.70076585),
             ('AUC', 0.7917838),
             ('loss', 1.0567521)])

In [None]:
a=iter(federated_test_data[0]).next()
print(a[0].shape)


(64, 224, 224, 3)


## Training and evaluation multiple rounds:

Here we run the federated learning algorithm multiple rounds on training clients and evaluate the performance on test client.

First we initialize the model:

In [None]:
state = iterative_process.initialize()

Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`


Now let's train and evaluate for multiple rounds. Note that number of rounds is soecified using the variable `NUM_ROUNDS` in the following.

In [None]:
NUM_ROUNDS = 5

loss = list()
accuracy = list()
AUC=list()



val_loss = list()
val_accuracy = list()
val_AUC=list()



evaluation = tff.learning.build_federated_evaluation(model_fn)
federated_test_data = make_federated_data(test_data, test_client_ids)

for round_num in range(1, NUM_ROUNDS+1):
    state, metrics = iterative_process.next(state, federated_train_data)
    val_metrics = evaluation(state.model, federated_test_data)
    
    
    my_loss = metrics['train']['loss']
    loss.append(metrics['train']['loss'])

    
    my_acc = metrics['train']['categorical_accuracy']
    accuracy.append(metrics['train']['categorical_accuracy'])

    my_AUC = metrics['train']['AUC']
    AUC.append(my_AUC)
    
    
    my_val_loss=val_metrics['loss']
    val_loss.append(val_metrics['loss'])

    
    
    my_val_accuracy=val_metrics['categorical_accuracy']
    val_accuracy.append(val_metrics['categorical_accuracy'])
    

    my_val_AUC = val_metrics['AUC']
    val_AUC.append(my_val_AUC)

    print(f"round: {round_num:2d}, training_loss: {my_loss}, training_accuracy: {my_acc}, train_auc: {my_AUC}, test_loss: {my_val_loss}, test_accuracy: {my_val_accuracy},  test_auc: {my_val_AUC}")


history_fed={'loss':loss, 'categorical_accuracy':accuracy, 'AUC':AUC, 'val_loss':val_loss, 'val_categorical_accuracy':val_accuracy, 'val_AUC':val_AUC}


#np.save('/content/gdrive/Shareddrives/AI Engineering/PETs/Privacy-Sobhan Hemati/Kidney/history_fed_train_id_0_2_3_test_id_1.npy',history_fed)

In [None]:
fig1 = plt.figure(figsize=(30, 8))
fig1.subplots_adjust(top = 0.99, bottom=0.01, hspace=.3, wspace=0.3)
ax1 = fig1.add_subplot(1, 3, 1)
ax1.plot(history_fed['AUC'],label='Train AUC on Hispital 1 with Federated Model trained on Hospitals 0-2-3')
ax1.plot(history_fed['val_AUC'],label='Test AUC on Hispital 1 with Federated Model trained on Hospitals 0-2-3')
ax1.legend(loc='lower right')
ax1.set_ylabel('Auc')
ax1.set_ylim([0, 1])
ax1.set_title('Auc for test and train clients')
ax1.set_xlabel('Epoch/Round')


ax2 = fig1.add_subplot(1, 3, 2)
ax2.plot(history_fed['categorical_accuracy'],label='Train Accuracy on Hispital 1 with Federated Model trained on Hospitals 0-2-3')
ax2.plot(history_fed['val_categorical_accuracy'],linewidth=3,label='Test Accuracy on Hispital 1 with Federated Model trained on Hospitals 0-2-3')
ax2.legend(loc='lower right')
ax2.set_ylabel('Accuracy')
ax2.set_ylim([0, 1])
ax2.set_title('Accuracy for test and train clients')
ax2.set_xlabel('Epoch/Round')

ax3 = fig1.add_subplot(1, 3, 3)
ax3.plot(history_fed['loss'],label='Train Loss on Hispital 1 with Federated Model trained on Hospitals 0-2-3')
ax3.plot(history_fed['val_loss'],label='Test Loss on Hispital 1 with Federated Model trained on Hospitals 0-2-3')
ax3.legend(loc='lower right')
ax3.set_ylabel('Cross Entropy')
ax3.set_ylim([0,max(ax2.get_ylim())+.5])
ax3.set_title('Loss for test and train clients')
ax3.set_xlabel('Epoch/Round')



#plt.savefig('/content/gdrive/Shareddrives/AI Engineering/PETs/Privacy-Sobhan Hemati/Kidney/history_fed_train_id_0_2_3_result.jpg',bbox_inches='tight')

In [None]:
!pip list -v | grep tensorflow-federated
!pip list -v | grep tensorflow

tensorflow-federated          0.19.0              /usr/local/lib/python3.7/dist-packages pip
tensorflow                    2.5.1               /usr/local/lib/python3.7/dist-packages pip
tensorflow-datasets           4.0.1               /usr/local/lib/python3.7/dist-packages pip
tensorflow-estimator          2.5.0               /usr/local/lib/python3.7/dist-packages pip
tensorflow-federated          0.19.0              /usr/local/lib/python3.7/dist-packages pip
tensorflow-gcs-config         2.6.0               /usr/local/lib/python3.7/dist-packages pip
tensorflow-hub                0.12.0              /usr/local/lib/python3.7/dist-packages pip
tensorflow-metadata           1.2.0               /usr/local/lib/python3.7/dist-packages pip
tensorflow-model-optimization 0.5.0               /usr/local/lib/python3.7/dist-packages pip
tensorflow-privacy            0.5.2               /usr/local/lib/python3.7/dist-packages pip
tensorflow-probability        0.14.1              /usr/local/lib/pytho