<a href="https://colab.research.google.com/github/AntonisGantzos/Tensorflow-Federated-Learning/blob/main/Federated_Learning_with_TensorFlow_Federated_Text_Generation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this project we will be loading a previously trained Keras model, and refine it using federated training on a simulated decentralized dataset.

In [1]:
!pip install --quiet --upgrade tensorflow-federated

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/1.3 MB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━[0m[91m╸[0m[90m━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.5/1.3 MB[0m [31m13.8 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m58.4/58.4 kB[0m [31m4.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m61.9 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.9/71.9 MB[0m [31m11.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
import collections
import functools
import os
import time

import numpy as np
import tensorflow as tf
import tensorflow_federated as tff

np.random.seed(0)

# Test that TFF is working:
tff.federated_computation(lambda: 'Hello, World!')()



b'Hello, World!'

# Load a Pre-Trained Model
Load a pre-trained model
We load a model that was pre-trained following the TensorFlow tutorial Text generation using a RNN with eager execution (https://www.tensorflow.org/tutorials/sequences/text_generation). However, rather than training on The Complete Works of Shakespeare, we pre-trained the model on the text from the Charles Dickens' A Tale of Two Cities and A Christmas Carol.

In [3]:
# A fixed vocabularly of ASCII chars that occur in the works of Shakespeare and Dickens:
vocab = list('dhlptx@DHLPTX $(,048cgkoswCGKOSW[_#\'/37;?bfjnrvzBFJNRVZ"&*.26:\naeimquyAEIMQUY]!%)-159\r')

# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

In [4]:
def load_model(batch_size):
    # Dictionary mapping batch_size values to model URLs
    urls = {
        1: 'https://storage.googleapis.com/tff-models-public/dickens_rnn.batch1.kerasmodel',
        8: 'https://storage.googleapis.com/tff-models-public/dickens_rnn.batch8.kerasmodel'
    }

    # Ensure the provided batch_size is a valid key in the `urls` dictionary.
    # If not, raise an error with a message listing the valid batch_size options.
    assert batch_size in urls, 'batch_size must be in ' + str(urls.keys())

    # Get the corresponding URL for the given batch_size
    url = urls[batch_size]

    # Download the model file from the URL, using the base name of the URL for the local file name.
    # This will save the file locally in a cache directory.
    local_file = tf.keras.utils.get_file(os.path.basename(url), origin=url)

    # Load the Keras model from the downloaded file, but don't compile it (as indicated by `compile=False`).
    # The model is ready to be used, and compilation can be done later if necessary.
    return tf.keras.models.load_model(local_file, compile=False)


In [5]:
def generate_text(model, start_string):
    # Set the number of characters to generate
    num_generate = 200

    # Convert the start_string (input string) into a list of integer indices using `char2idx` mapping.
    input_eval = [char2idx[s] for s in start_string]

    # Expand the input into a batch dimension to prepare it for the model (shape: [1, sequence_length])
    input_eval = tf.expand_dims(input_eval, 0)

    # List to store the generated text (characters)
    text_generated = []

    # Controls the randomness of the predictions. A lower temperature makes the model more confident,
    # while a higher temperature makes it generate more diverse text.
    temperature = 1.0

    # Reset the model's states (useful when dealing with RNNs/LSTMs for generating sequences)
    model.reset_states()

    # Generate `num_generate` characters by predicting one character at a time
    for i in range(num_generate):
        # Pass the input sequence through the model to get the predictions for the next character
        predictions = model(input_eval)

        # Remove the batch dimension (reshape) to make the predictions usable (shape: [sequence_length, vocab_size])
        predictions = tf.squeeze(predictions, 0)

        # Adjust the predictions by dividing by `temperature` to control the diversity of the output
        predictions = predictions / temperature

        # Use `tf.random.categorical` to sample from the predicted probability distribution of characters
        # `num_samples=1` means we're sampling one character
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()

        # Use the predicted character as the next input to the model
        input_eval = tf.expand_dims([predicted_id], 0)

        # Convert the predicted character ID back to a character using `idx2char` mapping
        text_generated.append(idx2char[predicted_id])

    # Return the final generated text: start_string + the new generated characters
    return (start_string + ''.join(text_generated))


In [6]:
# Text generation requires a batch_size=1 model.
keras_model_batch1 = load_model(batch_size=1)
print(generate_text(keras_model_batch1, 'What of TensorFlow Federated, you ask? '))

Downloading data from https://storage.googleapis.com/tff-models-public/dickens_rnn.batch1.kerasmodel
What of TensorFlow Federated, you ask? The Marquis, when the
moment I loss its people were purmission, and that they had spare him, with uneven from the light of his time.

A change for it but the second words, and the very different Yo


# Load and Preprocess the Federated Shakespeare Data
The ```tff.simulation.datasets``` package provides a variety of datasets that are split into "clients", where each client corresponds to a dataset on a particular device that might participate in federated learning.

These datasets provide realistic non-IID data distributions that replicate in simulation the challenges of training on real decentralized data.

In [7]:
train_data, test_data = tff.simulation.datasets.shakespeare.load_data()

Downloading shakespeare.sqlite.lzma: 100%|██████████| 1329828/1329828 [00:00<00:00, 10729777.01it/s]


The datasets provided by ```shakespeare.load_data()``` consist of a sequence of string Tensors, one for each line spoken by a particular character in a Shakespeare play. The client keys consist of the name of the play joined with the name of the character, so for example MUCH_ADO_ABOUT_NOTHING_OTHELLO corresponds to the lines for the character Othello in the play Much Ado About Nothing. Note that in a real federated learning scenario clients are never identified or tracked by ids, but for simulation it is useful to work with keyed datasets.

Here, for example, we can look at some data from King Lear

In [10]:
# Here the play is "The Tragedy of King Lear" and the character is "King".
raw_example_dataset = train_data.create_tf_dataset_for_client(
    'THE_TRAGEDY_OF_KING_LEAR_KING')
# To allow for future extensions, each entry x
# is an OrderedDict with a single key 'snippets' which contains the text.
for x in raw_example_dataset.take(10):
  print(x['snippets'])

tf.Tensor(b'', shape=(), dtype=string)
tf.Tensor(b'What?', shape=(), dtype=string)
tf.Tensor(b'Peace!', shape=(), dtype=string)
tf.Tensor(b'[Reads]', shape=(), dtype=string)
tf.Tensor(b'Hence, sirs, away.', shape=(), dtype=string)
tf.Tensor(b'I was, fair madam.', shape=(), dtype=string)
tf.Tensor(b'That can never be.', shape=(), dtype=string)
tf.Tensor(b'Upon mine honour, no.', shape=(), dtype=string)
tf.Tensor(b"'that shallow vassal,'", shape=(), dtype=string)
tf.Tensor(b'How fares your Majesty?', shape=(), dtype=string)


In [11]:
#We now use tf.data.Dataset transformations to prepare this data for training the char RNN loaded above.
# Input pre-processing parameters
SEQ_LENGTH = 100
BATCH_SIZE = 8
BUFFER_SIZE = 100  # For dataset shuffling

In [12]:
# Construct a lookup table to map string chars to indexes,
# using the vocab loaded above:
# Create a static hash table to map characters to their corresponding IDs (integer indices).
# The `vocab` contains the characters, and each character is mapped to a unique integer.
table = tf.lookup.StaticHashTable(
    tf.lookup.KeyValueTensorInitializer(
        keys=vocab,                                  # The keys are the characters in the vocabulary.
        values=tf.constant(list(range(len(vocab))),  # The values are the corresponding integer indices for each character.
                         dtype=tf.int64)),           # The dtype of the values is int64.
    default_value=0                                  # If a character is not found in the vocabulary, return 0 by default.
)


# Function to convert a string to its corresponding ID sequence (integer indices).
def to_ids(x):
    # Reshape the input snippet to have a shape of [1], ensuring it's in a batch of size 1.
    s = tf.reshape(x['snippets'], shape=[1])

    # Split the string into individual characters (as bytes).
    chars = tf.strings.bytes_split(s).values

    # Lookup the integer ID for each character using the `table` created earlier.
    ids = table.lookup(chars)

    # Return the sequence of character IDs.
    return ids


# Function to split input data into sequences for training.
# The goal is to separate each sequence into an input sequence and a target sequence.
def split_input_target(chunk):
    # The input sequence is all characters except the last one.
    input_text = tf.map_fn(lambda x: x[:-1], chunk)

    # The target sequence is all characters except the first one (shifted by one).
    target_text = tf.map_fn(lambda x: x[1:], chunk)

    # Return a tuple of (input_sequence, target_sequence).
    return (input_text, target_text)


def preprocess(dataset):
  return (
      # Map ASCII chars to int64 indexes using the vocab
      dataset.map(to_ids)
      # Split into individual chars
      .unbatch()
      # Form example sequences of SEQ_LENGTH +1
      .batch(SEQ_LENGTH + 1, drop_remainder=True)
      # Shuffle and form minibatches
      .shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True)
      # And finally split into (input, target) tuples,
      # each of length SEQ_LENGTH.
      .map(split_input_target))

Note that in the formation of the original sequences and in the formation of batches above, we use drop_remainder=True for simplicity. This means that any characters (clients) that don't have at least (SEQ_LENGTH + 1) * BATCH_SIZE chars of text will have empty datasets. A typical approach to address this would be to pad the batches with a special token, and then mask the loss to not take the padding tokens into account.

In [13]:
example_dataset = preprocess(raw_example_dataset)
print(example_dataset.element_spec)

(TensorSpec(shape=(8, 100), dtype=tf.int64, name=None), TensorSpec(shape=(8, 100), dtype=tf.int64, name=None))


# Compile the model and test on the preprocessed data
We loaded an uncompiled keras model, but in order to run ```keras_model.evaluate```, we need to compile it with a loss and metrics. We will also compile in an optimizer, which will be used as the on-device optimizer in Federated Learning.

The original tutorial didn't have char-level accuracy (the fraction of predictions where the highest probability was put on the correct next char). This is a useful metric, so we add it. However, we need to define a new metric class for this because our predictions have rank 3 (a vector of logits for each of the **BATCH_SIZE** * **SEQ_LENGTH** predictions), and SparseCategoricalAccuracy expects only rank 2 predictions.

In [14]:
class FlattenedCategoricalAccuracy(tf.keras.metrics.SparseCategoricalAccuracy):

  def __init__(self, name='accuracy', dtype=tf.float32):
    super().__init__(name, dtype=dtype)

  def update_state(self, y_true, y_pred, sample_weight=None):
    y_true = tf.reshape(y_true, [-1, 1])
    y_pred = tf.reshape(y_pred, [-1, len(vocab), 1])
    return super().update_state(y_true, y_pred, sample_weight)

In [15]:
BATCH_SIZE = 8  # The training and eval batch size for the rest of this tutorial.
keras_model = load_model(batch_size=BATCH_SIZE)
keras_model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=[FlattenedCategoricalAccuracy()])

# Confirm that loss is much lower on Shakespeare than on random data
loss, accuracy = keras_model.evaluate(example_dataset.take(5), verbose=0)
print(
    'Evaluating on an example Shakespeare character: {a:3f}'.format(a=accuracy))

# As a sanity check, we can construct some completely random data, where we expect
# the accuracy to be essentially random:
random_guessed_accuracy = 1.0 / len(vocab)
print('Expected accuracy for random guessing: {a:.3f}'.format(
    a=random_guessed_accuracy))
random_indexes = np.random.randint(
    low=0, high=len(vocab), size=1 * BATCH_SIZE * (SEQ_LENGTH + 1))
data = collections.OrderedDict(
    snippets=tf.constant(
        ''.join(np.array(vocab)[random_indexes]), shape=[1, 1]))
random_dataset = preprocess(tf.data.Dataset.from_tensor_slices(data))
loss, accuracy = keras_model.evaluate(random_dataset, steps=10, verbose=0)
print('Evaluating on completely random data: {a:.3f}'.format(a=accuracy))

Downloading data from https://storage.googleapis.com/tff-models-public/dickens_rnn.batch8.kerasmodel
Evaluating on an example Shakespeare character: 0.395250
Expected accuracy for random guessing: 0.012




Evaluating on completely random data: 0.006


TFF serializes all TensorFlow computations so they can potentially be run in a non-Python environment (even though at the moment, only a simulation runtime implemented in Python is available). Even though we are running in eager mode, (TF 2.0), currently TFF serializes TensorFlow computations by constructing the necessary ops inside the context of a "with tf.Graph.as_default()" statement. Thus, we need to provide a function that TFF can use to introduce our model into a graph it controls.

In [16]:
# Clone the keras_model inside `create_tff_model()`, which TFF will
# call to produce a new copy of the model inside the graph that it will
# serialize. Note: we want to construct all the necessary objects we'll need
# _inside_ this method.
def create_tff_model():
  # TFF uses an `input_spec` so it knows the types and shapes
  # that your model expects.
  input_spec = example_dataset.element_spec
  keras_model_clone = tf.keras.models.clone_model(keras_model)
  return tff.learning.models.from_keras_model(
      keras_model_clone,
      input_spec=input_spec,
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[FlattenedCategoricalAccuracy()])

In [17]:
# This command builds all the TensorFlow graphs and serializes them:
fed_avg = tff.learning.algorithms.build_weighted_fed_avg(
    model_fn=create_tff_model,
    client_optimizer_fn=tff.learning.optimizers.build_sgdm(learning_rate=0.5))

In [18]:
#run the iterative process once and print the results
state = fed_avg.initialize()
result = fed_avg.next(state, [example_dataset.take(5)])
state = result.state
train_metrics = result.metrics['client_work']['train']
print('loss={l:.3f}, accuracy={a:.3f}'.format(
    l=train_metrics['loss'], a=train_metrics['accuracy']))

loss=4.402, accuracy=0.132


In [19]:
def data(client, source=train_data):
  return preprocess(source.create_tf_dataset_for_client(client)).take(5)


clients = [
    'ALL_S_WELL_THAT_ENDS_WELL_CELIA', 'MUCH_ADO_ABOUT_NOTHING_OTHELLO',
]

train_datasets = [data(client) for client in clients]

# We concatenate the test datasets for evaluation with Keras by creating a
# Dataset of Datasets, and then identity flat mapping across all the examples.
test_dataset = tf.data.Dataset.from_tensor_slices(
    [data(client, test_data) for client in clients]).flat_map(lambda x: x)

In [20]:
NUM_ROUNDS = 5

# The state of the FL server, containing the model and optimization state.
state = fed_avg.initialize()

# Load our pre-trained Keras model weights into the global model state.
pre_trained_weights = tff.learning.models.ModelWeights(
    trainable=[v.numpy() for v in keras_model.trainable_weights],
    non_trainable=[v.numpy() for v in keras_model.non_trainable_weights]
)
state = fed_avg.set_model_weights(state, pre_trained_weights)


def keras_evaluate(state, round_num):
  # Take our global model weights and push them back into a Keras model to
  # use its standard `.evaluate()` method.
  keras_model = load_model(batch_size=BATCH_SIZE)
  keras_model.compile(
      loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
      metrics=[FlattenedCategoricalAccuracy()])
  model_weights = fed_avg.get_model_weights(state)
  model_weights.assign_weights_to(keras_model)
  loss, accuracy = keras_model.evaluate(example_dataset, steps=2, verbose=0)
  print('\tEval: loss={l:.3f}, accuracy={a:.3f}'.format(l=loss, a=accuracy))


for round_num in range(NUM_ROUNDS):
  print('Round {r}'.format(r=round_num))
  keras_evaluate(state, round_num)
  result = fed_avg.next(state, train_datasets)
  state = result.state
  train_metrics = result.metrics['client_work']['train']
  print('\tTrain: loss={l:.3f}, accuracy={a:.3f}'.format(
      l=train_metrics['loss'], a=train_metrics['accuracy']))

print('Final evaluation')
keras_evaluate(state, NUM_ROUNDS + 1)

Round 0
	Eval: loss=3.260, accuracy=0.421
	Train: loss=4.303, accuracy=0.117
Round 1
	Eval: loss=4.196, accuracy=0.166
	Train: loss=4.063, accuracy=0.207
Round 2
	Eval: loss=4.049, accuracy=0.159
	Train: loss=3.812, accuracy=0.230
Round 3




	Eval: loss=3.843, accuracy=0.176
	Train: loss=3.715, accuracy=0.207
Round 4




	Eval: loss=3.748, accuracy=0.174
	Train: loss=3.509, accuracy=0.233
Final evaluation
	Eval: loss=3.668, accuracy=0.173


These results are to be expected since with the default changes, we haven't done enough training to make a big difference, but if we train longer on more Shakespeare data, we should see a difference in the style of the text generated with the updated model:



In [21]:
# Set our newly trained weights back in the originally created model.
keras_model_batch1.set_weights([v.numpy() for v in keras_model.weights])
# Text generation requires batch_size=1
print(generate_text(keras_model_batch1, 'What of TensorFlow Federated, you ask? '))

What of TensorFlow Federated, you ask? I write on the tops of the descent. "He must only for
his shoulders at the Bar, that I _to_ say."

"That's the cress, and had crooped on his chair, and strong he could have recold, under
close its
