# Federated Learning for Text Generation

For this tutorial, we start with a RNN that generates ASCII characters, and refine it via federated learning. We also show how the final weights can be fed back to the original Keras model, allowing easy evaluation and text generation using standard tools.

## Before begining, run the following to verify that environment is set up correctly

In [1]:
!pip install --quiet --upgrade tensorflow-federated-nightly
!pip install --quiet --upgrade nest-asyncio

import nest_asyncio
nest_asyncio.apply()

ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device



In [2]:
import collections
import functools
import os
import time

import numpy as np
import tensorflow as tf
import tensorflow_federated as tff

np.random.seed(0)

# Test the TFF is working:
tff.federated_computation(lambda: 'Hello, World!')()

b'Hello, World!'

## Load a pre-trained model

We load a model that was pre-trained following the TensorFlow tutorial Text generation using a RNN with eager execution.

However, rather than training on The Complete Works of Shakespeare, we pre-trained the model on the text from the Charles Dickens' A Tale of Two Cities and A Christmas Carol.

#### Generate the vocab lookup tables

In [58]:
# A fixed vocabularly of ASCII chars that occur in the works of Shakespeare and Dickens:
vocab = list('dhlptx@DHLPTX $(,048cgkoswCGKOSW[_#\'/37;?bfjnrvzBFJNRVZ"&*.26:\naeimquyAEIMQUY]!%)-159\r')

# Creating a mapping from unique characters to indices
char2idx = {u:i for i, u in enumerate(vocab)}
idx2char = np.array(vocab)

#### Load the pre-trained model and generate some text

In [60]:
def generate_text(model, start_string):
    # From https://www.tensorflow.org/tutorials/sequences/text_generation
    num_generate = 200
    input_eval = [char2idx[s] for s in start_string]
    input_eval = tf.expand_dims(input_eval, 0)
    text_generated = []
    temperature = 1.0

    model.reset_states()
    for i in range(num_generate):
        predictions = model(input_eval)
        predictions = tf.squeeze(predictions, 0)
        predictions = predictions / temperature
        predicted_id = tf.random.categorical(predictions, num_samples=1)[-1, 0].numpy()
        input_eval = tf.expand_dims([predicted_id], 0)
        text_generated.append(idx2char[predicted_id])

    return (start_string + ''.join(text_generated))

In [61]:
# Text generation requires a batch_size=1 model.
keras_model_batch1 = load_model(batch_size=1)
print(generate_text(keras_model_batch1, 'What of TensorFlow Federated, you ask? '))

Downloading data from https://storage.googleapis.com/tff-models-public/dickens_rnn.batch1.kerasmodel
What of TensorFlow Federated, you ask? The case of
Monseigneur, in cruel separation and step, Darnay had side forty feet having happened to Miss Pross's pride,
Some of the adjoining passages of Corways, presented by her head again.

Lu


## Load and Preprocess the Federated Shakespeare Data

The tff.simulation.datasets package provides a variety of datasets that are split into "clients", where each client corresponds to a dataset on a particular device that might participate in federated learning.

In [62]:
train_data, test_data = tff.simulation.datasets.shakespeare.load_data()

Downloading data from https://storage.googleapis.com/tff-datasets-public/shakespeare.tar.bz2


The datasets provided by shakespeare.load_data() consist of a sequence of string Tensors, one for each line spoken by a particular character in a Shakespeare play. The client keys consist of the name of the play joined with the name of the character, so for example MUCH_ADO_ABOUT_NOTHING_OTHELLO corresponds to the lines for the character Othello in the play Much Ado About Nothing. 

In [63]:
# Here the play is "The Tragedy of King Lear" and the character is "King".
raw_example_dataset = train_data.create_tf_dataset_for_client('THE_TRAGEDY_OF_KING_LEAR_KING')
# To allow for future extensions, each entry x is an OrderedDict with a single key 'snippets' which contains the text.
for x in raw_example_dataset.take(2):
    print(x['snippets'])

tf.Tensor(b"Live regist'red upon our brazen tombs,\nAnd then grace us in the disgrace of death;\nWhen, spite of cormorant devouring Time,\nTh' endeavour of this present breath may buy\nThat honour which shall bate his scythe's keen edge,\nAnd make us heirs of all eternity.\nTherefore, brave conquerors- for so you are\nThat war against your own affections\nAnd the huge army of the world's desires-\nOur late edict shall strongly stand in force:\nNavarre shall be the wonder of the world;\nOur court shall be a little Academe,\nStill and contemplative in living art.\nYou three, Berowne, Dumain, and Longaville,\nHave sworn for three years' term to live with me\nMy fellow-scholars, and to keep those statutes\nThat are recorded in this schedule here.\nYour oaths are pass'd; and now subscribe your names,\nThat his own hand may strike his honour down\nThat violates the smallest branch herein.\nIf you are arm'd to do as sworn to do,\nSubscribe to your deep oaths, and keep it too.\nYour oath is pa

##### We now use tf.data.Dataset transformations to prepare this data for training the char RNN loaded above.

In [64]:
# Input pre-processing parameters
SEQ_LENGTH = 100
BATCH_SIZE = 8
BUFFER_SIZE = 100  # For dataset shuffling

In [65]:
# Construct a lookup table to map string chars to indexes, using the vocab loaded above:
table = tf.lookup.StaticHashTable(tf.lookup.KeyValueTensorInitializer(keys=vocab, values=tf.constant(list(range(len(vocab))),dtype=tf.int64)),default_value=0)

def to_ids(x):
    s = tf.reshape(x['snippets'], shape=[1])
    chars = tf.strings.bytes_split(s).values
    ids = table.lookup(chars)
    return ids


def split_input_target(chunk):
    input_text = tf.map_fn(lambda x: x[:-1], chunk)
    target_text = tf.map_fn(lambda x: x[1:], chunk)
    return (input_text, target_text)


def preprocess(dataset):
    return (dataset.map(to_ids).unbatch().batch(SEQ_LENGTH + 1, drop_remainder=True).shuffle(BUFFER_SIZE).batch(BATCH_SIZE, drop_remainder=True).map(split_input_target))

##### Now we can preprocess our raw_example_dataset, and check the types:

In [66]:
example_dataset = preprocess(raw_example_dataset)
print(example_dataset.element_spec)

(TensorSpec(shape=(8, 100), dtype=tf.int64, name=None), TensorSpec(shape=(8, 100), dtype=tf.int64, name=None))


## Compile the model and test on the preprocessed data

In [68]:
class FlattenedCategoricalAccuracy(tf.keras.metrics.SparseCategoricalAccuracy):
    

    def __init__(self, name='accuracy', dtype=tf.float32):
        super().__init__(name, dtype=dtype)

    def update_state(self, y_true, y_pred, sample_weight=None):
        y_true = tf.reshape(y_true, [-1, 1])
        y_pred = tf.reshape(y_pred, [-1, len(vocab), 1])
        return super().update_state(y_true, y_pred, sample_weight)

In [69]:
BATCH_SIZE = 8  # The training and eval batch size for the rest of this tutorial.
keras_model = load_model(batch_size=BATCH_SIZE)
keras_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=[FlattenedCategoricalAccuracy()])

# Confirm that loss is much lower on Shakespeare than on random data
loss, accuracy = keras_model.evaluate(example_dataset.take(5), verbose=0)
print('Evaluating on an example Shakespeare character: {a:3f}'.format(a=accuracy))

# As a sanity check, we can construct some completely random data, where we expect the accuracy to be essentially random:
random_guessed_accuracy = 1.0 / len(vocab)
print('Expected accuracy for random guessing: {a:.3f}'.format(a=random_guessed_accuracy))
random_indexes = np.random.randint(low=0, high=len(vocab), size=1 * BATCH_SIZE * (SEQ_LENGTH + 1))
data = collections.OrderedDict(snippets=tf.constant(''.join(np.array(vocab)[random_indexes]), shape=[1, 1]))
random_dataset = preprocess(tf.data.Dataset.from_tensor_slices(data))
loss, accuracy = keras_model.evaluate(random_dataset, steps=10, verbose=0)
print('Evaluating on completely random data: {a:.3f}'.format(a=accuracy))

Downloading data from https://storage.googleapis.com/tff-models-public/dickens_rnn.batch8.kerasmodel
Evaluating on an example Shakespeare character: 0.406750
Expected accuracy for random guessing: 0.012




Evaluating on completely random data: 0.007


## Fine-tune the model with Federated Learning

In [70]:
# Clone the keras_model inside `create_tff_model()`, which TFF will call to produce a new copy of the model inside the graph that it will serialize.

def create_tff_model():
    # TFF uses an `input_spec` so it knows the types and shapes that your model expects.
    input_spec = example_dataset.element_spec
    keras_model_clone = tf.keras.models.clone_model(keras_model)
    return tff.learning.from_keras_model(keras_model_clone,input_spec=input_spec,loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=[FlattenedCategoricalAccuracy()])

Now we are ready to construct a Federated Averaging iterative process, which we will use to improve the model.

In [71]:
# This command builds all the TensorFlow graphs and serializes them: 
fed_avg = tff.learning.build_federated_averaging_process(model_fn=create_tff_model,client_optimizer_fn=lambda: tf.keras.optimizers.SGD(lr=0.5))





Here is the simplest possible loop, where we run federated averaging for one round on a single client on a single batch:

In [72]:
state = fed_avg.initialize()
state, metrics = fed_avg.next(state, [example_dataset.take(5)])
train_metrics = metrics['train']
print('loss={l:.3f}, accuracy={a:.3f}'.format(l=train_metrics['loss'], a=train_metrics['accuracy']))

loss=4.402, accuracy=0.134


Now let's write a slightly more interesting training and evaluation loop.

So that this simulation still runs relatively quickly, we train on the same three clients each round, only considering two minibatches for each.

In [73]:
def data(client, source=train_data):
    return preprocess(source.create_tf_dataset_for_client(client)).take(5)

clients = ['ALL_S_WELL_THAT_ENDS_WELL_CELIA', 'MUCH_ADO_ABOUT_NOTHING_OTHELLO',]
train_datasets = [data(client) for client in clients]

# We concatenate the test datasets for evaluation with Keras by creating a dataset of Datasets, and then identity flat mapping across all the examples.
test_dataset = tf.data.Dataset.from_tensor_slices([data(client, test_data) for client in clients]).flat_map(lambda x: x)

The initial state of the model produced by fed_avg.initialize() is based on the random initializers for the Keras model, not the weights that were loaded, since clone_model() does not clone the weights. To start training from a pre-trained model, we set the model weights in the server state directly from the loaded model.

In [74]:
NUM_ROUNDS = 5

state = fed_avg.initialize()

state = tff.learning.state_with_new_model_weights(state,trainable_weights=[v.numpy() for v in keras_model.trainable_weights],non_trainable_weights=[v.numpy() for v in keras_model.non_trainable_weights])


def keras_evaluate(state, round_num):
    keras_model = load_model(batch_size=BATCH_SIZE)
    keras_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),metrics=[FlattenedCategoricalAccuracy()])
    state.model.assign_weights_to(keras_model)
    loss, accuracy = keras_model.evaluate(example_dataset, steps=2, verbose=0)
    print('\tEval: loss={l:.3f}, accuracy={a:.3f}'.format(l=loss, a=accuracy))


for round_num in range(NUM_ROUNDS):
    print('Round {r}'.format(r=round_num))
    keras_evaluate(state, round_num)
    state, metrics = fed_avg.next(state, train_datasets)
    train_metrics = metrics['train']
    print('\tTrain: loss={l:.3f}, accuracy={a:.3f}'.format(l=train_metrics['loss'], a=train_metrics['accuracy']))

print('Final evaluation')
keras_evaluate(state, NUM_ROUNDS + 1)

Round 0
	Eval: loss=3.042, accuracy=0.441
	Train: loss=4.329, accuracy=0.090
Round 1
	Eval: loss=4.292, accuracy=0.156
	Train: loss=4.184, accuracy=0.198
Round 2
	Eval: loss=4.169, accuracy=0.153
	Train: loss=4.051, accuracy=0.204
Round 3




	Eval: loss=4.041, accuracy=0.194
	Train: loss=3.920, accuracy=0.211
Round 4




	Eval: loss=3.975, accuracy=0.164
	Train: loss=3.810, accuracy=0.210
Final evaluation




	Eval: loss=3.857, accuracy=0.172


With the default changes, we haven't done enough training to make a big difference, but if you train longer on more Shakespeare data, you should see a difference in the style of the text generated with the updated model:

In [75]:
# Set our newly trained weights back in the originally created model.
keras_model_batch1.set_weights([v.numpy() for v in keras_model.weights])

print(generate_text(keras_model_batch1, 'What of TensorFlow Federated, you ask? '))

What of TensorFlow Federated, you ask? Say whend
her father took him if I could hope, five fure
and drank words of very day, and a barrier be marrier--said:

"That's all the sight of me, and myself, Mr. Carton?"

And smooth in fast u
