<a href="https://colab.research.google.com/github/MariaPdg/T5-classification/blob/master/xnli_transformer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Fine-Tuning the Text-To-Text Transfer Transformer (T5) for Language Classification**

## _Or: What's the language of a given sentence?_

_Here we demonstrate how to fine-tune a pre-trained T5 model, evaluating its accuracy, and using it for language prediction task on GPU with TensorFlow._

Here you find how to to do the following:
* Preprosess [XNLI](https://www.tensorflow.org/datasets/catalog/xnli) dataset
* Create new task with `tf.data.Dataset`
* Fine-tune T5 model for the new task
* Log with TensorBoard
* Make some predictions with the trained model

## Background

T5 was introduced by C. Raffel et al. in the paper [_Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer_](https://arxiv.org/abs/1910.10683). T5 is the Text-To-Text Transfer Transformer, which allows converting text-based language problems into a text-to-text format. The authors achieved state-of-the-art performance with [_Colossal Clean Crawled Corpus (C4)_](https://www.tensorflow.org/datasets/catalog/c4) in  covering summarization, question answering, text classification, and other tasks.

In this notebook I aim to give a brief overview of T5, explain some of its implications for NLP, and demonstrate how it can be used for language classification task. 


## Key points from T5 paper

<!-- [image](https://drive.google.com/file/d/11kyx3wDHptct6n4irAM3IodpFTokY2E4/view?usp=sharing) -->


<div align="center">
    <img src="    https://drive.google.com/uc?id=11kyx3wDHptct6n4irAM3IodpFTokY2E4" alt="img1" width="800"/>
</div>

1. **SOTA for text-to-text problems:** T5 treats each NLP task (translation, Q&A, classification) as a text-to-text problem, i.e. accepts an input text sequence and generates an output text sequence and achieves SOTA results.
2. **Unified framework for NLP Deep Learning:** Due to *(1)* T5 allows application of the same model, objective, training procedure, and decoding
process to any task, e.g. translation,  sentence acceptability judgment, Q&A, summarization (see Figure above) etc.  
3. **Multi-task learning** allows training of multiple tasks simultaneously. However, it does not guarantee that all tasks reach the highest performance for the same checkpoint. 
4. **Objective for pre-training:**  a denosing objective ("Masked Language Modelling"). The model is trained to predict missing or otherwise corrupted
tokens in the input. That is, inputs are presented by a sequence of tokens where corrupted tokens are replaced by ''sentinel'' tokens (X). The output is a sequence of tokens containing the answer. 

<div align="center">
    <img src="    https://drive.google.com/uc?id=1DxLchCUd-pevb7ISMYQXJQhOmVu7apHJ" alt="img1" width="500"/>
</div>

5. **Full encoder-decoder transformer architecture is employed:** The architectures of encoder and decoder represent a stack of N=6 identical layers. Each layer has two sublayers: a multi-head self-attention mechanism and point-wise fully-connected (FC) layers. Transformer also utilizes residual connections and normalization layers.

<div align="center">
    <img src="    https://drive.google.com/uc?id=11E8FqW0GT8BBYRtGXvv971inC9T2g9JF" alt="img1" width="450"/>
</div>

## Summary

T5 represents a unified framework for multiple NLP tasks, which is able to generate textual outputs and converts text-base problems in text-to-text format. Due to SOTA results of T5 for several NLP problems, we expect that an application of T5 can be advantageous for language classification tasks (lower time, compute, & storage costs for training and inference)


In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

Mounted at /content/gdrive


#Setup

In [None]:
#@title 
import functools
import itertools
import os
import re
import time


Install requred packages

In [None]:
!pip install mesh-tensorflow
!pip install t5
!pip install datasets transformers
!pip install --upgrade tensorflow-datasets

Collecting mesh-tensorflow
  Downloading mesh_tensorflow-0.1.19-py3-none-any.whl (366 kB)
[?25l[K     |█                               | 10 kB 25.7 MB/s eta 0:00:01[K     |█▉                              | 20 kB 27.7 MB/s eta 0:00:01[K     |██▊                             | 30 kB 12.1 MB/s eta 0:00:01[K     |███▋                            | 40 kB 9.3 MB/s eta 0:00:01[K     |████▌                           | 51 kB 5.4 MB/s eta 0:00:01[K     |█████▍                          | 61 kB 6.0 MB/s eta 0:00:01[K     |██████▎                         | 71 kB 5.7 MB/s eta 0:00:01[K     |███████▏                        | 81 kB 6.4 MB/s eta 0:00:01[K     |████████                        | 92 kB 4.9 MB/s eta 0:00:01[K     |█████████                       | 102 kB 5.2 MB/s eta 0:00:01[K     |█████████▉                      | 112 kB 5.2 MB/s eta 0:00:01[K     |██████████▊                     | 122 kB 5.2 MB/s eta 0:00:01[K     |███████████▋                    | 133 kB 5.2 MB/

In [None]:
# pip install tensorflow-datasets==1.2

In [None]:
from absl import logging
import mesh_tensorflow.transformer.dataset as transformer_dataset
import t5.data
from t5.models.t5_model import T5Model
import tensorflow_datasets as tfds
import torch
import torch.utils.tensorboard

# CHECKPOINT_FILE_FORMAT = "model-{}.checkpoint"


In [None]:
import functools
import t5
import torch
import transformers
import t5.data.mixtures

In [None]:
if torch.cuda.is_available():
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

In [None]:
# model = t5.models.HfPyTorchModel("t5-small", "/tmp/hft5/", device)

Define some global parameters

**Note:** the size of the whole validaion set: 37350

In [None]:
import pickle 
import os
import argparse
import numpy as np
from sklearn.preprocessing import OneHotEncoder
import tensorflow as tf
import seqio 

MAIN_DIR = '/content/gdrive/MyDrive/Colab/xnli-transformer'#@param {type: "string"}
DATA_DIR = os.path.join(MAIN_DIR, 'data')
VALID_SIZE = 37350 #@param [100, 500, 37350] {type:"raw"} 
TRAIN_SIZE = 5890530 

# MODELS_DIR = os.path.join('/content/gdrive/MyDrive/Colab/xnli-transformer')
SAVE = True #@param {type: "boolean"}
FINETUNE_STEPS =  2000#@param {type: "integer"}

input_length = 512 #@param [256, 512] {type:"raw"}
target_length =  8 #@param [4, 8, 16, 32] {type:"raw"}
languages = ['en', 'fr', 'de', 'ru', 'es', 'bg', 'sw', 'el'] 


Define some global parameters based on the inputs above

In [None]:
parser = argparse.ArgumentParser()
# parser.add_argument("--data_dir", default='/content/gdrive/MyDrive/Colab/', type=str,
#                         help="The input data dir. Should contain the .tsv files (or other data files) for the task.")
parser.add_argument("--input_length", default=input_length, type=str,
                        help="Length of input sentences (number of tokens)")
parser.add_argument("--target_length", default=target_length, type=str,
                        help="Length of targets (number of tokens)")
parser.add_argument("--languages", default=languages, type=str,
                        help="Length of targets (number of tokens)")
args = parser.parse_args(args=[])

In [None]:
if languages == 'all':
  TASK_NAME = 'xnli_{}_{}'.format(args.input_length, args.target_length)
  TRAIN_TSV = os.path.join(DATA_DIR, 'train.tsv')
  VALID_TSV = os.path.join(DATA_DIR, 'valid_{}.tsv'.format(VALID_SIZE))
else:
  TASK_NAME = 'xnli_{}l_{}_{}'.format(len(args.languages), args.input_length, args.target_length)
  TRAIN_TSV = os.path.join(DATA_DIR, 'train_{}l.tsv'.format(len(args.languages)))
  VALID_TSV = os.path.join(DATA_DIR, 'valid_{}l.tsv'.format((len(args.languages))))


In [None]:
if not os.path.exists(DATA_DIR):
    os.makedirs(DATA_DIR)

# Data preprocessing

We use [XNLI](https://www.tensorflow.org/datasets/catalog/xnli), which includes sentences presented in 15 different languages: English, French, Spanish, German, Greek, Bulgarian, Russian, Turkish, Arabic, Vietnamese, Thai, Chinese, Hindi, Swahili and Urdu.

Each sample of the dataset is a dictionary with the keys: hypothesis (language, translation), label ('entailment', 'neutral', 'contradiction'), premise (pairs: {language: translation}). 

For the language classification task, we only need premise with sentences and corresponding language labels. We store only this necessary information as our dataset in TSV format. 


In [None]:
# from transformers import AutoTokenizer, TFT5ForConditionalGeneration
from datasets import load_dataset

We load the whole dataset if the data were not loaded before.

In [None]:
if SAVE:
  train_dataset = load_dataset('xnli','all_languages', split='train')
  valid_dataset = load_dataset('xnli','all_languages', split='validation')
  # Look at features and some examples from the loaded dataset
  train_dataset.features
  data = next(iter(train_dataset))
  print("Example data from the dataset: \n", data)

Downloading:   0%|          | 0.00/2.68k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.18k [00:00<?, ?B/s]

Downloading and preparing dataset xnli/all_languages (download: 461.54 MiB, generated: 1.50 GiB, post-processed: Unknown size, total: 1.95 GiB) to /root/.cache/huggingface/datasets/xnli/all_languages/1.1.0/243f155ecab4d4f6e82e4eeab62b8c6b1f7abfcb8ed7fcc1661be8e25b117404...


  0%|          | 0/2 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/466M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/17.9M [00:00<?, ?B/s]

  0%|          | 0/2 [00:00<?, ?it/s]

0 examples [00:00, ? examples/s]

In [None]:
def clean(sample, languages='all'):
    outputs = []
    for k, ans in zip(sample['premise'].keys(), sample['premise'].values()):
        # Remove incorrect spacing around punctuation.
        ans = ans.replace(" ,", ",").replace(" .", ".").replace(" %", "%")
        ans = ans.replace(" - ", "-").replace(" : ", ":").replace(" / ", "/")
        ans = ans.replace("( ", "(").replace(" )", ")")
        ans = ans.replace("`` ", "\"").replace(" ''", "\"")
        ans = ans.replace(" 's", "'s").replace("s ' ", "s' ")
        if languages == 'all' or k in languages:
          outputs.append({'sentence': ans, 'target': k})
    return outputs

def create_data_list(dataset, languages='all'):
    outputs = []
    for sample in (dataset):
      s = clean(sample, languages)
      outputs.extend(s)
    return outputs

In [None]:
# if processed data were not saved before
if SAVE:
  train_outputs = create_data_list(train_dataset, languages=args.languages)
  valid_outputs = create_data_list(valid_dataset, languages=args.languages)
  print(len(train_outputs))
  print(len(valid_outputs))
  print(train_outputs[:5])
  print(valid_outputs[:5])

We store processed data in TSV format if files do not exist.

In [None]:
import csv
keys = ['sentence', 'target']
if not os.path.isfile(TRAIN_TSV):
  with open(TRAIN_TSV, 'w') as output_file:
      dict_writer = csv.DictWriter(output_file, keys, delimiter='\t')
      dict_writer.writerows(train_outputs)  

if not os.path.isfile(VALID_TSV):
  with open(VALID_TSV, 'w') as output_file:
      dict_writer = csv.DictWriter(output_file, keys, delimiter='\t')
      dict_writer.writerows(valid_outputs[:VALID_SIZE])  


Now load the data stored in TSV format and look at some examples. `Pandas` provides a convenient way to do this. 

In [None]:
import pandas as pd
valid_pd = pd.read_csv(VALID_TSV, sep='\t', names=['sentence', 'target'])
train_pd = pd.read_csv(TRAIN_TSV, sep='\t', names=['sentence', 'target'])

In [None]:
valid_pd[:10].style.set_properties(**{'text-align': 'left'})

Note that the resulted dataset is well balanced, i.e. we have an equal number of samples for each class represented a language. Howevere, there is a strong order in languages.   

# Creating new Task

T5 uses [seqIO](https://github.com/google/seqio) for managing data pipelines and evaluaton metics. It uses tf.data.Dataset to create scalable data pipelines but requires minimal use of TensorFlow. Two core components of `seqIO` are `Task` and `Mixture` objects.

A Task is a dataset along with preprocessing functions and evaluation metrics. A Mixture is a collection of Task objects along with a mixing rate or a function defining how to compute a mixing rate based on the properties of the constituent Tasks.

#### The main seqIO steps:

* Define a Task (and optionally a Mixture).

* Define (or use an existing) a FeatureConverter based on the model architecture.

* Use the top-level function seqio.get_dataset to obtain the tf.data.Dataset instance.

For this project, we will create a Task to do classification of sentences by languages an fine-tune the model.

In oder to create the Task, we define a function to load the TSV data as a `tf.data.Dataset` in TensorFlow. 

Most functions are modified version from [Link](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/master/notebooks/t5-trivia.ipynb#scrollTo=KPOteeqctpzw)

In [None]:
nq_tsv_path = {
    "train": TRAIN_TSV,
    "validation": VALID_TSV
}

def nq_dataset_fn(split, shuffle_files=False):
  # We only have one file for each split.
  del shuffle_files

  # Load lines from the text file as examples.
  ds = tf.data.TextLineDataset(nq_tsv_path[split])
  # Split each "<sentence>\t<target>" example into (sentece, target) tuple.
  ds = ds.map(functools.partial(tf.io.decode_csv, record_defaults=["", ""],
              field_delim="\t", use_quote_delim=False),
              num_parallel_calls=tf.data.experimental.AUTOTUNE)
  # Map each tuple to a {"sentence": ... "target": ...} dict.
  ds = ds.map(lambda *ex: dict(zip(["sentence", "target"], ex)))
  return ds

print("A few raw validation examples...")
for ex in tfds.as_numpy(nq_dataset_fn("validation").take(10)):
  print(ex['sentence'].decode('UTF-8'), ex['target'].decode('UTF-8'))

In [None]:
# ds = tfds.load("xnli")
# print("A few raw validation examples...")
# for ex in tfds.as_numpy(ds["validation"].take(1)):
#   print(ex)

Now, we write a preprocess function to convert the examples in the `tf.data.Dataset` into a text-to-text format, with both `inputs` and `targets` fields. The preprocessor also normalizes the text by lowercasing it and removing quotes since the answers are sometimes formatted in odd ways. Finally, we map samples to inputs and targets.

In [None]:
def trivia_preprocessor(ds):
  def normalize_text(text):
    # Lowercase and remove quotes from a TensorFlow string
    text = tf.strings.lower(text)
    text = tf.strings.regex_replace(text,"'(.*)'", r"\1")
    return text

  def to_inputs_and_targets(ex):
    # Map {"sentence": ..., "target": ...}->{"inputs": ..., "targets": ...}.
    return {"inputs": normalize_text(ex['sentence']), 
            "targets": normalize_text(ex['target'])}

  return ds.map(to_inputs_and_targets, 
                num_parallel_calls=tf.data.experimental.AUTOTUNE)

Finally, we put everything together to create a `Task`. It is an abstraction that combines:

* a raw data source
* one or more preprocessing steps
* a vocabulary to tokenize/detokenize each preprocessed feature for the model
* a postprocessor to convert detokenized model outputs into a format for evaluation
* one or more metrics to evaluate with


In [None]:
# DEFAULT_SPM_PATH = "https://huggingface.co/t5-small/resolve/main/spiece.model"  # GCS
# DEFAULT_EXTRA_IDS = 100

# def get_default_vocabulary():
#   return seqio.SentencePieceVocabulary(DEFAULT_SPM_PATH, DEFAULT_EXTRA_IDS)

In [None]:
DEFAULT_OUTPUT_FEATURES = {
    "inputs":
        seqio.Feature(
            vocabulary=t5.data.get_default_vocabulary(), add_eos=True),
    "targets":
        seqio.Feature(
            vocabulary=t5.data.get_default_vocabulary(), add_eos=True),
}

seqio.TaskRegistry.add(
    TASK_NAME,
    # Specify the task source.
    source=seqio.FunctionDataSource(
        # Supply a function which returns a tf.data.Dataset.
        dataset_fn=nq_dataset_fn,
        splits=["train", "validation"]),
        # Supply a list of functions that preprocess the input tf.data.Dataset.
        preprocessors=[trivia_preprocessor, 
                       seqio.preprocessors.tokenize_and_append_eos,
                       ],
    # Lowercase targets before computing metrics.
    postprocess_fn=t5.data.postprocessors.lower_text,
    # We'll use accuracy as our evaluation metric.
    metric_fns=[t5.evaluation.metrics.accuracy],
    output_features=DEFAULT_OUTPUT_FEATURES,
)

Now we need to add the Task to the global registry to use it with model configs and flags. Thus, it must have a unique string name (`TASK_NAME` in this case).

Let's look at a few pre-processed examples from the validation set. Note they contain both the tokenized (integer) and plain-text inputs and targets. Moreover, we need to randomize data, `Shuffle=True` is a default value in `seqio.get_dataset`.


In [None]:
nq_task = seqio.TaskRegistry.get(TASK_NAME)
ds = nq_task.get_dataset(split="validation", sequence_length={"inputs": args.input_length, "targets": args.target_length})
print("A few preprocessed validation examples...")
for ex in tfds.as_numpy(ds.take(5)):
  print(ex['inputs_pretokenized'].decode('UTF-8'))
  print(ex)


# Some statistics for the dataset

In [None]:
valid_targets = [0] * VALID_SIZE
valid_inputs = [0] * VALID_SIZE
res = 0
for i, ex in enumerate(tfds.as_numpy(ds)):
  valid_targets[i] = len(ex['targets'])
  valid_inputs[i] = len(ex['inputs'])

In [None]:
import matplotlib.pyplot as plt
plt.hist(valid_targets, bins=5)
plt.show()

In [None]:
plt.hist(valid_inputs, bins=100)
plt.show()

# Transferring to new Tasks

We are now ready to fine-tune one of the pre-trained T5 models on our Task for xnli classification.

First, we'll instantiate a `Model` object using the model size of your choice. 

**Note:** larger models are slower to train and use but will likely achieve higher accuracy. You also may be able to increase accuracy by training longer with more `FINETUNE_STEPS` below.


#Define Model

We set *train_batch_size=16* due to memory limit. 

In [None]:
MODEL_SIZE = "small" #@param["small", "base", "large", "3B", "11B"]
# Public GCS path for T5 pre-trained model checkpoints
BASE_PRETRAINED_DIR = "gs://t5-data/pretrained_models"
PRETRAINED_DIR = os.path.join(BASE_PRETRAINED_DIR, MODEL_SIZE)
MODEL_DIR = os.path.join(MAIN_DIR, MODEL_SIZE, TASK_NAME)


# Set parallelism and batch size to fit on v2-8 TPU (if possible).
# Limit number of checkpoints to fit within 5GB (if possible).
model_parallelism, train_batch_size, keep_checkpoint_max = {
    "small": (1, 16, 8),
    "base": (2, 128, 8),
    "large": (8, 64, 4),
    "3B": (8, 16, 1),
    "11B": (8, 16, 1)}[MODEL_SIZE]

tf.io.gfile.makedirs(MODEL_DIR)
# The models from our paper are based on the Mesh Tensorflow Transformer.
model = t5.models.MtfModel(
    model_dir=MODEL_DIR,
    tpu=None,
    model_parallelism=model_parallelism,
    batch_size=train_batch_size,
    sequence_length={"inputs": args.input_length, "targets": args.target_length},
    learning_rate_schedule=0.003,
    save_checkpoints_steps=500,
    keep_checkpoint_max=None,
    iterations_per_loop=100,
)

Before we continue, let's load a TensorBoard visualizer so that we can keep monitor our progress. The page should automatically update as fine-tuning and evaluation proceed.

In [None]:
%reload_ext tensorboard
DIR = os.path.join(MAIN_DIR, MODEL_SIZE)
%tensorboard --logdir="$DIR" --port=0

#Create targets

For the supervised task like our task of classification we need to store the initial inputs and targets for the validation set.

In [None]:
# Save targets for the task
from t5.models.utils import write_targets_and_examples

VALID_DIR = os.path.join(MODEL_DIR, 'validation_eval')
if not os.path.exists(VALID_DIR):
  os.makedirs(VALID_DIR)

if not os.path.isfile(os.path.join(VALID_DIR, '{}_targets'.format(TASK_NAME))):
  xnli_targets = []
  for ex in tfds.as_numpy(ds):
    xnli_targets.append(ex['targets_pretokenized'])

  xnli_dataset = {TASK_NAME: ds}
  xnli_targets = {TASK_NAME: xnli_targets}

  write_targets_and_examples(VALID_DIR, xnli_targets, xnli_dataset)

#Fine-tune

Faster with GPU runtime

In [None]:
model.finetune(
    mixture_or_task_name=TASK_NAME,
    pretrained_model_dir=PRETRAINED_DIR,
    finetune_steps=FINETUNE_STEPS
)

In [None]:
model.batch_size = train_batch_size * 4
model.eval(
    mixture_or_task_name=TASK_NAME,
    checkpoint_steps=[1001000, 1001500, 1002000]
)

In [None]:
import random

def print_random_predictions(task_name, n=10):
  """Print n predictions from the validation split of a task."""
  # Grab the dataset for this task.
  ds = seqio.TaskRegistry.get(task_name).get_dataset(
      split="validation",
      sequence_length={"inputs": args.input_length, "targets": args.target_length},
      shuffle=False)

  def _prediction_file_to_ckpt(path):
    """Extract the global step from a prediction filename."""
    return int(path.split("_")[-2])

  # Grab the paths of all logged predictions.
  prediction_files = tf.io.gfile.glob(
      os.path.join(
          MODEL_DIR,
          "validation_eval/%s_*_predictions" % task_name))
  # Get most recent prediction file by sorting by their step.
  latest_prediction_file = sorted(
      prediction_files, key=_prediction_file_to_ckpt)[-1]

  # Collect (inputs, targets, prediction) from the dataset and predictions file
  results = []
  with tf.io.gfile.GFile(latest_prediction_file) as preds:
    for ex, pred in zip(tfds.as_numpy(ds), preds):
      results.append((tf.compat.as_text(ex["inputs_pretokenized"]),
                      tf.compat.as_text(ex["targets_pretokenized"]),
                      pred.strip()))

  print("<== Random predictions for %s using checkpoint %s ==>\n" %
        (task_name, 
         _prediction_file_to_ckpt(latest_prediction_file)))

  for inp, tgt, pred in random.choices(results, k=10):
    print("Input:", inp)
    print("Target:", tgt)
    print("Prediction:", pred)
    print("Counted as Correct?", tgt == pred)
    print()

print_random_predictions(TASK_NAME)

In [None]:
export_dir = os.path.join(MODEL_DIR, "export")

model.batch_size = 1 # make one prediction per call
saved_model_path = model.export(
    export_dir,
    checkpoint_step=-1,  # use most recent
    beam_size=1,  # no beam search
    temperature=1.0,  # sample according to predicted distribution
)
print("Model saved to:", saved_model_path)

In [None]:
# #@title Optional: Run this cell to re-initialize if you switched to GPU runtime.
# %tensorflow_version 2.x
# !pip install tensorflow-text
# from google.colab import auth
# auth.authenticate_user()

In [None]:
import tensorflow as tf
import tensorflow_text  # Required to run exported model.

def load_predict_fn(model_path):
  if tf.executing_eagerly():
    print("Loading SavedModel in eager mode.")
    imported = tf.saved_model.load(model_path, ["serve"])
    return lambda x: imported.signatures['serving_default'](tf.constant(x))['outputs'].numpy()
  else:
    print("Loading SavedModel in tf 1.x graph mode.")
    tf.compat.v1.reset_default_graph()
    sess = tf.compat.v1.Session()
    meta_graph_def = tf.compat.v1.saved_model.load(sess, ["serve"], model_path)
    signature_def = meta_graph_def.signature_def["serving_default"]
    return lambda x: sess.run(
        fetches=signature_def.outputs["outputs"].name, 
        feed_dict={signature_def.inputs["inputs"].name: x}
    )

predict_fn = load_predict_fn(saved_model_path)

In [None]:
def answer(question):
  return predict_fn([question])[0].decode('utf-8')

for question in ["where is the google headquarters?",
                 "what is the most populous country in the world?",
                 "ist es richtig?",
                 "добрый день",
                 "Passez une bonne soirée",
                 "صباح الخير",
                 "早上好"]:
    print(answer(question))