In [None]:
%matplotlib inline


Language Modeling with nn.Transformer and TorchText
===============================================================

This is a tutorial on training a sequence-to-sequence model that uses the
`nn.Transformer <https://pytorch.org/docs/stable/generated/torch.nn.Transformer.html>`__ module.

The PyTorch 1.2 release includes a standard transformer module based on the
paper `Attention is All You Need <https://arxiv.org/pdf/1706.03762.pdf>`__.
Compared to Recurrent Neural Networks (RNNs), the transformer model has proven
to be superior in quality for many sequence-to-sequence tasks while being more
parallelizable. The ``nn.Transformer`` module relies entirely on an attention
mechanism (implemented as
`nn.MultiheadAttention <https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html>`__)
to draw global dependencies between input and output. The ``nn.Transformer``
module is highly modularized such that a single component (e.g.,
`nn.TransformerEncoder <https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoder.html>`__)
can be easily adapted/composed.

![](https://github.com/pytorch/tutorials/blob/gh-pages/_downloads/_static/img/transformer_architecture.jpg?raw=1)





Define the model
----------------




In this tutorial, we train a ``nn.TransformerEncoder`` model on a
language modeling task. The language modeling task is to assign a
probability for the likelihood of a given word (or a sequence of words)
to follow a sequence of words. A sequence of tokens are passed to the embedding
layer first, followed by a positional encoding layer to account for the order
of the word (see the next paragraph for more details). The
``nn.TransformerEncoder`` consists of multiple layers of
`nn.TransformerEncoderLayer <https://pytorch.org/docs/stable/generated/torch.nn.TransformerEncoderLayer.html>`__.
Along with the input sequence, a square attention mask is required because the
self-attention layers in ``nn.TransformerEncoder`` are only allowed to attend
the earlier positions in the sequence. For the language modeling task, any
tokens on the future positions should be masked. To produce a probability
distribution over output words, the output of the ``nn.TransformerEncoder``
model is passed through a linear layer followed by a log-softmax function.




In [2]:
import math
from typing import Tuple

import torch
from torch import nn, Tensor
import torch.nn.functional as F
from torch.nn import TransformerEncoder, TransformerEncoderLayer
from torch.utils.data import dataset

class TransformerModel(nn.Module):

    def __init__(self, ntoken: int, d_model: int, nhead: int, d_hid: int,
                 nlayers: int, dropout: float = 0.5):
        super().__init__()
        self.model_type = 'Transformer'
        self.pos_encoder = PositionalEncoding(d_model, dropout)
        encoder_layers = TransformerEncoderLayer(d_model, nhead, d_hid, dropout)
        self.transformer_encoder = TransformerEncoder(encoder_layers, nlayers)
        self.encoder = nn.Embedding(ntoken, d_model)
        self.d_model = d_model
        self.decoder = nn.Linear(d_model, ntoken)

        self.init_weights()

    def init_weights(self) -> None:
        initrange = 0.1
        self.encoder.weight.data.uniform_(-initrange, initrange)
        self.decoder.bias.data.zero_()
        self.decoder.weight.data.uniform_(-initrange, initrange)

    def forward(self, src: Tensor, src_mask: Tensor) -> Tensor:
        """
        Args:
            src: Tensor, shape [seq_len, batch_size]
            src_mask: Tensor, shape [seq_len, seq_len]

        Returns:
            output Tensor of shape [seq_len, batch_size, ntoken]
        """
        src = self.encoder(src) * math.sqrt(self.d_model)
        src = self.pos_encoder(src)
        output = self.transformer_encoder(src, src_mask)
        output = self.decoder(output)
        return output


def generate_square_subsequent_mask(sz: int) -> Tensor:
    """Generates an upper-triangular matrix of -inf, with zeros on diag."""
    return torch.triu(torch.ones(sz, sz) * float('-inf'), diagonal=1)

``PositionalEncoding`` module injects some information about the
relative or absolute position of the tokens in the sequence. The
positional encodings have the same dimension as the embeddings so that
the two can be summed. Here, we use ``sine`` and ``cosine`` functions of
different frequencies.




In [3]:
class PositionalEncoding(nn.Module):

    def __init__(self, d_model: int, dropout: float = 0.1, max_len: int = 5000):
        super().__init__()
        self.dropout = nn.Dropout(p=dropout)

        position = torch.arange(max_len).unsqueeze(1)
        div_term = torch.exp(torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model))
        pe = torch.zeros(max_len, 1, d_model)
        pe[:, 0, 0::2] = torch.sin(position * div_term)
        pe[:, 0, 1::2] = torch.cos(position * div_term)
        self.register_buffer('pe', pe)

    def forward(self, x: Tensor) -> Tensor:
        """
        Args:
            x: Tensor, shape [seq_len, batch_size, embedding_dim]
        """
        x = x + self.pe[:x.size(0)]
        return self.dropout(x)

Load and batch data
-------------------




This tutorial uses ``torchtext`` to generate Wikitext-2 dataset. The
vocab object is built based on the train dataset and is used to numericalize
tokens into tensors. Wikitext-2 represents rare tokens as `<unk>`.

Given a 1-D vector of sequential data, ``batchify()`` arranges the data
into ``batch_size`` columns. If the data does not divide evenly into
``batch_size`` columns, then the data is trimmed to fit. For instance, with
the alphabet as the data (total length of 26) and ``batch_size=4``, we would
divide the alphabet into 4 sequences of length 6:

\begin{align}\begin{bmatrix}
  \text{A} & \text{B} & \text{C} & \ldots & \text{X} & \text{Y} & \text{Z}
  \end{bmatrix}
  \Rightarrow
  \begin{bmatrix}
  \begin{bmatrix}\text{A} \\ \text{B} \\ \text{C} \\ \text{D} \\ \text{E} \\ \text{F}\end{bmatrix} &
  \begin{bmatrix}\text{G} \\ \text{H} \\ \text{I} \\ \text{J} \\ \text{K} \\ \text{L}\end{bmatrix} &
  \begin{bmatrix}\text{M} \\ \text{N} \\ \text{O} \\ \text{P} \\ \text{Q} \\ \text{R}\end{bmatrix} &
  \begin{bmatrix}\text{S} \\ \text{T} \\ \text{U} \\ \text{V} \\ \text{W} \\ \text{X}\end{bmatrix}
  \end{bmatrix}\end{align}

Batching enables more parallelizable processing. However, batching means that
the model treats each column independently; for example, the dependence of
``G`` and ``F`` can not be learned in the example above.




In [4]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.11.3-py3-none-any.whl (2.9 MB)
[K     |████████████████████████████████| 2.9 MB 10.0 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 63.9 MB/s 
Collecting huggingface-hub>=0.0.17
  Downloading huggingface_hub-0.0.19-py3-none-any.whl (56 kB)
[K     |████████████████████████████████| 56 kB 5.7 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 61.8 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.46-py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 64.1 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  Attempting un

In [5]:
from transformers import BertTokenizer
import torch
import random
import numpy as np

tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
#13649397 2924871 2924871
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

"""
class SentenceIterator:

    def __init__(self, file_path, batch_size, begin_index, total_lines):
        self.file_path = file_path
        self.batch_size = batch_size
        self.total_lines = total_lines
        self.begin_index = begin_index
        self.offset = 0

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n <= self.total_lines:
            line_return = []
            
            #with open(self.file_path, "r") as f:
            #  for i, line in enumerate(f):
            #    if len(line_return) == self.batch_size:
            #      break
            #    elif i >= self.begin_index + self.n and i < self.begin_index + self.n + self.batch_size:
            #      line_return.append(line.replace('\n',''))
            #  self.n += self.batch_size
            #return tokenizer(
            #          line_return,
            #          padding=True, 
            #          truncation=True,
            #          add_special_tokens=True, # Add '[CLS]' and '[SEP]'
            #         return_token_type_ids=False,
            #          return_attention_mask=False,
            #          return_tensors='pt',  # Return PyTorch tensors
            #        )["input_ids"]
            
            with open(self.file_path, "r") as f:
              f.seek(self.offset)
              for i in range(self.begin_index + self.n, self.begin_index + self.n + self.batch_size):
                line = f.readline()
                print(str(i),"---",line,"---",str(self.offset))
                line_without_newline = line.replace('\n','')
                line_return.append(line_without_newline)
                difference = len(line) - len(line_without_newline)
                self.offset += len(line) + difference
              self.n += self.batch_size
            return tokenizer(
                      line_return,
                      padding=True, 
                      truncation=True,
                      add_special_tokens=True, # Add '[CLS]' and '[SEP]'
                      return_token_type_ids=False,
                      return_attention_mask=False,
                      return_tensors='pt',  # Return PyTorch tensors
                    )["input_ids"]
        else:
            raise StopIteration

train_iterable = SentenceIterator(file_path = "/content/drive/MyDrive/Project2/wikipedia16.txt", batch_size=35, begin_index=0, total_lines=13649397)
train_iterator = iter(train_iterable)
"""

class SentenceIterator:

    def __init__(self, dataset, batch_size, device):
        self.dataset = dataset
        self.batch_size = batch_size
        self.total_lines = len(dataset)
        self.device = device

    def __iter__(self):
        self.n = 0
        return self

    def __next__(self):
        if self.n <= self.total_lines:
            batch = tokenizer(
                      self.dataset[self.n : self.n + self.batch_size],
                      padding=True, 
                      truncation=True,
                      add_special_tokens=True, # Add '[CLS]' and '[SEP]'
                      return_token_type_ids=False,
                      return_attention_mask=False,
                      return_tensors='pt',  # Return PyTorch tensors
                    )["input_ids"].to('cpu')
            self.n += self.batch_size
            return torch.tensor([np.concatenate((item[0:np.where(item == 102)[0][0]-1],item[np.where(item == 102)[0][0]:len(item)]),axis=0) for item in batch.numpy()], dtype=torch.long).to(self.device), torch.tensor([np.concatenate(([item[0]],item[2:len(item)]),axis=0) for item in batch.numpy()], dtype=torch.long).reshape(-1).to(self.device)
        else:
            raise StopIteration


def get_data(file_path, train_size):
  with open(file_path, "r") as f:
    sentences = f.readlines()
    random.shuffle(sentences)
    len_sentences = len(sentences)
    len_train_sentences = int(len_sentences * train_size)
    len_val_sentences = int((len_sentences - len_train_sentences) // 2)
    train_sentences = sentences[0:len_train_sentences]
    val_sentences = sentences[len_train_sentences:len_train_sentences + len_val_sentences]
    test_sentences = sentences[len_train_sentences + len_val_sentences:len_sentences]
    del sentences
    return train_sentences, val_sentences, test_sentences

train_data, val_data, test_data = get_data(file_path = "/content/drive/MyDrive/Project2/wikipedia16-large.txt", train_size = 0.7)

#train_iterable = SentenceIterator(dataset = train_data, batch_size=35, device = device)
#train_iterator = iter(train_iterable)

#batch_input, batch_target = next(train_iterator)
#print(batch_input.shape, batch_target.shape)
#for batch_input, batch_target in train_iterator:
#  print(batch_input.shape, batch_target.shape)
#  break

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Functions to generate input and target sequence
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~




``get_batch()`` generates a pair of input-target sequences for
the transformer model. It subdivides the source data into chunks of
length ``bptt``. For the language modeling task, the model needs the
following words as ``Target``. For example, with a ``bptt`` value of 2,
we’d get the following two Variables for ``i`` = 0:

![](https://github.com/pytorch/tutorials/blob/gh-pages/_downloads/_static/img/transformer_input_target.png?raw=1)


It should be noted that the chunks are along dimension 0, consistent
with the ``S`` dimension in the Transformer model. The batch dimension
``N`` is along dimension 1.




In [None]:
#This is not necessary
bptt = 35
def get_batch(source: Tensor, i: int) -> Tuple[Tensor, Tensor]:
    """
    Args:
        source: Tensor, shape [full_seq_len, batch_size]
        i: int

    Returns:
        tuple (data, target), where data has shape [seq_len, batch_size] and
        target has shape [seq_len * batch_size]
    """
    seq_len = min(bptt, len(source) - 1 - i)
    data = source[i:i+seq_len]
    target = source[i+1:i+1+seq_len].reshape(-1)
    return data, target

Initiate an instance
--------------------




The model hyperparameters are defined below. The vocab size is
equal to the length of the vocab object.




In [6]:
ntokens = 30592  # size of vocabulary in BERT
batch_size = 35
emsize = 768  # embedding dimension
d_hid = emsize * 4  # dimension of the feedforward network model in nn.TransformerEncoder
nlayers = 2  # number of nn.TransformerEncoderLayer in nn.TransformerEncoder
nhead = 2  # number of heads in nn.MultiheadAttention
dropout = 0.2  # dropout probability
model = TransformerModel(ntokens, emsize, nhead, d_hid, nlayers, dropout).to(device)

Run the model
-------------




We use `CrossEntropyLoss <https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html>`__
with the `SGD <https://pytorch.org/docs/stable/generated/torch.optim.SGD.html>`__
(stochastic gradient descent) optimizer. The learning rate is initially set to
5.0 and follows a `StepLR <https://pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.StepLR.html>`__
schedule. During training, we use `nn.utils.clip_grad_norm\_ <https://pytorch.org/docs/stable/generated/torch.nn.utils.clip_grad_norm_.html>`__
to prevent gradients from exploding.




In [7]:
import copy
import time

criterion = nn.CrossEntropyLoss()
lr = 5.0  # learning rate
optimizer = torch.optim.SGD(model.parameters(), lr=lr)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, 1.0, gamma=0.95)

def train(model: nn.Module) -> None:
    model.train()  # turn on train mode
    total_loss = 0.
    log_interval = 200
    start_time = time.time()
    src_mask = generate_square_subsequent_mask(batch_size).to(device)
    num_batches = len(train_data) // batch_size
    batch = 0
    train_iterable = SentenceIterator(dataset = train_data, batch_size=batch_size, device = device)
    train_iterator = iter(train_iterable)
    for batch_input, batch_target in train_iterator:
        current_batch_size = batch_input.size(0)
        if batch_size != current_batch_size:  # only on last batch
            src_mask = src_mask[:current_batch_size, :current_batch_size]
        output = model(batch_input, src_mask)
        #print(output.shape, batch_input.shape, batch_target.shape)
        loss = criterion(output.view(-1, ntokens), batch_target)
        optimizer.zero_grad()
        loss.backward()
        torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
        optimizer.step()

        total_loss += loss.item()
        if batch % log_interval == 0 and batch > 0:
            lr = scheduler.get_last_lr()[0]
            ms_per_batch = (time.time() - start_time) * 1000 / log_interval
            cur_loss = total_loss / log_interval
            ppl = math.exp(cur_loss)
            print(f'| epoch {epoch:3d} | {batch:5d}/{num_batches:5d} batches | '
                  f'lr {lr:02.2f} | ms/batch {ms_per_batch:5.2f} | '
                  f'loss {cur_loss:5.2f} | ppl {ppl:8.2f}')
            total_loss = 0
            start_time = time.time()
        if batch % 2000 == 0 and batch > 0:
            torch.save(model.state_dict(), "/content/drive/MyDrive/Project2/transformer3.pt")
        batch += 1

def evaluate(model: nn.Module, eval_data) -> float:
    model.eval()  # turn on evaluation mode
    total_loss = 0.
    src_mask = generate_square_subsequent_mask(batch_size).to(device)
    with torch.no_grad():
        val_iterable = SentenceIterator(dataset = eval_data, batch_size=batch_size, device = device)
        val_iterator = iter(val_iterable)
        for batch_input, batch_target in val_iterator:
            current_batch_size = batch_input.size(0)
            if batch_size != current_batch_size:  # only on last batch
                src_mask = src_mask[:current_batch_size, :current_batch_size]
            output = model(batch_input, src_mask)
            output_flat = output.view(-1, ntokens)
            total_loss += batch_size * criterion(output_flat, batch_target).item()
    return total_loss / (len(eval_data) - 1)

Loop over epochs. Save the model if the validation loss is the best
we've seen so far. Adjust the learning rate after each epoch.



In [8]:
best_val_loss = float('inf')
epochs = 1
best_model = None

for epoch in range(1, epochs + 1):
    epoch_start_time = time.time()
    train(model)
    val_loss = evaluate(model, val_data)
    val_ppl = math.exp(val_loss)
    elapsed = time.time() - epoch_start_time
    print('-' * 89)
    print(f'| end of epoch {epoch:3d} | time: {elapsed:5.2f}s | '
          f'valid loss {val_loss:5.2f} | valid ppl {val_ppl:8.2f}')
    print('-' * 89)

    if val_loss < best_val_loss:
        best_val_loss = val_loss
        best_model = copy.deepcopy(model)

    scheduler.step()

| epoch   1 |   200/389982 batches | lr 5.00 | ms/batch 70.34 | loss  7.29 | ppl  1469.73
| epoch   1 |   400/389982 batches | lr 5.00 | ms/batch 68.41 | loss  4.63 | ppl   102.02
| epoch   1 |   600/389982 batches | lr 5.00 | ms/batch 68.91 | loss  4.35 | ppl    77.35
| epoch   1 |   800/389982 batches | lr 5.00 | ms/batch 68.05 | loss  4.16 | ppl    63.80
| epoch   1 |  1000/389982 batches | lr 5.00 | ms/batch 67.44 | loss  4.12 | ppl    61.44
| epoch   1 |  1200/389982 batches | lr 5.00 | ms/batch 67.66 | loss  4.09 | ppl    59.86
| epoch   1 |  1400/389982 batches | lr 5.00 | ms/batch 68.89 | loss  3.96 | ppl    52.31
| epoch   1 |  1600/389982 batches | lr 5.00 | ms/batch 68.23 | loss  3.92 | ppl    50.41
| epoch   1 |  1800/389982 batches | lr 5.00 | ms/batch 67.80 | loss  3.96 | ppl    52.64
| epoch   1 |  2000/389982 batches | lr 5.00 | ms/batch 67.43 | loss  3.95 | ppl    51.96
| epoch   1 |  2200/389982 batches | lr 5.00 | ms/batch 73.03 | loss  3.84 | ppl    46.41
| epoch   

Evaluate the best model on the test dataset
-------------------------------------------




In [10]:
test_loss = evaluate(best_model, test_data)
test_ppl = math.exp(test_loss)
print('=' * 89)
print(f'| End of training | test loss {test_loss:5.2f} | '
      f'test ppl {test_ppl:8.2f}')
print('=' * 89)

| End of training | test loss  3.47 | test ppl    32.15


In [22]:
#torch.save(best_model.state_dict(), "/content/drive/MyDrive/Project2/transformer3.pt")
#best_model = TransformerModel(ntokens, emsize, nhead, d_hid, nlayers, dropout).to(device)
#best_model.load_state_dict(torch.load("/content/drive/MyDrive/Project2/transformer2.pt", map_location=torch.device(device)))
sentence = ["left of the political spectrum [MASK]"]
src_mask = generate_square_subsequent_mask(1).to(device)
sentence_data = tokenizer(
                      sentence,
                      padding=True, 
                      truncation=True,
                      add_special_tokens=True, # Add '[CLS]' and '[SEP]'
                      return_token_type_ids=False,
                      return_attention_mask=False,
                      return_tensors='pt',  # Return PyTorch tensors
                    )["input_ids"].to(device)
best_model.eval()
with torch.no_grad():
  print("input size", sentence_data.shape)
  output = best_model(sentence_data, src_mask)
  output_flat = output.view(-1, ntokens)
  print("output size", output.shape)
  print(output)
  print("output flat size", output_flat.shape)
  print(output_flat)
  print(nn.Softmax(dim=1)(output_flat))
  result_index = torch.argmax(nn.Softmax(dim=1)(output_flat), dim=1)
  print(result_index)
  print(sentence)
  print(tokenizer.convert_ids_to_tokens(sentence_data[0]))
  print(tokenizer.convert_ids_to_tokens(result_index))



input size torch.Size([1, 8])
output size torch.Size([1, 8, 30592])
tensor([[[ 9.2725, -0.3353, -0.7416,  ..., -1.4121, -1.6973, -0.5973],
         [ 5.1012, -1.2107, -1.6178,  ..., -2.3843, -2.1508, -1.4338],
         [ 6.2856, -2.0802, -2.0036,  ..., -2.0090, -2.3015, -2.2718],
         ...,
         [ 9.6759, -1.5315, -1.5371,  ..., -1.6680, -2.2948, -1.4584],
         [ 7.5608, -1.7483, -1.7033,  ..., -2.0602, -1.9165, -1.7665],
         [14.6608, -0.6735, -1.0620,  ..., -0.8586, -0.9394, -1.2344]]],
       device='cuda:0')
output flat size torch.Size([8, 30592])
tensor([[ 9.2725, -0.3353, -0.7416,  ..., -1.4121, -1.6973, -0.5973],
        [ 5.1012, -1.2107, -1.6178,  ..., -2.3843, -2.1508, -1.4338],
        [ 6.2856, -2.0802, -2.0036,  ..., -2.0090, -2.3015, -2.2718],
        ...,
        [ 9.6759, -1.5315, -1.5371,  ..., -1.6680, -2.2948, -1.4584],
        [ 7.5608, -1.7483, -1.7033,  ..., -2.0602, -1.9165, -1.7665],
        [14.6608, -0.6735, -1.0620,  ..., -0.8586, -0.9394, -1.

In [None]:
lista = []
train_iter, val_iter, test_iter = WikiText2()
for item in train_iter:
  lista.append(item)
print(lista[0:5])

train_iter, val_iter, test_iter = WikiText2()
train_data = data_process(train_iter)
print(train_data.size())
train_data = batchify(train_data, batch_size)
print(train_data.size())
print(vocab.lookup_tokens(train_data[0].numpy()))
bptt = 35
num_batches = len(train_data) // bptt
for batch, i in enumerate(range(0, train_data.size(0) - 1, bptt)):
  data, targets = get_batch(train_data, i)
  print(data.shape, targets.shape)
  print(vocab.lookup_tokens(data[0].numpy()))
  print(vocab.lookup_tokens(targets.numpy()[0:20]))
  break

[' \n', ' = Valkyria Chronicles III = \n', ' \n', ' Senjō no Valkyria 3 : <unk> Chronicles ( Japanese : 戦場のヴァルキュリア3 , lit . Valkyria of the Battlefield 3 ) , commonly referred to as Valkyria Chronicles III outside Japan , is a tactical role @-@ playing video game developed by Sega and Media.Vision for the PlayStation Portable . Released in January 2011 in Japan , it is the third game in the Valkyria series . <unk> the same fusion of tactical and real @-@ time gameplay as its predecessors , the story runs parallel to the first game and follows the " Nameless " , a penal military unit serving the nation of Gallia during the Second Europan War who perform secret black operations and are pitted against the Imperial unit " <unk> Raven " . \n', " The game began development in 2010 , carrying over a large portion of the work done on Valkyria Chronicles II . While it retained the standard features of the series , it also underwent multiple adjustments , such as making the game more <unk> for s