Generate sentences to augment the dataset
-------------------------------------------------

In this notebook we will try to create a generative-adversarial network which will generate for us new sentences in order to augment the corpora size. We will use the `pytorch-lightning` module to improve the training fastness. 

- The generative model will understand the following characteristics:
    - we will provide the `size of the sequences` to a first model to generate a output of the same size that the given sequences
    - the output will be rounded in order to be transmit to the discriminator
    - we will use a transformer encoder to the generate sentence ids in place of a simple `RNN` module
    - some rules will used on the decoded output in order to obtain the textual sentences

- The discriminative model will be used to verify if the output is close to the true sentences:
    - ~~we will use for that a pre-trained BERT Model to discriminate of the output~~
    - A Multi-Layers Perceptron will be sufficient to discriminate the output
    - we will tokenize the GAN inputs with a WordPiece tokenizer without normalizer because we want to generate texts


    

### Steps

The following steps will be required:

- Create a custom dataset to recuperate the sentences
- Create the generator
- Create the discriminator
- ~~Create the GAN~~
- Create Trainer 
- Search for the best parameters
- Train the model and evaluate it

### Create a custom dataset

Let us use the already trained tokenizer to recuperate the encoded sequences. Note that this dataset is different from that we want to use to train the translation model.

In [3]:
# %%writefile wolof-translate/wolof_translate/data/gan_dataset.py
import torch
import pandas as pd
from torch import nn
from tokenizers import Tokenizer
from torch.utils.data import Dataset

class SentenceDatasetGAN(Dataset):
    
    def __init__(self, file_path: str, corpus_1: str = "french_corpus", corpus_2: str = "wolof_corpus",
                 tokenizer_path: str = "wolof-translate/wolof_translate/tokenizers/adverse_tokenizer.json",
                 cls_token: str = "[CLS]", sep_token: str = "[SEP]", sep: str = ",", **kwargs):
        
        # let us recuperate the data frame
        self.__sentences = pd.read_csv(file_path, sep=sep, **kwargs)
        
        # let us recuperate the tokenizer
        self.tokenizer = Tokenizer.from_file(tokenizer_path)
        
        # recuperate the first corpus' sentences
        self.__sentences_1 = self.__sentences[corpus_1].to_list()
        
        # recuperate the second corpus' sentences
        self.__sentences_2 = self.__sentences[corpus_2].to_list()
        
        # recuperate the special tokens
        self.cls_token = cls_token
        
        self.sep_token = sep_token
        
        # recuperate the length
        self.__length = len(self.__sentences_1)
        
        # recuperate the max id
        self.max_id = self.tokenizer.get_vocab_size() - 1
        
        # let us recuperate the max len
        self.max_len = 0
        
        for i in range(self.__length):
            
            sentence = f"{self.cls_token}{self.__sentences_1[i]}{self.sep_token}{self.__sentences_2[i]}{self.sep_token}"
            
            encoding = self.tokenizer.encode(sentence)
            
            if len(encoding.ids) > self.max_len:
                
                self.max_len = len(encoding.ids)    
        
    def __getitem__(self, index):
        
        sentence_1 = self.__sentences_1[index]
        
        sentence_2 = self.__sentences_2[index]
        
        # let us create the sentence with special tokens
        sentence = f"{self.cls_token}{sentence_1}{self.sep_token}{sentence_2}{self.sep_token}"
        
        # let us encode the sentence
        encoding = self.tokenizer.encode(sentence)
        
        # it will return the padded ids and attention mask
        padding = self.max_len - len(encoding.ids)
        
        ids = torch.tensor(encoding.ids + [0] * padding)
        
        return ids.float(), (ids > 0).float()
        
    def __len__(self):
        
        return self.__length

  from .autonotebook import tqdm as notebook_tqdm


The data loader will generate the padded sequences of ids and the attention masks. Let us test it bellow.

In [4]:
dataset = SentenceDatasetGAN("data/extractions/new_data/sent_extraction.csv")

In [5]:
from torch.utils.data import DataLoader

# let us generate 10 sentences
ids, mask = next(iter(DataLoader(dataset, batch_size=10, shuffle=True)))

print("Ids:")
print(ids)

print("\nMask:")
print(mask)

Ids:
tensor([[2.0000e+00, 8.0400e+02, 3.2400e+02,  ..., 0.0000e+00, 0.0000e+00,
         0.0000e+00],
        [2.0000e+00, 5.7200e+02, 1.5280e+03,  ..., 0.0000e+00, 0.0000e+00,
         0.0000e+00],
        [2.0000e+00, 2.0060e+03, 1.1000e+01,  ..., 0.0000e+00, 0.0000e+00,
         0.0000e+00],
        ...,
        [2.0000e+00, 3.7700e+02, 2.4300e+02,  ..., 0.0000e+00, 0.0000e+00,
         0.0000e+00],
        [2.0000e+00, 3.0260e+03, 1.1000e+01,  ..., 0.0000e+00, 0.0000e+00,
         0.0000e+00],
        [2.0000e+00, 1.3860e+03, 1.8560e+03,  ..., 0.0000e+00, 0.0000e+00,
         0.0000e+00]])

Mask:
tensor([[1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        ...,
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.],
        [1., 1., 1.,  ..., 0., 0., 0.]])


### Generator

The generator use a transformer encoder with a d_model, a number of layers, a number of features and activation function specified as arguments. We can also specify a drop out. 

In [6]:
# %%writefile wolof-translate/wolof_translate/models/generative_model.py
from torch.nn import functional as F
from custom_rnn.transformers.add_position import PositionalEncoding
from typing import *
from torch import nn

class SentenceGenerator(nn.Module):
    
    def __init__(self, 
                 output_size: int,
                 d_model: int = 512,
                 latent_dim: Union[int, None] = None,
                 num_features: int = 2048,
                 n_heads: int = 8,
                 dropout: float = 0.0,
                 activation = F.relu,
                 num_layers: int = 6,
                 min: int = 0, max: int = 100):
        
        super(SentenceGenerator, self).__init__()
        
        self.min, self.max = min, max
        
        self.d_model = d_model
        
        self.n_heads = n_heads
        
        self.dropout = dropout
        
        self.activation = activation
        
        
        self.num_layers = num_layers
        
        self.num_features = num_features
        
        self.output_size = output_size
        
        self.latent_dim = latent_dim if not latent_dim is None else self.output_size
        
        
        self.pe = PositionalEncoding(self.latent_dim, self.d_model)
        
        self.encoder_layer = nn.TransformerEncoderLayer(self.d_model,
                                                        self.n_heads,
                                                        self.num_features,
                                                        self.dropout,
                                                        self.activation,
                                                        batch_first=True)
        
        self.encoder = nn.TransformerEncoder(self.encoder_layer, self.num_layers)
        
        self.output_layer = nn.Linear(self.d_model * self.latent_dim, output_size)
        
    def forward(self, input_, attention_mask):
        
        out = self.pe(input_).type_as(next(self.encoder.parameters()))
        
        out = self.encoder(out, src_key_padding_mask = attention_mask).view(-1, self.latent_dim * self.d_model)
            
        out = torch.clip(self.output_layer(out), self.min, self.max).round()
        
        return out
        

Let us test our generative model with dummy input.

In [7]:
generative_model = SentenceGenerator(output_size=dataset.max_len)

In [8]:
# the output must be rounded to the nearest integer and clipped between the lowest and the highest ids
g_output = generative_model(torch.randn((10, 379, 512)), mask)

g_output.size()

torch.Size([10, 379])

#### Discriminator

-----------------------------

**Remark**: We will use another type of discriminator than the BERT since the BERT embed the input. Then we will lose the gradient of the generator output when making the forward pass through the BERT model. See the nextly, another discriminator model implemented as the generator model.

Like we specified earlier the discriminator will be the pre-trained base BERT Model. Let us import the necessary libraries.

In [9]:
# try:

#     from transformers import AdamW, BertForSequenceClassification, get_linear_schedule_with_warmup

# except ImportError:
    
#     !pip install transformers
    
#     from transformers import AdamW, BertForSequenceClassification, get_linear_schedule_with_warmup
    
# try:
    
#     from pytorch_pretrained_bert.optimization import BertAdam

# except ImportError:
    
#     !pip install pytorch_pretrained_bert
    
#     from pytorch_pretrained_bert.optimization import BertAdam

# from tqdm import tqdm, trange

Let us load the pre-trained Bert Model.

In [10]:
# discriminator = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels = 1)

Let us initialize the optimizer group parameters. We need to add a weight decay rate to some parameters to avoid over-fitting.

In [11]:
# # recuperate the named parameters
# param_optimizer = list(discriminator.named_parameters())

# # identify the parameters with no decay
# no_decay = ['bias', 'LayerNorm.Weight']

# # Filter the parameters
# optimizer_group_parameters = [
#     {
#         'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
#         'weight_decay_rate': 0.1
#     },
#     {
#         'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
#         'weight_decay_rate': 0.0
#     }
# ]


Let us now configure the optimizer. We will use the BERT version of the Adam optimizer, `BertAdam`:

In [12]:
# opt_d = BertAdam(optimizer_group_parameters,
#                  lr=2e-5,
#                  warmup=.1)

We will reply the above configuration in the `GAN` Model.

Let us test the discriminator with the generator output.

In [13]:
# discriminator(g_output, attention_mask = None)

It gave us a object containing the loss (we must specify the labels to obtain them) and the logits. The latter are the most important. We will give to the discriminator the attention mask in addition to the output of the generator.

We must create a final discriminator model including the Sigmoid to obtain probabilities in place of logits.

In [14]:
# # %%writefile wolof-translate/wolof_translate/models/discriminative_model.py
# from transformers import BertForSequenceClassification, get_linear_schedule_with_warmup
# from pytorch_pretrained_bert.optimization import BertAdam
# from tqdm import tqdm, trange
# from torch import nn


# class SentenceDiscriminator(nn.Module):
    
#     def __init__(self):
        
#         super(SentenceDiscriminator, self).__init__()
        
#         self.bert_model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels = 1)
        
#         self.sigmoid = nn.Sigmoid()
    
#     def forward(self, input_, attention_mask = None):
        
#         out = self.bert_model(input_, attention_mask = attention_mask).logits
        
#         out = self.sigmoid(out)
        
#         return out
        

---------------------------------------------------

Let us create a new discriminator model different from the BERT Model. It will take output of the generator without converting it to a long tensor since doing so will make us losing the gradient.

In [15]:
# %%writefile wolof-translate/wolof_translate/models/discriminative_model.py
from torch.nn import functional as F
from typing import *
from torch import nn

class DiscriminatorSequence(nn.Module):
    
    def __init__(self, 
                 input_dim,
                 num_features,
                 negative_slope: float = 0.01,
                 drop_out: float = 0.0,
                 eps: float = 0.00001,
                 momentum: float = 0.1):
        
        super(DiscriminatorSequence, self).__init__()
        
        self.batch_norm = nn.BatchNorm1d(input_dim, eps, momentum)
        
        self.linear = nn.Linear(input_dim, num_features)
        
        self.drop_out = nn.Dropout1d(drop_out)
        
        self.activation = nn.LeakyReLU(negative_slope)
        
        
    def forward(self, input_):
        
        out = self.batch_norm(input_)
        
        out = self.activation(self.drop_out(self.linear(out)))
        
        return out

class SentenceDiscriminator(nn.Module):
    
    def __init__(self, 
                 input_dim: int,
                 num_features: Union[int, List] = 300,
                 num_layers: int = 5,
                 negative_slope: float = 0.01,
                 drop_out: float = 0.0,
                 eps: float = 0.00001,
                 momentum: float = 0.1):
        
        super(SentenceDiscriminator, self).__init__()
        
        self.input_dim = input_dim
        
        self.num_features = [num_features] * num_layers if type(num_features) is int else num_features
        
        assert len(self.num_features) == num_layers
        
        self.num_layers = num_layers
        
        self.sequences = nn.ModuleList()
        
        self.sequences.append(DiscriminatorSequence(input_dim, self.num_features[0], negative_slope, drop_out, eps, momentum))
        
        for l in range(1, num_layers):
            
            self.sequences.append(DiscriminatorSequence(self.num_features[l-1], self.num_features[l], negative_slope, drop_out, eps, momentum))
        
        self.output_layer = nn.Linear(self.num_features[-1], 1)
        
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, input_: torch.Tensor):
        
        out = input_
        
        for sequence in self.sequences:
            
            out = sequence(out)
        
        out = self.sigmoid(self.output_layer(out))
        
        return out
        

#### GAN Model

We will use the `pytorch-lightning` to create and train the GAN Model:

-----------------------

In [16]:
# # %%writefile wolof-translate/wolof_translate/models/gan_model.py
# from pytorch_lightning import LightningModule, Trainer
# from torch.nn import functional as F
# from torch.utils.data import Dataset
# from torch.optim import Adam 
# import IPython.display as ipd
# from typing import *
# from torch import nn
# import torch
# import os

# class SentenceGAN(LightningModule):
    
#     def __init__(
#         self,
#         latent_dim: int,
#         dataset: Dataset,
#         g_learning_rate: float,
#         d_learning_rate: float,
#         g_num_features: Union[int, list] = 300,
#         g_num_layers: int = 5,
#         g_negative_slope: float = 0.01,
#         g_drop_out: float = 0,
#         g_eps: float = 0.00001,
#         g_momentum: float = 0.1,
#         d_warmup: float = .1,
#         d_decay: float = .1,
#     ):
#         super().__init__()
        
#         self.save_hyperparameters()
        
#         # Initialize the dataset
#         self.dataset = dataset
        
#         # Initialize the generator and the discriminator
#         self.generator = SentenceGenerator(latent_dim,
#                                            dataset.max_len,
#                                            g_num_features,
#                                            g_num_layers,
#                                            g_negative_slope,
#                                            g_drop_out,
#                                            g_eps,
#                                            g_momentum
#                                            )
        
#         self.discriminator = SentenceDiscriminator()
    
#         # Generate a batch of 10 noisy data
#         self.noisy_data = torch.randn(10, self.hparams.latent_dim)
    
#     def forward(self, x):
        
#         return self.generator(x)
    
#     def adversarial_loss(self, y_pred, y):
        
#         return F.binary_cross_entropy(y_pred, y) # we can also use the binary cross entropy with logits if logits were returned
    
#     def training_step(self, batch, batch_idx, optimizer_idx):
        
#         # Recuperate the real ids and the masks from the batch
#         real_ids, attention_mask = batch
        
#         # Generate noisy data
#         noisy_data = torch.randn(real_ids.size(0), self.hparams.latent_dim)
        
#         noisy_data = noisy_data.type_as(next(self.parameters()))
        
#         if optimizer_idx == 0:
            
#             # Generate fake ids
#             fake_ids = self.convert(self(noisy_data))
#             print("fake ids 1")
#             print(fake_ids.requires_grad)
#             # We consider that the fake ids are real
#             y = torch.ones(real_ids.size(0), 1)
            
#             y = y.type_as(next(self.parameters()))
            
#             # Predict the veracity of the fake ids
#             y_pred = self.discriminator(fake_ids, attention_mask = attention_mask)
            
#             # Calculate the loss
#             loss = self.adversarial_loss(y_pred, y)
            
#             # Print the loss
#             self.log("g loss", loss, on_step=True, on_epoch=False)
            
#             return loss
        
#         elif optimizer_idx == 1:
            
#             # Generate fake ids
#             fake_ids = self.convert(self(noisy_data))
#             print("fake_ids 2")
#             print(fake_ids.requires_grad)
#             # We consider that the real ids as the true data
#             y_true = torch.ones(real_ids.size(0), 1).float()
            
#             y_true = y_true.type_as(next(self.parameters()))
            
#             # Predict the veracity of the true data
#             y_pred_true = self.discriminator(self.convert(real_ids), attention_mask = attention_mask)
            
#             if y_pred_true.ndim > 2:
                
#                 y_pred_true = y_pred_true.view(y_pred_true.size(0), 1)
            
#             # Calculate the loss on the real ids
#             real_loss = self.adversarial_loss(y_pred_true, y_true)
            
#             # Consider the fake ids to be false
#             y_false = torch.zeros(real_ids.size(0), 1)
            
#             y_false = y_false.type_as(next(self.parameters()))
            
#             # Predict the veracity of the false data
#             y_pred_false = self.discriminator(fake_ids, attention_mask = attention_mask)
            
#             # Calculate the loss on the fake ids
#             fake_loss = self.adversarial_loss(y_pred_false, y_false)
            
#             # Calculate the average loss
#             loss = (real_loss + fake_loss) / 2
            
#             # Print the loss
#             self.log("d loss", loss, on_step=True, on_epoch=False)
            
#             return loss
    
#     def configure_optimizers(self):
        
#         G_LR = self.hparams.g_learning_rate
#         D_LR = self.hparams.d_learning_rate
        
#         WARMUP = self.hparams.d_warmup
        
#         DECAY = self.hparams.d_decay
        
#         # recuperate the named parameters
#         param_optimizer = list(self.discriminator.bert_model.named_parameters())

#         # identify the parameters with no decay
#         no_decay = ['bias', 'LayerNorm.Weight']

#         # Filter the parameters
#         optimizer_group_parameters = [
#             {
#                 'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
#                 'weight_decay_rate': DECAY
#             },
#             {
#                 'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
#                 'weight_decay_rate': 0.0
#             }
#         ]
        
#         opt_g = Adam(self.generator.parameters(), lr = G_LR)
        
#         opt_d = BertAdam(optimizer_group_parameters, lr = D_LR, warmup = WARMUP)
        
#         return [opt_g, opt_d], []
    
#     def on_train_epoch_end(self):
        
#         # recuperate the noisy data for prediction
#         noisy_data = self.noisy_data.type_as(self.generator.sequences[0].linear.weight)
        
#         generated_data = self(noisy_data).cpu().detach()
            
#         generated_data = generated_data.tolist()
        
#         print(f"Generated sentences at {self.current_epoch}")
        
#         for data in generated_data:
            
#             sentence = self.hparams.dataset.tokenizer.decode(data)
            
#             print(sentence)
        
#     def generate(self, number: int = 10):
        
#         # Generate noisy data
#         noisy_data = torch.randn(number, self.hparams.latent_dim)
        
#         noisy_data = noisy_data.type_as(self.generator.sequence[0].linear.weight)
        
#         # decode and return the decode sentences
#         generated_data = self(noisy_data).cpu().detach().tolist()
        
#         return [self.hparams.tokenizer.decode(data) for data in generated_data]
    
#     def convert(self, logits: torch.Tensor):
#         print("Before convert")
#         print(logits.requires_grad)
#         return logits.round().clip(0, self.hparams.dataset.max_id)


--------------------------------

In [17]:
# %%writefile wolof-translate/wolof_translate/models/gan_model.py
from pytorch_lightning import LightningModule, Trainer
from torch.nn import functional as F
from torch.utils.data import Dataset
from torch.optim import Adam 
import IPython.display as ipd
from typing import *
from torch import nn
import torch
import os

class SentenceGAN(LightningModule):
    
    def __init__(
        self,
        config: dict,
        dataset: Dataset,
        d_model: int = 512,
        latent_dim: Union[int, None] = None,
        g_num_features: int = 2048,
        n_heads: int = 8,
        g_dropout: float = 0.0,
        g_activation = F.relu,
        g_num_layers: int = 6,
        d_num_features: Union[int, list] = 500,
        d_num_layers: int = 3,
        d_negative_slope: float = 0.01,
        d_drop_out: float = 0,
        d_eps: float = 0.00001,
        d_momentum: float = 0.1
    ):
        super().__init__()
        
        self.save_hyperparameters()
        
        # Initialize the dataset
        self.dataset = dataset
        
        # Initialize the generator and the discriminator
        self.generator = SentenceGenerator(
            dataset.max_len,
            d_model,
            latent_dim,
            g_num_features,
            n_heads,
            g_dropout,
            g_activation,
            g_num_layers, min = 0, max = dataset.max_id
        )
        
        self.discriminator = SentenceDiscriminator(dataset.max_len,
                                                   d_num_features,
                                                   d_num_layers,
                                                   d_negative_slope,
                                                   d_drop_out,
                                                   d_eps,
                                                   d_momentum)
    
        # Generate a batch of 10 noisy data
        self.noisy_data = torch.randn(10, self.generator.latent_dim, self.hparams.d_model)
    
    def forward(self, x, mask):
        
        return self.generator(x, mask)
    
    def adversarial_loss(self, y_pred, y):
        
        return F.binary_cross_entropy(y_pred, y) # we can also use the binary cross entropy with logits if logits were returned
    
    def training_step(self, batch, batch_idx, optimizer_idx):
        
        # Recuperate the real ids and the masks from the batch
        real_ids, attention_mask = batch
        
        # Generate noisy data
        noisy_data = torch.randn(real_ids.size(0), self.generator.latent_dim, self.hparams.d_model)
        
        noisy_data = noisy_data.type_as(next(self.parameters()))
        
        if optimizer_idx == 0:
            
            # Generate fake ids
            fake_ids = self(noisy_data, attention_mask)
           
            # We consider that the fake ids are real
            y = torch.ones(real_ids.size(0), 1)
            
            y = y.type_as(next(self.parameters()))
            
            # Predict the veracity of the fake ids
            y_pred = self.discriminator(fake_ids)
            
            # Calculate the loss
            loss = self.adversarial_loss(y_pred, y)
            
            # Print the loss
            self.log("g_loss", loss, on_step=True, on_epoch=False)
            
            return loss
        
        elif optimizer_idx == 1:
            
            # Generate fake ids
            fake_ids = self(noisy_data, attention_mask)
          
            # We consider that the real ids as the true data
            y_true = torch.ones(real_ids.size(0), 1)
            
            y_true = y_true.type_as(next(self.parameters()))
            
            # Predict the veracity of the true data
            y_pred_true = self.discriminator(real_ids)
            
            if y_pred_true.ndim > 2:
                
                y_pred_true = y_pred_true.view(y_pred_true.size(0), 1)
            
            # Calculate the loss on the real ids
            real_loss = self.adversarial_loss(y_pred_true, y_true)
            
            # Consider the fake ids to be false
            y_false = torch.zeros(real_ids.size(0), 1)
            
            y_false = y_false.type_as(next(self.parameters()))
            
            # Predict the veracity of the false data
            y_pred_false = self.discriminator(fake_ids)
            
            # Calculate the loss on the fake ids
            fake_loss = self.adversarial_loss(y_pred_false, y_false)
            
            # Calculate the average loss
            loss = (real_loss + fake_loss) / 2
            
            # Print the loss
            self.log("d_loss", loss, on_step=True, on_epoch=False)
            
            return loss
    
    def configure_optimizers(self):
        
        G_LR = self.hparams.config['g_learning_rate']
        D_LR = self.hparams.config['d_learning_rate']
        
        opt_g = Adam(self.generator.parameters(), lr = G_LR)
        
        opt_d = Adam(self.discriminator.parameters(), lr = D_LR)
        
        return [opt_g, opt_d], []
    
    def on_train_epoch_end(self):
        
        if (self.current_epoch + 1) % 3 == 0:
            
            # recuperate the noisy data for prediction
            noisy_data = self.noisy_data.type_as(next(self.parameters()))
            
            generated_data = self(noisy_data).cpu().detach().long()
                
            generated_data = generated_data.tolist()
            
            print(f"\nGenerated sentences at epoch {self.current_epoch}")
            
            for data in generated_data:
                
                sentence = self.hparams.dataset.tokenizer.decode(data)
                
                print(sentence)
        
    def generate(self, number: int = 10):
        
        # Generate noisy data
        noisy_data = torch.randn(number, self.generator.latent_dim, self.hparams.d_model)
        
        noisy_data = noisy_data.type_as(next(self.parameters()))
        
        # decode and return the decode sentences
        generated_data = self(noisy_data).cpu().detach().long().tolist()
        
        return [self.hparams.tokenizer.decode(data) for data in generated_data]
    
    def apply_mask(self, input_: torch.Tensor, attention_mask: torch.Tensor):
        
        return input_.masked_fill(attention_mask == 0, 0)


### Training of the GAN and evaluation

Let us train the `GAN` Model. It will generate after each epoch 10 sentences. We will check if they are correct. It will require eventually many iterations so let us initialize the max number of epochs to 50 and increase it if necessary. 

Let us import all the necessary libraries.

In [18]:
from ray.tune.integration.pytorch_lightning import TuneReportCallback
from pytorch_lightning.loggers import TensorBoardLogger
from pytorch_lightning.trainer import Trainer
import ray.tune as tune


In [20]:
# Initialize the gan model
gan_model = SentenceGAN(dataset = dataset, config = {'g_learning_rate': 1e-5, 'd_learning_rate': 1e-5})

Let us initialize the tensor boar logger.

In [21]:

LOGGER = TensorBoardLogger(save_dir="gan_logs")

Let us configure the trainer. It will automatically save the parameters locally to make us continue the training at any time.

In [22]:
# this code will make us loading the checkpoints
# gan_model = SentenceGAN.load_from_checkpoint("data/checkpoints/generator/")

In [23]:

trainer = Trainer(
    logger=LOGGER,
    accelerator="gpu",
    max_epochs=300,
    devices=1 if torch.cuda.is_available() else None,
    log_every_n_steps=300,
    default_root_dir="data/checkpoints/generator/"
)

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs


Let us train the `GAN` Model.

In [24]:
BATCH_SIZE = 5

# trainer.fit(gan_model, train_dataloaders=DataLoader(dataset, batch_size=BATCH_SIZE, shuffle=True))

Let us use the pytorch lightning runer to search for the best parameters.

In [25]:
# let us create the trainer 
def train_generator(
    config, 
    dataset = dataset,
    num_epochs = 5,
    d_model: int = 512,
    latent_dim: Union[int, None] = None,
    g_num_features: int = 2048,
    n_heads: int = 8,
    g_dropout: float = 0.0,
    g_activation = F.relu,
    g_num_layers: int = 6,
    d_num_features: Union[int, list] = 500,
    d_num_layers: int = 3,
    d_negative_slope: float = 0.01,
    d_drop_out: float = 0,
    d_eps: float = 0.00001,
    d_momentum: float = 0.1
    ):
    
    gan_model = SentenceGAN(
        config, 
        dataset, 
        d_model, 
        latent_dim,
        g_num_features,
        n_heads,
        g_dropout,
        g_activation,
        g_num_layers,
        d_num_features,
        d_num_layers,
        d_negative_slope,
        d_drop_out,
        d_eps,
        d_momentum)
    
    loader = DataLoader(dataset, batch_size=10, shuffle=True)
    
    metrics = {'d_loss': 'd_loss', 'g_loss': 'g_loss'}
    
    trainer = Trainer(
        logger=LOGGER,
        max_epochs=num_epochs,
        accelerator="gpu",
        devices=1 if torch.cuda.is_available() else None,
        callbacks=[TuneReportCallback(metrics, on="training_end")]
    )
    
    trainer.fit(gan_model, loader)
    
    
    
    

Let us define a search space.

In [26]:
num_samples = 10

num_epochs = 5

config = {
    'g_learning_rate': tune.loguniform(1e-5, 1e-1),
    'd_learning_rate': tune.loguniform(1e-5, 1e-1)
}

trainable = tune.with_parameters(
    train_generator, dataset = dataset, num_epochs = num_epochs
)

analysis = tune.run(
    trainable,
    metric="loss",
    mode = 'min',
    resources_per_trial={
        'cpu': 1,
        'gpu': 1
    },
    config = config,
    num_samples=num_samples,
    name="tune_generator" 
)

print(analysis.best_config)

2023-04-24 17:00:48,894	INFO worker.py:1553 -- Started a local Ray instance.


[2m[33m(raylet)[0m C:\Python\Python310\python.exe: can't open file 'c:\\Users\\Oumar\\ Kane\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pytorch1-HleOW5am-py3.10\\lib\\site-packages\\ray\\_private\\workers\\default_worker.py': [Errno 2] No such file or directory
[2m[33m(raylet)[0m [2023-04-24 17:02:00,085 E 39400 26796] (raylet.exe) worker_pool.cc:525: Some workers of the worker process(38480) have not registered within the timeout. The process is dead, probably it crashed during start.
[2m[33m(raylet)[0m C:\Python\Python310\python.exe: can't open file 'c:\\Users\\Oumar\\ Kane\\AppData\\Local\\pypoetry\\Cache\\virtualenvs\\pytorch1-HleOW5am-py3.10\\lib\\site-packages\\ray\\_private\\workers\\default_worker.py': [Errno 2] No such file or directory
[2m[33m(raylet)[0m [2023-04-24 17:03:00,115 E 39400 26796] (raylet.exe) worker_pool.cc:525: Some workers of the worker process(9560) have not registered within the timeout. The process is dead, probably it crashed during start.
