 ## To use GPU in Google Colab,
 go to `Runtime` -> `Change runtime type` and select GPU.

In [None]:
# You may uncomment and use the command below to view info of the GPU.
# !nvidia-smi


 The following Python libraries are required for this part, and have been tested on Python 3.9 and Python 3.7.
 If you use Google Colab, PyTorch and SciPy are already installed, so you probably just want to install PyTorch Lightning.
  - [PyTorch](https://pytorch.org/get-started/locally/) (tested with 1.10.1 and with 1.10.0)
  - [PyTorch Lightning](https://pypi.org/project/pytorch-lightning/) (test with 1.5.8)
  - [SciPy](https://scipy.org/install/) (tested with 1.7.3 and with 1.4.1)


In [2]:
# You may uncomment the line below to install PyTorch Lightning on Google Colab.
!pip install pytorch-lightning==1.5.8

Collecting pytorch-lightning==1.5.8
  Downloading pytorch_lightning-1.5.8-py3-none-any.whl (526 kB)
[K     |████████████████████████████████| 526 kB 4.5 MB/s 
Collecting pyDeprecate==0.3.1
  Downloading pyDeprecate-0.3.1-py3-none-any.whl (10 kB)
Collecting future>=0.17.1
  Downloading future-0.18.2.tar.gz (829 kB)
[K     |████████████████████████████████| 829 kB 43.4 MB/s 
[?25hCollecting PyYAML>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 40.4 MB/s 
[?25hCollecting fsspec[http]!=2021.06.0,>=2021.05.0
  Downloading fsspec-2022.1.0-py3-none-any.whl (133 kB)
[K     |████████████████████████████████| 133 kB 42.9 MB/s 
Collecting torchmetrics>=0.4.1
  Downloading torchmetrics-0.7.1-py3-none-any.whl (397 kB)
[K     |████████████████████████████████| 397 kB 41.4 MB/s 
Collecting aiohttp
  Downloading aiohttp-3.8.1-cp37-cp37m-manylinux_2_5_x86_64.m

 You may uncomment and run the code cell below to download the data.  Otherwise, you may download the data [here](https://drive.google.com/file/d/1thWkUj7uGOApr_dXRvMr9TsEHpo_H_2q/view?usp=sharing). 

If you are running this in Google Colab, make sure to upload any files you generated from the a2_sklearn code (e.g. unigram_vocab.json) into the appropriate directory here in Google Colab so it is accessible.

In [None]:
# !pip install gdown
# !gdown --id 1thWkUj7uGOApr_dXRvMr9TsEHpo_H_2q -O sst2.zip
# !mkdir -p data
# !unzip sst2.zip -d data
# !rm sst2.zip



 You may use the helper function below for feature weight analysis (1.1.2 and 1.2.2.)

In [3]:
def print_important_weights(weights, words):
    """
    Print importtatn pairs of weights and words.
    # Parameters
    weights : `Iterable`, required.
        Weights from a learned model.
    words : `Iterable`, required.
        Word types of the vocabulary.  
        It must be true that `len(weights) == len(words)`.
    # Returns
        `None`
    """

    def print_pairs(pairs):
        for weight, word in pairs:
            print("{: .4f} | {}".format(weight, word))

    assert len(weights) == len(words)
    pairs = list(zip(weights, words))
    pairs = sorted(pairs, key=lambda x: x[0], reverse=True)
    print("Most positive words:")
    print_pairs(pairs[:10])
    print("\nMost negative words:")
    print_pairs(reversed(pairs[-10:]))

    pairs = list(zip(abs(weights), words))
    pairs = sorted(pairs, key=lambda x: x[0], reverse=False)
    print("\nMost neutral words:")
    print_pairs(pairs[:10])



 # PyTorch specific part
 ## 1.2.1 Build a Torch Logistic Regression Model
 Note that you will have to use files (of features and vocabularies) you created with `a2.sklearn` for the part below.
 You may reuse the code from there or just make sure the code below points to the right directory and files.

In [None]:
import argparse
from argparse import ArgumentParser
from datetime import datetime
import json
import logging
from pathlib import Path
import shutil
from typing import Dict, List, Tuple, Type

import numpy as np
import pytorch_lightning as pl
from pytorch_lightning import loggers as pl_loggers
from scipy import sparse
import torch
from torch import nn
from torch.utils.data.dataloader import DataLoader
from torch.utils.data.dataset import Dataset
from torchmetrics import Accuracy


class LogisticRegressionModel(nn.Module):
    """
    Logistic regression binary classification model
    """

    def __init__(self, num_features):
        """
        # Parameters
        num_features : `int`, required.
            Number of the features.
        # Returns
            `None`
        """
        super().__init__()
        # Hw-TODO: Add a linear layer to weight the features.
        #          You may assign the layer to `self.linear`.

    def forward(self, features):
        """
        Returns the logits of the model given features. 
        Note that model predictions should be either 0 or 1 based on a threshold.
        # Parameters
        features : `torch.FloatTensor`, required.
            The tensor of features with the shape (batch_size, num_of_features)
        # Returns
        probs : `torch.FloatTensor`, required.
            The tensor of probabilities with the shape (batch_size, 1) or (batch_size,)
        """
        # Hw-TODO: Use `self.linear` you created in `__init__`
        #          and appropriate nonlinearity/activation-function to compute
        #          and return the probabilities of belonging to a class in the logistic regression.

        return probs  # you will define this variable in the preceding code.


 ## Generic binary classifier as a Pytorch lightning module
 Run this cell and go to the logistic regression model to build the model.
 However, it may be useful for you to understand the next cell to understand how PyTorch Lightning works and get ready for your own project.

In [None]:
class BinaryClassificationLModule(pl.LightningModule):

    def __init__(self, **kwargs):
        super().__init__()

        # Save arguments to `hparams` attribute, see the doc [here](https://pytorch-lightning.readthedocs.io/en/stable/common/hyperparameters.html).
        self.save_hyperparameters()
        data_dir = Path(self.hparams.data_dir)
        # The path `data_dir.joinpath(self.hparams.vocab_filename)` should point to unigram_vocab.json that you have generated with your code from a2_sklearn.ipynb
        # You can configure the path through `args_str`.  See more info below where the class method `add_model_specific_args` is defined and where `args_str` is used.
        self.hparams.vocab = json.load(
            open(data_dir.joinpath(self.hparams.vocab_filename)))
        self.hparams.vocab_size = len(self.hparams.vocab)

        self.model = self.get_model()
        self.step_count = 0
        self.accuracy = Accuracy()

    def forward(self, *args, **kwargs):
        return self.model(*args, **kwargs)

    def training_step(self, batch, batch_idx):
        input = self.batch2input(batch)
        labels = self.batch2labels(batch)
        probs = self(**input)
        probs = probs.squeeze()

        # Hw-TODO: Given probs in shape (batch_size,)
        #          and labels of the same shape,
        #          compute the binary cross entropy loss.
        #          The `probs`, for example, can be from the function call of
        #          an instance of `LogisticRegressionModel` above.
        loss = 

        self.log('train_loss', loss, prog_bar=True)
        self.log('train_acc', self.accuracy(probs, labels.int()), prog_bar=True)
        output_dict = {'loss': loss}
        return output_dict

    def validation_step(self, batch, batch_idx):
        input = self.batch2input(batch)
        labels = self.batch2labels(batch)
        probs = self(**input)
        probs = probs.squeeze()

        # Hw-TODO: Given probs in shape (batch_size,)
        #          and labels of the same shape,
        #          compute the binary cross entropy loss.
        loss = 

        self.log('val_loss', loss)
        self.log('val_acc', self.accuracy(probs, labels.int()))

    def test_step(self, batch, batch_idx):
        input = self.batch2input(batch)
        labels = self.batch2labels(batch)
        probs = self(**input)
        probs = probs.squeeze()

        # Hw-TODO: Given probs in shape (batch_size,)
        #          and labels of the same shape,
        #          compute the binary cross entropy loss.
        loss = 

        self.log('test_loss', loss)
        self.log('test_acc', self.accuracy(probs, labels.int()))

    def configure_optimizers(self):
        if self.hparams.optimizer == 'sgd':
            optimizer = torch.optim.SGD(self.model.parameters(),
                                        lr=self.hparams.learning_rate)
        elif self.hparams.optimizer == 'adam':
            optimizer = torch.optim.Adam(self.model.parameters(),
                                         lr=self.hparams.learning_rate)
        else:
            raise NotImplementedError
        return optimizer

    def train_dataloader(self):
        return self.get_dataloader('train', self.hparams.train_batch_size, shuffle=True)

    def val_dataloader(self):
        return self.get_dataloader('dev', self.hparams.eval_batch_size, shuffle=False)

    def test_dataloader(self):
        return self.get_dataloader('test', self.hparams.eval_batch_size, shuffle=False)

    def get_model(self) -> nn.Module:
        # To be overridden by inherited classes.
        raise NotImplementedError

    def batch2input(self, batch: Tuple[torch.Tensor]) -> Dict[str, torch.Tensor]:
        # To be overridden by inherited classes.
        raise NotImplementedError

    def batch2labels(self, batch: Tuple[torch.Tensor]) -> torch.Tensor:
        # To be overridden by inherited classes.
        raise NotImplementedError

    def get_dataloader(self,
                       split: str,
                       batch_size: int,
                       shuffle: bool = False) -> DataLoader:
        # To be overridden by inherited classes.
        raise NotImplementedError

    @classmethod
    def add_model_specific_args(cls, parser: ArgumentParser) -> ArgumentParser:
        """
        Add arguments to the parser and return the parser.
        See (https://pytorch-lightning.readthedocs.io/en/stable/common/hyperparameters.html)
        for usage of this method.
        """
        # Required arguments:
        parser.add_argument('--vocab_filename',
                            default=None,
                            type=str,
                            required=True,
                            help="File name of the feature.")
        # Optional arguments:
        parser.add_argument('--optimizer',
                            default='adam',
                            type=str,
                            help="The optimizer to use, such as sgd or adam.")
        parser.add_argument('--learning_rate',
                            default=1e-3,
                            type=float,
                            help="The initial learning rate for training.")
        parser.add_argument('--max_epochs',
                            default=10,
                            type=int,
                            help="The number of epochs to train your model.")
        parser.add_argument('--train_batch_size', default=32, type=int)
        parser.add_argument('--eval_batch_size', default=32, type=int)
        parser.add_argument('--seed',
                            type=int,
                            default=42,
                            help="The random seed for initialization")
        parser.add_argument('--do_train',
                            action="store_true",
                            default=True,
                            help="Whether to run training.")
        parser.add_argument('--do_predict',
                            action="store_true",
                            help="Whether to run predictions on the test set.")
        parser.add_argument('--data_dir',
                            default="data",
                            type=str,
                            help="The input data dir. Should contain the training files.")
        parser.add_argument('--output_dir',
                            type=str,
                            help=("The output directory where the model predictions "
                                  "and checkpoints will be written."))
        # NOTE: Set --gpus 0 or change the default value to 0 if not using GPUS.
        # See this [link](https://pytorch-lightning.readthedocs.io/en/latest/accelerators/gpu.html) for usage of this argument.
        parser.add_argument('--gpus',
                            default=1,
                            type=int,
                            help="The number of GPUs allocated for this, 0 meaning none")
        parser.add_argument('--num_workers',
                            default=8,
                            type=int,
                            help="Config `DataLoader` of pytorch")
        return parser


def generic_train(args: argparse.Namespace,
                  model_class: Type[pl.LightningModule]) -> Dict:
    """
        Train (and optionally predict) and return dict results.
        # Parameters
        args : `argparse.Namespace`, required.
            Configuration of the training and the model
        model_class : `Type[pl.LightningModule]`, required.
            Class of the model to be trained.
        # Returns
        A `dict` object containing the following keys and types.
            trainer: `pl.Trainer`
            model: `pl.LightningModule`
            val_results_best: `list[dict]`
                If `args.do_predict==True`
            test_results_best: `list[dict]`
                If `args.do_predict==True`
            best_model_path: `Path`
                Path to the checkpoint of the best model.
        """
    pl.seed_everything(args.seed)

    tensorboard_log_dir = Path(args.output_dir).joinpath('tensorboard_logs')
    tensorboard_log_dir.mkdir(parents=True, exist_ok=True)

    # Tensorboard logger
    tensorboard_logger = pl_loggers.TensorBoardLogger(
        save_dir=tensorboard_log_dir,
        version='version_' + datetime.now().strftime('%Y%m%d-%H%M%S'),
        name='',
        default_hp_metric=True)
    # Checkpoint callback
    checkpoint_dir = Path(args.output_dir).joinpath(tensorboard_logger.version,
                                                    'checkpoints')
    checkpoint_callback = pl.callbacks.ModelCheckpoint(dirpath=checkpoint_dir,
                                                       filename='{epoch}-{val_acc:.2f}',
                                                       monitor='val_acc',
                                                       mode='max',
                                                       save_top_k=1,
                                                       verbose=True)

    dict_args = vars(args)
    model = model_class(**dict_args)
    trainer = pl.Trainer.from_argparse_args(args,
                                            logger=tensorboard_logger,
                                            callbacks=[checkpoint_callback])

    output_dict = {'trainer': trainer, 'model': model}

    if args.do_train:
        trainer.fit(model=model)
        # Track model performance under differnt hparams settings in "Hparams" of TensorBoard
        tensorboard_logger.log_hyperparams(
            params=model.hparams,
            metrics={'hp_metric': checkpoint_callback.best_model_score.item()})
        tensorboard_logger.save()

        # Save the best model to `best_model.ckpt`
        best_model_path = checkpoint_dir.joinpath('best_model.ckpt')
        logger.info(f"Copy best model from {checkpoint_callback.best_model_path} "
                    f"to {best_model_path}.")
        shutil.copy(checkpoint_callback.best_model_path, best_model_path)

        output_dict.update({
            'trainer': trainer,
            'model': model,
            'best_model_path': best_model_path
        })

    # Optionally, predict on test set.
    if args.do_predict:
        best_model_path = checkpoint_dir.joinpath('best_model.ckpt')
        model = model.load_from_checkpoint(best_model_path)
        val_results_best = trainer.validate(model, verbose=True)
        test_results_best = trainer.test(model, verbose=True)
        print("Validation accuracy on the best model: {: .4f}".format(
            val_results_best[0]['val_acc']))
        print("Test       accuracy on the best model: {: .4f}".format(
            test_results_best[0]['test_acc']))
        output_dict.update({
            'val_results_best': val_results_best,
            'test_results_best': test_results_best,
        })

    return output_dict



 ## Binary classifier based on curated features.
 This is a subclass of the generic `BinaryClassificationLModule` defined above.

In [None]:
class FeatureBasedBinaryClassificationLModule(BinaryClassificationLModule):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def get_model(self) -> nn.Module:
        return LogisticRegressionModel(num_features=self.hparams.vocab_size)

    def batch2input(self, batch):
        return {'features': batch[0]}

    def batch2labels(self, batch):
        return batch[1]

    def get_dataloader(self,
                       split: str,
                       batch_size: int,
                       shuffle: bool = False) -> DataLoader:
        # NOTE: In order to use different features, change feature_name by
        # passing `--feature_name <feature_name>` in the training loop in
        # the cell below, or revise the code here for correct paths if needed.
        data_dir = Path(self.hparams.data_dir)
        features_filepath = data_dir.joinpath(
            f"{split}_{self.hparams.feature_name}_features.npz")
        labels_filepath = data_dir.joinpath(split + "_labels.npz")
        features = sparse.load_npz(features_filepath).todense()
        labels = np.load(labels_filepath, allow_pickle=True)["arr_0"]
        dataset = torch.utils.data.TensorDataset(
            torch.from_numpy(features).float(),
            torch.from_numpy(labels).float())

        logger.info(f"Loading {split} features and labels "
                    f"from {features_filepath} and {labels_filepath}")
        data_loader = torch.utils.data.DataLoader(dataset=dataset,
                                                  batch_size=batch_size,
                                                  shuffle=shuffle,
                                                  num_workers=self.hparams.num_workers)
        return data_loader

    @classmethod
    def add_model_specific_args(cls, parser: ArgumentParser) -> ArgumentParser:
        parser = super().add_model_specific_args(parser)
        # Required arguments:
        parser.add_argument('--feature_name',
                            default=None,
                            type=str,
                            required=True,
                            help="Name of the feature")
        # Optional arguments:
        parser.add_argument('--task',
                            default='featurebinarycls',
                            type=str,
                            help="Name of the task.")
        return parser



 # Training loop for the feature-based model of binary logistic regression
 You should replace `unigram_binary` in the assignment statement of `args_str =...`
 with whatever feature that you are experimented with.
 You can also configurate other options listed in the method of add_model_specific_args of
 the pytorch-lightning model `BinaryClassificationLModule`.

In [None]:
logging.basicConfig(format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
                    datefmt="%Y-%m-%d %H:%M:%S",
                    level=logging.INFO)
logger = logging.getLogger(__name__)

# Load hyperparameters
parser = ArgumentParser()
parser = FeatureBasedBinaryClassificationLModule.add_model_specific_args(parser)

# IMPORTANT: here we reuse the unigram_vocab.json and the feature files generated from a2_sklearn.ipynb
# you can read the get_dataloader function in FeatureBasedBinaryClassificationLModule 
# to understand how the data processing is handled
# NOTE: You can replace `unigram_binary` in the assignment statement of `args_str =...`
# with whatever feature that you are experimented with.
# You can also configure other options listed in the method of add_model_specific_args of
# the pytorch-lightning model `FeatureBasedBinaryClassificationLModule`.
args_str = ("--vocab_filename unigram_vocab.json --feature_name unigram_binary "
            "--output_dir output/ftrlogistic --optimizer adam --do_train --do_predict ")

args = parser.parse_args(args_str.split())

# If output_dir not provided, a folder is generated
if args.output_dir is None:
    args.output_dir = str(
        Path('output').joinpath(
            f"{args.task}_{datetime.now().strftime('%Y%m%d-%H%M%S')}"))
Path(args.output_dir).mkdir(parents=True, exist_ok=True)

print(f"Parsed arguments: {args}")

training_outout = generic_train(args=args,
                                model_class=FeatureBasedBinaryClassificationLModule)


 ## 1.2.2. Feature weight analysis

In [None]:
model = training_outout['model']
best_model_path = training_outout['best_model_path']
data_dir = Path('data')

model = model.load_from_checkpoint(best_model_path)
weights = model.model.linear.weight.squeeze().detach().numpy()
vocab = json.load(open(data_dir.joinpath('unigram_vocab.json')))
print_important_weights(weights=weights, words=vocab.keys())


 # View modeling training curves

In [None]:
# You may uncomment and run the commands below to use Tensorboard in a notebook.
# %load_ext tensorboard
# %tensorboard --logdir output/ftrlogistic


 ## 3. Deep Averaging Networks (DAN)
 ### Build a Torch Model of Deep Averaging Networks (DAN)

In [None]:
class DeepAveragingNetworksModel(nn.Module):

    def __init__(self,
                 vocab,
                 vocab_size: int,
                 word_embedding_size: int,
                 hidden_size: int,
                 num_intermediate_layers: int,
                 dropout_rate: float,
                 use_glove: bool = False):
        """
        # Parameters
        vocab : `dict[str, int]`, required.
            A map from the word type to the index of the word.
        vocab_size : `int`, required.
            Size of the vocabulary.
        word_embedding_size : `int`, required.
            Size of word embeddings.
        hidden_size : `int`, required.
            Size of hidden layer or number of hidden units per layer.
        num_intermediate_layers : `int`, required.
            Number of intermediate layers, the arg takes 0 or greater integers.
        dropout_rate : `float`, required.
            Dropout rate.
        use_glove : `bool`, optional.
            Whether or not to use Glove embeddings instead of randomly initialized ones.
        """
        super().__init__()
        # Return zero vector for input with padding_idx (0)
        self.embedding = nn.Embedding(vocab_size, word_embedding_size, padding_idx=0)
        if use_glove:
            self.load_glove(vocab, word_embedding_size)

        # Hw-TODO: Add the intermediate layers, output layer, dropout layer,
        #          and activation function according to DAN.
        #          You may find [nn.Modulelist](https://pytorch.org/docs/stable/generated/torch.nn.ModuleList.html)
        #          useful to have multiple intermediate layers.

    def forward(self, input_ids, lengths):
        """
        # Parameters
        input_ids : `torch.Tensor`, required.
            Tensor of shape (batch_size, feature_length).
            Each row is a datapoint represented by input words.
        lengths: `torch.Tensor`, required.
            Tensor of shape (batch_size, 1). Token length of input text.
            Used to compute average word embeddings.
        # Returns
        probs : `torch.Tensor`
            Tensor of shape (batch_size)
        """
        out = self.embedding(input_ids)  # shape: (batch_sz, max_len, embedding_sz)

        # Hw-TODO: Use the intermediate layers, output layer, dropout layer,
        #          and activation function you created in __init__
        #          and other appropriate non-linearity for the output layer
        #          to compute the probabilies of a class, assign these probabilities
        #          to a variable named "probs".

        return probs # you will define this variable in the preceding code.

    def load_glove(self, vocab, word_embedding_size):
        logger.info("Load glove pretrained word embeddings")
        # Hw-TODO: [extra credit] Load glove onto self.embeddings
        #          you may find [load_state_dict](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.load_state_dict) useful.



 ## Binary classifier based on curated features.
 `DeepAveragingBinaryClassificationLModule` is subclass of the generic `BinaryClassificationLModule`.

In [None]:
from nltk.tokenize import WordPunctTokenizer


class SST2Dataset(Dataset):
    """
    Using dataset to process input text on-the-fly
    """

    def __init__(self, vocab, data, tokenizer):
        self.data = data
        self.vocab = vocab
        self.max_len = 50  # assigned based on length analysis of training set
        self.tokenizer = tokenizer

    def __getitem__(self, index):
        note = []
        label, text = int(self.data[index][0]), self.data[index][1]
        tokens = self.tokenizer.tokenize(text.lower())
        # If word does not exist, give <unk> token id
        token_ids = [self.vocab.get(t, 1) for t in tokens]
        length = min(len(token_ids), self.max_len)
        # Truncate or pad to max length
        padded_token_ids = token_ids[:50] + [0] * (self.max_len - length)
        return padded_token_ids, length, label

    def collate_fn(self, batch_data):
        padded_token_ids, lengths, labels = list(zip(*batch_data))
        return (
            torch.LongTensor(padded_token_ids).view(-1, self.max_len),
            torch.FloatTensor(lengths).view(-1, 1),
            torch.FloatTensor(labels).view(-1, 1),
        )

    def __len__(self):
        return len(self.data)


class DeepAveragingBinaryClassificationLModule(BinaryClassificationLModule):

    def __init__(self, **kwargs):
        super().__init__(**kwargs)

    def get_model(self) -> nn.Module:
        return DeepAveragingNetworksModel(
            vocab=self.hparams.vocab,
            vocab_size=self.hparams.vocab_size,
            word_embedding_size=self.hparams.word_embedding_size,
            hidden_size=self.hparams.hidden_size,
            num_intermediate_layers=self.hparams.num_intermediate_layers,
            dropout_rate=self.hparams.dropout_rate,
            use_glove=self.hparams.use_glove)

    def batch2input(self, batch):
        return {'input_ids': batch[0], 'lengths': batch[1]}

    def batch2labels(self, batch):
        return batch[2].squeeze()

    def get_dataloader(self, split, batch_size, shuffle=False) -> DataLoader:
        data_dir = Path(self.hparams.data_dir)
        datapath = data_dir.joinpath(f"sst2.{split}")
        data = open(datapath).readlines()
        data = [d.strip().split(" ", maxsplit=1) for d in data
               ]  # list of [label, text] pair
        dataset = SST2Dataset(vocab=self.hparams.vocab,
                              data=data,
                              tokenizer=WordPunctTokenizer())

        logger.info(f"Loading {split} data and labels from {datapath}")
        data_loader = DataLoader(dataset=dataset,
                                 batch_size=batch_size,
                                 shuffle=shuffle,
                                 num_workers=self.hparams.num_workers,
                                 collate_fn=dataset.collate_fn)

        return data_loader

    def configure_optimizers(self):
        if self.hparams.optimizer == 'sgd':
            optimizer = torch.optim.SGD(self.model.parameters(),
                                        lr=self.hparams.learning_rate)
        elif self.hparams.optimizer == 'adam':
            optimizer = torch.optim.Adam(self.model.parameters(),
                                         lr=self.hparams.learning_rate)
        else:
            raise NotImplementedError
        # Hw-TODO: Add more optimizers and experiment with at least 2
        #          optimizers other than vanilla SGD.
        #          You can configure which optimizer to use by modifying
        #          args_str or args passted to the function generic_train.
        return optimizer

    @classmethod
    def add_model_specific_args(cls, parser: ArgumentParser) -> ArgumentParser:
        parser = super().add_model_specific_args(parser)

        # Required arguments
        parser.add_argument('--num_intermediate_layers',
                            type=int,
                            help="number of intermediate layers")
        # Optional arguments
        parser.add_argument('--dropout_rate',
                            default=0.5,
                            type=float,
                            help="Dropout rate")
        parser.add_argument('--word_embedding_size',
                            default=300,
                            type=int,
                            help="Size of word embeddings")
        parser.add_argument('--hidden_size',
                            default=300,
                            type=int,
                            help="Size of hidden layer")
        parser.add_argument('--use_glove',
                            action="store_true",
                            help="Whether to run predictions on the test set.")
        parser.add_argument('--task',
                            default='danbinarycls',
                            type=str,
                            help="Name of the task.")
        return parser



 # Training loop for the feature-based model of deep averaging networks.
 You can configurate other options listed in the method of add_model_specific_args of
 the pytorch-lightning model `BinaryClassificationLModule`.
 The example below trains with vanilla SGD.

In [None]:
logging.basicConfig(format="%(asctime)s - %(levelname)s - %(name)s - %(message)s",
                    datefmt="%Y-%m-%d %H:%M:%S",
                    level=logging.INFO)
logger = logging.getLogger(__name__)

# Load hyperparameters
parser = ArgumentParser()
parser = DeepAveragingBinaryClassificationLModule.add_model_specific_args(parser)

# NOTE: You should replace --optimizer <optimizer> with the name of the optimizer
# with which you are experimenting with, and the same goes for word_embedding_size.
# You can also configure other options listed in the method of add_model_specific_args of
# the pytorch-lightning module `DeepAveragingBinaryClassificationLModule`.
args_str = ("--vocab_filename unigram_vocab.json "
            "--optimizer sgd --num_intermediate_layers 1 "
            "--output_dir output/dan  --do_train --do_predict ")
args = parser.parse_args(args_str.split())

# If output_dir not provided, a folder is generated
if args.output_dir is None:
    args.output_dir = str(
        Path('output').joinpath(
            f"{args.task}_{datetime.now().strftime('%Y%m%d-%H%M%S')}"))
Path(args.output_dir).mkdir(parents=True, exist_ok=True)

print(f"Parsed arguments: {args}")

training_outout = generic_train(args=args,
                                model_class=DeepAveragingBinaryClassificationLModule)


 Another example with Adam optimizer and 2 hidden layers

In [None]:
# Load hyperparameters
parser = ArgumentParser()
parser = DeepAveragingBinaryClassificationLModule.add_model_specific_args(parser)

# NOTE: You should replace --optimizer <optimizer> with the name of the optimizer
# with which you are experimenting with, and the same goes for word_embedding_size.
# You can also configure other options listed in the method of add_model_specific_args of
# the pytorch-lightning module `DeepAveragingBinaryClassificationLModule`.
args_str = ("--vocab_filename unigram_vocab.json "
            "--optimizer adam --num_intermediate_layers 2 "
            "--output_dir output/dan  --do_train --do_predict ")
args = parser.parse_args(args_str.split())

# If output_dir not provided, a folder is generated
if args.output_dir is None:
    args.output_dir = str(
        Path('output').joinpath(
            f"{args.task}_{datetime.now().strftime('%Y%m%d-%H%M%S')}"))
Path(args.output_dir).mkdir(parents=True, exist_ok=True)

print(f"Parsed arguments: {args}")

training_outout = generic_train(args=args,
                                model_class=DeepAveragingBinaryClassificationLModule)


In [None]:
# You may uncomment and run the commands below to use Tensorboard in a notebook.
# %reload_ext  tensorboard
# %tensorboard --logdir output/dan
