<a href="https://colab.research.google.com/github/Nabarup-Maity/Project/blob/master/dl_ii_cw_sentiment_analysis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Environment Setup (Python)


In [None]:
# Download and install conda colab
!pip install -q condacolab
import condacolab
condacolab.install()

[0m✨🍰✨ Everything looks OK!


In [None]:
# Write an environment file to use for installations and library versions
%%writefile environment.yml
channels:
  - default
  - conda-forge
  - pytorch
dependencies:
  - pip>20.1
  - python>=3.7,<3.9
  - pytorch=1.8.0
  - torchtext=0.9
  - pip:
    - torch==1.8
    - transformers==4.11.0
    - torchmetrics==0.5.1
    - spacy==3.1.3
    - pandas==1.3.3
    - pytorch_lightning==1.4.8
    - azureml-core>=1.31.0
    - azureml-mlflow>=1.31.0
    - tqdm>=4.59,<4.60
    - matplotlib==3.4.3
    - azureml-train-core

Overwriting environment.yml


In [None]:
# Use conda to install everything based on our env file
!conda env update -n base -f environment.yml

Collecting package metadata (repodata.json): - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ done
Solving environment: / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - done


  current version: 4.9.2
  latest version: 4.12.0

Please update conda by running

    $ conda update -n base conda


Installing pip dependencies: | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / - \ | / 

# Local Training

## Data Exploration

In [None]:
import pytorch_lightning
import condacolab
condacolab.check()

✨🍰✨ Everything looks OK!


In [None]:
import warnings
warnings.filterwarnings('ignore')

import torch
import random
import numpy as np
SEED = 1924

In [None]:
import torchtext
from torchtext.legacy.data import Field
from torchtext.legacy.datasets import IMDB

# Import dataset
text = Field(batch_first=True, include_lengths=False)
label = Field(batch_first=True)
train, test = IMDB.splits(text, label)

# https://pytorch.org/text/stable/datasets.html#imdb
assert isinstance(train, torchtext.legacy.datasets.imdb.IMDB)
assert isinstance(test, torchtext.legacy.datasets.imdb.IMDB)

downloading aclImdb_v1.tar.gz


aclImdb_v1.tar.gz: 100%|██████████| 84.1M/84.1M [00:08<00:00, 9.78MB/s]


In [None]:
import random

# Print some examples of positive and negative reviews
examples = list(train.examples)
random.shuffle(examples)
examples = examples[:8]

for example in examples:
    print(example.label)
    print(' '.join(example.text)[:300])
    print("-----")

['pos']
This movie is a classic in every sense of the word. It is very entertaining and also very disturbing. The acting in this movie is well done. The story itself is believable, suspenseful, and well thought out. Character development is also done well, the audience can clearly see how each of the charac
-----
['neg']
I was thirteen years old, when I saw this movie. I expected a lot of action. Since Escape From New York was 16-rated in Germany I entered the movie as fallback. It was so boring. Afterwards I realized that this was just crap where a husband exhibits his wife. I mean today you do this via internet an
-----
['neg']
Christopher Guest need not worry, his supreme hold on the Mockumentary sub-genre is not in trouble of being upstaged in the least especially not by this extremely unfunny jab at RPG-gamers. The jokes are beyond lame. Not enough substance to last the typical length of a (particularly rancid) SNL skit
-----
['neg']
I saw this film opening weekend in Australia, a

## Data Prepration

In [None]:
from transformers import BertTokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

# Identify special tokens
print("Start Token:", tokenizer.cls_token, tokenizer.cls_token_id)
print("End Token:", tokenizer.sep_token, tokenizer.sep_token_id)
print("Padding Token:", tokenizer.pad_token, tokenizer.pad_token_id)
print("Out of Vocab Token:", tokenizer.unk_token, tokenizer.unk_token_id)

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/570 [00:00<?, ?B/s]

Start Token: [CLS] 101
End Token: [SEP] 102
Padding Token: [PAD] 0
Out of Vocab Token: [UNK] 100


In [None]:
# View an example
document = 'Hello InterviewKickstart Class, Welcome to Week 5!! '
tokens = tokenizer.tokenize(document)
print(f"Text:\n -->{document}")
print(f"Tokens:\n --> {', '.join(tokens)}")
print(f"Encoded:\n --> {tokenizer.encode(document)}")

Text:
 -->Hello InterviewKickstart Class, Welcome to Week 5!! 
Tokens:
 --> hello, interview, ##kic, ##ks, ##tar, ##t, class, ,, welcome, to, week, 5, !, !
Encoded:
 --> [101, 7592, 4357, 29493, 5705, 7559, 2102, 2465, 1010, 6160, 2000, 2733, 1019, 999, 999, 102]


In [None]:
# View text, tokens, and encoded representationn side by side
document = ' '.join(examples[0].text[:100])
tokens = tokenizer.tokenize(document)
print(f"Text:\n -->{document}")
print(f"Tokens:\n --> {', '.join(tokens)}")
print(f"Encoded:\n --> {tokenizer.encode(document)}")

Text:
 -->This movie is a classic in every sense of the word. It is very entertaining and also very disturbing. The acting in this movie is well done. The story itself is believable, suspenseful, and well thought out. Character development is also done well, the audience can clearly see how each of the characters is emotionally tested through this film. The villains in this movie are very threatening, from the first moment the audience sees them they can tell that they are up to something. This movie shows how a human being, when taken from civilization and put in the middle
Tokens:
 --> this, movie, is, a, classic, in, every, sense, of, the, word, ., it, is, very, entertaining, and, also, very, disturbing, ., the, acting, in, this, movie, is, well, done, ., the, story, itself, is, bel, ##ie, ##vable, ,, suspense, ##ful, ,, and, well, thought, out, ., character, development, is, also, done, well, ,, the, audience, can, clearly, see, how, each, of, the, characters, is, emotionally, test

## Environment Setup

In [None]:
# Define a feature field to store the values
MAX_SEQ_LEN = 128
text_field = Field(batch_first = True,
                  use_vocab = False,
                  tokenize = tokenizer.encode,
                  include_lengths=False,
                  init_token = tokenizer.cls_token_id,
                  eos_token = tokenizer.sep_token_id,
                  pad_token = tokenizer.pad_token_id,
                  unk_token = tokenizer.unk_token_id,
                  fix_length=MAX_SEQ_LEN)
label_field = Field(dtype=torch.long, batch_first=True)

In [None]:
# This will take a while
trainval, test_data = IMDB.splits(text_field, label_field)
train_data, valid_data = trainval.split(random_state = random.seed(SEED))

Token indices sequence length is longer than the specified maximum sequence length for this model (692 > 512). Running this sequence through the model will result in indexing errors


In [None]:
print(f"Number of training examples: {len(train_data)}")
print(f"Number of validation examples: {len(valid_data)}")
print(f"Number of testing examples: {len(test_data)}")

Number of training examples: 17500
Number of validation examples: 7500
Number of testing examples: 25000


In [None]:
print("Tokenized:" , train_data.examples[3].text)
print("Label:", train_data.examples[3].label)
print("Decoded:", tokenizer.convert_ids_to_tokens(train_data.examples[3].text))

Tokenized: [101, 2023, 3185, 2001, 1037, 2428, 2307, 17312, 2055, 2242, 2008, 13531, 2149, 2035, 1012, 1045, 2113, 1045, 1005, 2310, 7714, 2448, 2046, 2023, 2116, 2335, 1012, 4067, 15003, 2097, 3044, 2038, 5598, 3031, 1996, 23382, 3277, 1997, 3793, 24732, 2096, 4439, 1012, 2111, 1010, 2123, 1005, 1056, 2079, 2009, 1012, 2019, 3178, 1998, 5659, 2274, 2781, 2003, 2025, 2438, 2051, 2005, 2023, 3426, 1012, 7714, 1010, 1045, 2359, 2000, 5466, 2185, 2026, 3526, 3042, 2044, 1996, 3185, 1012, 1045, 2001, 5580, 2000, 2156, 2060, 2111, 1999, 1996, 4258, 2387, 1996, 4471, 1998, 14019, 2037, 11640, 2007, 1996, 4064, 8641, 1997, 24593, 1012, 1045, 2787, 2000, 4487, 19150, 2035, 3793, 24732, 2006, 2026, 3042, 1998, 2052, 8627, 2500, 2000, 2079, 1996, 2168, 1012, 2065, 2017, 2729, 2055, 2115, 2155, 1010, 2191, 2068, 3422, 2023, 8995, 2270, 2326, 8874, 2006, 3793, 24732, 2096, 4439, 2030, 2027, 2071, 3102, 2698, 2111, 1012, 4283, 2005, 4760, 2149, 1996, 2126, 1012, 102]
Label: ['pos']
Decoded: ['[CLS]

In [None]:
# View our output class field encodings
label_field.build_vocab(train_data)
print(dict(label_field.vocab.stoi))

{'<unk>': 0, '<pad>': 1, 'pos': 2, 'neg': 3}


## Model Implementation and Training Loop

using Pytorch + Pytorch Lightning

In [None]:
from transformers import BertModel
bert = BertModel.from_pretrained('bert-base-uncased')

Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.transform.LayerNorm.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


In [None]:
from torch import nn
import pytorch_lightning as pl
from torchtext.legacy.data import BucketIterator

class BertModule(pl.LightningModule):
    def __init__(
        self,
        bert: BertModel,

        train_data = None,
        valid_data = None,

        # optimization
        batch_size = 64,

        # Architecture
        num_hidden=64,
        output_dim=3,
        num_layers=2,
        bidirectional=True,

        # Regularization
        dropout=0.1,
    ):
        super().__init__()
        # Define the BERT and RNN models
        self.bert = bert
        self.rnn = nn.GRU(
            bert.config.hidden_size,
            num_hidden,
            num_layers=num_layers,
            bidirectional=bidirectional,
            batch_first=True,
            dropout=dropout,
        )

        self.fc = nn.Linear(num_hidden * 2 if bidirectional else num_hidden, output_dim)
        self.dropout = nn.Dropout(dropout)

        self.loss_fn = nn.CrossEntropyLoss()

        self.train_iter, self.valid_iter = BucketIterator.splits((train_data, valid_data), batch_size=batch_size)

        self.batch_size = batch_size

    # Required method
    def forward(self, text):
        outputs = self.bert.forward(text)

        _, hidden = self.rnn(outputs.last_hidden_state)

        if self.rnn.bidirectional:
            hidden = self.dropout(torch.cat((hidden[-2,:,:], hidden[-1,:,:]), dim = 1))
        else:
            hidden = self.dropout(hidden[-1,:,:])

        return self.fc(hidden)

    # Required method
    def configure_optimizers(self):
        return torch.optim.Adam(self.parameters(), lr=0.01)

    # Required method
    def training_step(self, batch, batch_index):
        text = batch.text
        label = batch.label.squeeze()
        pred = self.forward(text)
        loss = self.loss_fn(pred, label.long())
        return loss

    # Getter method
    def train_dataloader(self):
        return self.train_iter

    # Getter method
    def val_dataloader(self):
        return self.valid_iter

## Training

In [None]:
from pytorch_lightning.loggers import TensorBoardLogger

# Instantiate the model we just defined
model = BertModule(bert, train_data, valid_data, output_dim=len(label_field.vocab.stoi))

# Instantiate logger
logger = TensorBoardLogger('train_logs', name='model1')

trainer = pl.Trainer(
    logger=logger,
    max_epochs=10,
)

In [None]:
%load_ext tensorboard
%tensorboard --logdir ./train_logs
# Docs: https://www.tensorflow.org/tensorboard

In [None]:
# This will take a
trainer.fit(model)

How to test better locally:
- Lower training/test data size
- Lower number of parameters for the model to learn (layers, neurons per layer, etc...)
- Lower number of epochs
- Lower batch size

Or, use cloud-backed GPU training!

# Azure ML Backed Hyper Experimention

## Azure Workspace Setup

## Environment Auth

In [None]:
from azureml.core import Workspace
from azureml.core.authentication import InteractiveLoginAuthentication

# INSERT YOUR OWN AZURE CREDENTIALS HERE AND UNCOMMENT

# interactive_auth = InteractiveLoginAuthentication(tenant_id="aaaaaaa-bbbbb-1212-fdsa3-fdsafdsafda")
# subscription_id = "aaaaaaa-bbbbb-1212-fdsa3-fdsafdsafda"
# resource_group = "IKLearning"
# workspace_name = 'AIML-Module-5'

# "Interactive authentication successfully completed.""

ws = Workspace(subscription_id, resource_group, workspace_name, interactive_auth)
ws.write_config()

ws = Workspace.from_config()

ws.name

'AIML-Module-5'

## Single Trial (Debug) Run

In [None]:
# Make our project directory
!mkdir -p ./project/

In [None]:
%%writefile project/environment.yml
channels:
  - default
  - conda-forge
  - pytorch
dependencies:
  - pip>20.1
  - python>=3.7,<3.9
  - pytorch=1.8.0
  - torchtext=0.9
  - pip:
      - torch==1.8
      - transformers==4.11.0
      - torchmetrics==0.5.1
      - spacy==3.1.3
      - pandas==1.3.3
      - pytorch_lightning==1.4.8
      - azureml-core>=1.31.0
      - azureml-mlflow>=1.31.0
      - tqdm>=4.59,<4.60
      - matplotlib==3.4.3

Overwriting project/environment.yml


In [None]:
%%writefile project/train.py

import sys
from pytorch_lightning.callbacks.lr_monitor import LearningRateMonitor
import torch
import random
import pytorch_lightning as pl
from pytorch_lightning.callbacks import BaseFinetuning
import torchtext
import torchmetrics
import transformers
from torchtext.legacy.data import Field, BucketIterator
from torchtext.legacy.datasets import IMDB
from transformers import BertTokenizer, BertModel
from torchmetrics import Accuracy, MetricCollection
from torch import nn
import argparse

print("==" * 10)
print("Python version:", sys.version)
print("Torch Version:", torch.__version__)
print("Pytorch Lightning Version:", pl.__version__)
print("torchtext Version:", torchtext.__version__)
print("torchmetrics Version:", torchmetrics.__version__)
print("transformers Version:", transformers.__version__)
print("==" * 10)

SEED = 1924

# Define a class to stop training and freeze the current parameters in our model
class FreezeBert(BaseFinetuning):
    def __init__(self):
        super().__init__()

    def freeze_before_training(self, pl_module):
        self.freeze(pl_module.bert)

    def finetune_function(self, pl_module, current_epoch, optimizer, optimizer_idx):
        pass

# Define our model - similar to before, but different
class BertModule(pl.LightningModule):
    def __init__(
        self,
        bert="bert-base-uncased",

        # Optimization
        batch_size=64,
        learning_rate=0.01,
        dropout=0.1,
        freeze_bert=True,

        # Architecture
        num_hidden=256,
        num_classes=3,
        num_layers=2,
        bidirectional=True,

        # Encoding
        max_seq_len=128,
    ):
        super().__init__()
        # Setup bert model and tokenizer
        self.bert = BertModel.from_pretrained(bert)
        self.tokenizer = BertTokenizer.from_pretrained(bert)

        # use simple RNN on bert output
        self.rnn = nn.GRU(
            self.bert.config.hidden_size,
            num_hidden,
            num_layers=num_layers,
            bidirectional=bidirectional,
            batch_first=True,
            dropout=dropout,
        )

        # Score Layer
        self.fc = nn.Linear(
            num_hidden * 2 if bidirectional else num_hidden, num_classes
        )

        # Loss
        self.loss_fn = nn.CrossEntropyLoss()

        # Save Optimizationa and data params
        self.dropout = nn.Dropout(dropout)
        self.batch_size = batch_size
        self.learning_rate = learning_rate
        self.freeze_bert = freeze_bert
        self.max_seq_len = max_seq_len

        # Setup Metrics
        metrics = MetricCollection(
            [
                Accuracy(),
            ]
        )
        self.train_metrics = metrics.clone(prefix="train/")
        self.valid_metrics = metrics.clone(prefix="val/")

        self.save_hyperparameters()

    # Required method
    def forward(self, text):
        outputs = self.bert.forward(text)
        _, hidden = self.rnn(outputs.last_hidden_state)

        if self.rnn.bidirectional:
            hidden = self.dropout(
                torch.cat((hidden[-2, :, :], hidden[-1, :, :]), dim=1)
            )
        else:
            hidden = self.dropout(hidden[-1, :, :])

        return self.fc(hidden)

    # Required method
    def configure_callbacks(self):
        cb = super().configure_callbacks()

        # Add our FreezeBert() class as a custom callback here
        if self.freeze_bert:
            cb.append(FreezeBert())

        return cb

    # Required method
    def configure_optimizers(self):

        # lambda to turn each value into a T/F value for the filter function
        optimizer = torch.optim.Adam(
            filter(lambda p: p.requires_grad, self.parameters()), lr=self.learning_rate
        )

        # define a learning rate scheduler to finely control adjustments
        lr_scheduler = {
            "scheduler": torch.optim.lr_scheduler.ReduceLROnPlateau(
                optimizer,
                mode="min",
                factor=0.1,
                cooldown=0,
                patience=2,
                min_lr=1e-9,
            ),
            "monitor": "val/loss",
            "strict": True,
        }

        return {"optimizer": optimizer, "lr_scheduler": lr_scheduler}

    # Required method
    def training_step(self, batch, batch_index):
        text = batch.text
        label = batch.label.squeeze()
        pred = self.forward(text)
        loss = self.loss_fn(pred, label.long())

        output = self.train_metrics(pred, label)
        self.log_dict(output)
        self.log("train/loss", loss)

        return loss

    # Required method
    def validation_step(self, batch, batch_index):
        text = batch.text
        label = batch.label.squeeze()
        pred = self.forward(text)
        loss = self.loss_fn(pred, label.long())

        output = self.valid_metrics(pred, label)
        self.log_dict(output, on_step=False, on_epoch=True)
        self.log("val/loss", loss)

        return loss

    # Getter method - like our previous model
    def train_dataloader(self):
        return self.train_iter

    # Getter method - like our previous model
    def val_dataloader(self):
        return self.valid_iter

    # Not in previous model - useful for keeping everything together in remote trainng
    # Includes the steps we did cell-by-cell earlier, but now in one method
    # Can be broken up further, into their own class/methods for better organization
    # Cleanliness can be very helpful with experimentation
    def prepare_data(self) -> None:

        tokenizer = self.tokenizer
        text_field = Field(
            batch_first=True,
            use_vocab=False,
            tokenize=tokenizer.encode,
            include_lengths=False,
            init_token=tokenizer.cls_token_id,
            eos_token=tokenizer.sep_token_id,
            pad_token=tokenizer.pad_token_id,
            unk_token=tokenizer.unk_token_id,
            fix_length=self.max_seq_len,
        )
        label_field = Field(dtype=torch.long, batch_first=True)

        # this will take a while
        trainval, test_data = IMDB.splits(text_field, label_field)
        train_data, valid_data = trainval.split(random_state=random.seed(SEED))

        print(f"Number of training examples: {len(train_data)}")
        print(f"Number of validation examples: {len(valid_data)}")
        print(f"Number of testing examples: {len(test_data)}")

        label_field.build_vocab(trainval)

        self.train_iter, self.valid_iter = BucketIterator.splits(
            (train_data, valid_data), batch_size=self.batch_size
        )

    @classmethod
    def model_argparse_args(self, parser: argparse.ArgumentParser):

    """
    could be called inside the environment / docker container. Python + underlying
    dependencies are present


    python my_first_model.py --learning_rate 0.1 --batch_size 30 ....

    """

        group = parser.add_argument_group("Model")
        group.add_argument(
            "--learning_rate", dest="learning_rate", type=float, default=0.1, help=""
        )
        group.add_argument(
            "--batch_size", type=int, dest="batch_size", default=64, help=""
        )
        group.add_argument(
            "--num_hidden", type=int, dest="num_hidden", default=256, help=""
        )
        group.add_argument(
            "--num_layers", type=int, dest="num_layers", default=2, help=""
        )
        group.add_argument(
            "--bidirectional", type=bool, dest="bidirectional", default=True, help=""
        )
        group.add_argument(
            "--dropout", type=float, dest="dropout", default=0.1, help=""
        )
        group.add_argument(
            "--freeze_bert", type=bool, dest="freeze_bert", default=True, help=""
        )
        group.add_argument(
            "--bert", type=str, dest="bert", default="bert-base-uncased", help=""
        )
        group.add_argument(
            "--max_seq_len", type=int, dest="max_seq_len", default=128, help=""
        )

        return parser

# Define parser to be used in the model for command line convenience
parser = argparse.ArgumentParser()

parser.add_argument("--azure", type=bool, dest="azure", default=False, help="")

parser = pl.Trainer.add_argparse_args(parser)
parser = BertModule.model_argparse_args(parser)

args = parser.parse_args()

args.gpus = torch.cuda.device_count()
args.default_root_dir = "./outputs/"

trainer = pl.Trainer.from_argparse_args(args)

# Instantiate model
model = BertModule(
    bert=args.bert,
    batch_size=args.batch_size,
    num_hidden=args.num_hidden,
    num_layers=args.num_layers,
    bidirectional=args.bidirectional,
    learning_rate=args.learning_rate,
    dropout=args.dropout,
    freeze_bert=args.freeze_bert,
    max_seq_len=args.max_seq_len,
    num_classes=4,
)

# If we are running on Azure, setup logger
if args.azure:
    from azureml.core import Run
    from pytorch_lightning.loggers import MLFlowLogger
    run = Run.get_context()

    logger = MLFlowLogger(
        experiment_name=run.experiment.name,
        tracking_uri=run.experiment.workspace.get_mlflow_tracking_uri(),
    )
    logger._run_id = run.id
    trainer.logger = logger

# If the learning rate is not given as an argument (default argparse == 0), set it
if args.learning_rate == 0:
    lr_finder = trainer.tuner.lr_find(model)
    fig = lr_finder.plot(suggest=True)
    model.learning_rate = lr_finder.suggestion()
    print("new learning rate:", model.learning_rate)

    # If we are running on Azure, setup learning rate finder graph/logger so we can audit
    if args.azure:
        trainer.logger.experiment.log_figure(run.id, fig, "lr_finder.png")
        trainer.logger.experiment.log_param(run.id, "auto_learning_rate", model.learning_rate)

# Add monitor as an additional callback
trainer.callbacks.append(LearningRateMonitor(logging_interval="step"))

# Train
trainer.fit(model)

Overwriting project/train.py


In [None]:
from azureml.core import ScriptRunConfig, RunConfiguration, Experiment, Environment
from azureml.core.runconfig import DockerConfiguration
from IPython.display import display, Markdown

myenv = Environment.from_conda_specification(name="myenv", file_path="./project/environment.yml")
myenv.register(workspace=ws)

experiment_name = 'module-5-imdb-dbg'
experiment = Experiment(workspace=ws, name=experiment_name)

run_config = RunConfiguration()

run_config.environment = myenv
run_config.docker = DockerConfiguration(use_docker=True, shm_size="10G")

run_config.target = "nc6-uswest2"

refresh = 5

args = [
    # "--fast_dev_run", True,
    "--azure", True,
    "--progress_bar_refresh_rate", refresh,
    "--flush_logs_every_n_steps", refresh,
    "--log_every_n_steps", refresh,
    "--max_epochs", 16,
    "--learning_rate", 0,
    "--terminate_on_nan", 1,
]

config = ScriptRunConfig(
    "./project",
    script="train.py",
    arguments=args,
)
config.run_config = run_config

In [None]:
run = experiment.submit(config)

display(Markdown(f"""
* Experiement: [{run.experiment.name}]({run.experiment.get_portal_url()})
* Run: [{run.display_name}]({run.get_portal_url()})
"""))


* Experiement: [module-5-imdb-dbg](https://ml.azure.com/experiments/module-5-imdb-dbg?wsid=/subscriptions/7cd6d59d-0145-4b61-96bb-12f42ca736bd/resourcegroups/IKWork/workspaces/AIML-Module-5&tid=93ce7ccb-1367-4f00-b7b3-ad21ba5cb018)
* Run: [hungry_rocket_0wm3x9ms](https://ml.azure.com/runs/module-5-imdb-dbg_1647835506_34b40b59?wsid=/subscriptions/7cd6d59d-0145-4b61-96bb-12f42ca736bd/resourcegroups/IKWork/workspaces/AIML-Module-5&tid=93ce7ccb-1367-4f00-b7b3-ad21ba5cb018)


## HyperDrive Run

In [None]:
from azureml.train.hyperdrive import RandomParameterSampling, HyperDriveConfig, choice, PrimaryMetricGoal

param_sampling = RandomParameterSampling({
    "num_layers": choice(2, 3),
    "max_seq_len": choice(128, 256, 512),
    "bidirectional": choice(0, 1),
})

experiment_name = 'module-5-imdb-hyper'
experiment = Experiment(workspace=ws, name=experiment_name)
hyperdrive_config = HyperDriveConfig(
    run_config=config,
    hyperparameter_sampling=param_sampling,
    policy=None,
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    primary_metric_name='val/Accuracy',
    max_concurrent_runs=1,
    max_total_runs=18
)

In [None]:
run = experiment.submit(hyperdrive_config)

display(Markdown(f"""
* Experiement: [{run.experiment.name}]({run.experiment.get_portal_url()})
* Run: [{run.display_name}]({run.get_portal_url()})
"""))


## Second HyperDrive Run

In [None]:
from azureml.train.hyperdrive import BanditPolicy, HyperDriveRun
param_sampling = RandomParameterSampling({
    "num_layers": choice(3, 4),
    "max_seq_len": choice(512),
    "bidirectional": choice(0, 1),
    "freeze_bert": choice(1, 0),

})

experiment_name = 'module-5-imdb-hyper'
experiment = Experiment(workspace=ws, name=experiment_name)

# INSERT A PREVIOUS RUN ID FROM YOUR AZURE RUNS HERE
previous_runs = ["HD_aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee"]
warmstart_parents_to_resume_from = [HyperDriveRun(experiment, r) for r in previous_runs]

early_termination_policy = BanditPolicy(slack_factor = 0.1, evaluation_interval=1, delay_evaluation=6)

hyperdrive_config = HyperDriveConfig(
    run_config=config,
    hyperparameter_sampling=param_sampling,
    policy=early_termination_policy,
    resume_from=warmstart_parents_to_resume_from,
    primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
    primary_metric_name='val/Accuracy',
    max_concurrent_runs=1,
    max_total_runs=18
)

In [None]:
run = experiment.submit(hyperdrive_config)

display(Markdown(f"""
* Experiement: [{run.experiment.name}]({run.experiment.get_portal_url()})
* Run: [{run.display_name}]({run.get_portal_url()})
"""))
