Just using this to write the easily test the code for the baseline model. Final implementation will be in a py script, so it can be run from command line using GPU.


# To do!
- create function to extract data to train model -- DONE!
- create function to output tags into appropriate format -- DONE!
- make model -- DONE!
  - Incorporate start, stop and unknown tokens into the convert data shape. Start and stop should be both a label and a vocab. Unknown should only be vocab -- DONE!
  - Define allowed transitions, such as cannot transition into start token, cannot transition into pad token, except from stop token, cannot transition out of stop token except into pad token, can only transition into I tokens, from the B token of the same category. Potentially use allowed_transitions from the allen nlp CRF module to create it, it should then be fed into the model on its creation -- DONE!
- define hyperparamter space and random space search to optimize on dev dataset
  - Hyperparameters we have are DIM_EMBEDDING, LSTM_HIDDEN, LEARNING_RATE, EPOCHS and BATCH_SIZE. The values we have currently were selected arbitrarily, we could look at articles implementing Bi-LSTM and CRF for inspiration on ranges and appropriate values. 
  - https://pytorch.org/tutorials/beginner/hyperparameter_tuning_tutorial.html I think this might be the easiest way to implement it, otherwise we might have to implement from scratch
- train model -- This part should be working, just need to select the hyperparameters before we actually do it.
- submit results

In [4]:
#imports for ray
import ray
from ray import tune
from ray.tune import CLIReporter
from ray.tune.schedulers import ASHAScheduler
from torch.nn.parallel import DataParallel
import os
from ray.air import Checkpoint, session
# TODO: Migrate to ray.train.Checkpoint and remove following line(not sure how to do it)
os.environ["RAY_AIR_NEW_PERSISTENCE_MODE"]="0"

In [3]:
#Putting all the imports in one place for readability
import numpy as np
import torch
from torch import nn
from allennlp.modules.conditional_random_field import ConditionalRandomField as CRF
from allennlp.modules import conditional_random_field as CRFmodule
from torcheval.metrics.functional import multiclass_accuracy
from torcheval.metrics.functional import multiclass_confusion_matrix as MCM
import random
from collections import Counter


# Setting seeds to ensure reproducibility of results

random.seed(666)
np.random.seed(666)
torch.manual_seed(666)

<torch._C.Generator at 0x24e10924350>

In [5]:
#Extracts the data into 2 lists of lists, one with the tokens another with the tags


def extractData(filePath):
    """
    Returns:tuple: A tuple containing input data (list of lists of words), tags (list of lists of tags),
    and metadata (list of tuples containing newdoc_id, sent_id, and text).
    """
    wordsData = []
    tagsData = []
    currentSent = None
    with open(filePath, 'r', encoding='utf-8') as file:
        for line in file:
            line = line.strip()
            if line.startswith("# sent_id"):
                sentId = line.split("= ")[1]
            elif line.startswith("#"):
                continue
            elif line:                
                parts = line.split('\t')
                word = parts[1]
                tag = parts[2]
                if sentId != currentSent:
                    currentSent = sentId
                    wordsData.append([word])
                    tagsData.append([tag])
                else:
                    wordsData[-1].append(word)
                    tagsData[-1].append(tag)
    return wordsData, tagsData

# Example usage:
#file_path = "../Data/UniversalNER/train/en_ewt-ud-train.iob2"
#words_data, tags_data = extract_data(file_path)
# for words, tags in zip(words_data, tags_data):
#     print("Words:", words)
#     print("Tags:", tags)
#     print()


In [6]:
#Converts the Data into a tensor for use by the model

def convertDataShape(data, vocabulary = {}, labels = [], training = True, PADDING_TOKEN = '<PAD>', START_TOKEN = '<START>', STOP_TOKEN = '<END>', UNKNOWN_TOKEN = '<UNK>'):
    """
    If training is enabled creates a vocabulary of all words in a list. Otherwise, a vocabulary should be passed.
    Does the same with the labels.
    Creates a matrix of sentences and positions, where each value indicates a word via its index in the vocabulary.
    Creates another matrix of sentences and positions, where the values indicate a label.
    '<PAD>' or another user defined token is used as padding for short sentences. Will also act as an unknown token, if not training, it is assumed to be in vocabulary.
    Returns, the vocabulary, the labels and the two matrices.
    
    Input:
    data          - (string list * string list) list - List of sentences. Each sentence is a tuple of two lists. The first is a list of words, the second a list of labels.
    vocabulary    - string : int dictionary          - Dictionary of words in the vocabulary, values are the indices. Should be provided if not training. Defaults to empty dict.
    labels        - string : int dictionary          - Dictionary of labels to classify, values are the indices. Should be provided if not training. Defaults to empty dict.
    training      - boolean                          - Boolean variable deffining whether training is taking place, if yes then a new vocabulary will be created. Defaults to yes.
    PADDING_TOKEN - string                           - Token to be used as padding. Default is provided
    START_TOKEN   - string                           - Token to be used as marker for the start of the sentence. Default is provided
    STOP_TOKEN    - string                           - Token to be used as marker for the end of the sentence. Default is provided
    UNKNOWN_TOKEN - string                           - Token to be used as the unknown token. Default is provided
    
    Output:
    Xmatrix       - 2D torch.tensor                  - 2d torch tensor containing the index of the word in the sentence in the vocabulary
    Ymatrix       - 2D torch.tensor                  - 2d torch tensor containing the index of the label in the sentence in the labels
    vocabulary    - string : int dictionary          - Dictionary of words, with indices as values, used for training.
    labels        - string : int dictionary          - Dictionary of all the labels, with indices as values, used for classification. (all the labels are expected to be present in the training data, or in other words, the label list provided should be exhaustive)
    """


    if training:
        vocabList = sorted(set(word for sentence, _ in data for word in sentence))
        
        #In order to be able to work with unknown words in the future, we turn some of the least common words into unknown words so we can train on them
        #This is done by removing them from the vocab list before creating the dictionary
        vocabCount = Counter([word for sentence, _ in data for word in sentence])
        UNKNOWN_RATIO = 5 #This should be percentage of tokens we want to turn into Unknown tokens, the least common tokens will be used
        cutoff = int(len(vocabList) / (100 / UNKNOWN_RATIO)) + 1
        removeList = vocabCount.most_common()[:-cutoff:-1]
        for i in removeList:
            vocabList.remove(i[0])

        # Adding the special tokens in the first positions after the least common have been removed and creating the dictionaries
        vocabList = [PADDING_TOKEN, START_TOKEN, STOP_TOKEN, UNKNOWN_TOKEN] + vocabList
        vocabulary = {word: i for i, word in enumerate(vocabList)}
        labelList = [PADDING_TOKEN, START_TOKEN, STOP_TOKEN] + sorted(set(label for _, sentenceLabels in data for label in sentenceLabels))
        labels = {label: i for i, label in enumerate(labelList)}
    
    # Adding two to the max len in order to accomodate the introduction of start and end tokens
    maxLen = max(len(sentence) for sentence, _ in data) + 2
    Xmatrix = np.zeros((len(data), maxLen), dtype=int)
    Ymatrix = np.zeros((len(data), maxLen), dtype=int)

    for i, (sentence, sentenceLabels) in enumerate(data):
        #Set the first token as the start token (assumes it's index is 1)
        Xmatrix[i, 0] = 1
        Ymatrix[i, 0] = 1
        #Set all the indices to the correct index, with the unknown token as default
        for j, word in enumerate(sentence):
            Xmatrix[i, j+1] = vocabulary.get(word, vocabulary[UNKNOWN_TOKEN])
        for j, label in enumerate(sentenceLabels):
            Ymatrix[i, j+1] = labels.get(label, labels[START_TOKEN])
            lastWord = j         
        # Sets the token after the last word as en end token
        Xmatrix[i, lastWord + 2] = 2
        Ymatrix[i, lastWord + 2] = 2
    
    return torch.tensor(Xmatrix, dtype=torch.long), torch.tensor(Ymatrix, dtype=torch.long), vocabulary, labels

# two first sentences of EWT training dataset so that quickdebugging can be run



trainingDebugSen = [["Where", "in", "the", "world", "is", "Iguazu", "?"], ["Iguazu", "Falls"]]
trainingDebugTags = [["O", "O", "O", "O", "O", "B-LOC", "O"], ["B-LOC", "I-LOC"]]

dataDebug, labelsDebug, vocabDebug, tagsDebug = convertDataShape(list(zip(trainingDebugSen, trainingDebugTags)))
print(dataDebug)
print(labelsDebug)
print(vocabDebug)
print(tagsDebug)

tensor([[ 1,  7,  8, 10, 11,  9,  6,  4,  2],
        [ 1,  6,  5,  2,  0,  0,  0,  0,  0]])
tensor([[1, 5, 5, 5, 5, 5, 3, 5, 2],
        [1, 3, 4, 2, 0, 0, 0, 0, 0]])
{'<PAD>': 0, '<START>': 1, '<END>': 2, '<UNK>': 3, '?': 4, 'Falls': 5, 'Iguazu': 6, 'Where': 7, 'in': 8, 'is': 9, 'the': 10, 'world': 11}
{'<PAD>': 0, '<START>': 1, '<END>': 2, 'B-LOC': 3, 'I-LOC': 4, 'O': 5}


In [7]:
class baselineModel(torch.nn.Module):
    def __init__(self, nWords, tags, dimEmbed, dimHidden, constraints):
        super().__init__()
        self.dimEmbed = dimEmbed
        self.dimHidden = dimHidden
        self.vocabSize = nWords
        self.tagSetSize = len(tags)

        self.embed = nn.Embedding(nWords, dimEmbed)
        self.LSTM = nn.LSTM(dimEmbed, dimHidden, bidirectional=True)
        self.linear = nn.Linear(dimHidden * 2, self.tagSetSize)
        

        # Initialize the CRF layer
        self.CRF = CRF(self.tagSetSize, constraints = constraints, include_start_end_transitions=True)

    def forwardTrain(self, inputData, labels):
        # Embedding and LSTM layers
        wordVectors = self.embed(inputData)
        lstmOut, _ = self.LSTM(wordVectors)
        
        # Linear layer
        emissions = self.linear(lstmOut)
        
        # CRF layer to compute the log likelihood loss
        log_likelihood = self.CRF(emissions, labels)
        
        # The loss is the negative log-likelihood
        loss = -log_likelihood
        return loss
        
    def forwardPred(self, inputData):
        # Embedding and LSTM layers
        wordVectors = self.embed(inputData)
        lstmOut, _ = self.LSTM(wordVectors)
        
        # Linear layer
        emissions = self.linear(lstmOut)
        
        # Decode the best path
        best_paths = self.CRF.viterbi_tags(emissions)
        
        # Extract the predicted tags from the paths
        predictions = [path for path, score in best_paths]
        return predictions


In [8]:

def saveToIob2(words, labels, outputFilePath):
    """
    Save words and their corresponding labels in IOB2 format.

    Args:
    words (list): List of lists containing words.
    labels (list): List of lists containing labels.
    output_file (str): Path to the output IOB2 file.
    """
    with open(outputFilePath, 'w', encoding='utf-8') as file:
        for i in range(len(words)):
            for j in range(len(words[i])):
                line = f"{j+1}\t{words[i][j]}\t{labels[i][j]}\n"
                file.write(line)
            file.write('\n')

In [8]:
# two first sentences of EWT training dataset so that quickdebugging can be run

tags = ["O", "B-PER", "I-PER", "B-LOC", "I-LOC", "B-ORG", "I-ORG"]

trainingDebugSen = [["Where", "in", "the", "world", "is", "Iguazu", "?"], ["Iguazu", "Falls"]]
trainingDebugTags = [["O", "O", "O", "O", "O", "B-LOC", "O"], ["B-LOC", "I-LOC"]]

dataDebug, labelsDebug, vocabDebug, tagsDebug = convertDataShape(list(zip(trainingDebugSen, trainingDebugTags)))

In [7]:
#Quick traininig script on the debug dataset

DIM_EMBEDDING = 100
LSTM_HIDDEN = 50
LEARNING_RATE = 0.01
EPOCHS = 5

random.seed(666)
np.random.seed(666)
torch.manual_seed(666)

constraint_type = None

model = baselineModel(len(vocabDebug), tagsDebug, DIM_EMBEDDING, LSTM_HIDDEN, constraint_type)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

for epoch in range(EPOCHS):
    model.train()
    
    optimizer.zero_grad()
    loss = model.forwardTrain(dataDebug, labelsDebug)
    
    loss.backward()
    optimizer.step()
    
    print(f"Epoch {epoch}, Loss: {loss.item()}")


Epoch 0, Loss: 33.8591194152832
Epoch 1, Loss: 24.502079010009766
Epoch 2, Loss: 17.171268463134766
Epoch 3, Loss: 11.20111083984375
Epoch 4, Loss: 6.78648567199707


In [8]:
#Getting predicitons and checking accuracy


with torch.no_grad():
    predictsDebug = model.forwardPred(dataDebug)

confMat = MCM(torch.flatten(torch.tensor(predictsDebug, dtype=torch.long)), torch.flatten(labelsDebug), num_classes = len(tagsDebug))

acc = torch.trace(confMat[1:,1:])/torch.sum(confMat[1:,1:]) #Taking away the first collumn and first row, because those correspond to the padding token and we don't care
acc

tensor(1.)

In [9]:
# Loading all the training data sets

filePathTrain = "../Data/UniversalNER/train/"
wordsData = []
tagsData = []
datasets = ["da_ddt", "en_ewt", "hr_set", "pt_bosque", "sk_snk", "sr_set", "sv_talbanken", "zh_gsdsimp", "zh_gsd"]

for i in datasets:
    wordsDataTemp, tagsDataTemp = extractData(filePathTrain + i + "-ud-train.iob2")
    wordsData += wordsDataTemp
    tagsData += tagsDataTemp

trainData, trainLabels, vocab, labels = convertDataShape(list(zip(wordsData, tagsData)))

In [None]:
def train_cifar(config, data_dir=None):
    net = baselineModel(len(vocabDebug), tagsDebug, DIM_EMBEDDING, LSTM_HIDDEN, constraint_type)

    device = "cpu"
    if torch.cuda.is_available():
        device = "cuda:0"
        if torch.cuda.device_count() > 1:
            net = nn.DataParallel(net)
    net.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(net.parameters(), lr=config["lr"], momentum=0.9)

    checkpoint = session.get_checkpoint()

    if checkpoint:
        checkpoint_state = checkpoint.to_dict()
        start_epoch = checkpoint_state["epoch"]
        net.load_state_dict(checkpoint_state["net_state_dict"])
        optimizer.load_state_dict(checkpoint_state["optimizer_state_dict"])
    else:
        start_epoch = 0

    trainset, testset = load_data(data_dir)

    test_abs = int(len(trainset) * 0.8)
    train_subset, val_subset = random_split(
        trainset, [test_abs, len(trainset) - test_abs]
    )

    trainloader = torch.utils.data.DataLoader(
        train_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
    )
    valloader = torch.utils.data.DataLoader(
        val_subset, batch_size=int(config["batch_size"]), shuffle=True, num_workers=8
    )

    for epoch in range(start_epoch, 10):  # loop over the dataset multiple times
        running_loss = 0.0
        epoch_steps = 0
        for i, data in enumerate(trainloader, 0):
            # get the inputs; data is a list of [inputs, labels]
            inputs, labels = data
            inputs, labels = inputs.to(device), labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = net(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()
            epoch_steps += 1
            if i % 2000 == 1999:  # print every 2000 mini-batches
                print(
                    "[%d, %5d] loss: %.3f"
                    % (epoch + 1, i + 1, running_loss / epoch_steps)
                )
                running_loss = 0.0

        # Validation loss
        val_loss = 0.0
        val_steps = 0
        total = 0
        correct = 0
        for i, data in enumerate(valloader, 0):
            with torch.no_grad():
                inputs, labels = data
                inputs, labels = inputs.to(device), labels.to(device)

                outputs = net(inputs)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()

                loss = criterion(outputs, labels)
                val_loss += loss.cpu().numpy()
                val_steps += 1

        checkpoint_data = {
            "epoch": epoch,
            "net_state_dict": net.state_dict(),
            "optimizer_state_dict": optimizer.state_dict(),
        }
        checkpoint = Checkpoint.from_dict(checkpoint_data)

        session.report(
            {"loss": val_loss / val_steps, "accuracy": correct / total},
            checkpoint=checkpoint,
        )
    print("Finished Training")

In [10]:
DIM_EMBEDDING = 100
LSTM_HIDDEN = 50
LEARNING_RATE = 0.01
EPOCHS = 5
BATCH_SIZE = 32


PADDING_TOKEN = '<PAD>'
START_TOKEN = '<START>'
STOP_TOKEN = '<END>'
# The make constraint from the module was yielding some weird results so I decided to hardcode this for our use case, assuming the following dict of tags
#{'<PAD>': 0, '<START>': 1, '<END>': 2, '-': 3, 'B-LOC': 4, 'B-ORG': 5, 'B-PER': 6, 'I-LOC': 7, 'I-ORG': 8, 'I-PER': 9, 'O': 10}
CONSTRAINTS = [(1, 4), (1, 5), (1, 6), (1, 10), (2, 0), (4, 2), (4, 4), (4, 5), (4, 6), (4, 7), (4, 10), 
              (5, 2), (5, 4), (5, 5), (5, 6), (5, 8), (5, 10), (6, 2), (6, 4), (6, 5), (6, 6), (6, 9), (6, 10),
              (7, 2), (7, 4), (7, 5), (7, 6), (7, 7), (7, 10), (8, 2), (8, 4), (8, 5), (8, 6), (8, 8), (8, 10),
              (9, 2), (9, 4), (9, 5), (9, 6), (9, 9), (9, 10), (10, 2), (10, 4), (10, 5), (10, 6), (10, 10)]

random.seed(666)
np.random.seed(666)
torch.manual_seed(666)

numBatches = trainData.shape[0] // BATCH_SIZE

trainDataBatches = trainData[:BATCH_SIZE*numBatches].view(numBatches, trainData.shape[1], BATCH_SIZE)
trainLabelsBatches = trainLabels[:BATCH_SIZE*numBatches].view(numBatches, trainLabels.shape[1], BATCH_SIZE)



model = baselineModel(len(vocab), labels, DIM_EMBEDDING, LSTM_HIDDEN, CONSTRAINTS)
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

for epoch in range(EPOCHS):
    model.train()

    model.zero_grad()

    for batch in zip(trainDataBatches, trainLabelsBatches): 
        optimizer.zero_grad()
        
        loss = model.forwardTrain(batch[0], batch[1])
        loss.backward()
        optimizer.step()
        
     
    
    print(f"Epoch {epoch}, Loss: {loss.item()}")


Epoch 0, Loss: 76.76655578613281
Epoch 1, Loss: 45.305023193359375
Epoch 2, Loss: 42.572418212890625
Epoch 3, Loss: 42.45745849609375
Epoch 4, Loss: 41.4228515625


In [44]:
#Loading all the dev datasets

filePathDev = "../Data/UniversalNER/dev/"

wordsDataDev = []
tagsDataDev = []
datasets = ["da_ddt", "en_ewt", "hr_set", "pt_bosque", "sk_snk", "sr_set", "sv_talbanken", "zh_gsdsimp", "zh_gsd"]

for i in datasets:
    wordsDataTemp, tagsDataTemp = extractData(filePathDev + i + "-ud-dev.iob2")
    wordsDataDev += wordsDataTemp
    tagsDataDev += tagsDataTemp

devData, devLabels, _, _ = convertDataShape(list(zip(wordsData, tagsData)), vocabulary = vocab, labels = labels, training = False)

In [46]:
# Define hyperparameter search space
config = {
    "DIM_EMBEDDING": tune.choice([50, 100, 200]),
    "LSTM_HIDDEN": tune.choice([25, 50, 100]),
    "LEARNING_RATE": tune.loguniform(1e-5, 1e-2),
    "EPOCHS": tune.choice([3, 5, 7]),
    "BATCH_SIZE": tune.choice([16, 32, 64])
}

# Define training function
def train_model(config, dev_data):
    DIM_EMBEDDING = config["DIM_EMBEDDING"]
    LSTM_HIDDEN = config["LSTM_HIDDEN"]
    LEARNING_RATE = config["LEARNING_RATE"]
    EPOCHS = config["EPOCHS"]
    BATCH_SIZE = config["BATCH_SIZE"]

    # Load dev data and preprocess if necessary
    # dev_data = load_dev_data()

    # Define your model, optimizer, and data loading here
    model = baselineModel(len(vocab), labels, DIM_EMBEDDING, LSTM_HIDDEN, CONSTRAINTS)
    model = DataParallel(model)  # Utilize DataParallel for multi-GPU training
    optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

    # Training loop
    for epoch in range(EPOCHS):
        model.train()

        model.zero_grad()

        for batch in dev_data: 
            optimizer.zero_grad()
            
            loss = model.module.forwardTrain(batch[0], batch[1])  # Access the module inside DataParallel
            loss.backward()
            optimizer.step()
            
    # Return performance metric for optimization (e.g., accuracy, F1 score)
    return 0.9  # Dummy return value for demonstration

# Initialize Ray Tune
ray.init()

# Define scheduler and reporter
scheduler = ASHAScheduler(
    max_t=10,
    grace_period=1,
    reduction_factor=2,
    metric="accuracy",  # Specify the metric parameter to optimize
    mode="max"  # Specify the mode parameter
)
reporter = CLIReporter(metric_columns=["accuracy"])

# Define dev dataset
# dev_data = load_dev_data()

# Perform hyperparameter tuning
analysis = tune.run(
    lambda config: train_model(config, devData),
    config=config,
    num_samples=10,
    scheduler=scheduler,
    progress_reporter=reporter,
    trial_dirname_creator=lambda trial: str(trial))

# Get best hyperparameters and corresponding results
best_config = analysis.get_best_config(metric="accuracy")
best_accuracy = analysis.get_best_trial(metric="accuracy").last_result["accuracy"]

print("Best config:", best_config)
print("Best accuracy:", best_accuracy)

# Shut down Ray Tune
ray.shutdown()

2024-03-26 17:59:56,552	INFO worker.py:1752 -- Started a local Ray instance.
2024-03-26 17:59:59,411	INFO tune.py:613 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949


ValueError: Tracked actor is not managed by this event manager: <TrackedActor 302833676580458670291854003854672144260>

In [12]:
#Getting predicitons and checking accuracy

DEV_BATCH_SIZE = 200

devNumBatches = devData.shape[0] // DEV_BATCH_SIZE
devDataBatches = devData[:DEV_BATCH_SIZE*devNumBatches].view(devNumBatches, devData.shape[1], DEV_BATCH_SIZE)
devLabelsBatches = devLabels[:DEV_BATCH_SIZE*devNumBatches].view(devNumBatches, devLabels.shape[1], DEV_BATCH_SIZE)

predicts = []
with torch.no_grad():

    for batch in devDataBatches:
        predicts += model.forwardPred(batch)



KeyboardInterrupt: 

In [None]:
confMat = MCM(torch.flatten(torch.tensor(predicts, dtype=torch.long)), torch.flatten(devLabels), num_classes = len(labels))

#Taking away the first three collumns and rows, because those correspond to the functional tokens and we don't care
acc = torch.trace(confMat[3:,3:])/torch.sum(confMat[3:,3:]) 
acc

ValueError: The `input` and `target` should have the same first dimension, got shapes torch.Size([16330400]) and torch.Size([16379868]).

In [None]:
outputFilePath = "./baselineModel.iob2"

#convert the predictions back into labels

# creates a list of lists of tags, where the padding token is excluded
predictLabels = [[list(labels.keys())[i] for i in j if list(labels.keys())[i] != PADDING_TOKEN and list(labels.keys())[i] != START_TOKEN and list(labels.keys())[i] != STOP_TOKEN] for j in predicts]

# the saveToIob2 works when provided data in the right format
saveToIob2(devWordsData, predictLabels, outputFilePath)


In [None]:
#Run thes every time after running ray if it crashed and didn't turn down
# Shut down Ray Tune
ray.shutdown()

In [55]:
# Define hyperparameter search space
config = {
    "DIM_EMBEDDING": tune.choice([50, 100, 200]),
    "LSTM_HIDDEN": tune.choice([25, 50, 100]),
    "LEARNING_RATE": tune.loguniform(1e-5, 1e-2),
    "EPOCHS": tune.choice([3, 5, 7]),
    "BATCH_SIZE": tune.choice([16, 32, 64])
}

ray_constants.FUNCTION_SIZE_ERROR_THRESHOLD = 967619810

def train_model(config, trainData, trainLabels):
    DIM_EMBEDDING = config["DIM_EMBEDDING"]
    LSTM_HIDDEN = config["LSTM_HIDDEN"]
    LEARNING_RATE = config["LEARNING_RATE"]
    EPOCHS = config["EPOCHS"]
    BATCH_SIZE = config["BATCH_SIZE"]
    CONSTRAINTS = [(1, 4), (1, 5), (1, 6), (1, 10), (2, 0), (4, 2), (4, 4), (4, 5), (4, 6), (4, 7), (4, 10), 
                (5, 2), (5, 4), (5, 5), (5, 6), (5, 8), (5, 10), (6, 2), (6, 4), (6, 5), (6, 6), (6, 9), (6, 10),
                (7, 2), (7, 4), (7, 5), (7, 6), (7, 7), (7, 10), (8, 2), (8, 4), (8, 5), (8, 6), (8, 8), (8, 10),
                (9, 2), (9, 4), (9, 5), (9, 6), (9, 9), (9, 10), (10, 2), (10, 4), (10, 5), (10, 6), (10, 10)]
    
    numBatches = trainData.shape[0] // BATCH_SIZE

    trainDataBatches = trainData[:BATCH_SIZE*numBatches].view(numBatches, trainData.shape[1], BATCH_SIZE)
    trainLabelsBatches = trainLabels[:BATCH_SIZE*numBatches].view(numBatches, trainLabels.shape[1], BATCH_SIZE)

    # Define your model, optimizer, and data loading here
    model = baselineModel(len(vocab), labels, DIM_EMBEDDING, LSTM_HIDDEN, CONSTRAINTS)
    #model = DataParallel(model)  # Utilize DataParallel for multi-GPU training
    optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)

    # Training loop
    for epoch in range(EPOCHS):
        model.train()

        model.zero_grad()

        for batch in zip(trainDataBatches, trainLabelsBatches): 
            optimizer.zero_grad()
            
            loss = model.module.forwardTrain(batch[0], batch[1])  # Access the module inside DataParallel
            loss.backward()
            optimizer.step()
            
        tune.report(loss=loss.item())

# Initialize Ray Tune
ray.init()

# Define scheduler and reporter
scheduler = ASHAScheduler(
    max_t=10,
    grace_period=1,
    reduction_factor=2,
    metric="loss",  # Specify the metric parameter
    mode="min")  # Specify the mode parameter
reporter = CLIReporter(metric_columns=["loss", "training_iteration"])

# Perform hyperparameter tuning
analysis = tune.run(
    lambda config: train_model(config, trainData, trainLabels),
    config=config,
    num_samples=10,
    scheduler=scheduler,
    progress_reporter=reporter,
    trial_dirname_creator=lambda trial: str(trial))

# Get best hyperparameters and corresponding results
best_config = analysis.get_best_config(metric="loss")
best_loss = analysis.get_best_trial(metric="loss").last_result["loss"]

print("Best config:", best_config)
print("Best loss:", best_loss)

# Shut down Ray Tune
ray.shutdown()

2024-03-26 18:57:12,541	INFO worker.py:1752 -- Started a local Ray instance.
2024-03-26 18:57:14,738	INFO tune.py:613 -- [output] This uses the legacy output and progress reporter, as Jupyter notebooks are not supported by the new engine, yet. For more information, please see https://github.com/ray-project/ray/issues/36949


== Status ==
Current time: 2024-03-26 18:58:34 (running for 00:01:15.59)
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: None
Logical resource usage: 10.0/12 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: C:/Users/danii/AppData/Local/Temp/ray/session_2024-03-26_18-57-09_907373_28344/artifacts/2024-03-26_18-57-16/lambda_2024-03-26_18-57-16/driver_artifacts
Number of trials: 10/10 (10 PENDING)


== Status ==
Current time: 2024-03-26 18:58:39 (running for 00:01:20.65)
Using AsyncHyperBand: num_stopped=0
Bracket: Iter 8.000: None | Iter 4.000: None | Iter 2.000: None | Iter 1.000: None
Logical resource usage: 10.0/12 CPUs, 0/1 GPUs (0.0/1.0 accelerator_type:G)
Result logdir: C:/Users/danii/AppData/Local/Temp/ray/session_2024-03-26_18-57-09_907373_28344/artifacts/2024-03-26_18-57-16/lambda_2024-03-26_18-57-16/driver_artifacts
Number of trials: 10/10 (10 PENDING)


== Status ==
Current time: 2024-03-26 18:58:44 (ru

[36m(pid=gcs_server)[0m [2024-03-26 18:58:47,142 E 3916 24004] (gcs_server.exe) logging.cc:97: Unhandled exception: class std::bad_alloc. what(): bad allocation


: 