# Google Colab Project: Music Composition with BERT

## Introduction

In this project, we explore the fascinating world of music composition using state-of-the-art natural language processing models, specifically BERT (Bidirectional Encoder Representations from Transformers). The goal is to train the model to generate music compositions in ABC notation.

## Project Workflow

### Data Loading and Preparation

- **Data Source**: The project begins by obtaining a dataset of music compositions in ABC notation. This dataset contains the music pieces that we'll use for training our models.

- **Data Preprocessing**: The dataset is preprocessed to clean and format the ABC notation for model training. This includes tokenization and encoding into a suitable format for the models.

### BERT Model Training


- **Model Selection**: We also train a BERT model, adapted for music generation, using PyTorch.

- **Hyperparameter Tuning**: The training process includes hyperparameter tuning to find the optimal combination of settings for BERT.


In [1]:
import torch
from tqdm import tqdm
from argparse import ArgumentParser

import glob
import os
import pandas as pd

import sys
!pip install wandb

import wandb
wandb.login(key='30b44f6f59b06faebb3d1f78df32c6fd9961f07d')
!{sys.executable} -m pip install youtokentome
!{sys.executable} -m pip install transformers
!pip install accelerate -U
from transformers import Trainer, TrainingArguments,default_data_collator
import youtokentome as yttm



Collecting wandb
  Downloading wandb-0.15.11-py3-none-any.whl (2.1 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m19.8 MB/s[0m eta [36m0:00:00[0m
Collecting GitPython!=3.1.29,>=1.0.0 (from wandb)
  Downloading GitPython-3.1.37-py3-none-any.whl (190 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m190.0/190.0 kB[0m [31m21.6 MB/s[0m eta [36m0:00:00[0m
Collecting sentry-sdk>=1.0.0 (from wandb)
  Downloading sentry_sdk-1.31.0-py2.py3-none-any.whl (224 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m224.8/224.8 kB[0m [31m25.9 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting docker-pycreds>=0.4.0 (from wandb)
  Downloading docker_pycreds-0.4.0-py2.py3-none-any.whl (9.0 kB)
Collecting pathtools (from wandb)
  Downloading pathtools-0.1.2.tar.gz (11 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting setproctitle (from wandb)
  Downloading setproctitle-1.3.2-cp310-cp310-manylinux_2_5_x86_64.manyl

[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


Collecting youtokentome
  Downloading youtokentome-1.0.6.tar.gz (86 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/86.7 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━━[0m [32m81.9/86.7 kB[0m [31m2.4 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m86.7/86.7 kB[0m [31m2.0 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: youtokentome
  Building wheel for youtokentome (setup.py) ... [?25l[?25hdone
  Created wheel for youtokentome: filename=youtokentome-1.0.6-cp310-cp310-linux_x86_64.whl size=1948600 sha256=814a8d6cbc3fc708c7399f01decf0c7e62c359bb09f6a5e8a12b7b327bff4d9e
  Stored in directory: /root/.cache/pip/wheels/df/85/f8/301d2ba45f43f30bed2fe413efa760bc726b8b660ed9c2900c
Successfully built youtokentome
Installing collected packages: youtokentome
Successfully instal

In [2]:
ORIGIN = os.path.normpath(os.getcwd())
print(ORIGIN)
TRAIN_DIR ="/content/drive/MyDrive/test2/"
VALID_DIR = "/content/drive/MyDrive/Music_project/valid_path/"
TEST_DIR = "/content/drive/MyDrive/Music_project/test_path/"
TOKENIZER_DIR = "/content/drive/MyDrive/Music_project/abc_run5.yttm"
DATASET_DIR ="/content/drive/MyDrive/Music_project/300,000_new_samples.csv"
# OUTPUT_DIR = "/content/drive/MyDrive/Music_project/output_GPT2_checkpoints6"
OUTPUT_DIR = "/content/drive/MyDrive/Music_project/"


/content


In [None]:
print("Loading tokenizer...")
tokenizer = yttm.BPE(TOKENIZER_DIR) # import the trained tokenizer



Loading tokenizer...


In [None]:
from transformers import BertConfig, EncoderDecoderConfig, EncoderDecoderModel

def get_model(vocab_size=30000):
    config_encoder = BertConfig()
    config_decoder = BertConfig()

    config_encoder.vocab_size = vocab_size
    config_decoder.vocab_size = vocab_size

    config_decoder.is_decoder = True
    config_decoder.add_cross_attention = True

    config = EncoderDecoderConfig.from_encoder_decoder_configs(config_encoder, config_decoder)
    model = EncoderDecoderModel(config=config)

    return model

model = get_model(vocab_size=tokenizer.vocab_size()) # load the BERT model



In [5]:
USEABLE_PARAMS = [i+":" for i in "BCDFGHIKLMmNOPQRrSsTUVWwXZ"] # These are the parameters for key

def read_abc(path):
    keys = []
    notes = []
    with open(path) as rf:
        for line in rf:
            line = line.strip()
            if line.startswith("%"): # Skip any commments
                continue

            if any([line.startswith(key) for key in USEABLE_PARAMS]):
                if(line.startswith('T')):
                    continue # skipping the title for better tokenization
#                 if(line.startswith('L')):
#                     print(line) ## Checking all L in all files
                # After checking the all midi files, they all have the length (L) : 1/8
                keys.append(line)
            else:
                notes.append(line)

    keys = " ".join(keys)

    notes = "".join(notes).strip()
    notes = notes.replace(" ", "")

    if notes.endswith("|"):
        notes = notes[:-1]
    # Remove unneeded character.
    notes = notes.replace(" \ ", "")
    notes = notes.replace("\\", "")
    notes = notes.replace("\ ", "")
    notes = notes.replace("x8|", "") # 8 because all of the midi file has a L:1/8 that means one muted bar
    notes = notes.replace("z8|", "") # 8 because all of the midi file has a L:1/8 that means one muted bar

    notes = notes.strip()
    notes = " ".join(notes.split(" "))

    if not keys or not notes:
        return None, None

    return keys, notes



In [None]:
from google.colab import drive
drive.mount('/content/drive')


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [None]:

OUTPUT_DIR

'/content/drive/MyDrive/Music_project/output_BERT_checkpoints6'

In [None]:
def load_dataset(path):
  data = []
  count = 0
  counter = 0
  directory_files = os.listdir(path)
  directory_path = path

  for file in directory_files:
      filename = os.path.join(directory_path, file)
      print(filename)
      keys, notes = read_abc(filename)
      print("======================")
      print(keys)
      print(notes)
      if keys is None:
          continue

      keys_tokens = tokenizer.encode(keys)


      bars = notes.split(",")
      input_bars = []
      target_bars = []
      count = 0
      notes_tokens = [tokenizer.encode(i + " | ") for i in bars]


      print("======total=====")

      print(notes_tokens)

      sequence_len = sum(len(i) for i in notes_tokens)

      counter = counter+1
      if counter == 10:
        break
      data.append((keys_tokens, notes_tokens))
  return data

In [None]:
train_data = []
valid_data = []
test_data = []

train_data = load_dataset(TRAIN_DIR)
# valid_data = load_dataset(VALID_DIR)
# test_data = load_dataset(TEST_DIR)

/content/drive/MyDrive/test2/8352_9782.abc
[[58, 33, 18, 14, 17, 38, 18, 14, 17, 12, 35, 18, 14, 17, 12, 38, 11, 18, 14, 17, 12, 35, 11, 18, 14, 17, 12, 7131, 38, 14, 17, 35, 14, 17, 38, 11, 14, 17, 35, 11, 14, 17, 12, 7131, 33, 14, 17, 38, 14, 17, 35, 14, 17, 35, 11, 14, 17, 12, 60, 35, 11, 14, 17, 49146, 33, 14, 17, 38, 14, 17, 35, 14, 17, 35, 11, 14, 17, 12, 60, 35, 11, 14, 17, 49146, 33, 14, 17, 38, 14, 17, 35, 14, 17, 35, 11, 14, 17, 12, 60, 35, 11, 14, 17, 49146, 33, 14, 17, 38, 14, 17, 35, 14, 17, 35, 11, 14, 17, 12, 60, 35, 11, 14, 17, 49146, 33, 14, 17, 38, 14, 17, 35, 14, 17, 35, 11, 14, 17, 12, 60, 35, 11, 14, 17, 49146, 33, 14, 17, 38, 14, 17, 35, 14, 17, 35, 11, 14, 17, 12, 60, 35, 11, 14, 17, 12, 930, 220], [58, 33, 17, 38, 17, 35, 17, 38, 11, 17, 12, 35, 11, 17, 12, 7131, 33, 21, 38, 21, 35, 21, 38, 11, 21, 35, 11, 21, 12, 60, 930, 220], [58, 28, 32, 17, 37, 17, 35, 17, 35, 11, 17, 7131, 32, 14, 17, 12, 37, 14, 17, 12, 35, 14, 17, 7131, 32, 14, 17, 37, 14, 17, 7131, 32, 

In [None]:
######
#BERT
#####
import torch
from torch.utils.data import Dataset


class ABCD_BERT(Dataset):
    def __init__(self, data,
                 context_bars_num=8,
                 target_bars_num=8,
                 bos_id=2,
                 eos_id=3,
                 is_test=False):

        self.notes = []
        self.keys = []

        for (keys, notes) in data:
            if notes is None:
                continue

            self.keys.append(keys)
            self.notes.append(notes)

        self.context_bars_num = context_bars_num
        self.target_bars_num = target_bars_num
        self.bos_id = bos_id
        self.eos_id = eos_id
        self.is_test = is_test

    def __len__(self):
        return len(self.keys)


    def __getitem__(self, idx):
        notes = self.notes[idx]
        keys = self.keys[idx]

        if not self.is_test:
            split_indx = 12

            # split notes to context (input for network) and target (that model must to generate)
            context_notes = notes[split_indx - self.context_bars_num : split_indx]
            target_notes = notes[split_indx: split_indx + self.target_bars_num]

        else:
            context_notes = notes
            target_notes = []

        context_tokens = [self.bos_id] + keys
        target_tokens = [self.bos_id]

        for bar in context_notes:
            context_tokens += bar

        for bar in target_notes:
            target_tokens += bar

        context_tokens += [self.eos_id]
        target_tokens += [self.eos_id]

        context_tokens = torch.tensor(context_tokens, dtype=torch.long)
        target_tokens = torch.tensor(target_tokens, dtype=torch.long)

        return {
            "input_ids": context_tokens,
            "decoder_input_ids": target_tokens,
            "labels": target_tokens,
        }
    def save_to_csv(self, file_path):
        data = []
        for idx in range(len(self)):
            sample = self[idx]
            data.append({
                "input_ids": " ".join(str(token.item()) for token in sample["input_ids"]),
                "decoder_input_ids": " ".join(str(token.item()) for token in sample["decoder_input_ids"]),
                "labels": " ".join(str(token.item()) for token in sample["labels"])
            })

        df = pd.DataFrame(data)
        df.to_csv(file_path, index=False)



In [None]:
train_dataset_2 = ABCD(train_data)
# valid_dataset = ABCD(valid_data)

In [None]:
from torch.nn.utils.rnn import pad_sequence

def collate_function(samples):

    input_ids = [sample["input_ids"] for sample in samples]
    print(samples[0])
    decoder_input_ids = [sample["decoder_input_ids"] for sample in samples]
    labels = [sample["labels"] for sample in samples]

    input_ids_padded = pad_sequence(input_ids, batch_first=True)
    decoder_input_ids_padded = pad_sequence(decoder_input_ids, batch_first=True)
    labels_padded = pad_sequence(labels, batch_first=True)

    attention_mask = input_ids_padded != 0
    decoder_attention_mask = decoder_input_ids_padded != 0

    batch = {
        "input_ids": input_ids_padded,
        "decoder_input_ids": decoder_input_ids_padded,
        "labels": labels_padded,
        "attention_mask": attention_mask,
        "decoder_attention_mask": decoder_attention_mask,
    }

    return batch


In [None]:
OUTPUT_DIR

'/content/drive/MyDrive/Music_project/output_BERT_checkpoints'

In [None]:
from transformers import Trainer, TrainingArguments,TrainerCallback
from transformers import get_cosine_schedule_with_warmup
from transformers import DataCollatorForLanguageModeling


training_args = TrainingArguments(
     output_dir='/content/first_run',
     overwrite_output_dir=True,
     evaluation_strategy="epoch",
     gradient_accumulation_steps=8, # recheck this one

     num_train_epochs=50,
     per_device_train_batch_size=8,
     per_device_eval_batch_size=8,
     save_strategy = 'steps',
     save_steps=500,
     eval_steps=500,
     # logging_steps = 1,
     logging_strategy = 'epoch',
     fp16=True,
     report_to="wandb",  # enable logging to W&B
     run_name="bert-base-music_project",

     logging_dir='/content/first_run',
 )


class PrinterCallback(TrainerCallback):
    def on_log(self, args, state, control, logs=None, **kwargs):
        _ = logs.pop("flos", None)
        if state.is_local_process_zero:
            print(logs)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=abc_dataset,
    # eval_dataset=valid_dataset,
    data_collator= collate_function,
    callbacks=[PrinterCallback],  # We can either pass the callback class this way or an instance of it (MyCallback())

)

# Start training


In [None]:
#Fifth Run
trainer.train()

Step,Training Loss
1554,0.0297
3109,0.0051
4664,0.0018
6219,0.0008
7774,0.0004
9329,0.0002
10884,0.0001
12439,0.0
13993,0.0
15548,0.0


{'loss': 0.0297, 'learning_rate': 1.982551546391753e-05, 'epoch': 1.0}




{'loss': 0.0051, 'learning_rate': 1.9625128865979383e-05, 'epoch': 2.0}




{'loss': 0.0018, 'learning_rate': 1.942474226804124e-05, 'epoch': 3.0}




{'loss': 0.0008, 'learning_rate': 1.9224355670103095e-05, 'epoch': 4.0}




{'loss': 0.0004, 'learning_rate': 1.902396907216495e-05, 'epoch': 5.0}




{'loss': 0.0002, 'learning_rate': 1.8823582474226806e-05, 'epoch': 6.0}




{'loss': 0.0001, 'learning_rate': 1.8623195876288663e-05, 'epoch': 7.0}




{'loss': 0.0, 'learning_rate': 1.8422809278350517e-05, 'epoch': 8.0}




{'loss': 0.0, 'learning_rate': 1.8222551546391752e-05, 'epoch': 9.0}




{'loss': 0.0, 'learning_rate': 1.802216494845361e-05, 'epoch': 10.0}




{'loss': 0.0, 'learning_rate': 1.7821778350515467e-05, 'epoch': 11.0}




{'loss': 0.0, 'learning_rate': 1.762139175257732e-05, 'epoch': 12.0}




{'loss': 0.0, 'learning_rate': 1.7421005154639178e-05, 'epoch': 13.0}




{'loss': 0.0, 'learning_rate': 1.7220618556701032e-05, 'epoch': 14.0}




{'loss': 0.0, 'learning_rate': 1.702036082474227e-05, 'epoch': 15.0}




{'loss': 0.0, 'learning_rate': 1.6819974226804124e-05, 'epoch': 16.0}




{'loss': 0.0, 'learning_rate': 1.6619845360824746e-05, 'epoch': 17.0}




{'loss': 0.0, 'learning_rate': 1.641958762886598e-05, 'epoch': 18.0}




{'loss': 0.0, 'learning_rate': 1.6219201030927834e-05, 'epoch': 19.0}




{'loss': 0.0, 'learning_rate': 1.6018943298969075e-05, 'epoch': 20.0}




In [None]:
#Fourth Run
trainer.train()

Step,Training Loss
157,5.8465
314,3.6194
472,1.4841
629,0.3644
786,0.0882
944,0.0443
1101,0.029
1259,0.0208
1416,0.016
1573,0.0129


{'loss': 5.8465, 'learning_rate': 1.5700000000000002e-05, 'epoch': 1.0}
{'loss': 3.6194, 'learning_rate': 1.998904336335509e-05, 'epoch': 2.0}
{'loss': 1.4841, 'learning_rate': 1.9937679191605964e-05, 'epoch': 3.0}
{'loss': 0.3644, 'learning_rate': 1.984521179060989e-05, 'epoch': 4.0}
{'loss': 0.0882, 'learning_rate': 1.9711832381924365e-05, 'epoch': 4.99}
{'loss': 0.0443, 'learning_rate': 1.9536860733321152e-05, 'epoch': 6.0}
{'loss': 0.029, 'learning_rate': 1.9323238012155125e-05, 'epoch': 7.0}
{'loss': 0.0208, 'learning_rate': 1.9069142925435335e-05, 'epoch': 8.0}
{'loss': 0.016, 'learning_rate': 1.8778846657551135e-05, 'epoch': 9.0}




{'loss': 0.0129, 'learning_rate': 1.845206968721005e-05, 'epoch': 10.0}
{'loss': 0.0104, 'learning_rate': 1.8087755429170473e-05, 'epoch': 11.0}
{'loss': 0.0088, 'learning_rate': 1.769202778528286e-05, 'epoch': 12.0}
{'loss': 0.0075, 'learning_rate': 1.7264335740162244e-05, 'epoch': 12.99}
{'loss': 0.0064, 'learning_rate': 1.6803447414783938e-05, 'epoch': 14.0}
{'loss': 0.0057, 'learning_rate': 1.631711006253251e-05, 'epoch': 15.0}
{'loss': 0.005, 'learning_rate': 1.580117729483068e-05, 'epoch': 16.0}
{'loss': 0.0045, 'learning_rate': 1.526432162877356e-05, 'epoch': 17.0}
{'loss': 0.004, 'learning_rate': 1.4705589951155008e-05, 'epoch': 18.0}
{'loss': 0.0036, 'learning_rate': 1.4123563174739036e-05, 'epoch': 19.0}




{'loss': 0.0033, 'learning_rate': 1.3528024816844712e-05, 'epoch': 20.0}
{'loss': 0.003, 'learning_rate': 1.2917825669370118e-05, 'epoch': 20.99}
{'loss': 0.0028, 'learning_rate': 1.2291504239353628e-05, 'epoch': 22.0}
{'loss': 0.0026, 'learning_rate': 1.1659588610392369e-05, 'epoch': 23.0}
{'loss': 0.0024, 'learning_rate': 1.10166912305461e-05, 'epoch': 24.0}
{'loss': 0.0022, 'learning_rate': 1.037361881508116e-05, 'epoch': 25.0}
{'loss': 0.0021, 'learning_rate': 9.728993817898255e-06, 'epoch': 26.0}
{'loss': 0.0019, 'learning_rate': 9.081405621844106e-06, 'epoch': 27.0}
{'loss': 0.0018, 'learning_rate': 8.441739791962186e-06, 'epoch': 28.0}




{'loss': 0.0017, 'learning_rate': 7.808549348793049e-06, 'epoch': 28.99}
{'loss': 0.0016, 'learning_rate': 7.180525243049418e-06, 'epoch': 30.0}
{'loss': 0.0015, 'learning_rate': 6.568224179275326e-06, 'epoch': 31.0}
{'loss': 0.0015, 'learning_rate': 5.96642583432484e-06, 'epoch': 32.0}
{'loss': 0.0014, 'learning_rate': 5.385246073599659e-06, 'epoch': 33.0}
{'loss': 0.0013, 'learning_rate': 4.823243030667576e-06, 'epoch': 34.0}
{'loss': 0.0013, 'learning_rate': 4.27938331632013e-06, 'epoch': 35.0}
{'loss': 0.0013, 'learning_rate': 3.7628088826977815e-06, 'epoch': 36.0}
{'loss': 0.0012, 'learning_rate': 3.2721532425334933e-06, 'epoch': 36.99}
{'loss': 0.0012, 'learning_rate': 2.8066019966134907e-06, 'epoch': 38.0}




{'loss': 0.0012, 'learning_rate': 2.373980779190238e-06, 'epoch': 39.0}
{'loss': 0.0011, 'learning_rate': 1.970601171790616e-06, 'epoch': 40.0}
{'loss': 0.0011, 'learning_rate': 1.6032437411085711e-06, 'epoch': 41.0}
{'loss': 0.0011, 'learning_rate': 1.2707792273019049e-06, 'epoch': 42.0}
{'loss': 0.0011, 'learning_rate': 9.728216135571323e-07, 'epoch': 43.0}
{'loss': 0.001, 'learning_rate': 7.14379386755859e-07, 'epoch': 44.0}
{'loss': 0.001, 'learning_rate': 4.945237734282282e-07, 'epoch': 44.99}
{'loss': 0.001, 'learning_rate': 3.1314792140057395e-07, 'epoch': 46.0}
{'loss': 0.001, 'learning_rate': 1.7330064880545784e-07, 'epoch': 47.0}




{'loss': 0.001, 'learning_rate': 7.378965336347188e-08, 'epoch': 48.0}
{'loss': 0.001, 'learning_rate': 1.6287654589922653e-08, 'epoch': 49.0}
{'loss': 0.001, 'learning_rate': 0.0, 'epoch': 49.88}
{'train_runtime': 7376.2989, 'train_samples_per_second': 68.259, 'train_steps_per_second': 1.064, 'total_flos': 9.048081898115194e+16, 'train_loss': 0.23280484066267682, 'epoch': 49.88}


TrainOutput(global_step=7850, training_loss=0.23280484066267682, metrics={'train_runtime': 7376.2989, 'train_samples_per_second': 68.259, 'train_steps_per_second': 1.064, 'total_flos': 9.048081898115194e+16, 'train_loss': 0.23280484066267682, 'epoch': 49.88})

In [None]:
#Third Run
trainer.train()

Step,Training Loss,Validation Loss
200,7.3551,5.505111
400,3.4734,2.016948
600,1.2532,0.429838
800,0.0667,0.139001
1000,0.0254,0.109631
1200,0.0153,0.098248
1400,0.009,0.092101
1600,0.0069,0.088578
1800,0.0055,0.086435
2000,0.0041,0.084204


{'loss': 9.5008, 'learning_rate': 8.6e-06, 'epoch': 0.99}
{'loss': 7.3551, 'learning_rate': 1.73e-05, 'epoch': 1.99}
{'eval_loss': 5.505110740661621, 'eval_runtime': 7.077, 'eval_samples_per_second': 195.422, 'eval_steps_per_second': 24.445, 'epoch': 2.3}
{'loss': 5.19, 'learning_rate': 1.999748234942507e-05, 'epoch': 2.99}
{'loss': 3.4734, 'learning_rate': 1.9984890974505383e-05, 'epoch': 3.99}
{'eval_loss': 2.0169477462768555, 'eval_runtime': 7.4168, 'eval_samples_per_second': 186.469, 'eval_steps_per_second': 23.326, 'epoch': 4.6}
{'loss': 2.2175, 'learning_rate': 1.9961729363458e-05, 'epoch': 5.0}




{'loss': 1.2532, 'learning_rate': 1.9928022035699166e-05, 'epoch': 6.0}
{'eval_loss': 0.42983782291412354, 'eval_runtime': 7.1236, 'eval_samples_per_second': 194.142, 'eval_steps_per_second': 24.285, 'epoch': 6.91}
{'loss': 0.557, 'learning_rate': 1.9883804674584312e-05, 'epoch': 7.0}
{'loss': 0.1924, 'learning_rate': 1.982912408963285e-05, 'epoch': 8.0}
{'loss': 0.0667, 'learning_rate': 1.9764845145447687e-05, 'epoch': 8.99}
{'eval_loss': 0.13900142908096313, 'eval_runtime': 7.4351, 'eval_samples_per_second': 186.008, 'eval_steps_per_second': 23.268, 'epoch': 9.21}
{'loss': 0.0371, 'learning_rate': 1.9689541163440347e-05, 'epoch': 9.99}
{'loss': 0.0254, 'learning_rate': 1.9603979609434666e-05, 'epoch': 10.99}
{'eval_loss': 0.10963056981563568, 'eval_runtime': 7.4455, 'eval_samples_per_second': 185.75, 'eval_steps_per_second': 23.236, 'epoch': 11.51}




{'loss': 0.0192, 'learning_rate': 1.9508251060867252e-05, 'epoch': 11.99}
{'loss': 0.0153, 'learning_rate': 1.9402456858189912e-05, 'epoch': 13.0}
{'eval_loss': 0.0982479527592659, 'eval_runtime': 7.1707, 'eval_samples_per_second': 192.869, 'eval_steps_per_second': 24.126, 'epoch': 13.81}
{'loss': 0.0125, 'learning_rate': 1.9286708997588278e-05, 'epoch': 14.0}
{'loss': 0.0106, 'learning_rate': 1.9161130012420113e-05, 'epoch': 15.0}
{'loss': 0.009, 'learning_rate': 1.902585284349861e-05, 'epoch': 16.0}
{'eval_loss': 0.09210092574357986, 'eval_runtime': 7.134, 'eval_samples_per_second': 193.86, 'eval_steps_per_second': 24.25, 'epoch': 16.12}
{'loss': 0.0079, 'learning_rate': 1.8882739150311568e-05, 'epoch': 16.99}




{'loss': 0.0069, 'learning_rate': 1.8728612507913358e-05, 'epoch': 17.99}
{'eval_loss': 0.08857765048742294, 'eval_runtime': 6.7653, 'eval_samples_per_second': 204.426, 'eval_steps_per_second': 25.572, 'epoch': 18.42}
{'loss': 0.0062, 'learning_rate': 1.8565245554778516e-05, 'epoch': 18.99}
{'loss': 0.0055, 'learning_rate': 1.839281123493563e-05, 'epoch': 19.99}
{'eval_loss': 0.08643537014722824, 'eval_runtime': 7.0259, 'eval_samples_per_second': 196.844, 'eval_steps_per_second': 24.623, 'epoch': 20.72}
{'loss': 0.005, 'learning_rate': 1.821149209133704e-05, 'epoch': 21.0}
{'loss': 0.0045, 'learning_rate': 1.8021480072614653e-05, 'epoch': 22.0}
{'loss': 0.0041, 'learning_rate': 1.7822976329878692e-05, 'epoch': 23.0}
{'eval_loss': 0.08420398086309433, 'eval_runtime': 7.1806, 'eval_samples_per_second': 192.602, 'eval_steps_per_second': 24.093, 'epoch': 23.02}




{'loss': 0.0038, 'learning_rate': 1.761619100377449e-05, 'epoch': 24.0}
{'loss': 0.0035, 'learning_rate': 1.74038574754189e-05, 'epoch': 24.99}
{'eval_loss': 0.08270398527383804, 'eval_runtime': 7.0157, 'eval_samples_per_second': 197.129, 'eval_steps_per_second': 24.659, 'epoch': 25.32}
{'loss': 0.0032, 'learning_rate': 1.718126297763189e-05, 'epoch': 25.99}
{'loss': 0.003, 'learning_rate': 1.695106622904791e-05, 'epoch': 26.99}
{'eval_loss': 0.08160126954317093, 'eval_runtime': 7.1942, 'eval_samples_per_second': 192.24, 'eval_steps_per_second': 24.047, 'epoch': 27.63}
{'loss': 0.0028, 'learning_rate': 1.671351092126004e-05, 'epoch': 27.99}




{'loss': 0.0026, 'learning_rate': 1.6468848535802043e-05, 'epoch': 29.0}
{'eval_loss': 0.08025236427783966, 'eval_runtime': 7.0675, 'eval_samples_per_second': 195.684, 'eval_steps_per_second': 24.478, 'epoch': 29.93}
{'loss': 0.0024, 'learning_rate': 1.62173380779242e-05, 'epoch': 30.0}
{'loss': 0.0023, 'learning_rate': 1.5959245802404365e-05, 'epoch': 31.0}
{'loss': 0.0021, 'learning_rate': 1.569484493168452e-05, 'epoch': 32.0}
{'eval_loss': 0.07962320744991302, 'eval_runtime': 7.6711, 'eval_samples_per_second': 180.287, 'eval_steps_per_second': 22.552, 'epoch': 32.23}
{'loss': 0.002, 'learning_rate': 1.5427556929731312e-05, 'epoch': 32.99}
{'loss': 0.0019, 'learning_rate': 1.515144930847762e-05, 'epoch': 33.99}
{'eval_loss': 0.07887732982635498, 'eval_runtime': 7.525, 'eval_samples_per_second': 183.787, 'eval_steps_per_second': 22.99, 'epoch': 34.53}




{'loss': 0.0018, 'learning_rate': 1.4869888244043674e-05, 'epoch': 34.99}
{'loss': 0.0017, 'learning_rate': 1.4583171803473279e-05, 'epoch': 35.99}
{'eval_loss': 0.07841328531503677, 'eval_runtime': 7.2776, 'eval_samples_per_second': 190.036, 'eval_steps_per_second': 23.772, 'epoch': 36.83}
{'loss': 0.0016, 'learning_rate': 1.4291603511410449e-05, 'epoch': 37.0}
{'loss': 0.0015, 'learning_rate': 1.3995492028781202e-05, 'epoch': 38.0}
{'loss': 0.0014, 'learning_rate': 1.3695150826037998e-05, 'epoch': 39.0}
{'eval_loss': 0.07789459079504013, 'eval_runtime': 7.1169, 'eval_samples_per_second': 194.328, 'eval_steps_per_second': 24.309, 'epoch': 39.14}
{'loss': 0.0014, 'learning_rate': 1.3390897851312667e-05, 'epoch': 40.0}




{'loss': 0.0013, 'learning_rate': 1.3086612784659842e-05, 'epoch': 40.99}
{'eval_loss': 0.07737118750810623, 'eval_runtime': 8.9371, 'eval_samples_per_second': 154.748, 'eval_steps_per_second': 19.358, 'epoch': 41.44}
{'loss': 0.0013, 'learning_rate': 1.2775541983889333e-05, 'epoch': 41.99}
{'loss': 0.0012, 'learning_rate': 1.2461532930289932e-05, 'epoch': 42.99}
{'eval_loss': 0.07664971798658371, 'eval_runtime': 7.0807, 'eval_samples_per_second': 195.319, 'eval_steps_per_second': 24.433, 'epoch': 43.74}
{'loss': 0.0011, 'learning_rate': 1.214491804109596e-05, 'epoch': 43.99}
{'loss': 0.0011, 'learning_rate': 1.1826032492139474e-05, 'epoch': 45.0}
{'loss': 0.0011, 'learning_rate': 1.150521386302537e-05, 'epoch': 46.0}
{'eval_loss': 0.07631289213895798, 'eval_runtime': 6.9861, 'eval_samples_per_second': 197.964, 'eval_steps_per_second': 24.763, 'epoch': 46.04}




{'loss': 0.001, 'learning_rate': 1.118280177976185e-05, 'epoch': 47.0}
{'loss': 0.001, 'learning_rate': 1.0859137555224448e-05, 'epoch': 48.0}
{'eval_loss': 0.07604631781578064, 'eval_runtime': 7.2168, 'eval_samples_per_second': 191.637, 'eval_steps_per_second': 23.972, 'epoch': 48.35}
{'loss': 0.0009, 'learning_rate': 1.0538298434121284e-05, 'epoch': 48.99}
{'loss': 0.0009, 'learning_rate': 1.0213163355112147e-05, 'epoch': 49.99}
{'eval_loss': 0.07584027945995331, 'eval_runtime': 7.3157, 'eval_samples_per_second': 189.045, 'eval_steps_per_second': 23.648, 'epoch': 50.65}
{'loss': 0.0009, 'learning_rate': 9.887802616453543e-06, 'epoch': 50.99}




{'loss': 0.0008, 'learning_rate': 9.562560652535695e-06, 'epoch': 51.99}
{'eval_loss': 0.0755033865571022, 'eval_runtime': 7.198, 'eval_samples_per_second': 192.137, 'eval_steps_per_second': 24.035, 'epoch': 52.95}
{'loss': 0.0008, 'learning_rate': 9.237781772011152e-06, 'epoch': 53.0}
{'loss': 0.0008, 'learning_rate': 8.913809793301682e-06, 'epoch': 54.0}
{'loss': 0.0008, 'learning_rate': 8.590987680624174e-06, 'epoch': 55.0}
{'eval_loss': 0.07520244270563126, 'eval_runtime': 7.4946, 'eval_samples_per_second': 184.533, 'eval_steps_per_second': 23.083, 'epoch': 55.25}
{'loss': 0.0007, 'learning_rate': 8.269657180920773e-06, 'epoch': 56.0}
{'loss': 0.0007, 'learning_rate': 7.953819178985326e-06, 'epoch': 56.99}
{'eval_loss': 0.07489626109600067, 'eval_runtime': 11.2975, 'eval_samples_per_second': 122.417, 'eval_steps_per_second': 15.313, 'epoch': 57.55}




{'loss': 0.0007, 'learning_rate': 7.636463613895024e-06, 'epoch': 57.99}
{'loss': 0.0007, 'learning_rate': 7.321610142994971e-06, 'epoch': 58.99}
{'eval_loss': 0.07462754845619202, 'eval_runtime': 7.1094, 'eval_samples_per_second': 194.532, 'eval_steps_per_second': 24.334, 'epoch': 59.86}
{'loss': 0.0007, 'learning_rate': 7.009592077439135e-06, 'epoch': 59.99}
{'loss': 0.0006, 'learning_rate': 6.700739726755931e-06, 'epoch': 61.0}
{'loss': 0.0006, 'learning_rate': 6.3953800491749095e-06, 'epoch': 62.0}
{'eval_loss': 0.07452213019132614, 'eval_runtime': 7.1468, 'eval_samples_per_second': 193.514, 'eval_steps_per_second': 24.207, 'epoch': 62.16}
{'loss': 0.0006, 'learning_rate': 6.093836305501242e-06, 'epoch': 63.0}




{'loss': 0.0006, 'learning_rate': 5.796427716904347e-06, 'epoch': 64.0}
{'eval_loss': 0.07437185943126678, 'eval_runtime': 8.5204, 'eval_samples_per_second': 162.317, 'eval_steps_per_second': 20.304, 'epoch': 64.46}
{'loss': 0.0006, 'learning_rate': 5.506810013841036e-06, 'epoch': 64.99}
{'loss': 0.0006, 'learning_rate': 5.218555097949634e-06, 'epoch': 65.99}
{'eval_loss': 0.07411817461252213, 'eval_runtime': 7.2644, 'eval_samples_per_second': 190.381, 'eval_steps_per_second': 23.815, 'epoch': 66.76}
{'loss': 0.0005, 'learning_rate': 4.935361930030774e-06, 'epoch': 66.99}
{'loss': 0.0005, 'learning_rate': 4.657530304910679e-06, 'epoch': 67.99}
{'loss': 0.0005, 'learning_rate': 4.385354341562596e-06, 'epoch': 69.0}
{'eval_loss': 0.07404367625713348, 'eval_runtime': 10.3702, 'eval_samples_per_second': 133.363, 'eval_steps_per_second': 16.682, 'epoch': 69.06}




{'loss': 0.0005, 'learning_rate': 4.119122171745608e-06, 'epoch': 70.0}
{'loss': 0.0005, 'learning_rate': 3.859115634981748e-06, 'epoch': 71.0}
{'eval_loss': 0.0738782212138176, 'eval_runtime': 7.2946, 'eval_samples_per_second': 189.593, 'eval_steps_per_second': 23.716, 'epoch': 71.37}
{'loss': 0.0005, 'learning_rate': 3.6056099801941535e-06, 'epoch': 72.0}
{'loss': 0.0005, 'learning_rate': 3.361670177840707e-06, 'epoch': 72.99}
{'eval_loss': 0.07365242391824722, 'eval_runtime': 7.3639, 'eval_samples_per_second': 187.809, 'eval_steps_per_second': 23.493, 'epoch': 73.67}
{'loss': 0.0005, 'learning_rate': 3.121881955840421e-06, 'epoch': 73.99}




{'loss': 0.0005, 'learning_rate': 2.8893750684111977e-06, 'epoch': 74.99}
{'eval_loss': 0.0736328661441803, 'eval_runtime': 7.2368, 'eval_samples_per_second': 191.107, 'eval_steps_per_second': 23.906, 'epoch': 75.97}
{'loss': 0.0005, 'learning_rate': 2.664395652712435e-06, 'epoch': 75.99}
{'loss': 0.0005, 'learning_rate': 2.447181877148165e-06, 'epoch': 77.0}
{'loss': 0.0004, 'learning_rate': 2.237963689236472e-06, 'epoch': 78.0}
{'eval_loss': 0.07348404079675674, 'eval_runtime': 8.975, 'eval_samples_per_second': 154.095, 'eval_steps_per_second': 19.276, 'epoch': 78.27}
{'loss': 0.0004, 'learning_rate': 2.036962572181731e-06, 'epoch': 79.0}
{'loss': 0.0004, 'learning_rate': 1.8443913104073984e-06, 'epoch': 80.0}
{'eval_loss': 0.07343505322933197, 'eval_runtime': 7.3541, 'eval_samples_per_second': 188.059, 'eval_steps_per_second': 23.524, 'epoch': 80.58}




{'loss': 0.0004, 'learning_rate': 1.662518198179528e-06, 'epoch': 80.99}
{'loss': 0.0004, 'learning_rate': 1.487306540771315e-06, 'epoch': 81.99}
{'eval_loss': 0.07323481142520905, 'eval_runtime': 6.8565, 'eval_samples_per_second': 201.705, 'eval_steps_per_second': 25.231, 'epoch': 82.88}
{'loss': 0.0004, 'learning_rate': 1.3211066172094178e-06, 'epoch': 82.99}
{'loss': 0.0004, 'learning_rate': 1.1640943705703256e-06, 'epoch': 83.99}
{'loss': 0.0004, 'learning_rate': 1.0164360176435962e-06, 'epoch': 85.0}
{'eval_loss': 0.07321203500032425, 'eval_runtime': 7.197, 'eval_samples_per_second': 192.165, 'eval_steps_per_second': 24.038, 'epoch': 85.18}
{'loss': 0.0004, 'learning_rate': 8.782878729709399e-07, 'epoch': 86.0}




{'loss': 0.0004, 'learning_rate': 7.49796183368019e-07, 'epoch': 87.0}
{'eval_loss': 0.07315260916948318, 'eval_runtime': 7.5284, 'eval_samples_per_second': 183.703, 'eval_steps_per_second': 22.98, 'epoch': 87.48}
{'loss': 0.0004, 'learning_rate': 6.31096973104206e-07, 'epoch': 88.0}
{'loss': 0.0004, 'learning_rate': 5.235094677507402e-07, 'epoch': 88.99}
{'eval_loss': 0.0731230154633522, 'eval_runtime': 7.3891, 'eval_samples_per_second': 187.169, 'eval_steps_per_second': 23.413, 'epoch': 89.78}
{'loss': 0.0004, 'learning_rate': 4.246457502031631e-07, 'epoch': 89.99}
{'loss': 0.0004, 'learning_rate': 3.359187237506689e-07, 'epoch': 90.99}
{'loss': 0.0004, 'learning_rate': 2.5742231687209016e-07, 'epoch': 91.99}
{'eval_loss': 0.07306259870529175, 'eval_runtime': 6.9056, 'eval_samples_per_second': 200.273, 'eval_steps_per_second': 25.052, 'epoch': 92.09}




{'loss': 0.0004, 'learning_rate': 1.8923962767615545e-07, 'epoch': 93.0}
{'loss': 0.0004, 'learning_rate': 1.3144283593192752e-07, 'epoch': 94.0}
{'eval_loss': 0.0730486586689949, 'eval_runtime': 7.4922, 'eval_samples_per_second': 184.593, 'eval_steps_per_second': 23.091, 'epoch': 94.39}
{'loss': 0.0004, 'learning_rate': 8.40931266576206e-08, 'epoch': 95.0}
{'loss': 0.0004, 'learning_rate': 4.7240625348735636e-08, 'epoch': 96.0}
{'eval_loss': 0.07308094203472137, 'eval_runtime': 7.2017, 'eval_samples_per_second': 192.037, 'eval_steps_per_second': 24.022, 'epoch': 96.69}
{'loss': 0.0004, 'learning_rate': 2.1166858367646092e-08, 'epoch': 96.99}




{'loss': 0.0004, 'learning_rate': 5.293115445467179e-09, 'epoch': 97.99}
{'eval_loss': 0.0730840414762497, 'eval_runtime': 7.0677, 'eval_samples_per_second': 195.679, 'eval_steps_per_second': 24.478, 'epoch': 98.99}
{'loss': 0.0004, 'learning_rate': 0.0, 'epoch': 98.99}
{'train_runtime': 9283.0026, 'train_samples_per_second': 59.873, 'train_steps_per_second': 0.926, 'total_flos': 1.496030941481132e+17, 'train_loss': 0.30288746738893, 'epoch': 98.99}


TrainOutput(global_step=8600, training_loss=0.30288746738893, metrics={'train_runtime': 9283.0026, 'train_samples_per_second': 59.873, 'train_steps_per_second': 0.926, 'total_flos': 1.496030941481132e+17, 'train_loss': 0.30288746738893, 'epoch': 98.99})

In [None]:
#Second Run
trainer.train()


[34m[1mwandb[0m: Currently logged in as: [33madam-mourad1960[0m ([33mmusic_project[0m). Use [1m`wandb login --relogin`[0m to force relogin




Step,Training Loss,Validation Loss
200,6.1997,5.350147
400,2.2255,1.906656
600,0.5494,0.511615
800,0.149,0.217047
1000,0.1011,0.181315


{'loss': 10.6595, 'learning_rate': 2.1000000000000002e-06, 'epoch': 0.97}
{'loss': 9.6048, 'learning_rate': 4.3e-06, 'epoch': 1.98}
{'loss': 9.1835, 'learning_rate': 6.5000000000000004e-06, 'epoch': 2.99}
{'loss': 8.7154, 'learning_rate': 8.700000000000001e-06, 'epoch': 4.0}
{'loss': 8.6013, 'learning_rate': 1.0800000000000002e-05, 'epoch': 4.97}
{'loss': 7.6603, 'learning_rate': 1.3000000000000001e-05, 'epoch': 5.98}
{'loss': 7.0804, 'learning_rate': 1.5200000000000002e-05, 'epoch': 6.99}
{'loss': 6.4889, 'learning_rate': 1.7400000000000003e-05, 'epoch': 8.0}
{'loss': 6.1997, 'learning_rate': 1.95e-05, 'epoch': 8.97}
{'eval_loss': 5.350146770477295, 'eval_runtime': 4.3195, 'eval_samples_per_second': 320.177, 'eval_steps_per_second': 20.141, 'epoch': 9.2}
{'loss': 5.3507, 'learning_rate': 1.9980267284282718e-05, 'epoch': 9.98}
{'loss': 4.8314, 'learning_rate': 1.9896292772724142e-05, 'epoch': 10.99}
{'loss': 4.3184, 'learning_rate': 1.974692387082714e-05, 'epoch': 12.0}
{'loss': 4.0791



{'loss': 1.0956, 'learning_rate': 1.4457383557765385e-05, 'epoch': 22.99}
{'loss': 0.9336, 'learning_rate': 1.3715584763641345e-05, 'epoch': 24.0}
{'loss': 0.8207, 'learning_rate': 1.2984529248893081e-05, 'epoch': 24.97}
{'loss': 0.6598, 'learning_rate': 1.2199463578396688e-05, 'epoch': 25.98}
{'loss': 0.5494, 'learning_rate': 1.1399863921984151e-05, 'epoch': 26.99}
{'eval_loss': 0.5116150975227356, 'eval_runtime': 4.2557, 'eval_samples_per_second': 324.973, 'eval_steps_per_second': 20.443, 'epoch': 27.59}
{'loss': 0.4617, 'learning_rate': 1.0591014008951555e-05, 'epoch': 28.0}
{'loss': 0.4071, 'learning_rate': 9.815210950408703e-06, 'epoch': 28.97}
{'loss': 0.3298, 'learning_rate': 9.00373778573246e-06, 'epoch': 29.98}
{'loss': 0.282, 'learning_rate': 8.198847890328405e-06, 'epoch': 30.99}
{'loss': 0.2425, 'learning_rate': 7.4058599512249345e-06, 'epoch': 32.0}
{'loss': 0.2208, 'learning_rate': 6.664834894950232e-06, 'epoch': 32.97}
{'loss': 0.1854, 'learning_rate': 5.91013623160902e-



{'loss': 0.0961, 'learning_rate': 1.7026900316098217e-07, 'epoch': 45.98}
{'loss': 0.0955, 'learning_rate': 5.3500806496741276e-08, 'epoch': 46.99}
{'loss': 0.0954, 'learning_rate': 2.458762615035193e-09, 'epoch': 48.0}
{'loss': 0.0963, 'learning_rate': 0.0, 'epoch': 48.28}
{'train_runtime': 2595.9894, 'train_samples_per_second': 107.05, 'train_steps_per_second': 0.404, 'total_flos': 8.114630532889133e+16, 'train_loss': 2.508532467399325, 'epoch': 48.28}


TrainOutput(global_step=1050, training_loss=2.508532467399325, metrics={'train_runtime': 2595.9894, 'train_samples_per_second': 107.05, 'train_steps_per_second': 0.404, 'total_flos': 8.114630532889133e+16, 'train_loss': 2.508532467399325, 'epoch': 48.28})

In [None]:
#First RUN
trainer.train()




Step,Training Loss,Validation Loss
500,0.0101,0.178636
1000,0.0032,0.158075
1500,0.0018,0.152992
2000,0.0012,0.151644
2500,0.0009,0.149198
3000,0.0007,0.148097


{'loss': 1.3521, 'learning_rate': 4.9e-05, 'epoch': 0.99}
{'loss': 0.2353, 'learning_rate': 4.7985507246376815e-05, 'epoch': 2.0}
{'loss': 0.0558, 'learning_rate': 4.698550724637682e-05, 'epoch': 2.99}
{'loss': 0.0282, 'learning_rate': 4.597101449275363e-05, 'epoch': 4.0}
{'loss': 0.0184, 'learning_rate': 4.497101449275363e-05, 'epoch': 4.99}
{'loss': 0.013, 'learning_rate': 4.395652173913043e-05, 'epoch': 6.0}
{'loss': 0.0101, 'learning_rate': 4.2956521739130435e-05, 'epoch': 6.99}
{'eval_loss': 0.178636372089386, 'eval_runtime': 5.9065, 'eval_samples_per_second': 188.267, 'eval_steps_per_second': 23.533, 'epoch': 7.19}




{'loss': 0.0079, 'learning_rate': 4.194202898550725e-05, 'epoch': 8.0}
{'loss': 0.0066, 'learning_rate': 4.094202898550725e-05, 'epoch': 8.99}
{'loss': 0.0054, 'learning_rate': 3.9927536231884064e-05, 'epoch': 10.0}
{'loss': 0.0047, 'learning_rate': 3.892753623188406e-05, 'epoch': 10.99}
{'loss': 0.0041, 'learning_rate': 3.7913043478260876e-05, 'epoch': 12.0}
{'loss': 0.0036, 'learning_rate': 3.691304347826087e-05, 'epoch': 12.99}
{'loss': 0.0032, 'learning_rate': 3.589855072463768e-05, 'epoch': 14.0}
{'eval_loss': 0.1580747365951538, 'eval_runtime': 5.7716, 'eval_samples_per_second': 192.668, 'eval_steps_per_second': 24.084, 'epoch': 14.39}




{'loss': 0.0029, 'learning_rate': 3.4898550724637684e-05, 'epoch': 14.99}
{'loss': 0.0026, 'learning_rate': 3.3884057971014493e-05, 'epoch': 16.0}
{'loss': 0.0024, 'learning_rate': 3.288405797101449e-05, 'epoch': 16.99}
{'loss': 0.0022, 'learning_rate': 3.1869565217391306e-05, 'epoch': 18.0}
{'loss': 0.002, 'learning_rate': 3.086956521739131e-05, 'epoch': 18.99}
{'loss': 0.0019, 'learning_rate': 2.9855072463768118e-05, 'epoch': 20.0}
{'loss': 0.0018, 'learning_rate': 2.8855072463768117e-05, 'epoch': 20.99}
{'eval_loss': 0.152992382645607, 'eval_runtime': 5.91, 'eval_samples_per_second': 188.156, 'eval_steps_per_second': 23.519, 'epoch': 21.58}




{'loss': 0.0016, 'learning_rate': 2.7840579710144927e-05, 'epoch': 22.0}
{'loss': 0.0016, 'learning_rate': 2.684057971014493e-05, 'epoch': 22.99}
{'loss': 0.0015, 'learning_rate': 2.582608695652174e-05, 'epoch': 24.0}
{'loss': 0.0014, 'learning_rate': 2.4826086956521742e-05, 'epoch': 24.99}
{'loss': 0.0013, 'learning_rate': 2.381159420289855e-05, 'epoch': 26.0}
{'loss': 0.0012, 'learning_rate': 2.281159420289855e-05, 'epoch': 26.99}
{'loss': 0.0012, 'learning_rate': 2.1797101449275363e-05, 'epoch': 28.0}
{'eval_loss': 0.15164361894130707, 'eval_runtime': 5.586, 'eval_samples_per_second': 199.068, 'eval_steps_per_second': 24.883, 'epoch': 28.78}




{'loss': 0.0011, 'learning_rate': 2.0797101449275363e-05, 'epoch': 28.99}
{'loss': 0.0011, 'learning_rate': 1.9782608695652176e-05, 'epoch': 30.0}
{'loss': 0.0011, 'learning_rate': 1.8782608695652175e-05, 'epoch': 30.99}
{'loss': 0.001, 'learning_rate': 1.7768115942028988e-05, 'epoch': 32.0}
{'loss': 0.001, 'learning_rate': 1.6768115942028987e-05, 'epoch': 32.99}
{'loss': 0.0009, 'learning_rate': 1.5753623188405797e-05, 'epoch': 34.0}
{'loss': 0.0009, 'learning_rate': 1.47536231884058e-05, 'epoch': 34.99}
{'eval_loss': 0.14919769763946533, 'eval_runtime': 5.6117, 'eval_samples_per_second': 198.159, 'eval_steps_per_second': 24.77, 'epoch': 35.97}




{'loss': 0.0009, 'learning_rate': 1.373913043478261e-05, 'epoch': 36.0}
{'loss': 0.0009, 'learning_rate': 1.2739130434782608e-05, 'epoch': 36.99}
{'loss': 0.0008, 'learning_rate': 1.1724637681159421e-05, 'epoch': 38.0}
{'loss': 0.0008, 'learning_rate': 1.072463768115942e-05, 'epoch': 38.99}
{'loss': 0.0008, 'learning_rate': 9.710144927536233e-06, 'epoch': 40.0}
{'loss': 0.0008, 'learning_rate': 8.710144927536231e-06, 'epoch': 40.99}
{'loss': 0.0008, 'learning_rate': 7.695652173913044e-06, 'epoch': 42.0}
{'loss': 0.0007, 'learning_rate': 6.695652173913043e-06, 'epoch': 42.99}
{'eval_loss': 0.14809656143188477, 'eval_runtime': 5.6153, 'eval_samples_per_second': 198.031, 'eval_steps_per_second': 24.754, 'epoch': 43.17}




{'loss': 0.0007, 'learning_rate': 5.681159420289855e-06, 'epoch': 44.0}
{'loss': 0.0007, 'learning_rate': 4.6811594202898555e-06, 'epoch': 44.99}
{'loss': 0.0007, 'learning_rate': 3.666666666666667e-06, 'epoch': 46.0}
{'loss': 0.0007, 'learning_rate': 2.666666666666667e-06, 'epoch': 46.99}
{'loss': 0.0007, 'learning_rate': 1.6521739130434782e-06, 'epoch': 48.0}
{'loss': 0.0007, 'learning_rate': 6.521739130434782e-07, 'epoch': 48.99}
{'loss': 0.0007, 'learning_rate': 0.0, 'epoch': 49.64}
{'train_runtime': 3295.4549, 'train_samples_per_second': 67.457, 'train_steps_per_second': 1.047, 'total_flos': 4.15398998828304e+16, 'train_loss': 0.035935078265442365, 'epoch': 49.64}


TrainOutput(global_step=3450, training_loss=0.035935078265442365, metrics={'train_runtime': 3295.4549, 'train_samples_per_second': 67.457, 'train_steps_per_second': 1.047, 'total_flos': 4.15398998828304e+16, 'train_loss': 0.035935078265442365, 'epoch': 49.64})

In [None]:
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Not connected to a GPU')
else:
  print(gpu_info)

Thu Jun 15 13:12:18 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12    Driver Version: 525.85.12    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla V100-SXM2...  Off  | 00000000:00:04.0 Off |                    0 |
| N/A   33C    P0    23W / 300W |      0MiB / 16384MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [None]:
# Fourth RUN
wandb.finish()


0,1
train/epoch,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
train/global_step,▁▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇████
train/learning_rate,▆▆██████▇▇▇▇▇▇▆▆▆▆▅▅▅▄▄▄▄▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁
train/loss,█▆▄▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/total_flos,▁
train/train_loss,▁
train/train_runtime,▁
train/train_samples_per_second,▁
train/train_steps_per_second,▁

0,1
train/epoch,49.88
train/global_step,7850.0
train/learning_rate,0.0
train/loss,0.001
train/total_flos,9.048081898115194e+16
train/train_loss,0.2328
train/train_runtime,7376.2989
train/train_samples_per_second,68.259
train/train_steps_per_second,1.064


In [None]:
# Third RUN
wandb.finish()


0,1
eval/loss,█▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
eval/runtime,▁▂▂▂▂▂▂▁▁▂▁▂▁▂▂▂▄▁▁▂▂▂▂█▂▂▂▇▂▂▂▄▂▁▂▂▂▁▂▁
eval/samples_per_second,▇▆▇▆▆▇▇█▇▇▇▇▇▆▇▇▄▇▇▇▇▇▆▁▇▇▇▂▇▇▇▄▇█▇▆▇█▆▇
eval/steps_per_second,▇▆▇▆▆▇▇█▇▇▇▇▇▆▇▇▄▇▇▇▇▇▆▁▇▇▇▂▇▇▇▄▇█▇▆▇█▆▇
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/learning_rate,▄███████▇▇▇▇▇▆▆▆▆▅▅▅▄▄▄▄▃▃▃▃▂▂▂▂▂▁▁▁▁▁▁▁
train/loss,█▅▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.07308
eval/runtime,7.0677
eval/samples_per_second,195.679
eval/steps_per_second,24.478
train/epoch,98.99
train/global_step,8600.0
train/learning_rate,0.0
train/loss,0.0004
train/total_flos,1.496030941481132e+17
train/train_loss,0.30289


In [None]:
# SECOND RUN
wandb.finish()



0,1
eval/loss,█▃▁▁▁
eval/runtime,█▆▅▃▁
eval/samples_per_second,▁▃▄▆█
eval/steps_per_second,▁▃▄▆█
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇█████
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▆▇▇▇▇█████
train/learning_rate,▂▃▃▄▅▆▇██████▇▇▇▇▇▆▆▆▅▅▄▄▄▃▃▃▃▂▂▂▂▁▁▁▁▁▁
train/loss,█▇▇▇▇▆▅▅▄▄▄▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.18131
eval/runtime,4.1751
eval/samples_per_second,331.249
eval/steps_per_second,20.838
train/epoch,48.28
train/global_step,1050.0
train/learning_rate,0.0
train/loss,0.0963
train/total_flos,8.114630532889133e+16
train/train_loss,2.50853


In [None]:
# FIRST RUN
wandb.finish()



0,1
eval/loss,█▃▂▂▁▁
eval/runtime,█▅█▁▂▂
eval/samples_per_second,▁▄▁█▇▇
eval/steps_per_second,▁▄▁█▇▇
train/epoch,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/global_step,▁▁▁▂▂▂▂▂▂▃▃▃▃▃▄▄▄▄▄▄▅▅▅▅▅▅▆▆▆▆▆▇▇▇▇▇▇███
train/learning_rate,████▇▇▇▇▇▆▆▆▆▆▆▅▅▅▅▅▄▄▄▄▄▄▃▃▃▃▃▃▂▂▂▂▂▁▁▁
train/loss,█▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
train/total_flos,▁
train/train_loss,▁

0,1
eval/loss,0.1481
eval/runtime,5.6153
eval/samples_per_second,198.031
eval/steps_per_second,24.754
train/epoch,49.64
train/global_step,3450.0
train/learning_rate,0.0
train/loss,0.0007
train/total_flos,4.15398998828304e+16
train/train_loss,0.03594
