## Project Overview


This project aims to fine-tune the AraGPT2-large model on classical Islamic texts, specifically focusing on the book Al-Tadmuriyah by Shaykh al-Islam Ibn Taymiyyah. The goal is to have the model become highly specialized in predicting and generating text within this specific book, even if it leads to overfitting. Overfitting is acceptable and even encouraged for this task, as the primary objective is for the model to accurately represent and generate the unique language style and content of *Al-Tadmuriyah*.


To achieve this, I have followed these steps:

1. Data Collection: The text of Al-Tadmuriyah was scraped and processed for training. The code used for this scraping can be found on my [GitHub repository](#), and the processed dataset is available on Hugging Face datasets for public access.
   
2. Model Selection: I used the AraGPT2-large model from Hugging Face, a powerful language model for Arabic. The model was quantized to 4-bit precision to reduce computational load while still retaining its performance potential.

3. Training and Fine-Tuning: I trained the model for 50 epochs, initially observing poor performance. To address this, I performed a grid search over several hyperparameters to identify the best configuration for this task.

4. Model Evaluation and Logs: The results from different configurations are logged and visualized using TensorBoard. All models generated from the grid search are available on my [Hugging Face repository](#).

### What to Expect

In this notebook, you will find:
- Details on the data preparation process, including how the dataset was scraped and preprocessed.
- The fine-tuning steps applied to AraGPT2-large on Al-Tadmuriyah.
- A discussion of the initial training results, followed by the grid search for optimal hyperparameters.
- Insights into the performance of the model across different configurations, along with links to the trained models.
- Logs and visualizations of the training process using TensorBoard.


In [None]:
! pip install -U transformers accelerate BitsAndBytes datasets
! pip install huggingface-hub arabert peft

Collecting accelerate
  Downloading accelerate-0.34.2-py3-none-any.whl.metadata (19 kB)
Collecting BitsAndBytes
  Downloading bitsandbytes-0.43.3-py3-none-manylinux_2_24_x86_64.whl.metadata (3.5 kB)
Collecting datasets
  Downloading datasets-2.21.0-py3-none-any.whl.metadata (21 kB)
Collecting pyarrow>=15.0.0 (from datasets)
  Downloading pyarrow-17.0.0-cp310-cp310-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Downloading accelerate-0.34.2-py3-none-any.whl (324 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m324.4/324.4 kB[0m [31m16.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading bitsandbytes-0.43.3-py3-none-manylinux_2

In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
from datasets import load_dataset
import torch
from arabert.preprocess import ArabertPreprocessor

In [None]:
# Signing in so we can upload the model to the huggingface hub ..
from huggingface_hub import notebook_login

notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

### Explanation of the Code

1. **`BitsAndBytesConfig`**:
   This configuration is used to enable **4-bit quantization** to make the model more memory-efficient and speed up training. Key parameters include:
   - **`load_in_4bit=True`**: Loads the model with 4-bit precision to reduce memory usage.
   - **`bnb_4bit_use_double_quant=True`**: Enables double quantization for better performance and compression.
   - **`bnb_4bit_quant_type='nf4'`**: Specifies the quantization type as "NF4" (a type of quantization that provides better precision than traditional quantization methods).
   - **`bnb_4bit_compute_dtype=torch.float16`**: Specifies the data type used for computations, which is set to 16-bit floating point (float16) to further reduce memory usage and speed up computations.

2. **`check_point = 'aubmindlab/aragpt2-large'`**:
   The checkpoint for the `AraGPT2-large` model from `aubmindlab`, which is a large-scale Arabic language model.

3. **`arabert_prep = ArabertPreprocessor(model_name=check_point)`**:
   This initializes an **`ArabertPreprocessor`** for Arabic text preprocessing using the `AraGPT2-large` model. It prepares the text by applying tokenization and other necessary preprocessing steps specific to the Arabic language.


In [None]:
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type='nf4',
    bnb_4bit_compute_dtype=torch.float16)

check_point = 'aubmindlab/aragpt2-large'
arabert_prep = ArabertPreprocessor(model_name=check_point)

In [None]:
model = AutoModelForCausalLM.from_pretrained(check_point, quantization_config=bnb_config,
                                             trust_remote_code=True)
model.config.use_cache = False
tokenizer = AutoTokenizer.from_pretrained(check_point, trust_remote_code=True)

loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--aubmindlab--aragpt2-large/snapshots/b870598f32c15f993567a09c977cc0e5431d28f0/config.json
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--aubmindlab--aragpt2-large/snapshots/b870598f32c15f993567a09c977cc0e5431d28f0/config.json
Model config AraGPT2Config {
  "_name_or_path": "aubmindlab/aragpt2-large",
  "activation_function": "gelu_new",
  "architectures": [
    "AraGPT2LMHeadModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "attn_pdrop": 0.1,
  "auto_map": {
    "AutoConfig": "aubmindlab/aragpt2-large--configuration_aragpt2.AraGPT2Config",
    "AutoModel": "aubmindlab/aragpt2-large--modeling_aragpt2.AraGPT2Model",
    "AutoModelForCausalLM": "aubmindlab/aragpt2-large--modeling_aragpt2.AraGPT2LMHeadModel"
  },
  "bos_token_id": 0,
  "embd_pdrop": 0.1,
  "eos_token_id": 0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "initializer_range": 0.0141

## Fixing the Tokenizer

### Issue:

The tokenizer does not have a `pad_token`, and the `pad_token_id` is set to the same value as the `eos_token_id`.

These two issues could cause strange behavior when fine-tuning the model. To resolve this, we need to manually set a unique `pad_token`.

### Steps Taken:

1. **Manually set the `pad_token`** to ensure that the tokenizer has a unique padding token.
2. **Set a unique `pad_token_id`** to avoid conflicts with the `eos_token_id`.

### Encountered Error:

After adjusting the tokenizer and attempting to load the model for evaluation, I encountered a **shape mismatch error**:

```
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.transformer.wte.weight: copying a param with shape torch.Size([64001, 1280]) from checkpoint, the shape in current model is torch.Size([64000, 1280]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([64001, 1280]) from checkpoint, the shape in current model is torch.Size([64000, 1280]).
```

### Solution:

To resolve this issue, I realized that I could simply delete the `pad_token` before uploading the model to the Hugging Face Hub. This ensures the tokenizer and model remain compatible without causing further errors.

---



In [None]:
print(f'tokenizer len befour padding token: {len(tokenizer)}')
tokenizer.add_special_tokens({'pad_token':'[PAD]'}) # adding the padding token
print(f'len after adding the new token: {len(tokenizer)}')

tokenizer len befour padding token: 64000
len after adding the new token: 64001


In [None]:
# changing the toekenizer len in the model config to make sure it has been modeified and got the padding token ..
model.resize_token_embeddings(len(tokenizer))
model.config.pad_token_id = tokenizer.pad_token_id # changin the pad token id in the model.config

You are resizing the embedding layer without providing a `pad_to_multiple_of` parameter. This means that the new embedding dimension will be 64001. This might induce some performance reduction as *Tensor Cores* will not be available. For more details about this, or help on choosing the correct value for resizing, refer to this guide: https://docs.nvidia.com/deeplearning/performance/dl-performance-matrix-multiplication/index.html#requirements-tc


In [None]:
# Verfy the changes:
print(f'padding token ID in the tokenizer: {tokenizer.pad_token_id}')
print(f'padding token ID in the model config {model.config.pad_token_id}')
print(f'the padding token from the tokenizer: {tokenizer.pad_token}')
print(f'eos token ID in the model: {model.config.eos_token_id}, eos token ID in the tokenizer {tokenizer.eos_token_id}')
print(f'the tokenizer len:{len(tokenizer)}, the model input ebedding layer:{model.get_input_embeddings()}')
# tokenizer.pad_token = tokenizer.eos_token
# tokenizer.pad_token, tokenizer.eos_token

padding token ID in the tokenizer: 64000
padding token ID in the model config 64000
the padding token from the tokenizer: [PAD]
eos token ID in the model: 0, eos token ID in the tokenizer 0
the tokenizer len:64001, the model input ebedding layer:Embedding(64001, 1280)


### LoRA

In this code, we are preparing a model for k-bit training using Low-Rank Adaptation (LoRA), which helps reduce the number of trainable parameters. This technique is particularly useful for large models like GPT-2, allowing fine-tuning with fewer resources.

1. **`prepare_model_for_kbit_training(model)`**: Prepares the model for k-bit quantization, reducing its memory footprint and speeding up training.
   
2. **`LoraConfig`**: Defines the configuration for LoRA:
   - **`task_type=TaskType.CAUSAL_LM`**: Specifies that the task is causal language modeling, appropriate for GPT-2.
   - **`inference_mode=False`**: Indicates that the model will be trained, not just used for inference.
   - **`r=8`**: The rank of the LoRA matrices, controlling the extent of parameter reduction.
   - **`lora_alpha=32`**: A scaling factor that controls the impact of LoRA parameters.
   - **`lora_dropout=0.05`**: Dropout probability to prevent overfitting.
   - **`target_modules`**: Specifies which layers to apply LoRA to (attention and MLP layers).

3. **`get_peft_model(model, lora_config)`**: Wraps the original model with the LoRA configuration, enabling parameter-efficient fine-tuning.


In [None]:
model = prepare_model_for_kbit_training(model)
lora_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,  # Since GPT-2 is a causal language model
    inference_mode=False,          # Set to True if only doing inference
    r=8,                           # Rank of the LoRA matrices
    lora_alpha=32,                 # Scaling factor
    lora_dropout=0.05,              # Dropout probability
    target_modules=[
        "attn.c_attn",  # Self-attention projection (q, k, v)
        "attn.c_proj",  # Self-attention output projection
        "mlp.c_fc",     # MLP intermediate projection
        "mlp.c_proj"    # MLP output projection
    ]
)
model = get_peft_model(model, lora_config)

Loading the data then tokenize it

In [None]:
def tokenizer_function(exampls):
  cleaned_text = [arabert_prep.preprocess(text) for text in exampls['combined']]
  return tokenizer(cleaned_text, padding=True, truncation=True, max_length=1024)
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

In [None]:
from datasets import load_dataset
raw_dataset = load_dataset('ahmadAlrabghi/al_tadmoreyyah')
print(raw_dataset)
train_dataset = raw_dataset.select_columns(['combined'])
train_dataset = train_dataset.map(tokenizer_function, batched=True)
print(train_dataset)

Using the latest cached version of the dataset since ahmadAlrabghi/al_tadmoreyyah couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'default' at /root/.cache/huggingface/datasets/ahmadAlrabghi___al_tadmoreyyah/default/0.0.0/7a295c83c22d7ce96678af1b35bfea88baccde0c (last modified on Fri Sep  6 15:31:53 2024).


DatasetDict({
    train: Dataset({
        features: ['Unnamed: 0', 'page', 'title', 'text', 'cleaned_text', 'len_cleand', 'combined', 'len_combined'],
        num_rows: 81
    })
})


Map:   0%|          | 0/81 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['combined', 'input_ids', 'attention_mask'],
        num_rows: 81
    })
})


In [None]:
training_args = TrainingArguments(learning_rate=2e-6, per_device_train_batch_size=16,
                                  num_train_epochs=1, output_dir='ahmadAlrabghi/al_tadmoreyyah_model',
                                  warmup_steps=round(6 * 0.2), # about 20% of the steps for the first epoch
                                  optim='paged_adamw_8bit', # using the 8 bit optimizer for computational efficiency
                                  per_device_eval_batch_size=4,
                                  weight_decay=0.01, push_to_hub=False, # dont push the model until we remove the pad token !
                                  hub_model_id='ahmadAlrabghi/al_tadmoreyyah_model',
                                  report_to='all', log_level='info',
                                  evaluation_strategy='no', fp16=True,
                                  logging_strategy='epoch', save_strategy='epoch')

PyTorch: setting up devices


In [None]:
trainer = Trainer(model=model, tokenizer=tokenizer, data_collator=data_collator,
                  args=training_args, train_dataset=train_dataset['train'])

  self.scaler = torch.cuda.amp.GradScaler(**kwargs)
Using auto half precision backend


In [None]:
%%time

trainer.train()

The following columns in the training set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: combined. If combined are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
***** Running training *****
  Num examples = 81
  Num Epochs = 1
  Instantaneous batch size per device = 16
  Total train batch size (w. parallel, distributed & accumulation) = 16
  Gradient Accumulation steps = 1
  Total optimization steps = 6
  Number of trainable parameters = 5,898,240
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
6,3.689


Saving model checkpoint to ahmadAlrabghi/al_tadmoreyyah_model/checkpoint-6
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--aubmindlab--aragpt2-large/snapshots/b870598f32c15f993567a09c977cc0e5431d28f0/config.json
Model config AraGPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "AraGPT2LMHeadModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "attn_pdrop": 0.1,
  "auto_map": {
    "AutoConfig": "aubmindlab/aragpt2-large--configuration_aragpt2.AraGPT2Config",
    "AutoModel": "aubmindlab/aragpt2-large--modeling_aragpt2.AraGPT2Model",
    "AutoModelForCausalLM": "aubmindlab/aragpt2-large--modeling_aragpt2.AraGPT2LMHeadModel"
  },
  "bos_token_id": 0,
  "embd_pdrop": 0.1,
  "eos_token_id": 0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "initializer_range": 0.014142135623731,
  "intermediate_size": 5120,
  "layer_norm_epsilon": 1e-05,
  "model_type": "aragpt2",
  "n_ctx": 1024,
  "n_embd": 1280,
  "n_head": 20,


CPU times: user 58.8 s, sys: 3.04 s, total: 1min 1s
Wall time: 1min 41s


TrainOutput(global_step=6, training_loss=3.6890131632486978, metrics={'train_runtime': 101.33, 'train_samples_per_second': 0.799, 'train_steps_per_second': 0.059, 'total_flos': 219395184353280.0, 'train_loss': 3.6890131632486978, 'epoch': 1.0})

## Removing the Padding Token

In this section of the code, we are removing the **`pad_token`** that was previously added to the tokenizer. This is necessary to avoid any size mismatch errors when loading the trained model due to the padding token being present in the tokenizer but not required during inference.

### Steps Explained:

1. **Get the current vocabulary**:
   - We first retrieve the tokenizer's current vocabulary using the `get_vocab()` method.

2. **Check if the `pad_token` exists**:
   - If the `pad_token` (`'[PAD]'`) exists in the vocabulary, we remove it using the `pop()` method.

3. **Rebuild the vocabulary**:
   - After removing the `pad_token`, we rebuild the vocabulary (`new_vocab`) by creating a list of the remaining tokens.

4. **Reinitialize the tokenizer without the `pad_token`**:
   - We then recreate the tokenizer using the updated vocabulary that no longer includes the `pad_token` by calling `AutoTokenizer.from_pretrained`.

5. **Update the model configuration**:
   - The model's configuration is updated by setting `pad_token_id` to `None`, ensuring that the model is no longer dependent on the padding token.
   - We also adjust the model’s token embeddings to match the new vocabulary size using `resize_token_embeddings`.



This process ensures that both the model and the tokenizer are in sync after removing the padding token, preventing any potential size mismatch errors when the model is used.

In [None]:
vocab = tokenizer.get_vocab()

if '[PAD]' in vocab:
    vocab.pop('[PAD]')

new_vocab = list(vocab.keys())

tokenizer = AutoTokenizer.from_pretrained(
    tokenizer.name_or_path,
    vocab=new_vocab, trust_remote_code=True
)

model.config.pad_token_id = None
model.resize_token_embeddings(len(tokenizer))
# verify the changes
# print(f'padding token in model: {model.config.pad_token_id}, padding token in the tokenizer: {tokenizer.pad_token}')

Could not locate the tokenizer configuration file, will try to use the model config instead.
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--aubmindlab--aragpt2-large/snapshots/b870598f32c15f993567a09c977cc0e5431d28f0/config.json
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--aubmindlab--aragpt2-large/snapshots/b870598f32c15f993567a09c977cc0e5431d28f0/config.json
Model config AraGPT2Config {
  "_name_or_path": "aubmindlab/aragpt2-large",
  "activation_function": "gelu_new",
  "architectures": [
    "AraGPT2LMHeadModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "attn_pdrop": 0.1,
  "auto_map": {
    "AutoConfig": "aubmindlab/aragpt2-large--configuration_aragpt2.AraGPT2Config",
    "AutoModel": "aubmindlab/aragpt2-large--modeling_aragpt2.AraGPT2Model",
    "AutoModelForCausalLM": "aubmindlab/aragpt2-large--modeling_aragpt2.AraGPT2LMHeadModel"
  },
  "bos_token_id": 0,
  "embd_pdrop": 0.1,
  "eos_toke

Embedding(64000, 1280)

In [None]:
# verify the changes
print(f'padding token ID in the tokenizer: {tokenizer.pad_token_id}')
print(f'padding token ID in the model config {model.config.pad_token_id}')
print(f'padding token in model: {model.config.pad_token_id}')
print(f'the padding token from the tokenizer: {tokenizer.pad_token}')
print(f'eos token ID in the model: {model.config.eos_token_id}, eos token ID in the tokenizer {tokenizer.eos_token_id}')
print(f'the tokenizer len:{len(tokenizer)}, the model input ebedding layer:{model.get_input_embeddings()}')


padding token ID in the tokenizer: None
padding token ID in the model config None
padding token in model: None
the padding token from the tokenizer: None
eos token ID in the model: 0, eos token ID in the tokenizer 0
the tokenizer len:64000, the model input ebedding layer:Embedding(64000, 1280)


In [None]:
trainer.push_to_hub()

Saving model checkpoint to ahmadAlrabghi/al_tadmoreyyah_model
loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--aubmindlab--aragpt2-large/snapshots/b870598f32c15f993567a09c977cc0e5431d28f0/config.json
Model config AraGPT2Config {
  "activation_function": "gelu_new",
  "architectures": [
    "AraGPT2LMHeadModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "attn_pdrop": 0.1,
  "auto_map": {
    "AutoConfig": "aubmindlab/aragpt2-large--configuration_aragpt2.AraGPT2Config",
    "AutoModel": "aubmindlab/aragpt2-large--modeling_aragpt2.AraGPT2Model",
    "AutoModelForCausalLM": "aubmindlab/aragpt2-large--modeling_aragpt2.AraGPT2LMHeadModel"
  },
  "bos_token_id": 0,
  "embd_pdrop": 0.1,
  "eos_token_id": 0,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "initializer_range": 0.014142135623731,
  "intermediate_size": 5120,
  "layer_norm_epsilon": 1e-05,
  "model_type": "aragpt2",
  "n_ctx": 1024,
  "n_embd": 1280,
  "n_head": 20,
  "n_inner": 

events.out.tfevents.1725645141.2749de5fb9ec.20791.13:   0%|          | 0.00/7.68k [00:00<?, ?B/s]

events.out.tfevents.1725645650.2749de5fb9ec.20791.14:   0%|          | 0.00/6.86k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/23.6M [00:00<?, ?B/s]

Upload 5 LFS files:   0%|          | 0/5 [00:00<?, ?it/s]

events.out.tfevents.1725646130.2749de5fb9ec.20791.15:   0%|          | 0.00/6.86k [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/ahmadAlrabghi/al_tadmoreyyah_model/commit/bceab286f3a59202aad86272ff2664c3b72c2195', commit_message='End of training', commit_description='', oid='bceab286f3a59202aad86272ff2664c3b72c2195', pr_url=None, pr_revision=None, pr_num=None)

### Model Performance Evaluation

The performance of this baseline model is actually **very poor**.  
You can try running the model to observe its performance.

To improve the model, we need to perform a **grid search** to identify the best hyperparameters that will optimize the model's performance for our specific data.


# Grid Search Step:

In [10]:
from huggingface_hub import notebook_login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [11]:
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
from transformers import TrainingArguments, Trainer, DataCollatorForLanguageModeling
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training, TaskType
from datasets import load_dataset
import torch
from arabert.preprocess import ArabertPreprocessor
import itertools
import os
from huggingface_hub import Repository

In [13]:
# remove unwanted files to free up some space
! rm -r sample_data

rm: cannot remove 'sample_data': No such file or directory
rm: cannot remove 'ahmadAlrabghi': No such file or directory


In [None]:
# Mount Google Drive to save logging files during training
from google.colab import drive
drive.mount('/content/drive')
# or you could just download it very simply

from google.colab import files
# files.download(path)

Mounted at /content/drive


In [14]:
# debug cuda errors if needed:
os.environ["CUDA_LAUNCH_BLOCKING"] = '1'

In [15]:
%%time
# the only difference in the code here is in the gridsearch part and all of the other parts are the same.

# Hyperparameter grids, # I have use a small grid due to coputational rss limits:
learning_rates = [5e-5, 2e-4]
batch_sizes = [8, 4, 2]
weight_decays = [0.01, 0.0]
grad_accumulation_steps = [2]

# LoRA-specific hyperparameters
lora_ranks = [32, 64]
lora_alphas = [64, 128]
lora_dropouts = [0.0]



# Function to train the model with a given set of hyperparameters
def train_model(lr, batch_size, weight_decay, grad_accum_steps, lora_r, lora_alpha, lora_dropout, check_point, dataset_path):
    print('start trianing function')

    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type='nf4',
        bnb_4bit_compute_dtype=torch.float16
    )

    model = AutoModelForCausalLM.from_pretrained(check_point, trust_remote_code=True, quantization_config=bnb_config)
    tokenizer = AutoTokenizer.from_pretrained(check_point, trust_remote_code=True)
    arabert_prep = ArabertPreprocessor(model_name=check_point)



    # adding the padding token:
    tokenizer.add_special_tokens({'pad_token':'[PAD]'})
    model.resize_token_embeddings(len(tokenizer))
    model.config.pad_token_id = tokenizer.pad_token_id

    # peft:
    model = prepare_model_for_kbit_training(model)
    lora_config = LoraConfig(
        task_type=TaskType.CAUSAL_LM,
        inference_mode=False,
        r=lora_r,
        lora_alpha=lora_alpha,
        lora_dropout=lora_dropout,
        target_modules=[
            "attn.c_attn",
            "attn.c_proj",
            "mlp.c_fc",
            "mlp.c_proj"
        ]
    )
    model = get_peft_model(model, lora_config)



    # Data
    def tokenizer_function(exampls):
        cleaned_text = [arabert_prep.preprocess(text) for text in exampls['combined']]
        return tokenizer(cleaned_text, padding=True, truncation=True, max_length=1024)
    data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False)

    from datasets import load_dataset
    raw_dataset = load_dataset(dataset_path)
    train_dataset = raw_dataset.select_columns(['combined'])
    train_dataset = train_dataset.map(tokenizer_function, batched=True)
    output_dir = f"./models/al_tadmoreyyah_lr{lr}_bs{batch_size}_wa{weight_decay}_ga{grad_accum_steps}_r{lora_r}_alpha{lora_alpha}_dropout{lora_dropout}"

    # saving the logging data to specific file to use it later in tensorboard:
    logging_dir = f"./logs/al_tadmoreyyah_lr{lr}_bs{batch_size}_wa{weight_decay}_ga{grad_accum_steps}_r{lora_r}_alpha{lora_alpha}_dropout{lora_dropout}"

    # getting the right number to use in in warmup hyperparameter in training arguments:
    datasize = 81
    warm_up  = 81 / batch_size



    training_args = TrainingArguments(
        output_dir='ahmadAlrabghi/al_tadmoreyyah_model',
        per_device_train_batch_size=batch_size,
        num_train_epochs=15,
        weight_decay=weight_decay,
        learning_rate=lr,
        gradient_accumulation_steps=grad_accum_steps,
        logging_dir=logging_dir,
        logging_steps=5,
        logging_strategy='steps',
        save_steps=10,
        warmup_steps=round(warm_up),
        optim='paged_adamw_8bit',
        push_to_hub=False,
        hub_model_id='ahmadAlrabghi/al_tadmoreyyah_model',
        report_to='tensorboard',
        # log_level='info',
        evaluation_strategy='no'
    )

    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset['train'],
        data_collator=data_collator,
        tokenizer=tokenizer
    )

    print('training_started...')
    trainer.train()

    # removing the pad token to save the model
    vocab = tokenizer.get_vocab()
    if '[PAD]' in vocab:
        vocab.pop('[PAD]')

    new_vocab = list(vocab.keys())

    tokenizer = AutoTokenizer.from_pretrained(
        tokenizer.name_or_path,
        vocab=new_vocab, trust_remote_code=True
    )

    model.config.pad_token_id = None
    model.resize_token_embeddings(len(tokenizer))

    trainer.push_to_hub(
        model_name='ahmadAlrabghi/al_tadmoreyyah_model',
        # adding tags to differentiation between the models
        tags=
         [
             f'lr:{lr}', f'epochs:{20}', f'lora-dropout:{lora_dropout}', f'train-batch:{batch_size}',
             f'optim: 8bit-adam', f'weight-decay:{weight_decay}', f'gradient_accumulation_steps:{grad_accum_steps}',
             f'lora-r:{lora_r}', f'lora-alpha:{lora_alpha}'
             ]
        )





    print('a model has been pushed to the hub!')
    print('spray and pray worked this time.. :)')


# Perform grid search over all combinations of hyperparameters
def grid_search():
    i = 1
    # Use itertools.product to generate all combinations
    for lr, batch_size, weight_decay, grad_steps, lora_r, lora_alpha, lora_dropout in itertools.product(
        learning_rates, batch_sizes, weight_decays, grad_accumulation_steps, lora_ranks, lora_alphas, lora_dropouts
    ):
        print(f'model number: {i}')
        print(f"Training with lr={lr}, batch_size={batch_size}, weight_decay={weight_decay}, grad_steps={grad_steps}, lora_r={lora_r}, lora_alpha={lora_alpha}, lora_dropout={lora_dropout}")
        train_model(lr, batch_size, weight_decay, grad_steps, lora_r, lora_alpha, lora_dropout, check_point='aubmindlab/aragpt2-large', dataset_path='ahmadAlrabghi/al_tadmoreyyah')
        i+=1


CPU times: user 7 µs, sys: 0 ns, total: 7 µs
Wall time: 11.9 µs


In [16]:
%%time
# run the grid search
# Warning: This grid search is going to take a long time to run! :)
grid_search()

model number: 1
Training with lr=5e-05, batch_size=8, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6522
10,3.5959
15,3.5914
20,3.6053
25,3.3512
30,3.5505
35,3.2892
40,3.5078
45,3.2684
50,3.3495


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 2
Training with lr=5e-05, batch_size=8, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.649
10,3.5646
15,3.5306
20,3.5321
25,3.2651
30,3.4501
35,3.1766
40,3.3817
45,3.1334
50,3.2048


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 3
Training with lr=5e-05, batch_size=8, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6522
10,3.5957
15,3.5915
20,3.6058
25,3.3521
30,3.552
35,3.2921
40,3.5102
45,3.2723
50,3.3551


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 4
Training with lr=5e-05, batch_size=8, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6489
10,3.5642
15,3.5305
20,3.5329
25,3.2662
30,3.4527
35,3.1794
40,3.3841
45,3.1372
50,3.2079


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 5
Training with lr=5e-05, batch_size=8, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6522
10,3.5959
15,3.5913
20,3.605
25,3.3507
30,3.5497
35,3.2887
40,3.5076
45,3.2675
50,3.3489


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 6
Training with lr=5e-05, batch_size=8, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.649
10,3.5646
15,3.5306
20,3.5321
25,3.2651
30,3.4501
35,3.1766
40,3.3817
45,3.1334
50,3.2048


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 7
Training with lr=5e-05, batch_size=8, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6522
10,3.5957
15,3.5915
20,3.6058
25,3.3521
30,3.552
35,3.2921
40,3.5102
45,3.2723
50,3.3551


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 8
Training with lr=5e-05, batch_size=8, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6489
10,3.5642
15,3.5305
20,3.5329
25,3.2662
30,3.4527
35,3.1794
40,3.3841
45,3.1372
50,3.2079


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 9
Training with lr=5e-05, batch_size=4, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5156
10,3.7508
15,3.5718
20,3.5625
25,3.6345
30,3.3369
35,3.5715
40,3.443
45,3.1696
50,3.4558


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 10
Training with lr=5e-05, batch_size=4, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5142
10,3.7377
15,3.5335
20,3.5109
25,3.5711
30,3.2696
35,3.4935
40,3.3534
45,3.0719
50,3.347


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 11
Training with lr=5e-05, batch_size=4, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5155
10,3.7504
15,3.5711
20,3.5619
25,3.6345
30,3.3368
35,3.5716
40,3.4433
45,3.1687
50,3.4547


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 12
Training with lr=5e-05, batch_size=4, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5142
10,3.7373
15,3.5329
20,3.5104
25,3.5707
30,3.2691
35,3.4919
40,3.3527
45,3.0709
50,3.3461


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 13
Training with lr=5e-05, batch_size=4, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5156
10,3.7506
15,3.5716
20,3.5621
25,3.6342
30,3.3362
35,3.5712
40,3.4427
45,3.1698
50,3.4557


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 14
Training with lr=5e-05, batch_size=4, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5142
10,3.7377
15,3.5335
20,3.5109
25,3.5711
30,3.2695
35,3.4935
40,3.3534
45,3.0719
50,3.347


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 15
Training with lr=5e-05, batch_size=4, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5155
10,3.7504
15,3.5711
20,3.5619
25,3.6345
30,3.3368
35,3.5716
40,3.4433
45,3.1687
50,3.4547


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 16
Training with lr=5e-05, batch_size=4, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5142
10,3.7373
15,3.5329
20,3.5104
25,3.5707
30,3.2691
35,3.4919
40,3.3527
45,3.0709
50,3.3461


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 17
Training with lr=5e-05, batch_size=2, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.7096
10,3.3342
15,3.5097
20,3.9709
25,3.611
30,3.4542
35,3.4465
40,3.6549
45,3.6235
50,3.4322


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 18
Training with lr=5e-05, batch_size=2, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.7091
10,3.3287
15,3.494
20,3.9451
25,3.5671
30,3.4006
35,3.3978
40,3.6095
45,3.5581
50,3.3646


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 19
Training with lr=5e-05, batch_size=2, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.7096
10,3.3341
15,3.5094
20,3.9705
25,3.6107
30,3.4535
35,3.4464
40,3.6549
45,3.6232
50,3.4313


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 20
Training with lr=5e-05, batch_size=2, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.7091
10,3.3286
15,3.4936
20,3.9447
25,3.5668
30,3.4
35,3.3978
40,3.6093
45,3.5569
50,3.364


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 21
Training with lr=5e-05, batch_size=2, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.7096
10,3.3342
15,3.5097
20,3.9709
25,3.611
30,3.4542
35,3.4465
40,3.6549
45,3.6235
50,3.4322


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 22
Training with lr=5e-05, batch_size=2, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.7091
10,3.3287
15,3.494
20,3.9451
25,3.5671
30,3.4006
35,3.3978
40,3.6095
45,3.5581
50,3.3646


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 23
Training with lr=5e-05, batch_size=2, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.7096
10,3.3341
15,3.5094
20,3.9705
25,3.6107
30,3.4535
35,3.4464
40,3.6549
45,3.6232
50,3.4313


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 24
Training with lr=5e-05, batch_size=2, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.7091
10,3.3286
15,3.4936
20,3.9447
25,3.5668
30,3.4
35,3.3978
40,3.6093
45,3.5569
50,3.364


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 25
Training with lr=0.0002, batch_size=8, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6427
10,3.5122
15,3.4512
20,3.418
25,3.1031
30,3.242
35,2.917
40,3.0502
45,2.7893
50,2.8044


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 26
Training with lr=0.0002, batch_size=8, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6323
10,3.463
15,3.3594
20,3.2689
25,2.9049
30,2.9888
35,2.6322
40,2.7019
45,2.4398
50,2.4231


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 27
Training with lr=0.0002, batch_size=8, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6427
10,3.5124
15,3.4522
20,3.4214
25,3.1102
30,3.2529
35,2.9347
40,3.0645
45,2.8151
50,2.8351


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 28
Training with lr=0.0002, batch_size=8, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6322
10,3.4626
15,3.3595
20,3.2693
25,2.9082
30,2.9946
35,2.6399
40,2.7057
45,2.4445
50,2.425


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 29
Training with lr=0.0002, batch_size=8, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6428
10,3.5125
15,3.4509
20,3.4177
25,3.1035
30,3.2426
35,2.9193
40,3.053
45,2.7918
50,2.8055


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 30
Training with lr=0.0002, batch_size=8, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6323
10,3.463
15,3.3594
20,3.2689
25,2.9049
30,2.9888
35,2.6321
40,2.7019
45,2.4398
50,2.423


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 31
Training with lr=0.0002, batch_size=8, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6427
10,3.5124
15,3.4522
20,3.4214
25,3.1102
30,3.2529
35,2.9347
40,3.0645
45,2.815
50,2.835


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 32
Training with lr=0.0002, batch_size=8, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.6322
10,3.4626
15,3.3595
20,3.2693
25,2.9082
30,2.9946
35,2.6399
40,2.7057
45,2.4445
50,2.425


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 33
Training with lr=0.0002, batch_size=4, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5117
10,3.7131
15,3.4665
20,3.442


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5117
10,3.7131
15,3.4665
20,3.442
25,3.4661
30,3.1364
35,3.3129
40,3.1319
45,2.7974
50,3.0174


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 34
Training with lr=0.0002, batch_size=4, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5069
10,3.6819
15,3.4115
20,3.3829
25,3.3581
30,2.9919
35,3.1181
40,2.9088
45,2.5547
50,2.7344


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 35
Training with lr=0.0002, batch_size=4, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5116
10,3.7122
15,3.4663
20,3.442
25,3.4645
30,3.1365
35,3.3108
40,3.1298
45,2.7935
50,3.0146


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 36
Training with lr=0.0002, batch_size=4, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5068
10,3.6813
15,3.4114
20,3.3819
25,3.3569
30,2.9915
35,3.1161
40,2.9081
45,2.5538
50,2.7326


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 37
Training with lr=0.0002, batch_size=4, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5117
10,3.7127
15,3.4663
20,3.442
25,3.4666
30,3.1353
35,3.3104
40,3.129
45,2.7946
50,3.0135


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 38
Training with lr=0.0002, batch_size=4, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5069
10,3.6819
15,3.4115
20,3.3829
25,3.3581
30,2.9919
35,3.1181
40,2.9088
45,2.5546
50,2.7344


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 39
Training with lr=0.0002, batch_size=4, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5116
10,3.7122
15,3.4663
20,3.442
25,3.4645
30,3.1365
35,3.3108
40,3.1298
45,2.7935
50,3.0146


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 40
Training with lr=0.0002, batch_size=4, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.5068
10,3.6813
15,3.4114
20,3.3819
25,3.3569
30,2.9915
35,3.1161
40,2.9081
45,2.5538
50,2.7326


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 41
Training with lr=0.0002, batch_size=2, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.708
10,3.3181
15,3.464
20,3.9009
25,3.4938
30,3.3232
35,3.3246
40,3.5337
45,3.429
50,3.2153


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 42
Training with lr=0.0002, batch_size=2, weight_decay=0.01, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.706
10,3.3011
15,3.4326
20,3.875
25,3.4485
30,3.2681
35,3.2511
40,3.4557
45,3.3034
50,3.0495


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 43
Training with lr=0.0002, batch_size=2, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.708
10,3.3179
15,3.4636
20,3.901
25,3.494
30,3.3218
35,3.3251
40,3.5348
45,3.4263
50,3.2104


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 44
Training with lr=0.0002, batch_size=2, weight_decay=0.01, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.706
10,3.3007
15,3.4323
20,3.8752
25,3.4482
30,3.267
35,3.2519
40,3.4564
45,3.3008
50,3.0502


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 45
Training with lr=0.0002, batch_size=2, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.708
10,3.3181
15,3.464
20,3.9009
25,3.4938
30,3.3232
35,3.3246
40,3.5337
45,3.429
50,3.2153


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 46
Training with lr=0.0002, batch_size=2, weight_decay=0.0, grad_steps=2, lora_r=32, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.706
10,3.3011
15,3.4326
20,3.875
25,3.4485
30,3.2681
35,3.2511
40,3.4557
45,3.3034
50,3.0495


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/94.4M [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 47
Training with lr=0.0002, batch_size=2, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=64, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.708
10,3.3179
15,3.4636
20,3.901
25,3.494
30,3.3218
35,3.3251
40,3.5348
45,3.4263
50,3.2104


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
model number: 48
Training with lr=0.0002, batch_size=2, weight_decay=0.0, grad_steps=2, lora_r=64, lora_alpha=128, lora_dropout=0.0
start trianing function


`low_cpu_mem_usage` was None, now set to True since model is quantized.


training_started...


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


Step,Training Loss
5,3.706
10,3.3007
15,3.4323
20,3.8752
25,3.4482
30,3.267
35,3.2519
40,3.4564
45,3.3008
50,3.0502


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]
  return fn(*args, **kwargs)
  with torch.enab

Upload 2 LFS files:   0%|          | 0/2 [00:00<?, ?it/s]

adapter_model.safetensors:   0%|          | 0.00/189M [00:00<?, ?B/s]

training_args.bin:   0%|          | 0.00/5.24k [00:00<?, ?B/s]

a model has been pushed to the hub!
spray and pray worked this time.. :)
CPU times: user 13h 18min, sys: 23min 11s, total: 13h 41min 11s
Wall time: 14h 5min 48s


In [None]:
# # check and compare models performance in this notebook if you want
# %load_ext tensorboard
# %tensorboar --logdir logs

In [None]:
%%time
# zipping the logs files so we can easily save it
! zip logs.zip logs
! cp logs.zip /content/drive/MyDrive/altadmoreyyah_model/

from google.colab import files
files.download('logs.zip')