In [1]:
%pip install datasets
%pip install peft
%pip install trl
%pip install bitsandbytes
%pip install ipywidgets

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


In [2]:
import torch
from datasets import load_dataset
from peft import LoraConfig
from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
)
from trl import DataCollatorForCompletionOnlyLM, SFTConfig, SFTTrainer

In [3]:
model_id = "HuggingFaceTB/SmolLM-135M-Instruct"
dataset_id = "medalpaca/medical_meadow_medical_flashcards"
device = "cuda" if torch.cuda.is_available() else "cpu"

## Preparing and Formatting the Dataset for Training

We'll be preparing a dataset for training a Language Model (LLM). The steps involve formatting the dataset to keep only the necessary columns and splitting it into training and evaluation sets. Proper dataset preparation is crucial for ensuring the model's effectiveness and generalization.


In [4]:
def format_dataset(dataset, keys, instruction_col_name, response_col_name):
    """Format the dataset by retaining only necessary columns and renaming them."""
    cols_to_remove = [key for key in keys if key not in [instruction_col_name, response_col_name]]
    dataset = dataset.remove_columns(cols_to_remove)
    dataset = dataset.rename_column(instruction_col_name, "instruction")
    dataset = dataset.rename_column(response_col_name, "response")
    return dataset

def prepare_datasets(dataset, instruction_col_name, response_col_name):
    """Format and split the dataset for training and evaluation."""
    available_cols = list(dataset["train"].features.keys())
    formatted_dataset = format_dataset(
        dataset, available_cols, instruction_col_name, response_col_name
    )

    if "valid" in formatted_dataset:
        train_dataset = formatted_dataset["train"]
        eval_dataset = formatted_dataset["valid"]
    elif "test" in formatted_dataset:
        train_dataset = formatted_dataset["train"]
        eval_dataset = formatted_dataset["test"]
    else:
        split_dataset = formatted_dataset["train"].train_test_split(test_size=0.2)
        train_dataset, eval_dataset = split_dataset["train"], split_dataset["test"]

    return train_dataset, eval_dataset

Load the dataset using its ID or path. This dataset will be used for training and evaluating the model


In [5]:
dataset = load_dataset(dataset_id)

README.md:   0%|          | 0.00/1.24k [00:00<?, ?B/s]

(…)l_meadow_wikidoc_medical_flashcards.json:   0%|          | 0.00/17.7M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/33955 [00:00<?, ? examples/s]

Print the dataset information to inspect its structure and column names. This is important to understand the data we're working with and ensure that we correctly identify the columns containing the `instructions` and `responses`.


In [6]:
dataset

DatasetDict({
    train: Dataset({
        features: ['input', 'output', 'instruction'],
        num_rows: 33955
    })
})

In [7]:
dataset["train"][0]

{'input': 'What is the relationship between very low Mg2+ levels, PTH levels, and Ca2+ levels?',
 'output': 'Very low Mg2+ levels correspond to low PTH levels which in turn results in low Ca2+ levels.',
 'instruction': 'Answer this question truthfully'}

Format the dataset and split it into training and evaluation sets. Here, `input` and `output` represent the columns in the dataset holding the `instruction` and `response`, respectively.


In [8]:
train_dataset, eval_dataset = prepare_datasets(
    dataset, instruction_col_name="input", response_col_name="output"
)

In [9]:
print(f"{train_dataset = }")
print(f"{eval_dataset = }")

train_dataset = Dataset({
    features: ['instruction', 'response'],
    num_rows: 27164
})
eval_dataset = Dataset({
    features: ['instruction', 'response'],
    num_rows: 6791
})


## Load and Test Pre-trained Model

Define Functions for Response Generation and Display


In [10]:
def generate_response(model, tokenizer, instruction, device="cpu"):
    """Generate a response from the model based on an instruction."""
    messages = [{"role": "user", "content": instruction}]
    input_text = tokenizer.apply_chat_template(
        messages, tokenize=False, add_generation_prompt=True
    )
    inputs = tokenizer.encode(input_text, return_tensors="pt").to(device)
    outputs = model.generate(
        inputs, max_new_tokens=128, temperature=0.2, top_p=0.9, do_sample=True
    )
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return response

def print_example(example):
    """Print an example from the dataset."""
    print(f"Original Dataset Example:")
    print(f"Instruction: {example['instruction']}")
    print(f"Response: {example['response']}")
    print("-" * 100)

def print_response(response):
    """Print the model's response."""
    print(f"Model response:")
    print(response.split("assistant\n")[-1])
    print("-" * 100)

Load the Model, and the Tokenizer


In [11]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id).to(device)

tokenizer_config.json:   0%|          | 0.00/3.59k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/801k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/466k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.10M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/565 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/723 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/156 [00:00<?, ?B/s]

Test the Pre-trained Model


In [12]:
# Define a test example
example1 = eval_dataset[1]

response = generate_response(model, tokenizer, example1["instruction"], device)

print_example(example1)
print_response(response)

The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.


Original Dataset Example:
Instruction: What thyroid imbalance is associated with anxiety?
Response: Hyperthyroidism presents with anxiety.
----------------------------------------------------------------------------------------------------
Model response:
Thyroid imbalance is a common condition that can contribute to anxiety. Thyroid dysfunction, which affects the thyroid gland, can lead to anxiety symptoms. Here are some ways thyroid imbalance can contribute to anxiety:

1. **Hypothyroidism**: Hypothyroidism, or underactive thyroid, can cause anxiety by disrupting the body's natural balance of hormones. This can lead to feelings of fatigue, weight gain, and mood disturbances.
2. **Hyperthyroidism**: Hyperthyroidism, or overactive thyroid, can cause anxiety by disrupting the body's natural balance of hormones. This can lead to feelings of anxiety, jitteriness, and rapid heartbeat.
----------------------------------------------------------------------------------------------------


### Supervised Fine-tuning Trainer


#### Training adapters [Read More](https://huggingface.co/docs/trl/v0.9.6/en/sft_trainer#training-adapters)


Huggingface support tight integration with PEFT library so that we can conveniently train adapters instead of training the entire model.


In [13]:
peft_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    bias="none"
)

#### Customize prompts using packed dataset [Read More](https://huggingface.co/docs/trl/en/sft_trainer#customize-your-prompts-using-packed-dataset)


Since our dataset has two field `instruction` and `response`, we need to combine them as one string to be able to past it to the SFT Trainer.


In [14]:
def formatting_prompts_func(example: dict) -> str:
    """Format prompt for training."""
    text = f"<|im_start|>user\n{example['instruction']}<|im_end|>\n<|im_start|>assistant\n{example['response']}<|im_end|>"
    return text

## Training Parameters


In [43]:
num_train_epochs = 100

output_dir = f"{model_id.split('/')[-1]}-{dataset_id.split('/')[-1]}-{num_train_epochs}epochs"

sft_config = SFTConfig(
    output_dir=output_dir,
    num_train_epochs=num_train_epochs,
    max_seq_length=512,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    gradient_checkpointing=True,
    optim="paged_adamw_32bit",
    save_steps=500,  # save checkpoints every n training steps
    logging_steps=500,
    learning_rate=1e-3,
    weight_decay=0.001,
    fp16=False,
    bf16=True,
    warmup_ratio=0.05,
    lr_scheduler_type="constant",
    packing=True
)

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    formatting_func=formatting_prompts_func,
    peft_config=peft_config,
    args=sft_config,
)

In [44]:
print(torch.cuda.memory_summary(device=None, abbreviated=False))

|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 2            |        cudaMalloc retries: 2         |
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |   1647 MiB |   4883 MiB | 120184 GiB | 120183 GiB |
|       from large pool |   1585 MiB |   4851 MiB | 119399 GiB | 119398 GiB |
|       from small pool |     61 MiB |     97 MiB |    785 GiB |    785 GiB |
|---------------------------------------------------------------------------|
| Active memory         |   1647 MiB |   4883 MiB | 120184 GiB | 120183 GiB |
|       from large pool |   1585 MiB |   4851 MiB | 119399 GiB | 119398 GiB |
|       from small pool |     61 MiB |     97 MiB |    785 GiB |    785 GiB |
|---------------------------------------------------------------

In [45]:
trainer.train()

  0%|          | 0/32900 [00:00<?, ?it/s]

  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.4246, 'grad_norm': 0.11087402701377869, 'learning_rate': 0.001, 'epoch': 1.52}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.343, 'grad_norm': 0.14058388769626617, 'learning_rate': 0.001, 'epoch': 3.04}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.2933, 'grad_norm': 0.14454783499240875, 'learning_rate': 0.001, 'epoch': 4.56}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.2624, 'grad_norm': 0.16288474202156067, 'learning_rate': 0.001, 'epoch': 6.08}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.2298, 'grad_norm': 0.1749546080827713, 'learning_rate': 0.001, 'epoch': 7.6}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.206, 'grad_norm': 0.1861453652381897, 'learning_rate': 0.001, 'epoch': 9.12}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.1825, 'grad_norm': 0.19856591522693634, 'learning_rate': 0.001, 'epoch': 10.64}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.1685, 'grad_norm': 0.2096509486436844, 'learning_rate': 0.001, 'epoch': 12.16}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.1496, 'grad_norm': 0.23079277575016022, 'learning_rate': 0.001, 'epoch': 13.68}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.1363, 'grad_norm': 0.23993003368377686, 'learning_rate': 0.001, 'epoch': 15.2}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.1235, 'grad_norm': 0.24106284976005554, 'learning_rate': 0.001, 'epoch': 16.72}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.112, 'grad_norm': 0.24820154905319214, 'learning_rate': 0.001, 'epoch': 18.24}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.1023, 'grad_norm': 0.26187747716903687, 'learning_rate': 0.001, 'epoch': 19.76}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0932, 'grad_norm': 0.2678745985031128, 'learning_rate': 0.001, 'epoch': 21.28}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0865, 'grad_norm': 0.2732801139354706, 'learning_rate': 0.001, 'epoch': 22.8}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.077, 'grad_norm': 0.27954167127609253, 'learning_rate': 0.001, 'epoch': 24.32}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0732, 'grad_norm': 0.29230767488479614, 'learning_rate': 0.001, 'epoch': 25.84}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0632, 'grad_norm': 0.31502199172973633, 'learning_rate': 0.001, 'epoch': 27.36}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0631, 'grad_norm': 0.314107209444046, 'learning_rate': 0.001, 'epoch': 28.88}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0537, 'grad_norm': 0.302936851978302, 'learning_rate': 0.001, 'epoch': 30.4}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0525, 'grad_norm': 0.29010626673698425, 'learning_rate': 0.001, 'epoch': 31.91}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0435, 'grad_norm': 0.32046446204185486, 'learning_rate': 0.001, 'epoch': 33.43}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0456, 'grad_norm': 0.3422679007053375, 'learning_rate': 0.001, 'epoch': 34.95}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0359, 'grad_norm': 0.32034948468208313, 'learning_rate': 0.001, 'epoch': 36.47}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0383, 'grad_norm': 0.32696086168289185, 'learning_rate': 0.001, 'epoch': 37.99}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0284, 'grad_norm': 0.29580581188201904, 'learning_rate': 0.001, 'epoch': 39.51}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0325, 'grad_norm': 0.3324735164642334, 'learning_rate': 0.001, 'epoch': 41.03}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0228, 'grad_norm': 0.3088604807853699, 'learning_rate': 0.001, 'epoch': 42.55}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0269, 'grad_norm': 0.3445807695388794, 'learning_rate': 0.001, 'epoch': 44.07}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0184, 'grad_norm': 0.33308324217796326, 'learning_rate': 0.001, 'epoch': 45.59}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0207, 'grad_norm': 0.3461936414241791, 'learning_rate': 0.001, 'epoch': 47.11}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0151, 'grad_norm': 0.32928311824798584, 'learning_rate': 0.001, 'epoch': 48.63}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0162, 'grad_norm': 0.3238476514816284, 'learning_rate': 0.001, 'epoch': 50.15}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0125, 'grad_norm': 0.3503732979297638, 'learning_rate': 0.001, 'epoch': 51.67}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0113, 'grad_norm': 0.32343536615371704, 'learning_rate': 0.001, 'epoch': 53.19}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0088, 'grad_norm': 0.3301333785057068, 'learning_rate': 0.001, 'epoch': 54.71}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0074, 'grad_norm': 0.3329049348831177, 'learning_rate': 0.001, 'epoch': 56.23}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0068, 'grad_norm': 0.34890225529670715, 'learning_rate': 0.001, 'epoch': 57.75}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0038, 'grad_norm': 0.3521020710468292, 'learning_rate': 0.001, 'epoch': 59.27}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0037, 'grad_norm': 0.35883283615112305, 'learning_rate': 0.001, 'epoch': 60.79}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0007, 'grad_norm': 0.33201485872268677, 'learning_rate': 0.001, 'epoch': 62.31}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0009, 'grad_norm': 0.3412777781486511, 'learning_rate': 0.001, 'epoch': 63.83}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9974, 'grad_norm': 0.36233896017074585, 'learning_rate': 0.001, 'epoch': 65.35}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 1.0, 'grad_norm': 0.3302571475505829, 'learning_rate': 0.001, 'epoch': 66.87}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9936, 'grad_norm': 0.3609643578529358, 'learning_rate': 0.001, 'epoch': 68.39}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9981, 'grad_norm': 0.3421984910964966, 'learning_rate': 0.001, 'epoch': 69.91}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9923, 'grad_norm': 0.36456289887428284, 'learning_rate': 0.001, 'epoch': 71.43}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9971, 'grad_norm': 0.3589473068714142, 'learning_rate': 0.001, 'epoch': 72.95}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9887, 'grad_norm': 0.3569645583629608, 'learning_rate': 0.001, 'epoch': 74.47}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9944, 'grad_norm': 0.3648380935192108, 'learning_rate': 0.001, 'epoch': 75.99}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9863, 'grad_norm': 0.3525194525718689, 'learning_rate': 0.001, 'epoch': 77.51}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9931, 'grad_norm': 0.3629269301891327, 'learning_rate': 0.001, 'epoch': 79.03}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9868, 'grad_norm': 0.36118456721305847, 'learning_rate': 0.001, 'epoch': 80.55}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9888, 'grad_norm': 0.35341793298721313, 'learning_rate': 0.001, 'epoch': 82.07}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9837, 'grad_norm': 0.3631097674369812, 'learning_rate': 0.001, 'epoch': 83.59}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.989, 'grad_norm': 0.3691718578338623, 'learning_rate': 0.001, 'epoch': 85.11}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9844, 'grad_norm': 0.3770564794540405, 'learning_rate': 0.001, 'epoch': 86.63}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9847, 'grad_norm': 0.3792206943035126, 'learning_rate': 0.001, 'epoch': 88.15}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.982, 'grad_norm': 0.39887386560440063, 'learning_rate': 0.001, 'epoch': 89.67}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9846, 'grad_norm': 0.3566865921020508, 'learning_rate': 0.001, 'epoch': 91.19}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9827, 'grad_norm': 0.3833291232585907, 'learning_rate': 0.001, 'epoch': 92.71}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9818, 'grad_norm': 0.3814966082572937, 'learning_rate': 0.001, 'epoch': 94.22}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9812, 'grad_norm': 0.3674820363521576, 'learning_rate': 0.001, 'epoch': 95.74}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.9806, 'grad_norm': 0.40586304664611816, 'learning_rate': 0.001, 'epoch': 97.26}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'loss': 0.98, 'grad_norm': 0.4027002453804016, 'learning_rate': 0.001, 'epoch': 98.78}


  return fn(*args, **kwargs)
  with torch.enable_grad(), device_autocast_ctx, torch.cpu.amp.autocast(**ctx.cpu_autocast_kwargs):  # type: ignore[attr-defined]


{'train_runtime': 34130.4596, 'train_samples_per_second': 15.414, 'train_steps_per_second': 0.964, 'train_loss': 1.0518812769741999, 'epoch': 100.0}


TrainOutput(global_step=32900, training_loss=1.0518812769741999, metrics={'train_runtime': 34130.4596, 'train_samples_per_second': 15.414, 'train_steps_per_second': 0.964, 'total_flos': 1.731332873060352e+17, 'train_loss': 1.0518812769741999, 'epoch': 100.0})

In [46]:
trainer.evaluate()

  0%|          | 0/166 [00:00<?, ?it/s]

{'eval_runtime': 24.2226,
 'eval_samples_per_second': 54.784,
 'eval_steps_per_second': 6.853,
 'epoch': 100.0}

In [47]:
trainer.save_model()



#### Test Fine-tuned Model


In [48]:
ft_model = AutoModelForCausalLM.from_pretrained(output_dir).to(device)

response = generate_response(ft_model, tokenizer, example1["instruction"], device)

print_example(example1)
print_response(response)

Original Dataset Example:
Instruction: What thyroid imbalance is associated with anxiety?
Response: Hyperthyroidism presents with anxiety.
----------------------------------------------------------------------------------------------------
Model response:
Hyperthyroidism is associated with anxiety.
----------------------------------------------------------------------------------------------------


#### Test Many Responses


In [49]:
import random

for i in range(10):
    example = eval_dataset[random.randint(1, len(eval_dataset))]
    test_response = generate_response(ft_model, tokenizer, example["instruction"], device)

    print("=======================", (i+1), "==========================")
    print_example(example)
    print_response(test_response)


Original Dataset Example:
Instruction: What does the presence of RBC casts/dysmorphic RBCs in the urine indicate in terms of the origin of hematuria/pyuria?
Response: The presence of RBC casts/dysmorphic RBCs in the urine indicates that hematuria/pyuria is of glomerular origin.
----------------------------------------------------------------------------------------------------
Model response:
The presence of RBC casts/dysmorphic RBCs in the urine indicate hematuria/pyuria.
----------------------------------------------------------------------------------------------------
Original Dataset Example:
Instruction: What is the typical platelet count for someone with Hemophilia (A, B, C)?
Response: Hemophilia (A, B, C) is a genetic disorder that affects the blood's ability to clot properly. While Hemophilia can cause a range of symptoms, such as excessive bleeding and bruising, it typically does not affect the number of platelets in the blood. Therefore, someone with Hemophilia (A, B, C) wou

Push the fine-tuned model to your HuggingFace


In [51]:
hf_access_token = "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"
if hf_access_token:
    trainer.push_to_hub(token=hf_access_token)

No files have been modified since last commit. Skipping to prevent empty commit.
