# Fitting Giants: Practical Introduction to LoRA for Large Models 🚀

## Learning Objectives 🎯
- Understand the hardware requirements necessary to train large models.
- Install specific versions of dependencies to maintain consistency across training environments.
- Configure and execute training sessions for large-scale models using advanced settings.
- Explore techniques like LoRA to enhance model performance without increasing computational costs prohibitively.

### Importing Libraries

In [1]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

cuda


## Library Installation 🛠️
Install the Axolotl library from a specified GitHub commit to ensure that all participants use the same library version, promoting consistency and reliability in the training process.

In [2]:
# !pip install --no-build-isolation axolotl[flash-attn,deepspeed]

## Configuration of Training Parameters 📝
Set up a YAML configuration for training large models. This setup will detail all necessary parameters, including the base model, dataset specifics, and advanced options like batch sizes and learning rates, tailored to handle the demands of large-scale model training.

In [3]:
import yaml

train_config = """
# model params
base_model: unsloth/Meta-Llama-3.1-8B-Instruct

# dataset params
datasets:
  - path: jaydenccc/AI_Storyteller_Dataset
    type:
      system_prompt: "You are an amazing storyteller. From the following synopsis, create an engaging story."
      field_system: system
      field_instruction: synopsis
      field_output: short_story
      format: "<|user|>\n {instruction} </s>\n<|assistant|>"
      no_input_format: "<|user|> {instruction} </s>\n<|assistant|>"

output_dir: ./models/Llama3_Storyteller2


# model params
sequence_length: 512
bf16: auto
tf32: false

# training params
micro_batch_size: 1
num_epochs: 1
optimizer: adamw_bnb_8bit
learning_rate: 0.0002

logging_steps: 1


# LoRA
adapter: lora

lora_r: 16
lora_alpha: 16
lora_dropout: 0.05

lora_target_linear: true

# Gradient Accumulation
gradient_accumulation_steps: 1

# Gradient Checkpointing
gradient_checkpointing: true
"""

# Convert the YAML string to a Python dictionary
yaml_dict = yaml.safe_load(train_config)


# Write the YAML file
with open("advanced_train.yml", 'w') as file:
    yaml.dump(yaml_dict, file)


## Launching the Training Session 🚀
Initiate the training process with an `accelerate launch` command tailored for large models. This session will utilize significant GPU resources, reflecting the practical challenges and solutions in training large models efficiently.

Axolotl will train only on the small matrices in the model i.e. only a selected parameters so we nned to merge the trained parameters with the model and axolotl.cli.merge_lora will merge the trained parameters to the model

In [4]:
# !accelerate launch -m axolotl.cli.train advanced_train.yml
# Optional: Merge the trained adapter
# !accelerate launch -m axolotl.cli.merge_lora advanced_train.yml

# Since Llama 3 is bigger model training with Llama 2
# training this in colab 

## Initializing Text Generation Pipeline 🚀
Set up a text generation pipeline using a pre-trained model. This pipeline will utilize a specific transformer model configured for generating narrative text, showcasing how advanced models can be employed directly in practical applications.

In [10]:
from transformers import pipeline
# pipe = pipeline("text-generation", model="TheFuzzyScientist/Llama3_Storyteller", torch_dtype=torch.bfloat16, device_map="auto")
# Loading into cpu since GPU is very small
pipe = pipeline("text-generation", model="TheFuzzyScientist/Llama3_Storyteller", torch_dtype=torch.bfloat16, device_map="cpu")

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

Device set to use cpu
Error during conversion: ChunkedEncodingError(ProtocolError('Response ended prematurely'))


## Preparing and Generating Text 📝
Prepare a prompt for text generation using custom messages tailored to test the storytelling capabilities of the model. Generate text based on this prompt to evaluate the model's creative output and the effectiveness of LoRA adapters.

In [11]:
messages = [
    {"role":"system", "content": "You are an amazing storyteller. From the following synopsis, create an engaging story."},
    {"role": "user", "content": "A bright student was working with The Fuzzy Scientist on a project."},
]


In [12]:
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

## Reviewing Generated Output 🕵️‍♂️


Analyze the generated text to assess how well the model with integrated LoRA adapters performs in real-world storytelling tasks. This step is crucial for understanding the enhancements provided by LoRA in practical scenarios.

In [13]:
outputs = pipe(prompt, max_new_tokens=128)

print(outputs[0]["generated_text"])

<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 26 Jul 2024

You are an amazing storyteller. From the following synopsis, create an engaging story.<|eot_id|><|start_header_id|>user<|end_header_id|>

A bright student was working with The Fuzzy Scientist on a project.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

It was a typical day in the small, cluttered laboratory of The Fuzzy Scientist, a brilliant and eccentric inventor known for his unorthodox approach to science. The room was filled with strange contraptions, beakers filled with bubbling liquids, and an assortment of gadgets that defied explanation. Amidst the chaos, a bright and curious student named Emma sat at a workbench, surrounded by papers and notes, working on a project with The Fuzzy Scientist.

Emma had always been fascinated by science and had been lucky enough to land an internship with The Fuzzy Scientist, who was renowned for his groundbreaking