# Practice Fundamentals: Most Basic Form of Training LLMs 💪

## Learning Objectives 🎯
- Set up the development environment to utilize GPU resources.
- Understand and install specific library versions directly from a repository.
- Familiarize with YAML configuration for training setups.
- Execute a basic training session for a language model using the Axolotl library.

### Importing Libraries

In [1]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(device)

cuda


## Library Installation 🛠️
Install the Axolotl library directly from GitHub to ensure compatibility with the course's specified version. This step ensures that the environment matches the course requirements without needing advanced hardware.

In [2]:
# !pip install --no-build-isolation axolotl[flash-attn,deepspeed]

In [7]:
import yaml

train_config = """
# Model Parameters
# Model_name
base_model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
# Model-type 
model_type: LlamaForCausalLM
# tokenizer
tokenizer_type: LlamaTokenizer


# Dataset parameters
datasets:
    - path: jaydenccc/AI_Storyteller_Dataset
      # formatting this dataset
      type:
          system_prompt: ""
          field_system: system
          # field instruction is the input for our model 
          field_instruction: synopsis
          # field output is the output column
          field_output: short_story
          # now we will give the format of the chat template
          # some times the column wont have data so specifing the model to follw same structure
          format: "<|user|>\n {instruction} </s>\n<|assistant|>"
          no_input_format: "<|user|> {instruction} </s>\n<|assistant|>"


# saving the final trained model in a directory
output_dir: ./models/Tiny_Llama_Storyteller

# Model_parameters
sequence_length: 1024
bf16: auto
tf32: false

# Training Parameters
batch_size: 4
micro_batch_size: 2
num_epochs: 4
# optimizer is set from axolotl
optimizer: adamw_bnb_8bit
learning_rate: 0.0002

logging_steps: 1
"""


# Convert the YAML string to a Python dictionary
yaml_dict = yaml.safe_load(train_config)


# Write the YAML file
with open("basic_train.yml", 'w') as file:
    yaml.dump(yaml_dict, file)


## Training Launch 🚀
Launch the training process with the `accelerate` command. This command is optimized for use even with free-tier resources, ensuring that you can train models effectively without requiring premium hardware.

In [8]:
# !accelerate launch -m axolotl.cli.train basic_train.yml
# training this in colab 

## Importing those model from huggingface 
If you are training locally give the model path

In [17]:
from transformers import pipeline

pipe = pipeline("text-generation", model="Arivukkarasu/Tiny_Llama_Storyteller", torch_dtype=torch.bfloat16, device_map="auto")

# We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
messages = [
    {"role": "user", "content": "A Man who was a gangster, now living a regular life with his family but his past ememies still want him dead"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt

Some parameters are on the meta device because they were offloaded to the cpu.
Device set to use cuda:0


'<|user|>\nA Man who was a gangster, now living a regular life with his family but his past ememies still want him dead</s>\n<|assistant|>\n'

In [19]:
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Arivukkarasu/Tiny_Llama_Storyteller")

In [26]:
tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors='pt').shape

torch.Size([1, 41])

In [27]:
# outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
# For now we keep this output as simple 
outputs = pipe(prompt, max_new_tokens=2048)
print(outputs[0]["generated_text"])

<|user|>
A Man who was a gangster, now living a regular life with his family but his past ememies still want him dead</s>
<|assistant|>
Emily had always been a silent and reserved child, raised by her mother to be a sharpshooter for the family. But when her mother passed away, her life took a toll. She began to wake up in a trudder, unable to sleep. And his mind would wander during the day. Day by day, she found herself overwhelmed by the sheer scale of what she had uncovered. She had no room for error, and she had to be vigilant at all times.

One day, while staking out a local park, Jake saw a suspicious figure lurking in the shadows. She approached cautiously, her hand already on her firearm. As she got closer, she noticed the figure was a young woman, hunching over a piece of paper. The woman was afraid of approaching the cautiously noted man, but she knew she had to do it to stay alive.

With her hand on her firearm, Jake took her message directly to the people, holding rallies an