# Working with `Llama-3.2-1B`

### Initialization of the Model Pipeline
*A pipeline is essentially a high-level abstraction function that makes working with models easier.*

- A pipeline is initialized for "text-generation" using the Hugging Face Transformers library.
- Model is specified via `model_id = "meta-llama/Llama-3.2-1B"`

#### What is happening during initialization:
- If not already downloaded, download the model weights
- Loads the relevant tokenizer for the model.
- Configures the PyTorch device mapping:
  - GPU is automatically assigned if available/applicable
  - Model specific parameter: `torch_dtype=torch.bfloat16,`

From here, the `pipe` object is essentially the interface to interact with the `Llama-3.2-1B` model for text generation tasks




In [1]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer, TrainerCallback
import numpy as np
import evaluate


model_id = "meta-llama/Llama-3.2-1B"

pipe = pipeline(
    "text-generation", 
    model=model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)


  from .autonotebook import tqdm as notebook_tqdm
Device set to use mps


### Running text generation
`pipe("The key to life is")` serves as the prompt to the model

#### This pipeline will:
1. Tokenize input prompt
2. Run it through the model to create the output based on the model's learned parameters
3. Decode model's output back into humman readable text


In [2]:
pipe("Write a descriptive paragraph about lavish party in West Egg.")

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


[{'generated_text': 'Write a descriptive paragraph about lavish party in West Egg. 1. Describe the party.\nWrite a descriptive paragraph about lavish party in West Egg. 1'}]

---
# Fine-tuning the model

### Loading the dataset

In this case, we are using the *Great Gatsby* to train the model.

In [None]:
from datasets import load_dataset

# Loading dataset from hugging face (Great Gatsby txt)
ds = load_dataset("TeacherPuffy/book")

# This line prints out the "train" split where each index is a line number
print(ds["train"][100])


### Tokenization
We are using a tokenizer to process the text and provide padding as well as a truncation strategy to handle varying sequence lengths. The `map` method is used to apply the preprocessing function over the entire dataset

In [None]:
tokenizer = AutoTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id  # Ensure pad_token_id is set

def tokenize_function(examples):
    outputs = tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128)
    outputs["labels"] = outputs["input_ids"].copy()  # Set labels to be identical to input_ids
    return outputs

tokenized_datasets = ds.map(tokenize_function)
print(tokenized_datasets)



Then, to prepare for training, remove and edit columns that hugging face expects.
Here, the text column is removed, keeping `input_ids`, and `attention_mask`

In [None]:
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets = tokenized_datasets.with_format("torch")
print(tokenized_datasets["train"])


---
# Training the model with PyTorch Trainer

In [None]:
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")
model.config.pad_token_id = tokenizer.eos_token_id  # Update model configuration
model.resize_token_embeddings(len(tokenizer))

# Contains all hyperparameters
training_args = TrainingArguments(output_dir="test_trainer", num_train_epochs=2)

# Computes and reports metrics during training
metric = evaluate.load("accuracy")

# Calculates accuracy of the predictions
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# Define a callback to print training loss at the end of each epoch
class LogEpochLossCallback(TrainerCallback):
    def on_epoch_end(self, args, state, control, **kwargs):
        # Filter log history for entries with a loss value
        loss_logs = [log for log in state.log_history if "loss" in log]
        if loss_logs:
            last_log = loss_logs[-1]
            print(f"Epoch {state.epoch:.2f} ended with loss: {last_log['loss']:.4f}")

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    compute_metrics=compute_metrics,
)

trainer.add_callback(LogEpochLossCallback)



### Launch training:

In [None]:
trainer.train()

### Training metrics after running 5 epochs:

#### For epochs 1–5
{'loss': 1.8386, 'grad_norm': 9.5, 'learning_rate': 4.0714285714285717e-05, 'epoch': 1.0}

Epoch 2.00 ended with loss: 1.8386
{'loss': 0.4671, 'grad_norm': 7.4375, 'learning_rate': 3.071428571428572e-05, 'epoch': 2.0}

Epoch 3.00 ended with loss: 0.4671
{'loss': 0.1653, 'grad_norm': 3.921875, 'learning_rate': 2.0714285714285718e-05, 'epoch': 3.0}

Epoch 4.00 ended with loss: 0.1653
{'loss': 0.0639, 'grad_norm': 3.328125, 'learning_rate': 1.0714285714285714e-05, 'epoch': 4.0}

Epoch 5.00 ended with loss: 0.0639
{'loss': 0.0457, 'grad_norm': 5.375, 'learning_rate': 7.142857142857143e-07, 'epoch': 5.0}

#### Total
{'train_runtime': 23.8242, 'train_samples_per_second': 23.296, 'train_steps_per_second': 2.938, 'train_loss': 0.5161191165447235, 'epoch': 5.0}


### Querying the fine-tuned model, downloaded locally from the Chimera Cluster

The specific parameters in the `generate()` method are as follows:
- `temperature=0.7` controls the randomness of token sampling.
  - Closer to 0 makes the output more deterministic and conservative (greedy)
  - Closer to 1 increases the randomness.
- Top_k sampling `top_k=50` limits the token sampling to the 50 most likely candidatea at each step. Narrowing this pool of choices can prevent the model form choosing tokens with extremely low probabilities.

- Nucleus sampling `top_p=0.9` considers only the smallest set of tokens whose cumulative probability exceeds 0.9. This helps adjust the candidate pool based on the probability distribution.

- `num_return_sequences=1` specifies that the generation should only return one output sequence.

- `do_sample=True` enables sampling instead of pure greedy (deterministic) decoding. This adds diversigy to the output instead of just choosing the highest probability token

In [None]:
model_id = "great_gatsby_llm"
tokenizer = AutoTokenizer.from_pretrained("great_gatsby_llm")

GG_model =  AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto")

input_prompt = "In the Great Gatsby, the name of the narrator is"

encoded_input = tokenizer.encode_plus(
    input_prompt,
    return_tensors="pt",
    padding="longest",
    truncation=True
)

input_ids = encoded_input["input_ids"] 
attention_mask = encoded_input["attention_mask"]
generated_ids = GG_model.generate(input_ids, max_length=100, temperature=0.7, top_k=50, top_p=0.9, num_return_sequences=1)
answer = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(answer)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


The names of the characters the story are based on are fictitious. They are not intended to represent any particular person or group of people. The story is told from the point of view of a young man who goes by that name.  


### Wild Outputs

**Input:**
> In the Great Gatsby, the name of the narrator is

**Output**
> In the Great Gatsby, the name of the narrator is Gatsby. This is a most superficial tag to express the wide range of my interests and the narrow focus of my attention. I am not even remotely like Gatsby—indeed, I am not sure I have ever heard of a Gatsby. I am not even remotely like this man with that name who wrote the novel that bears my name. I am not even remotely like the man who gave his name to this book. I

**Input:**
> Write a descriptive paragraph about a lavish party in West Egg.

**Output temp=0.7 (Extended the prompt)**
> Write a descriptive paragraph about a lavish party in West Egg. Include the following: the season, the date, and the weather. Then describe the festivities, beginning with the arrival of the guests and ending with the departure of the last guest. Use active verbs to express the physical movements of the people and the objects they are moving. For instance, use "she sat down" instead of "she sat."


---
This is most likely due to factors such as:

- **Extremely small and narrow dataset**: This model was fine-tuned on only 111 lines from the book potentially causing severe overfitting.
- **Truncated Context Windows**: We set the max length in the tokenization function to only 128 tokens.
- **Catastrophic Forgetting**: Training on our limited dataset, could have altered the preexisting general knowledge of the pretrained model.
- **Lack of QA Fine Tuning**