# Working with `Llama-3.2-1B`

### Initialization of the Model Pipeline
*A pipeline is essentially a high-level abstraction function that makes working with models easier.*

- A pipeline is initialized for "text-generation" using the Hugging Face Transformers library.
- Model is specified via `model_id = "meta-llama/Llama-3.2-1B"`

#### What is happening during initialization:
- If not already downloaded, download the model weights
- Loads the relevant tokenizer for the model.
- Configures the PyTorch device mapping:
  - GPU is automatically assigned if available/applicable
  - Model specific parameter: `torch_dtype=torch.bfloat16,`

From here, the `pipe` object is essentially the interface to interact with the `Llama-3.2-1B` model for text generation tasks




In [8]:
import torch
from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
import numpy as np
import evaluate


model_id = "meta-llama/Llama-3.2-1B"

pipe = pipeline(
    "text-generation", 
    model=model_id, 
    torch_dtype=torch.bfloat16, 
    device_map="auto"
)


Device set to use mps


### Running text generation
`pipe("The key to life is")` serves as the prompt to the model

#### This pipeline will:
1. Tokenize input prompt
2. Run it through the model to create the output based on the model's learned parameters
3. Decode model's output back into humman readable text


In [None]:
pipe("The key to life is")

---
# Fine-tuning the model

### Loading the dataset

In this case, we are using the *Great Gatsby* to train the model.

In [9]:
from datasets import load_dataset

# Loading dataset from hugging face (Great Gatsby txt)
ds = load_dataset("TeacherPuffy/book")

# This line prints out the "train" split where each index is a line number
print(ds["train"][100])


{'text': 'Sometimes she and Miss Baker talked at the same time, unobtrusively and with a playful banter that was never quite chatter, as cool as their white dresses and their impersonal eyes, devoid of all desire. They were here—and they accepted Tom and me, making only a polite effort to entertain or be entertained. They knew that dinner would soon be over, and a little later, the evening too would end and be casually put away. It was a stark contrast to the West, where an evening was rushed from one phase to the next, driven by a continually disappointed anticipation or sheer nervous dread of the moment itself.'}


### Tokenization
We are using a tokenizer to process the text and provide padding as well as a truncation strategy to handle varying sequence lengths. The `map` method is used to apply the preprocessing function over the entire dataset

In [10]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

def tokenize_function(examples):

    tokenizer.pad_token = tokenizer.eos_token

    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128)

tokenized_datasets = ds.map(tokenize_function)
print(tokenized_datasets)



DatasetDict({
    train: Dataset({
        features: ['text', 'input_ids', 'attention_mask'],
        num_rows: 111
    })
})


Then, to prepare for training, remove and edit columns that hugging face expects.
Here, the text column is removed, keeping `input_ids`, and `attention_mask`

In [11]:
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets = tokenized_datasets.with_format("torch")
print(tokenized_datasets["train"])


Dataset({
    features: ['input_ids', 'attention_mask'],
    num_rows: 111
})


---
# Training the model with PyTorch Trainer

In [13]:
model = AutoModelForSequenceClassification.from_pretrained(model_id, torch_dtype="auto")
model.resize_token_embeddings(len(tokenizer))

# Contains all hyperparameters
training_args = TrainingArguments(output_dir="test_trainer", eval_strategy="epoch", num_train_epochs=1)

# Computes and reports metrics during training
metric = evaluate.load("accuracy")

# Calculates accuracy of the predictions
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset='no',
    compute_metrics=compute_metrics,
    
)


Some weights of LlamaForSequenceClassification were not initialized from the model checkpoint at meta-llama/Llama-3.2-1B and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


### Launching the training:

In [14]:
trainer.train()

ValueError: Cannot handle batch sizes > 1 if no padding token is defined.