<a href="https://colab.research.google.com/github/argonne-lcf/ai-science-training-series/blob/main/04_intro_to_llms/IntroLLMs.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction to Language models (LMs)

Author: Archit Vasan , including materials on LLMs by Varuni Sastri and Carlo Graziani at Argonne, and discussion/editorial work by Taylor Childers, Bethany Lusch, and Venkat Vishwanath (Argonne)

Inspiration from the blog posts "The Illustrated Transformer" and "The Illustrated GPT2" by Jay Alammar, highly recommended reading.

Although the name "language models" is derived from Natural Language Processing, the models used in these approaches can be applied to diverse scientific applications as illustrated below.

## Outline
During this session I will cover:
1. Scientific applications for language models
2. General overview of Transformers
3. Tokenization
4. Model Architecture
5. Pipeline using HuggingFace  
6. Model loading

## Modeling Sequential Data

Sequences are variable-length lists with data in subsequent iterations that depends on previous iterations (or tokens).

Mathematically:
A sequence is a list of tokens: $$T = [t_1, t_2, t_3,...,t_N]$$ where each token within the list depends on the others with a particular probability:

$$P(t_N | t_{N-1}, ..., t_3, t_2, t_1)$$

The purpose of sequential modeling is to learn these probabilities for possible tokens in a distribution to perform various tasks including:
* Sequence generation based on a prompt
* Language translation (e.g. English --> French)
* Property prediction (predicting a property based on an entire sequence)
* Identifying mistakes or missing elements in sequential data

## Scientific sequential data modeling examples

 ### Nucleic acid sequences + genomic data

 <div style="text-align: center">
<img src="https://github.com/architvasan/ai_science_local/blob/main/images/RNA-codons.svg.png?raw=1"
 width="200">
</div>

Nucleic acid sequences can be used to predict translation of proteins, mutations, and gene expression levels.


Here is an image of GenSLM. This is a language model developed by Argonne researchers that can model genomic information in a single model. It was shown to model the evolution of SARS-COV2 without expensive experiments.

<div>

<img src="https://github.com/architvasan/ai_science_local/blob/main/images/genslm.png?raw=1" width="450"/>
</div>

[Zvyagin et. al 2022. BioRXiv](https://www.biorxiv.org/content/10.1101/2022.10.10.511571v1)

### Protein sequences
Protein sequences can be used to predict folding structure, protein-protein interactions, chemical/binding properties, protein function and many more properties.
<div>
<img src="https://github.com/architvasan/ai_science_local/blob/main/images/Protein-Structure-06.png?raw=1" width="400"/>
</div>

<div>
<img src="https://github.com/argonne-lcf/ai-science-training-series/blob/main/04_intro_to_llms/images/ESMFold.png?raw=1" width="700"/>
</div>

[Lin et. al. 2023. Science](https://www.science.org/doi/10.1126/science.ade2574)

### Other applications:

* Biomedical text
* SMILES strings
* Weather predictions
* Interfacing with simulations such as molecular dynamics simulation

## Overview of Language models

We will now briefly talk about the progression of language models.

### Transformers

The most common LMs base their design on the Transformer architecture that was introduced in 2017 in the "Attention is all you need" paper.

<div>
<img src="https://github.com/architvasan/ai_science_local/blob/main/images/attention_is_all_you_need.png?raw=1" width="500"/>
</div>

[Vaswani 2017. Advances in Neural Information Processing Systems](https://arxiv.org/pdf/1706.03762)

Since then a multitude of LLM architectures have been designed.

<div>
<img src="https://github.com/architvasan/ai_science_local/blob/main/images/en_chapter1_transformers_chrono.svg?raw=1" width="600"/>
</div>

[HuggingFace NLP Course](https://huggingface.co/learn/nlp-course/chapter1/4)

## Coding example of LLMs in action!

Let's look at an example of running inference with a LLM as a block box to generate text given a prompt and we will also initiate a training loop for an LLM:

Here, we will use the `transformers` library which is as part of HuggingFace, a repository of different models, tokenizers and information on how to apply these models

*Warning: Large Language Models are only as good as their training data. They have no ethics, no judgement, or editing ability. We will be using some pretrained models from Hugging Face which used wide samples of internet hosted text. The datasets have not been strictly filtered to restrict all malign content so the generated text may be surprisingly dark or questionable. They do not reflect our core values and are only used for demonstration purposes.*

In [1]:
'''
Uncomment below section if running on sophia jupyter notebook
'''
# import os
# os.environ["HTTP_PROXY"]="proxy.alcf.anl.gov:3128"
# os.environ["HTTPS_PROXY"]="proxy.alcf.anl.gov:3128"
# os.environ["http_proxy"]="proxy.alcf.anl.gov:3128"
# os.environ["https_proxy"]="proxy.alcf.anl.gov:3128"
# os.environ["ftp_proxy"]="proxy.alcf.anl.gov:3128"

'\nUncomment below section if running on sophia jupyter notebook\n'

In [2]:
!pip install transformers
!pip install pandas
!pip install torch



In [3]:
from transformers import AutoTokenizer,AutoModelForCausalLM, AutoConfig
input_text = "My dog really wanted to eat icecream because"
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
pipe = pipeline("text-generation", model="gpt2")
generator(input_text, max_length=20, num_return_sequences=5)



config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'My dog really wanted to eat icecream because he hates eating it," she said later, as the'},
 {'generated_text': "My dog really wanted to eat icecream because the milk didn't taste good. He said she was"},
 {'generated_text': 'My dog really wanted to eat icecream because of the ice and ice cream. I got me all'},
 {'generated_text': "My dog really wanted to eat icecream because his mother wouldn't let him. I can't even"},
 {'generated_text': "My dog really wanted to eat icecream because he says it brings the world together … It didn't"}]

## What's going on under the hood?
There are two components that are "black-boxes" here:

1. The method for tokenization
2. The model that generates novel text.


## Tokenization and embedding of sequential data

Humans can inherently understand language data because they previously learned phonetic sounds.

Machines don’t have phonetic knowledge so they need to be told how to break text into standard units to process it.

They use a system called “tokenization”, where sequences of text are broken into smaller parts, or “tokens”, and then fed as input.

<div>
<img src="https://github.com/architvasan/ai_science_local/blob/main/images/text-processing---machines-vs-humans.png?raw=1" width="400"/>
</div>

Tokenization is a data preprocessing step which transforms the raw text data into a format suitable for machine learning models. Tokenizers break down raw text into smaller units called tokens. These tokens are what is fed into the language models. Based on the type and configuration of the tokenizer, these tokens can be words, subwords, or characters.

Types of tokenizers:

1. Character Tokenizers: Split text into individual characters.
2. Word Tokenizers: Split text into words based on whitespace or punctuation.
3. Subword Tokenizers: Split text into subword units, such as morphemes or character n-grams. Common subword tokenization algorithms include:
  1. Byte-Pair Encoding (BPE),
  2. SentencePiece,
  3. WordPiece.

<div>
<img src="https://github.com/architvasan/ai_science_local/blob/main/images/tokenization_image.webp?raw=1" width="400"/>
</div>

[nlpiation](https://nlpiation.medium.com/how-to-use-huggingfaces-transformers-pre-trained-tokenizers-e029e8d6d1fa)

### Example of tokenization
Let's look at an example of tokenization using byte-pair encoding.

In [4]:
from transformers import AutoTokenizer

# A utility function to tokenize a sequence and print out some information about it.

def tokenization_summary(tokenizer, sequence):

    # get the vocabulary
    vocab = tokenizer.vocab
    # Number of entries to print
    n = 10

    # Print subset of the vocabulary
    print("Subset of tokenizer.vocab:")
    for i, (token, index) in enumerate(tokenizer.vocab.items()):
        print(f"{token}: {index}")
        if i >= n - 1:
            break

    print("Vocab size of the tokenizer = ", len(vocab))
    print("------------------------------------------")

    # .tokenize chunks the existing sequence into different tokens based on the rules and vocab of the tokenizer.
    tokens = tokenizer.tokenize(sequence)
    print("Tokens : ", tokens)
    print("------------------------------------------")

    # .convert_tokens_to_ids or .encode or .tokenize converts the tokens to their corresponding numerical representation.
    #  .convert_tokens_to_ids has a 1-1 mapping between tokens and numerical representation
    # ids = tokenizer.convert_tokens_to_ids(tokens)
    # print("encoded Ids: ", ids)

    # .encode also adds additional information like Start of sequence tokens and End of sequene
    print("tokenized sequence : ", tokenizer.encode(sequence))

    # .tokenizer has additional information about attention_mask.
    # encode = tokenizer(sequence)
    # print("Encode sequence : ", encode)
    # print("------------------------------------------")

    # .decode decodes the ids to raw text
    ids = tokenizer.convert_tokens_to_ids(tokens)
    decode = tokenizer.decode(ids)
    print("Decode sequence : ", decode)


tokenizer_1  =  AutoTokenizer.from_pretrained("gpt2") # GPT-2 uses "Byte-Pair Encoding (BPE)"

sequence = "Counselor, please adjust your Zoom filter to appear as a human, rather than as a cat"

tokenization_summary(tokenizer_1, sequence)

Subset of tokenizer.vocab:
ĠUnique: 30015
essen: 44483
ĠNeurolog: 49115
ĠSave: 12793
ĠInternet: 4455
works: 5225
Ġnewcom: 22315
">: 5320
Ġrim: 20254
itures: 20686
Vocab size of the tokenizer =  50257
------------------------------------------
Tokens :  ['Coun', 'sel', 'or', ',', 'Ġplease', 'Ġadjust', 'Ġyour', 'ĠZoom', 'Ġfilter', 'Ġto', 'Ġappear', 'Ġas', 'Ġa', 'Ġhuman', ',', 'Ġrather', 'Ġthan', 'Ġas', 'Ġa', 'Ġcat']
------------------------------------------
tokenized sequence :  [31053, 741, 273, 11, 3387, 4532, 534, 40305, 8106, 284, 1656, 355, 257, 1692, 11, 2138, 621, 355, 257, 3797]
Decode sequence :  Counselor, please adjust your Zoom filter to appear as a human, rather than as a cat


### Token embedding:

Words are turned into vectors based on their location within a vocabulary.

The strategy of choice for learning language structure from tokenized text is to find a clever way to map each token into a moderate-dimension vector space, adjusting the mapping so that

Similar, or associated tokens take up residence nearby each other, and different regions of the space correspond to different position in the sequence.
Such a mapping from token ID to a point in a vector space is called a token embedding. The dimension of the vector space is often high (e.g. 1024-dimensional), but much smaller than the vocabulary size (30,000--500,000).

Various approaches have been attempted for generating such embeddings, including static algorithms that operate on a corpus of tokenized data as preprocessors for NLP tasks. Transformers, however, adjust their embeddings during training.

## Transformer Model Architecture

Now let's look at the base elements that
make up a Transformer by dissecting the popular GPT2 model

In [5]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel
model = GPT2LMHeadModel.from_pretrained('gpt2')
print(model)

GPT2LMHeadModel(
  (transformer): GPT2Model(
    (wte): Embedding(50257, 768)
    (wpe): Embedding(1024, 768)
    (drop): Dropout(p=0.1, inplace=False)
    (h): ModuleList(
      (0-11): 12 x GPT2Block(
        (ln_1): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (attn): GPT2Attention(
          (c_attn): Conv1D()
          (c_proj): Conv1D()
          (attn_dropout): Dropout(p=0.1, inplace=False)
          (resid_dropout): Dropout(p=0.1, inplace=False)
        )
        (ln_2): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
        (mlp): GPT2MLP(
          (c_fc): Conv1D()
          (c_proj): Conv1D()
          (act): NewGELUActivation()
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
    (ln_f): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=768, out_features=50257, bias=False)
)


GPT2 is an example of a Transformer Decoder which is used to generate novel text.

Decoder models use only the decoder of a Transformer model. At each stage, for a given word the attention layers can only access the words positioned before it in the sentence. These models are often called auto-regressive models. The pretraining of decoder models usually revolves around predicting the next word in the sentence.

These models are best suited for tasks involving text generation.

The architecture of GPT-2 is inspired by the paper: "Generating Wikipedia by Summarizing Long Sequences" which is another arrangement of the transformer block that can do language modeling. This model threw away the encoder and thus is known as the “Transformer-Decoder”.

<div>
<img src="https://github.com/architvasan/ai_science_local/blob/main/images/transformer-decoder-intro.png?raw=1" width="500"/>
</div>

[Illustrated GPT2](https://jalammar.github.io/illustrated-gpt2/)

Key components of the transformer architecture include:

* Input Embeddings: Word embedding or word vectors help us represent words or text as a numeric vector where words with similar meanings have the similar representation.

* Positional Encoding: Injects information about the position of words in a sequence, helping the model understand word order.

* Self-Attention Mechanism: Allows the model to weigh the importance of different words in a sentence, enabling it to effectively capture contextual information.

* Feedforward Neural Networks: Process information from self-attention layers to generate output for each word/token.

* Layer Normalization and Residual Connections: Aid in stabilizing training and mitigating the vanishing gradient problem.

* Transformer Blocks: Comprised of multiple layers of self-attention and feedforward neural networks, stacked together to form the model.

### Attention mechanisms

Since attention mechanisms are arguably the most powerful component of the Transformer, let's discuss this in a little more detail.

Suppose the following sentence is an input sentence we want to translate using an LLM:

`”The animal didn't cross the street because it was too tired”`

To understand a full sentence, the model needs to understand what each word means in relation to other words.

For example, when we read the sentence:
`”The animal didn't cross the street because it was too tired”`
we know intuitively that the word `"it"` refers to `"animal"`, the state for `"it"` is `"tired"`, and the associated action is `"didn't cross"`.

However, the model needs a way to learn all of this information in a simple yet generalizable way.
What makes Transformers particularly powerful compared to earlier sequential architectures is how it encodes context with the **self-attention mechanism**.

As the model processes each word in the input sequence, attention looks at other positions in the input sequence for clues to a better understanding for this word.

<div>
<img src="https://github.com/architvasan/ai_science_local/blob/main/images/transformer_self-attention_visualization.png?raw=1" width="400"/>
</div>

[The Illustrated Transformer](https://jalammar.github.io/illustrated-transformer/)

#### Multi-head attention
In practice, multiple attention heads are used simultaneously.

This:
* Expands the model’s ability to focus on different positions.
* Prevents the attention to be dominated by the word itself.

#### Let's see multi-head attention mechanisms in action!

We are going to use the powerful visualization tool bertviz, which allows an interactive experience of the attention mechanisms. Normally these mechanisms are abstracted away but this will allow us to inspect our model in more detail.

In [6]:
!pip install bertviz

Collecting bertviz
  Obtaining dependency information for bertviz from https://files.pythonhosted.org/packages/66/07/cce3d29605a3011d3685b2041fb94fcad25565b80bd2f22f3dcd75b2eee9/bertviz-1.4.0-py3-none-any.whl.metadata
  Downloading bertviz-1.4.0-py3-none-any.whl.metadata (19 kB)
Collecting boto3 (from bertviz)
  Obtaining dependency information for boto3 from https://files.pythonhosted.org/packages/dd/ee/cbae52e3a54c96330359fcd0a883072a0970c3a9ed2f3022eec6adf1d40d/boto3-1.35.70-py3-none-any.whl.metadata
  Downloading boto3-1.35.70-py3-none-any.whl.metadata (6.7 kB)
Collecting sentencepiece (from bertviz)
  Obtaining dependency information for sentencepiece from https://files.pythonhosted.org/packages/a2/f6/587c62fd21fc988555b85351f50bbde43a51524caafd63bc69240ded14fd/sentencepiece-0.2.0-cp311-cp311-win_amd64.whl.metadata
  Downloading sentencepiece-0.2.0-cp311-cp311-win_amd64.whl.metadata (8.3 kB)
Collecting botocore<1.36.0,>=1.35.70 (from boto3->bertviz)
  Obtaining dependency informat

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
aiobotocore 2.5.0 requires botocore<1.29.77,>=1.29.76, but you have botocore 1.35.70 which is incompatible.
s3fs 2023.4.0 requires fsspec==2023.4.0, but you have fsspec 2024.10.0 which is incompatible.


Let's load in the model, GPT2 and look at the attention mechanisms.

**Hint... click on the different blocks in the visualization to see the attention**

In [7]:
from transformers import AutoTokenizer, AutoModel, utils, AutoModelForCausalLM

from bertviz import model_view
utils.logging.set_verbosity_error()  # Suppress standard warnings

model_name = 'openai-community/gpt2'
input_text = "No, I am your father"
model = AutoModelForCausalLM.from_pretrained(model_name, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(model_name)
inputs = tokenizer.encode(input_text, return_tensors='pt')  # Tokenize input text
outputs = model(inputs)  # Run model
attention = outputs[-1]  # Retrieve attention from model outputs
tokens = tokenizer.convert_ids_to_tokens(inputs[0])  # Convert input ids to token strings
model_view(attention, tokens)  # Display model view

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

<IPython.core.display.Javascript object>

## Pipeline using HuggingFace

Now, let's see a practical application of LLMs using a HuggingFace pipeline for classification.

This involves a few steps including:
1. Setting up a prompt
2. Loading in a pretrained model
3. Loading in the tokenizer and tokenizing input text
4. Performing model inference
5. Interpreting inference output

In [20]:
# STEP 0 : Installations and imports
from transformers import AutoTokenizer, AutoModelForSequenceClassification, AutoConfig
import torch
import torch.nn.functional as F
prompts = ['No, I am your',
           'Im sorry Dave, Im afraid I',
           'What is your favorite color?',
          'Forget it Jake, its',
          'You cant handle the']

from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
pipe = pipeline("text-generation", model="gpt2")
generator(prompts, max_length=20, num_return_sequences=5)

[[{'generated_text': 'No, I am your enemy." And then he took a hand in mine, looked at me and'},
  {'generated_text': 'No, I am your dear friend."\n\n"You may come and I won\'t. I'},
  {'generated_text': 'No, I am your father."\n\nNo, I am not your father. It cannot be'},
  {'generated_text': 'No, I am your Lord." - A message of forgiveness "You are My body and me"'},
  {'generated_text': 'No, I am your mother, but I will not allow you to suffer for you." She looked'}],
 [{'generated_text': "Im sorry Dave, Im afraid I was too long ago. The guy doesn't seem like he's"},
  {'generated_text': "Im sorry Dave, Im afraid I'm leaving too soon. I'll be here shortly with one of"},
  {'generated_text': 'Im sorry Dave, Im afraid I will fall in love with you. I have thought this out before'},
  {'generated_text': "Im sorry Dave, Im afraid I'd hurt someone's feelings, It should be fine. It is"},
  {'generated_text': "Im sorry Dave, Im afraid I've lost this trainwreck. We are getting to do real work

### 1. Setting up a prompt

A "prompt" refers to a specific input or query provided to a language model. They guide the text processing and generation by providing the context for the model to generate coherent and relevant text based on the given input.

The choice and structure of the prompt depends on the specific task, the context and desired output. Prompts can be "discrete" or "instructive" where they are explicit instructions or questions directed to the language model. They can also be more nuanced by more providing suggestions, directions and contexts to the model.

We will use very simple prompts in this tutorial section, but we will learn more about prompt engineering and how it helps in optimizing the performance of the model for a given use case in the following tutorials.

In [9]:
# STEP 1 : Set up the prompt
input_text = "The panoramic view of the ocean was breathtaking."

### 2. Loading Pretrained Models

The AutoModelForSequenceClassification from_pretrained() method instantiates a sequence classification model.

Refer to https://huggingface.co/transformers/v3.0.2/model_doc/auto.html#automodels for the list of model classes supported.

"from_pretrained" method downloads the pre-trained weights from the Hugging Face Model Hub or the specified URL if the model is not already cached locally. It then loads the weights into the instantiated model, initializing the model parameters with the pre-trained values.

The model cache contains:

* model configuration (config.json)
* pretrained model weights (model.safetensors)
* tokenizer information (tokenizer.json, vocab.json, merges.txt, tokenizer.model)

In [10]:
# STEP 2 : Load the pretrained model.
model_name = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
config = AutoConfig.from_pretrained(model_name)
print(config)

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

DistilBertConfig {
  "_name_or_path": "distilbert-base-uncased-finetuned-sst-2-english",
  "activation": "gelu",
  "architectures": [
    "DistilBertForSequenceClassification"
  ],
  "attention_dropout": 0.1,
  "dim": 768,
  "dropout": 0.1,
  "finetuning_task": "sst-2",
  "hidden_dim": 3072,
  "id2label": {
    "0": "NEGATIVE",
    "1": "POSITIVE"
  },
  "initializer_range": 0.02,
  "label2id": {
    "NEGATIVE": 0,
    "POSITIVE": 1
  },
  "max_position_embeddings": 512,
  "model_type": "distilbert",
  "n_heads": 12,
  "n_layers": 6,
  "output_past": true,
  "pad_token_id": 0,
  "qa_dropout": 0.1,
  "seq_classif_dropout": 0.2,
  "sinusoidal_pos_embds": false,
  "tie_weights_": true,
  "transformers_version": "4.32.1",
  "vocab_size": 30522
}



### 3. Loading in the tokenizer and tokenizing input text

Here, we load in a pretrained tokenizer associated with this model.

In [11]:
#STEP 3 : Load the tokenizer and tokenize the input text
tokenizer  =  AutoTokenizer.from_pretrained(model_name)
input_ids = tokenizer(input_text, return_tensors="pt")["input_ids"]
print(input_ids)

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tensor([[  101,  1996,  6090,  6525,  7712,  3193,  1997,  1996,  4153,  2001,
          3052, 17904,  1012,   102]])


### 4. Performing inference and interpreting

Here, we:
* load data into the model,
* perform inference to obtain logits,
* Convert logits into probabilities
* According to probabilities assign label

The end result is that we can predict whether the input phrase is positive or negative.

In [12]:
# STEP 5 : Perform inference
outputs = model(input_ids)
result = outputs.logits
print(result)

# STEP 6 :  Interpret the output.
probabilities = F.softmax(result, dim=-1)
print(probabilities)
predicted_class = torch.argmax(probabilities, dim=-1).item()
labels = ["NEGATIVE", "POSITIVE"]
out_string = "[{'label': '" + str(labels[predicted_class]) + "', 'score': " + str(probabilities[0][predicted_class].tolist()) + "}]"
print(out_string)

tensor([[-4.2767,  4.5486]], grad_fn=<AddmmBackward0>)
tensor([[1.4695e-04, 9.9985e-01]], grad_fn=<SoftmaxBackward0>)
[{'label': 'POSITIVE', 'score': 0.9998530149459839}]


### Saving and loading models

Model can be saved and loaded to and from a local model directory.

In [13]:
from transformers import AutoModel, AutoModelForCausalLM

# Instantiate and train or fine-tune a model
model = AutoModelForCausalLM.from_pretrained("bert-base-uncased")

# Train or fine-tune the model...

# Save the model to a local directory
directory = "my_local_model"
model.save_pretrained(directory)

# Load a pre-trained model from a local directory
loaded_model = AutoModel.from_pretrained(directory)

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

## Model Hub
The Model Hub is where the members of the Hugging Face community can host all of their model checkpoints for simple storage, discovery, and sharing.

* Download pre-trained models with the huggingface_hub client library, with Transformers for fine-tuning.
* Make use of Inference API to use models in production settings.
* You can filter for different models for different tasks, frameworks used, datasets used, and many more.
* You can select any model, that will show the model card.
* Model card contains information of the model, including the description, usage, limitations etc. Some models also have inference API's that can be used directly.

Model Hub Link : https://huggingface.co/docs/hub/en/models-the-hub

Example of a model card : https://huggingface.co/bert-base-uncased/tree/main

## Recommended reading

* ["The Illustrated Transformer" by Jay Alammar](https://jalammar.github.io/illustrated-transformer/)
* ["Visualizing A Neural Machine Translation Model (Mechanics of Seq2seq Models With Attention)" by Jay Alammar](https://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/)
* ["The Illustrated GPT-2 (Visualizing Transformer Language Models)"](https://jalammar.github.io/illustrated-gpt2/)
* ["A gentle introduction to positional encoding"](https://machinelearningmastery.com/a-gentle-introduction-to-positional-encoding-in-transformer-models-part-1/)
* ["LLM Tutorial Workshop (Argonne National Laboratory)"](https://github.com/brettin/llm_tutorial)
* ["LLM Tutorial Workshop Part 2 (Argonne National Laboratory)"](https://github.com/argonne-lcf/llm-workshop)

## Homework

1. Load in a generative model using the HuggingFace pipeline and generate text using a batch of prompts.
  * Play with generative parameters such as temperature, max_new_tokens, and the model itself and explain the effect on the legibility of the model response. Try at least 4 different parameter/model combinations.
  * Models that can be used include:
    * `google/gemma-2-2b-it`
    * `microsoft/Phi-3-mini-4k-instruct`
    * `meta-llama/Llama-3.2-1B`
    * Any model from this list: [Text-generation models](https://huggingface.co/models?pipeline_tag=text-generation)
    * `gpt2` if having trouble loading these models in
  * This guide should help! [Text-generation strategies](https://huggingface.co/docs/transformers/en/generation_strategies)
2. Load in 2 models of different parameter size (e.g. GPT2, meta-llama/Llama-2-7b-chat-hf, or distilbert/distilgpt2) and analyze the BertViz for each. How does the attention mechanisms change depending on model size?

In [None]:
# STEP 0 : Installations and imports
import numpy as np
from huggingface_hub import login
from transformers import pipeline

hf_token = "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"
login(token=hf_token, add_to_git_credential=True)


prompts = ['I know',
           'I love you',
           'Luke, I am your father',
          "That's no moon",
          'I used to be an adventurer like you',
          "I can't imagine a world without light",
          "That would be dark"]

from transformers import pipeline
generator = pipeline("text-generation", model="GPT2",device=0)
pipe = pipeline("text-generation", model="GPT2",device=0)
numCombos = 4
maxLength = 30

temps = np.random.uniform(0,1,size=(numCombos))
TokensParams = np.random.randint(5,30,size=(numCombos))



for token in TokensParams:
    for temp in temps:
        print(f"max new tokens = {token}, temperature = {temp}")
        print(generator(prompts, num_return_sequences=10, temperature = temp, max_new_tokens = token),'\n\n')

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


max new tokens = 7, temperature = 0.9940366739279645


Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know I've seen it all before,"}, {'generated_text': 'I know I do."\n, by a'}, {'generated_text': 'I know your name, too. (And'}, {'generated_text': 'I know a lot of people who love it'}, {'generated_text': 'I know that I would love to give you'}, {'generated_text': "I know what you're saying: you're"}, {'generated_text': 'I know we could use a day-night'}, {'generated_text': 'I know I have to kill the dragon but'}, {'generated_text': 'I know our guys were able to perform exceptionally'}, {'generated_text': 'I know this might sound bad, but we'}], [{'generated_text': 'I love you too, Kiko."\n\n'}, {'generated_text': 'I love you for coming out to us!"\n'}, {'generated_text': 'I love you guys.\n\nI love to'}, {'generated_text': 'I love you and your company!"\n\n"'}, {'generated_text': 'I love you. And have never wanted to take'}, {'generated_text': 'I love you, Mike, I love you.'}, {'generated_text': 'I love you."\n\nIn the past,'}, {'generated_text': 'I love you,"

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know that I'm not the only one"}, {'generated_text': "I know that I'm not the only one"}, {'generated_text': "I know that I'm not the only one"}, {'generated_text': "I know that I'm not the only one"}, {'generated_text': "I know that I'm not the only one"}, {'generated_text': "I know I'm not the only one who"}, {'generated_text': "I know that I'm not the only one"}, {'generated_text': "I know that I'm not the only one"}, {'generated_text': "I know I'm not the only one who"}, {'generated_text': "I know that I'm not the only one"}], [{'generated_text': 'I love you, and I love you so much'}, {'generated_text': 'I love you, and I love you, and'}, {'generated_text': 'I love you, and I love you so much'}, {'generated_text': 'I love you, and I love you too.'}, {'generated_text': 'I love you, and I love you so much'}, {'generated_text': 'I love you, and I love you, and'}, {'generated_text': 'I love you, and I love you, and'}, {'generated_text': 'I love you, and I love y

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know. That didn't go well for"}, {'generated_text': "I know for a fact that you don't"}, {'generated_text': "I know he's going to be playing as"}, {'generated_text': "I know we're not going to do that"}, {'generated_text': 'I know how hard it is to give kids'}, {'generated_text': "I know it's difficult to see a solution"}, {'generated_text': "I know this from the fact that I've"}, {'generated_text': "I know you're right, but you don"}, {'generated_text': "I know it's a big stretch to say"}, {'generated_text': "I know, I never thought I'd ever"}], [{'generated_text': "I love you.\n\n[O'Brien"}, {'generated_text': 'I love you; you are such a good and'}, {'generated_text': 'I love you."\n\n"Ahhhh..."'}, {'generated_text': 'I love you.\n\n[Catch up'}, {'generated_text': 'I love you. I love you! I love'}, {'generated_text': 'I love you, baby." "But I don'}, {'generated_text': 'I love you so much."\n\nAdvertisement\n'}, {'generated_text': 'I love you.\n\nWhen it comes

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know there's a lot of things,"}, {'generated_text': 'I know, this is weird."\n\n'}, {'generated_text': "I know he didn't think he could get"}, {'generated_text': "I know that there are people that don't"}, {'generated_text': 'I know those have been a pain but I'}, {'generated_text': 'I know this about the first season, because'}, {'generated_text': 'I know how I feel. If you don'}, {'generated_text': 'I know I have a lot to learn.'}, {'generated_text': 'I know he will not be available as a'}, {'generated_text': "I know it's not going to have much"}], [{'generated_text': 'I love you guys," said Ewing; "'}, {'generated_text': 'I love you so much, Derry!" And'}, {'generated_text': 'I love you!"'}, {'generated_text': 'I love you," the man told her, and'}, {'generated_text': 'I love you."\n\n"Ah, so'}, {'generated_text': 'I love you, you know I love you,'}, {'generated_text': 'I love you and my family."\n\nThe'}, {'generated_text': 'I love you" (Chorus): I would'}, {

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know these characters were all different and I'm just not sure they feel very relevant. This is definitely a game we want, and"}, {'generated_text': 'I know you won\'t want to talk to the world anymore. I\'ve got this job and I know your job as well."\n'}, {'generated_text': "I know everyone's been asking me for many months if I'm happy I have a future with this company. I've read that it"}, {'generated_text': 'I know."\n\n(Picture: PA)\n\nPavlovic was not surprised when he made his name.\n\n'}, {'generated_text': "I know that some of you are watching with amazement. Well, I'm here to save you. It could be that this is"}, {'generated_text': "I know they don't like to share their phone with us because of their privacy concerns, but we do know that there isn't a"}, {'generated_text': 'I know how we feel about this," says Shonky, who grew up in Fort Dodge, Minn., watching a football game'}, {'generated_text': "I know that I'm a sucker for any good ideas, and I'm gr

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know that I'm not the only one who's been affected by this. I've been told that my family is affected by this"}, {'generated_text': "I know that I'm not the only one who's been told that. I've been told that by people who are not in the"}, {'generated_text': "I know that I'm not the only one who's been told that. I'm not the only one who's been told that."}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been told that I'm not the only one"}, {'generated_text': "I know I'm not the only one who's been affected by this. I've been told by my doctor that I'm not going"}, {'generated_text': "I know that you're not going to be able to get a job, but I'm going to be able to get a job."}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been told that I'm not the only one"}, {'generated_text': "I know that I'm not the only one who has been affected by this. I've been told that I'm not the only

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': 'I know that to have any chance of winning in the tournament, there needs to be a clear purpose of both teams, and it is'}, {'generated_text': "I know what you're thinking, that's the end of the whole argument. The end, I think, I got to the point"}, {'generated_text': "I know. But who cares, you know, I'm glad it's now over.\n\nI'll just go out and get"}, {'generated_text': 'I know that we will definitely be working to expand the program into a whole new area of education, and I certainly hope we get to'}, {'generated_text': "I know that I'm not allowed to do that when you're just doing the same thing over and over... but I have to."}, {'generated_text': "I know for a fact that our country's leading tech companies are making big bets, and it's just been amazing. We've seen"}, {'generated_text': 'I know how much you like watching your favorite kids play, but we want to make sure you are able to enjoy your favorite games in'}, {'generated_text': 'I know you are tryi

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know I'm right. And I want to tell you that we're very close to that agreement that's been on the table for"}, {'generated_text': "I know that you're not trying to be political, but I'm looking forward to what Donald said, because I'm very, very"}, {'generated_text': "I know that's a small thing for me. I don't see myself as doing that (although I'd like a chance to try"}, {'generated_text': 'I know it\'s been a long road," said O.M. Davenport, a former mayor. "I\'ll have a'}, {'generated_text': 'I know this is a bit embarrassing to hear, just because the media have been following Hillary Clinton up on this, but her silence about'}, {'generated_text': 'I know that in New York, which is not the way the government wants it to be, these people will have to run an organization'}, {'generated_text': "I know you may have heard by now that we get along well with each other, but that hasn't changed much in the last couple"}, {'generated_text': 'I know it might cause som

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know you got my attention; are you sure you're the one they're hoping to recruit"}, {'generated_text': "I know we don't have our own way but at this point we haven't given up."}, {'generated_text': "I know that I'm not an expert on these substances, but I saw you on this blog"}, {'generated_text': 'I know that every person who is here, or even on the Internet knows that someone who is'}, {'generated_text': 'I know that I can change the course of this process when I start the process. But we'}, {'generated_text': 'I know for sure that this is only being used to support some kind of special case here in'}, {'generated_text': "I know what you're thinking, but I just got my ass handed out to some homeless people"}, {'generated_text': 'I know I said there weren\'t the same many people who said "we really don\'t need'}, {'generated_text': 'I know this story because I have had it for many, many years; it is part of'}, {'generated_text': "I know, I'll tell ya. But you'l

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know that I'm not the only one who's been affected by this. I'm not"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I'm not"}, {'generated_text': "I know I'm not the only one who's been affected by this. I've been told"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been"}, {'generated_text': 'I know, I know. I know. I know. I know. I know. I'}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been"}], [{'generated_text': 'I love you, and I love you, and I love you, and I lo

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know it's been a long time coming, but I'm really honored and thankful to see"}, {'generated_text': "I know it hurts but it's OK if I've been bullied like that and I can't"}, {'generated_text': 'I know this is not a surprise, and I wish to share it with you as a reminder'}, {'generated_text': "I know, right? Aha! I'll be right back. I was waiting for you"}, {'generated_text': "I know what many of you are thinking: I mean, I can't believe what we're"}, {'generated_text': 'I know what you\'re looking for? Then come on in!" "Thank you," said the'}, {'generated_text': "I know this is all fake news and I'm the one who is doing it. But I"}, {'generated_text': "I know you have some serious questions that I need to talk about, but I'm sure you"}, {'generated_text': 'I know how to work the wheel, but I am always working to solve my problems with the'}, {'generated_text': "I know it's a good moment. So, how does the situation of the black people go"}], [{'generated_text':

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': 'I know that, and that could well be true."\n\nGillum, who was'}, {'generated_text': 'I know I would have gone more, but I was just so excited right before I got here'}, {'generated_text': "I know he's not trying to intimidate people, he just wants to make sure that they feel"}, {'generated_text': 'I know," I admit, "that I feel like a fool. I know I have an'}, {'generated_text': 'I know I am making a huge mistake by not being able to work for any more than 30'}, {'generated_text': 'I know someone, and we\'re going in, so I\'m going to get you."\n'}, {'generated_text': "I know that we're not getting great results from our experiments, but sometimes it's hard to"}, {'generated_text': "I know I'm bad, but I can still feel her.\n\nThe second I hear"}, {'generated_text': 'I know this is going to be the most confusing part. I don\'t care about the "'}, {'generated_text': 'I know I need to talk to your team. Maybe I was there to talk with your team'}], [{'generated_text'

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know of no better way to get to that point than to do this, if you're from a conservative town. Herein lies your problem"}, {'generated_text': "I know there's a reason why his last season was so disastrous, how they got so far. But I'm not getting any better. I"}, {'generated_text': 'I know I look terrible and make a lot of mistakes all the time. I will always be proud of my family. I am going to do'}, {'generated_text': 'I know this isn\'t something an editor, who\'s a bit like me, would be willing to accept," I said. "But we\'re'}, {'generated_text': 'I know but it takes more work than we can handle… And we love the feeling of being with one another. And we love the feeling of'}, {'generated_text': "I know it sounds like he would be a great hire for a guy like John Cerrone. But really, he's been a huge disappointment"}, {'generated_text': "I know you've seen the photos on Youtube you know that it's because I like it. I just want to post the videos with the pho

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know that I'm not the only one who's been affected by this. I've been told that I'm not the only one who's"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been told that I'm not the only one who's"}, {'generated_text': 'I know I\'m not the only one who\'s been told that.\n\n"I\'ve been told that by the police. I\'ve been'}, {'generated_text': "I know that I'm not the only one who's been told that. I'm not the only one who's been told that.\n\n"}, {'generated_text': "I know that I'm not the only one who's been told that. I'm not the only one who's been told that.\n\n"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I'm not the only one who's been affected by this."}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been told that I'm not the only one who's"}, {'generated_text': "I know that I'm not the only one who's been affected by this. I've been

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': "I know it's hard to believe that people need to be asked this question, but I find it hard to believe I'm going to take my"}, {'generated_text': 'I know your little sister and I\'ll have to help you out here," Daphne said.\n\n"What do you want me to'}, {'generated_text': 'I know that sometimes when I want to play in my first game (or even just play League of Legends), I want to get into the group'}, {'generated_text': "I know you can't just run to your desk, we know you can't just sit there and work. We know you can't just sit"}, {'generated_text': 'I know I\'m not the best that day. But I know I\'m smart enough to learn the art of watching people.\n\n"A'}, {'generated_text': "I know, but I have this idea that, if they're going to get an NBA game under their belt, they gotta do this thing right"}, {'generated_text': 'I know that the new president is coming soon," said Mr. Trump\'s first official press secretary, Sean Spicer, who said Tuesday that Mr.'}, {'generate

Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.


[[{'generated_text': 'I know you\'ll ask for help, too," he says, gesturing to the back of the chair in the corner of his room. "'}, {'generated_text': 'I know that to be true. All I\'m saying is I\'ve got faith in God"\n\nTravis has also signed a five-'}, {'generated_text': 'I know he is a good guy. We are talking about our own person or what he does. And he does not represent all of us.'}, {'generated_text': 'I know of no better way for you, a good and loyal customer, here at Mr. Lyle. We are happy and thrilled to begin'}, {'generated_text': "I know there are some good examples of both the classic and modern games on this list, but I haven't found one that really fits all of"}, {'generated_text': 'I know that I want to be with the guys who care about this country," Sanders said. "I always talk about women, but I\'m'}, {'generated_text': "I know that this is going to be a very interesting story, but after the game, I'm going to go sit down and talk to the"}, {'generated_text': 'I know

# for small tmeperature it seems the model is more likely to repeat phrases.
# when the max_new_tokens is small it is not as legible as it tends to make smaller sentences causing it to not have "time" to complete the sentence

In [11]:
from transformers import AutoTokenizer, AutoModel, utils, AutoModelForCausalLM

from bertviz import model_view
utils.logging.set_verbosity_error()  # Suppress standard warnings

modelName = 'GPT2'
input = "It's a me, a Mario"
model = AutoModelForCausalLM.from_pretrained(modelName, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(modelName)
inputs = tokenizer.encode(input, return_tensors='pt')  # Tokenize input text
outputs = model(inputs)  # Run model
attention = outputs[-1]  # Retrieve attention from model outputs
tokens = tokenizer.convert_ids_to_tokens(inputs[0])  # Convert input ids to token strings
model_view(attention, tokens)  # Display model view




<IPython.core.display.Javascript object>

In [12]:
from transformers import AutoTokenizer, AutoModel, utils, AutoModelForCausalLM

from bertviz import model_view
utils.logging.set_verbosity_error()  # Suppress standard warnings

modelName = 'meta-llama/Llama-3.2-1B'
input = "It's a me, a Mario"
model = AutoModelForCausalLM.from_pretrained(modelName, output_attentions=True)
tokenizer = AutoTokenizer.from_pretrained(modelName)
inputs = tokenizer.encode(input, return_tensors='pt')  # Tokenize input text
outputs = model(inputs)  # Run model
attention = outputs[-1]  # Retrieve attention from model outputs
tokens = tokenizer.convert_ids_to_tokens(inputs[0])  # Convert input ids to token strings
model_view(attention, tokens)  # Display model view



<IPython.core.display.Javascript object>

# There are more layers and attention heads within the llama model. The structure is also mopre variable, for example in GPT2 the majority are from one corner to another, but llama has more scatter in it