<a href="https://colab.research.google.com/github/Yajie-Xu/HuggingFace-Course-NLP/blob/main/course/en/chapter2/quiz_ch2.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Ch2 Quiz

Install the Transformers, Datasets, and Evaluate libraries to run this notebook.

In [None]:
!pip install datasets evaluate transformers[sentencepiece]

## 9. What does the result variable contain in this code sample?


In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
result = tokenizer.tokenize("Hello!")
result

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

['Hello', '!']

## 10. Is there something wrong with the following code?

In [None]:
from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
model = AutoModel.from_pretrained("gpt2")

encoded = tokenizer("Hey!", return_tensors="pt")
result = model(**encoded)

 > A: The tokenizer and model should always be from the same checkpoint.

## Model Head VS Model Output

Clarification:
	1.	Model Head = A predefined layer(s) at the end of the model

1. Model Head = A predefined layer(s) at the end of the model
  * It is part of the model architecture.
  * It transforms hidden states from the base model into task-specific outputs.
  * Example: A linear layer (torch.nn.Linear) for classification.

2. Model Output = The actual values computed during inference
	*	It is what the model produces after running a forward pass.
	*	Example: Logits, probabilities, predicted labels, etc.

Example: DistilBERT for Sentiment Classification

*	Base Model (distilbert): Extracts hidden representations from input text.
*	Model Head (classifier): A linear layer that maps hidden states to logits.
*	Model Output (logits): The final numerical values returned when you run inference.

In [None]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model and tokenizer
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)

# Tokenize input
sequences = ["I've been waiting for a HuggingFace course my whole life."]
tokens = tokenizer(sequences, padding=True, truncation=True, return_tensors="pt")

# Run model
output = model(**tokens)

# Inspect components
print("Model Head (classifier layer):", model.classifier)
print("Model Output (logits):", output.logits)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Model Head (classifier layer): Linear(in_features=768, out_features=2, bias=True)
Model Output (logits): tensor([[-1.5607,  1.6123]], grad_fn=<AddmmBackward0>)


In [None]:
print(model)

DistilBertForSequenceClassification(
  (distilbert): DistilBertModel(
    (embeddings): Embeddings(
      (word_embeddings): Embedding(30522, 768, padding_idx=0)
      (position_embeddings): Embedding(512, 768)
      (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
      (dropout): Dropout(p=0.1, inplace=False)
    )
    (transformer): Transformer(
      (layer): ModuleList(
        (0-5): 6 x TransformerBlock(
          (attention): DistilBertSdpaAttention(
            (dropout): Dropout(p=0.1, inplace=False)
            (q_lin): Linear(in_features=768, out_features=768, bias=True)
            (k_lin): Linear(in_features=768, out_features=768, bias=True)
            (v_lin): Linear(in_features=768, out_features=768, bias=True)
            (out_lin): Linear(in_features=768, out_features=768, bias=True)
          )
          (sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
          (ffn): FFN(
            (dropout): Dropout(p=0.1, inplace=False)


Here, the model head consists of:

*	pre_classifier: A linear transformation that prepares hidden states.
*	classifier: The final layer (this is the main classification head).
*	dropout: A dropout layer to prevent overfitting.

In [None]:
# You can directly access the classification head:
print(model.classifier)

Linear(in_features=768, out_features=2, bias=True)


This means:
*	The input dimension (in_features=768) corresponds to the hidden size of DistilBERT.
* The output dimension (out_features=2) corresponds to the number of classes (positive/negative).
* This is the model head, which converts hidden states into class logits.