In [None]:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = 'HuggingFaceTB/SmolLM2-135M-Instruct'
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/655 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/861 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/269M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [None]:
""" Next Step Tokenization
Tokenization is a fundamental step in text processing and natural language processing (NLP). A tokenizer breaks down text into smaller, meaningful units called "tokens." Think of it like taking a sentence and splitting it into individual words, or even smaller pieces depending on your needs. For example, the sentence "Hello, world!" might be tokenized into:

["Hello", ",", "world", "!"] (word-level tokens)
Or even ["Hel", "lo", ",", "wor", "ld", "!"] (subword tokens)"""

In [None]:
text = "My Name is Harshad "
tokens = tokenizer(text)
tokens

{'input_ids': [5965, 10181, 314, 407, 5555, 344, 216], 'attention_mask': [1, 1, 1, 1, 1, 1, 1]}

In [None]:
"""1>input_ids: These are the numerical IDs that represent each token - this is what the model actually processes
   2>attention_mask: Other inputs required for the model, let's ignore it for now"""

In [None]:
# to get back token str from the tokens
[tokenizer.decode([token_id]) for token_id in tokens['input_ids']]

['My', ' Name', ' is', ' H', 'arsh', 'ad', ' ']

In [None]:
""" Now Move  to Token breakdown:

'my', ' name', ' is', ' harsh', 'ad' — other words and their corresponding representations. We can see that the tokenizer doesn't just split on spaces. It learns patterns from training data, so spaces become part of tokens (except the first word).

why sub word Tokenization?
You might wonder why "name" becomes ' name' (with a space) or why "harshad" is split into ' harsh' and 'ad'. Modern tokenizers use subword tokenization — they learn common patterns from millions of texts.

A space before a word often signals it's a separate concept, so the tokenizer treats ' name' as one unit. Rare or less frequent words like "harshad" are split into smaller meaningful chunks (' harsh' + 'ad').

This helps the model understand word boundaries, context, and rare words much better than just splitting on spaces."""

In [7]:
print(tokenizer.special_tokens_map)
print(tokenizer.vocab_size)



{'bos_token': '<|im_start|>', 'eos_token': '<|im_end|>', 'unk_token': '<|endoftext|>', 'pad_token': '<|im_end|>', 'additional_special_tokens': ['<|im_start|>', '<|im_end|>']}
49152


In [None]:
""" Now lets look at The Foundation: From Words to Numbers
The Model's Internal Thinking Process When you see those logits numbers, you're literally looking at the model's "thoughts"! Each position in our input gets its own set of predictions. The model isn't just guessing the next word - it's considering what could come after EVERY position. But for text generation, we only care about the very last position (after " is").

We have converted the text to tokens and will use them to get the next token from the model."""

In [8]:
import torch
input_tensor = torch.tensor([tokens['input_ids']]) # as the model needs ptorch tensor as input
op = model(input_tensor)



In [None]:
"""We have to focus on logits for now and can ignore all other parts"""

In [9]:
op.logits.shape

torch.Size([1, 7, 49152])

In [None]:
""" its really intresting that why so many numbers? 49,152 might seem like overkill, but remember - the model has to consider EVERY possible token it knows. This includes common words like "happy", rare words like "sesquipedalian", numbers, punctuation, and even tokens from other languages. Most will have very low scores, but the model still evaluates them all."

In [10]:
print(f"Logits shape: {op.logits.shape}")
print("Shape breakdown: [Batch_size, Sequence_length, Vocab_size]")
print(f"[{op.logits.shape[0]}, {op.logits.shape[1]}, {op.logits.shape[2]}]")
print(f"- Batch: {op.logits.shape[0]} text(s) processed")
print(f"- Sequence: {op.logits.shape[1]} tokens in input")
print(f"- Vocab: {op.logits.shape[2]:,} possible next tokens")

Logits shape: torch.Size([1, 7, 49152])
Shape breakdown: [Batch_size, Sequence_length, Vocab_size]
[1, 7, 49152]
- Batch: 1 text(s) processed
- Sequence: 7 tokens in input
- Vocab: 49,152 possible next tokens


In [None]:
## 49,152 Possibilities: Watching the Model Think From Raw Scores to Probabilities
1. Converting logits to probabilities using softmax
1. Demonstrating why we can't just pick the highest logit every time
1. A simple example of sampling vs greedy selection

In [11]:
# Extract logits for the last token position (where next token will be predicted)
last_token_logits = op.logits[:, -1, :]  # Shape: [1, 262144]

# Find the token with highest probability (greedy selection)
predicted_token_id = last_token_logits.argmax(dim=-1)  # Gets index of max value

# convert the id to token
next_token = tokenizer.decode(predicted_token_id)

print(f"predicted_token_id : {predicted_token_id.item()}")
print(f"next token : `{next_token}`")

predicted_token_id : 198
next token : `
`


In [None]:
"""The Art of Selection: Why Randomness Matters
So far, we've done text to predictions. But here's the thing - if we always select the token with the highest logit score (greedy selection), our model becomes predictable and boring. It's like having a conversation with someone who always gives the most obvious response!

This is where sampling comes in. Instead of always picking the #1 choice, large language models introduce some randomness by selecting from the top-K highest scoring tokens, where K is a number we can control.

There are two parameters are used, below are

Temperature is like a creativity dial: 0.1 = very predictable, 1.5 = very creative
Top-k means 'only consider the k most likely tokens' - saves computation and improves quality"""

In [12]:
import torch.nn.functional as F

def generate_next_token(text, temperature=1.0, top_k=50):
    """Simple function to show one step of text generation"""
    # Tokenize input
    tokens = tokenizer(text, return_tensors="pt")

    # Get model predictions
    with torch.no_grad():
        outputs = model(**tokens)

    # Get logits for next token prediction
    next_token_logits = outputs.logits[0, -1, :] / temperature

    # Get top-k most likely tokens
    top_logits, top_indices = torch.topk(next_token_logits, top_k)

    # Convert to probabilities and sample
    probs = F.softmax(top_logits, dim=-1)

    # randomly sample from all the probabilities
    next_token_idx = torch.multinomial(probs, 1)

    next_token_id = top_indices[next_token_idx]

    return tokenizer.decode(next_token_id)

In [13]:
[generate_next_token("my name is harshad", 0.7) for _ in range(6)]

[',', ' and', '."', ',', 'ell', '.']

In [None]:
##Putting It All Together: Building Our Generator
We have to add in new tokens to the end of the text and pass it to the model.



In [14]:
def generate_text(prompt, max_tokens=10):
    current_text = prompt
    for i in range(max_tokens):
        next_token = generate_next_token(current_text, temperature=0.7)
        current_text += next_token
        print(f"Step {i+1}: {current_text}")
    return current_text

In [15]:
generate_text(text, 4)

Step 1: My Name is Harshad 7
Step 2: My Name is Harshad 7,
Step 3: My Name is Harshad 7, and
Step 4: My Name is Harshad 7, and our


'My Name is Harshad 7, and our'

In [None]:
""" Thats all its just that simple"""