<a href="https://colab.research.google.com/github/RamsesMDLC/Smolagent_Project_1/blob/main/Smolagents_Project_1_YT.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**1. LOADING LIBRARIES / MODULES / CLASSES**



In [1]:
#Installs the smolagents library along with extensions defined in the [toolkit] option.
!pip install smolagents[toolkit]

#Components from smolagents
  #CodeAgent: The "agent". It orchestrates reasoning and tool usage.
  #DuckDuckGoSearchTool: The "tool". It lets the agent fetch information from the web.
  #TransformersModel: It "allow us to get access to the model through Hugging Face". A wrapper for Hugging Face Transformer models.
from smolagents import CodeAgent, DuckDuckGoSearchTool, TransformersModel

#API key
  #Provides a secure way to access stored secrets (like API tokens) within Google Colab.
from google.colab import userdata
  #Allows programmatic login to Hugging Face Hub.
from huggingface_hub import login

#Tokenizer: class in the Hugging Face Transformers library to process text inputs ("prompts or text") and outputs ("answer") for the model.
  #This means AutoTokenizer forms the bridge:
    #Input text → tokens/tensors → Model
      #Splitting text into tokens (smaller pieces such as words or subwords).
      #Converting these tokens into numbers ("tensors"), called input IDs, which the model uses for computation.
      #Managing extra elements like special tokens (e.g., [CLS], [SEP], padding).
    #Model output tokens/tensors → decoded text
  #It automatically loads and configures the correct tokenizer for a specified model (i.e., there’s no need to know the model-specific tokenizer class).
from transformers import AutoTokenizer

Collecting smolagents[toolkit]
  Downloading smolagents-1.21.3-py3-none-any.whl.metadata (16 kB)
Collecting ddgs>=9.0.0 (from smolagents[toolkit])
  Downloading ddgs-9.5.5-py3-none-any.whl.metadata (18 kB)
Collecting markdownify>=0.14.1 (from smolagents[toolkit])
  Downloading markdownify-1.2.0-py3-none-any.whl.metadata (9.9 kB)
Collecting primp>=0.15.0 (from ddgs>=9.0.0->smolagents[toolkit])
  Downloading primp-0.15.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (13 kB)
Collecting lxml>=6.0.0 (from ddgs>=9.0.0->smolagents[toolkit])
  Downloading lxml-6.0.1-cp312-cp312-manylinux_2_26_x86_64.manylinux_2_28_x86_64.whl.metadata (3.8 kB)
Downloading ddgs-9.5.5-py3-none-any.whl (37 kB)
Downloading markdownify-1.2.0-py3-none-any.whl (15 kB)
Downloading smolagents-1.21.3-py3-none-any.whl (145 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m145.4/145.4 kB[0m [31m4.6 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading lxml-6.0.1-cp312-cp312-manylinux_2_26_x8

#**2. GETTING THE TOKEN**

In [2]:
# Securely get Hugging Face token and login
hf_token = userdata.get('HF_TOKEN')
if hf_token:
    login(hf_token)
    print("Successfully logged in to Hugging Face!")
else:
    print("Token not found. Please add HF_TOKEN secret.")

Successfully logged in to Hugging Face!


#**3. DEFINING MODEL / TOKENIZER / PAD**


The log output you are seeing is the model and tokenizer files being downloaded from Hugging Face’s model hub, along with some warnings about default generation settings. Each of the lines corresponds to specific files that define how your model and tokenizer work. Let’s break this down.
Tokenizer Files Being Downloaded

When you run:

python
tokenizer = AutoTokenizer.from_pretrained(model_id)

Hugging Face automatically downloads the tokenizer artifacts from the repository Qwen/Qwen1.5-1.8B. These include:

    tokenizer_config.json (1.29kB)
    Contains metadata about the tokenizer, such as whether it lowercases text, what type of tokenizer is being used (e.g., BPE, SentencePiece), and any special tokens (CLS, SEP, etc.).

    vocab.json (2.78MB)
    Stores the vocabulary mapping for the tokenizer: a dictionary linking each token string to an ID (e.g., "elephant" -> 5031). This is used to convert text into token IDs.

    merges.txt (1.67MB)
    Defines the merge operations for a Byte Pair Encoding (BPE) tokenizer. It describes how characters and subwords are combined into larger tokens.

    tokenizer.json (7.03MB)
    A consolidated JSON file that includes vocabulary, merges, and tokenizer configuration in one place. This is often faster and more convenient to load.

Model Files Being Downloaded

Later, when you load the model:

python
model = TransformersModel(model_id=model_id)

the files specific to the model are fetched:

    config.json (662B)
    Stores model architecture parameters (e.g., hidden size, number of layers, attention heads, max position embeddings).

    model.safetensors (3.67GB)
    The actual pretrained model weights in the safetensors format (preferred over .bin for safety and efficiency). This file makes up the majority of the size because it contains billions of parameters.

    generation_config.json (138B)
    Stores default parameters for text generation, like temperature, top-k, top-p, max length, and repetition penalty.

Warning Message

text
WARNING:smolagents.models:`max_new_tokens` not provided, using this default value for `max_new_tokens`: 4096

    This means that when you later call the model to generate text, if you don’t explicitly set max_new_tokens, the system will fall back to a default of 4096 tokens.

    max_new_tokens controls how long the model’s response can be. By default, it’s big, but for efficiency or cost reasons you’ll often want to set it (e.g., max_new_tokens=200).

Why This Happens

When you run the code the first time, Hugging Face:

    Checks if the tokenizer/model files are local.

    If not, downloads them from the Hugging Face Hub.

    Caches them locally (usually at ~/.cache/huggingface/transformers/).

    Prints a live download progress bar (those lines with kB/s and MB/s speeds).

    Applies default settings if you haven’t provided explicit overrides (like max_new_tokens).

So the meaning of the output is essentially: “Hugging Face is downloading and caching all the necessary tokenizer and model files to run Qwen/Qwen1.5-1.8B, and it’ll use a default generation max length of 4096 tokens unless you specify otherwise.”

Would you like me to also show you a snippet to override max_new_tokens properly during generation so you don’t hit that warning?

In [3]:
#Defining Model (from Hugging Face)
model_id = "Qwen/Qwen1.5-1.8B"

#Initialize tokenizer
  #Load a pretrained tokenizer for the given model identified by model_id (in this case "Qwen/Qwen1.5-1.8B")
    #The tokenizer includes vocabulary, tokenization rules, special tokens, and associated settings needed to convert raw text into token IDs.
tokenizer = AutoTokenizer.from_pretrained(model_id)

#Check whether the tokenizer has a designated padding token.
  #Padding tokens are used to make all input sequences the same length by adding special "pad" tokens to shorter sequences.
    #Real-world texts vary in length, so to batch-process multiple sequences efficiently, shorter sequences are padded with these special tokens until they match the longest sequence length in the batch.
    #Padding tokens carry no meaningful information and are meant only to fill space for model input consistency.
  #Padding tokens ensure stable input preprocessing and model compatibility during tokenization and generation
  #If the padding token is not set, the tokenizer or model might throw errors during inference or training.
  #By assigning the EOS (end-of-sentence) token as padding, the code ensures compatibility even when a dedicated pad token is not defined for the particular model.
if tokenizer.pad_token is None:
    tokenizer.pad_token = tokenizer.eos_token

# Load the model
model = TransformersModel(model_id=model_id)

# Fix pad_token_id in model config if not set
if model.model.config.pad_token_id is None:
    model.model.config.pad_token_id = tokenizer.pad_token_id

tokenizer_config.json: 0.00B [00:00, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]



config.json:   0%|          | 0.00/662 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/3.67G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/138 [00:00<?, ?B/s]


agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)

    CodeAgent: A higher-level wrapper (likely from a framework like LangChain or a custom agent system) that can run a language model while also using external tools.

    tools=[DuckDuckGoSearchTool()]: The agent can access web search via DuckDuckGo, so it can pull in fresh information to supplement the model’s reasoning.

    model=model: The agent is linked to a pretrained transformer-based language model (e.g., Hugging Face’s transformers).


-----------------

input_text = "How long would it take for an elephant to cross the United States from Florida to California?"
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)



    input_text: The user’s query in plain text.

    tokenizer: Converts text into tokens (numerical IDs) suitable for the model.

    return_tensors="pt": Returns PyTorch tensors (instead of lists or arrays).

    padding=True: Ensures uniform sequence lengths by filling shorter sequences with padding tokens.

    truncation=True: Ensures long sequences are cut down to fit the model’s input size.

This produces a dictionary inputs containing:

    input_ids: Token IDs for the sequence.

    attention_mask: A binary mask (1s for real tokens, 0s for padding) that tells the model which positions to attend to.


---------------


input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

    These are extracted separately for clarity, and will be used in the generation call.


----------------
generated_ids = model.model.generate(
    input_ids=input_ids,
    attention_mask=attention_mask,
    pad_token_id=model.model.config.pad_token_id,
    max_new_tokens=50
)




    model.model: Accesses the underlying transformer model inside the agent wrapper.

    generate: Method used for text generation (e.g., greedy search, beam search, sampling depending on model’s default).

    input_ids, attention_mask: Ensure the model knows exactly which parts of the input are real text vs padding.

    pad_token_id: In case the model needs to pad internally, this specifies the right padding token ID.

    max_new_tokens=50: Limits how many new tokens the model can generate beyond the input.

This produces generated_ids, which are the token IDs for the generated text.

--------------------------------

generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print("Generated text:")
print(generated_text)



    tokenizer.decode: Converts the generated token IDs back into a readable string.

    skip_special_tokens=True: Removes artifacts like <pad>, <eos> that the model may produce.

    Prints the final model response.

Overall Workflow

    Create an agent that can use search tools and a language model.

    Tokenize the input user query into model-readable form.

    Pass the tokenized inputs to the model’s generate function, ensuring masking and padding work properly.

    Decode the model’s output back into natural language and display it.

------------------------------------
**see a step-by-step execution trace of what the tensors might look like at each stage (input_ids, attention_mask, output IDs)**



In [None]:
# Initialize agent
agent = CodeAgent(tools=[DuckDuckGoSearchTool()], model=model)

# Prepare input text and tokenize with attention mask
input_text = "How long would it take for an elephant to cross the United States from Florida to California?"
inputs = tokenizer(input_text, return_tensors="pt", padding=True, truncation=True)

input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

# Since agent.run() might not allow passing attention_mask directly,
# call the model generation yourself for reliable behavior:

generated_ids = model.model.generate(
    input_ids=input_ids,
    attention_mask=attention_mask,
    pad_token_id=model.model.config.pad_token_id,
    max_new_tokens=50
)

# Decode generated tokens
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print("Generated text:")
print(generated_text)