# AI Q&A Agent using Open Source LLM from HuggingFace

This is the first notebook in the series of experiments where I will build different AI agents using open-source LLMs from HuggingFace.

### Google Colab
I will use Google Colab for creating and running the python code to build the AI agents using open-source LLMs from HuggingFace. Why did I choose Google Colab instead of my local computer?
1. Free access to powerful T4 GPUs needed to run most of the LLMs efficiently.
2. Easy ability to share code and collaborate.

### Hugging Face
I will need to connect to HuggingFace to use the appropriate open-source LLM for the AI application and connect that from my notebook in Colab. Here are the steps -
1. Create a free HuggingFace account at https://huggingface.co
2. Navigate to Settings from the user menu on the top right.
3. Create a new API token with **write** permissions.
4. Back to this colab notebook
  * Press the "key" icon on the side panel to the left
  * Click on add a new secret
  * In the name field put HF_TOKEN
  * In the value field put your actual token: hf_...
  * Ensure the notebook access switch is turned ON.

This way I can use my confidential API Keys for HuggingFace or others without needing to type them into my colab notebook, I will be sharing with others.

In [1]:
# Check GPU availability and specifications, such as its memory usage, temperature, and clock speed.
# We can also see that in details by clicking on Runtime (top menu) > View Resources
!nvidia-smi

Mon Dec 29 05:11:32 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   44C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [2]:
# I will need to connect from my notebook in Colab to HuggingFace by validating the token, in order to use open-source models.
# The huggingface_hub library allows to interact with the HuggingFace Hub, a platform democratizing open-source LLMs and Datasets

from huggingface_hub import login
from google.colab import userdata

hf_token = userdata.get('HF_TOKEN')
login(hf_token, add_to_git_credential=True)

### Model Selection

I will select a model from the HuggingFace model library based on the specific  application. Here are the steps -

* Go to https://huggingface.co/models.
* Click on Question Answering under NLP.
* Choose any model and review it's specification.
* I am choosing the model at https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad

Note: We should select a model based on various criteria, such as the specific use-casr, available infrastructure, latency, performance. I will cover those in details later.

### Approach 1 - Using HuggingFace Pipeline

This is a much simpler approach with the Hugging Face pipeline API, which  provides a high-level, task-specific interface for running inference with pretrained models without manually handling tokenization, preprocessing, or postprocessing.

This approach is ideal, when we need to run quick experimentation or prototyping and don't need to gain more granular control on the model behavior.

In [3]:
# Use a pipeline as a high-level helper
from transformers import pipeline

# Load the QA pipeline with the desired task and the model
qa = pipeline(
    task="question-answering",
    model="distilbert/distilbert-base-cased-distilled-squad"
)

config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

Device set to use cuda:0


In [4]:
# Application 1

# Sample input with the context and the question, the answer of which is in the context
question = "Who wrote the play Hamlet?"
context = "Hamlet is a famous tragedy written by William Shakespeare."

# Run inference
result = qa({
    "question": question,
    "context": context
})

print(result)



{'score': 0.9790515899658203, 'start': 38, 'end': 57, 'answer': 'William Shakespeare'}


In [5]:
# Application 2

# Sample input with the context and the question, the answer of which is in the context
question = "What is Sagrada Família?"
context = """
Barcelona boasts a history that stretches back to Roman times. Explore the narrow, winding streets of the Gothic Quarter,
the ancient heart of the city, where medieval buildings, churches, and palaces whisper tales of centuries past. You can
even see remnants of Roman walls incorporated into the city's fabric. Prepare to be amazed by Barcelona's architectural
prowess. The city is a treasure trove of Modernist and Art Nouveau masterpieces. Antoni Gaudí's works are simply mesmerizing.
The iconic Sagrada Família is an absolute must-see, a colossal temple that has become a symbol of the city. Don't miss his
other incredible creations like Casa Batlló, Casa Milà (La Pedrera), and Güell Park, all designated as UNESCO World Heritage
sites. Discover the stunning Music Palace, another UNESCO World Heritage site, showcasing the remarkable talent of architects
like Luis Doménechi Montaner. Stroll hand-in-hand down Las Ramblas, Barcelona's most famous promenade. This lively boulevard,
lined with trees, is a delightful spectacle of flower stalls, kiosks selling books and newspapers, and a vibrant street life
that leads you down to the port and the impressive Columbus Monument. From its ancient core to the expansive L'Eixample
district, designed with geometric blocks and open spaces, Barcelona showcases a fascinating evolution. While modern buildings
add to the skyline, the city's soul lies in its historical layers and the artistic flair that permeates its streets.
"""

# Run inference
result = qa({
    "question": question,
    "context": context
})

print(result)

{'score': 0.291929692029953, 'start': 545, 'end': 562, 'answer': 'a colossal temple'}
