<a href="https://drive.google.com/file/d/16rDHV376TTYXCDORLuTIq9JUA-DciKyN/view?usp=sharing" target="_blank" >
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>

  

| | |
|:---:|:---|
| <img src="https://drive.google.com/uc?id=1i87oxReRQv7rLqFuZKCPeLCh2zy8RQUU" width="300"/> |  <strong><font size=5>Future x Summer School 2025</font></strong><br><br><strong><font color="#1A54A6" size=5>LLMs<br>Lab 2 Part B: Embeddings and Introduction to the HuggingFace API</font></strong>|

---



**Instructor:**  
Pavlos Protopapas  

**Teaching Team:**  
Nawang Thinley Bhutia





In this notebook, we will use the Hugging Face API to explore a few open source language models and their performance of simple language tasks.


By the end of this tutorial, you will have a solid understanding of how to leverage the Transformers library.

---

<font color ='#CE6DFF'>****Note:****</font>  For the students, please note that this lab requires you to complete a few blanks wherever you see `"---"` and `## YOUR CODE HERE`

---




## Table of Contents
1. **Part A**
    - Embeddings recap
    - Key Components
2. **Part B**
    - Introduction to HuggingFace
    - Getting started with the HuggingFace API
    - Using pretrained models for NLP tasks
    - Understanding implementation (BONUS)
    - Evaluation (BONUS)
    - Quick Gradio DEMO (Toy Deployement)

---



### What is Hugging Face?

[`Hugging Face`](https://huggingface.co/) is a company that specializes in natural language processing (NLP) technologies. It provides one of the most popular platforms for state-of-the-art machine learning models, particularly those designed for tasks like text analysis, language understanding, and generation. Hugging Face is widely recognized for its Transformers library, which offers easy access to pre-trained models that can perform a range of NLP tasks.

![alt text](https://drive.google.com/uc?id=1oG1s7346pjEn_A_EOS1QT6obiAD9o050)


### How does the Hugging Face API work?

HF's Transformers library provides models that are hosted on their public model hub, and these can be accessed and used without an API key for local computations. When you use a function like pipeline, the library automatically downloads the specified model from the Hugging Face model hub if it's not already present on your local machine. This means that the actual computation using the model occurs locally on your device, not on Hugging Face's servers.

## Setting Up Your Environment in Google Colab

###Why Google Colab?

- Google Colab provides a cloud-based environment that allows you to write, run, and share Python code through the browser.
- It is especially useful for machine learning and data analysis applications because it offers free access to GPUs and TPUs, making it an ideal platform for training and testing large models.

###Preparing Colab

- To make the most of Google Colab for this tutorial, we need to ensure that the environment is correctly set up with all the necessary libraries and configurations.

###Installation Steps (Optional / If Available)

- Ensure GPU Availability: First, let's make sure that your notebook is set to use a GPU, which will speed up the model operations significantly.

- Go to Runtime > Change runtime type in the Colab menu.
Select GPU from the Hardware accelerator dropdown list and click Save.

> ### **Important Notice**

`When running this notebook on colab, you may run out of available memory (RAM). This is expected as we are running a lot of diffferent models, simply run the imports again and only run the desired/pending sections thereafter.`

## Importing the transformers library

In [1]:
#check transformers version
!pip show transformers

Name: transformers
Version: 4.53.3
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft, sentence-transformers


In [2]:
import subprocess
import sys

required_packages = ['filelock', 'huggingface-hub', 'numpy', 'packaging', 'pyyaml', 'regex', 'requests', 'safetensors', 'tokenizers', 'tqdm']

for package in required_packages:
    try:
        __import__(package)
        print(f"{package} is already installed.")
    except ImportError:
        print(f"{package} not found. Installing...")
        subprocess.check_call([sys.executable, "-m", "pip", "install", package])
        print(f"{package} installed.")

filelock is already installed.
huggingface-hub not found. Installing...
huggingface-hub installed.
numpy is already installed.
packaging is already installed.
pyyaml not found. Installing...
pyyaml installed.
regex is already installed.
requests is already installed.
safetensors is already installed.
tokenizers is already installed.
tqdm is already installed.


In [3]:
  #importing specific modules from transformers
  from transformers import TFAutoModel, AutoModelForSequenceClassification, AutoTokenizer
  from transformers.pipelines import pipeline

In [4]:
#importing tensorflow
import tensorflow as tf

In [5]:
# Checking the available GPU
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

# Print TensorFlow and Transformers versions
print("TensorFlow version:", tf.__version__)

Sat Jul 26 12:24:05 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   48C    P8             10W /   70W |       2MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### **1. Sentiment Analysis**

In [6]:
# **Model 1: BERT**
classifier = pipeline('sentiment-analysis', model='bert-base-cased')
result = classifier("I love using Hugging Face!")
print(result)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'LABEL_1', 'score': 0.6020336151123047}]


In [7]:
# **Model 2: RoBERTa**
classifier = pipeline('sentiment-analysis', model='roberta-base')
result = classifier("I love using Hugging Face!")
print(result)

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'LABEL_1', 'score': 0.5032063126564026}]


Let's add a new sentence...

In [8]:
sample_sentence = "ChatGPT is my favorite language model for creative writing and problem solving!"

In [9]:
# **Model 3: DistilBERT**
classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')
result = classifier(sample_sentence)
print(result)

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9978591799736023}]


### **2. Text Generation**

Set the maximum length of the output in the next cell.

**Note**: try for a value between 30 and 50 to start.

In [10]:
# 【补充内容】设置文本生成的最大token数量
max_new_tokens = 40



In [11]:
# 【补充内容】设置返回序列的数量
num_return_sequences = 1



In [12]:
# 【补充内容】设置是否截断文本
truncation = True

In [13]:
# **Model 1: GPT-2**
generator = pipeline('text-generation', model='gpt2', device=0)
result = generator("Once upon a time", max_new_tokens=max_new_tokens, num_return_sequences=num_return_sequences, truncation=truncation)
print(result[0]['generated_text'])

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time, they were all a little like a team, one being very well-equipped to handle the task and another being very well-equipped to handle the task. It's not a huge surprise that a


In [None]:
# **Model 3: CTRL**
generator = pipeline('text-generation', model='ctrl', device=0)
result = generator("Legal Disclaimer:", max_new_tokens="---", num_return_sequences="---")
print(result[0]['generated_text'])

config.json:   0%|          | 0.00/635 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/6.55G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/6.55G [00:00<?, ?B/s]

### **3. Question Answering**

For basic Question-Answering,

We can use a model fine-tuned on the SQuAD (Stanford Question Answering Dataset) dataset, which is a standard benchmark for QA tasks.

In [None]:
# Load a QA model
qa_pipeline = pipeline("question-answering")

# Set context and question
context = """
Tesla, Inc. is an American electric vehicle and clean energy company based in Palo Alto, California.
Tesla's current products include electric cars, battery energy storage from home to grid scale, solar panels and solar roof tiles,
as well as other related products and services.
"""
question = "What does Tesla produce besides electric cars?"

# Get the answer
answer = qa_pipeline(question=question, context=context)
print(answer['answer'])

Further we can select our own choice of models too!

In [None]:
# **Model 1: BERT**
qa_pipeline = pipeline('question-answering', model='bert-base-cased')
context = "Hugging Face is a company that develops tools for building applications using machine learning."
question = "What does Hugging Face do?"
result = qa_pipeline({'question': question, 'context': context})
print(result)

In [None]:
# **Model 2: DistilBERT**
qa_pipeline = pipeline('question-answering', model='distilbert-base-uncased-distilled-squad')
result = qa_pipeline({'question': question, 'context': context})
print(result)

In [None]:
# **Model 3: RoBERTa**
qa_pipeline = pipeline('question-answering', model='deepset/roberta-base-squad2')
result = qa_pipeline({'question': "What is the capital of France?", 'context': "Paris is the capital of France."})
print(result)

### **4. Translation**

In [None]:
# 【补充内容】添加测试翻译的句子
test_sentence = "Artificial intelligence is transforming the way we live and work."

In [None]:
# **Model 1: MarianMT**
translator = pipeline('translation_en_to_fr', model='Helsinki-NLP/opus-mt-en-fr')
result = translator(test_sentence)
print(result)

In [None]:
# **Model 2: T5**
translator = pipeline('translation_en_to_de', model='t5-small')
result = translator(test_sentence)
print(result)

**Bonus sentence**

In [None]:
translator = pipeline('translation_en_to_zh', model='Helsinki-NLP/opus-mt-en-zh')
result = translator("Professor Pavlos is the best DJ")
print(result)

### **5. NER**

In [None]:
# **Model 1: Bert Large Cased**

# Load the NER pipeline
ner_model = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

# Process text
ner_results = ner_model("Hugging Face is based in New York and was founded by Clement Delangue.")
print(ner_results)

Find the correct name of the model from the HuggingFace website and fill here

In [None]:
# **Model 2: DistilBERT NER model**
ner_distilbert = pipeline("ner", model="Davlan/distilbert-base-multilingual-cased-ner-hrl")

# Process another example
distilbert_ner_results = ner_distilbert("Apple is looking at buying U.K. startup for $1 billion.")
print(distilbert_ner_results)

### **6. Summarization**

Try to refer to the HuggingFace docs an complete this section on your own. Try to alter important parameters like `max_length` an `min_length` and assess the responses.

In [None]:
##your code here

# 【补充内容】完整的文本摘要部分
# Define input text
input_text = """
The Orbiter Discovery mission was the 25th flight of the space shuttle Orbiter Discovery, and the 33rd mission of the Space Shuttle program.
The mission was launched from Kennedy Space Center, Florida, on October 11, 1990, and landed on October 15, 1990, at Edwards Air Force Base, California.
The primary objective of the mission was to deploy the Ulysses solar probe.
"""

# Load the summarization pipeline
summarizer = pipeline("summarization", model="t5-small")

# Generate summary
summary = summarizer(input_text, max_length=50, min_length=10, do_sample=False)

# Print summary
print(summary[0]['summary_text'])

## **Understanding Implementation** (BONUS)

### **Looking at BERT**

In [None]:
from transformers import BertModel, BertConfig

In [None]:
# Load BERT configuration
configuration = BertConfig()

# Load BERT with its predefined configuration
bert_model = BertModel(configuration)

# Print the model architecture
print(bert_model)

### **Fine Tuning BERT**

In [None]:
from transformers import BertTokenizer, TFBertForSequenceClassification

In [None]:
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')

In [None]:
# Example: Fine-tuning the model on a custom dataset
# This is a placeholder; in a real scenario, you would load your dataset here
texts = ["I love this product!", "I hate this product!"]
labels = [1, 0]  # 1 for positive, 0 for negative sentiment

In [None]:
# 【补充内容】对文本进行tokenization处理
train_encodings = tokenizer(texts, truncation=True, padding=True, max_length=128)

In [None]:
# Convert data to TensorFlow format
train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    labels
))

In [None]:
# Define the training arguments
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=tf.keras.metrics.SparseCategoricalAccuracy())

In [None]:
# Train the model
model.fit(train_dataset.shuffle(100).batch(32), epochs=3)

## **Evaluation** (BONUS)

Model evaluation is critical to determine the performance of your NLP models. This involves using metrics that are suitable for the task your model is performing.

**Common Evaluation Metrics**

- Accuracy: Measures the proportion of correct predictions for classification tasks.

- F1 Score: Balances the precision and recall of a model, especially useful for imbalanced datasets.

- BLEU Score: Commonly used in machine translation to compare the machine-generated text to a reference text.

In [None]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np

In [None]:
# Simulated predictions and true labels
true_labels = [0, 1, 0, 1, 1]  # 0 for negative, 1 for positive sentiment
predicted_labels = [0, 1, 0, 0, 1]

# Calculate metrics
accuracy = accuracy_score(true_labels, predicted_labels)
precision, recall, f1, _ = precision_recall_fscore_support(true_labels, predicted_labels, average='binary')

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

<img src="https://drive.google.com/uc?id=1TeGOA4-iCi8gSxEEQXbwpHosErRVdNh0" width="800" height="500">



#### **BLEU score**

In [None]:
from nltk.translate.bleu_score import corpus_bleu

In [None]:

# Example machine translation outputs and references
references = [[['this', 'is', 'a', 'test'], ['this', 'is', 'a', 'trial']]]
candidates = [['this', 'is', 'a', 'test']]

# Calculate BLEU score
bleu_score = corpus_bleu(references, candidates)
print(f"BLEU Score: {bleu_score:.2f}")

In [None]:
!pip install -q rouge

In [None]:
from rouge import Rouge

In [None]:
# Hypothetical generated summaries and reference summaries
generated_summaries = ["the cat was found under the bed"]
reference_summaries = ["the cat was hiding under the bed"]

# Initialize ROUGE scorer
rouge = Rouge()
scores = rouge.get_scores(generated_summaries, reference_summaries)

# Display ROUGE scores
print("ROUGE Scores:", scores[0])

Note: We will learn more about Benchmarks used to asses LLMs later in this course.

## **Quick Gradio DEMO (Toy Deployement)**

In [None]:
!pip install -q gradio

In [None]:
# Please refer to the Hugging Face documentation for detailed instructions.
# Example using Gradio in Hugging Face Spaces

import gradio as gr

def translate(text):
    translator = pipeline('translation_en_to_fr', model='t5-small')
    return translator(text)[0]['translation_text']

iface = gr.Interface(fn=translate, inputs="text", outputs="text",title="English to French Translation",)  # title of the app)
iface.launch()

## Appendix

This cells just makes sure the transformers library is working properly.

Additionally, it shows that in addition to outputs we can even view/access hidden states and work as needed.

In [None]:
#@title Test Cell , (hidden state)
# Load a pre-trained model and tokenizer
model = TFAutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Tokenize input
input_ids = tokenizer("Hello, world!", return_tensors="tf")

# Perform inference
outputs = model(input_ids)
print("Model outputs:", outputs.last_hidden_state)
