<a href="https://colab.research.google.com/github/cwmarris/pull-request-monitor/blob/master/Copy_of_LLM_Workshop_Lab3_Introduction_to_the_HuggingFace_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

| | |
|:---:|:---|
| <img src="https://drive.google.com/uc?export=view&id=1ezSRk_nXkvXlCmpVaDGViMn2wef6QGfj" width="100"/> |  <strong><font size=5>EL EL EM</font></strong><br><br><strong><font color="#A41034" size=5>Applying LLMs in Practice:<br>Core Concepts and Functional AI Solutions for Real-World Implementation</font></strong>|

---

# **Lab 3: Introduction to the HuggingFace API**

**Instructor:**  
Pavlos Protopapas  

**Teaching Team:**  
Ignacio Becker, Chris Gumb, Hargun Oberoi, Shivas Jayaram  

**Contributors:**  
Shibani Budhraja, Lakshay Chawla  

## 📝 Make a Copy to Edit

This notebook is **view-only**. To edit it, follow these steps:

1. Click **File** > **Save a copy in Drive**.
2. Your own editable copy will open in a new tab.

Now you can modify and run the code freely!

In this notebook, we will use the Hugging Face API to explore a few open source language models and their performance of simple language tasks.

<img src="https://drive.google.com/uc?id=1i87oxReRQv7rLqFuZKCPeLCh2zy8RQUU" width="400" height="100">


By the end of this tutorial, you will have a solid understanding of how to leverage the Transformers library.




## Table of Contents
1. **Part A**
    - Embeddings recap
    - Key Components
2. **Part B**
    - Introduction to HuggingFace
    - Getting started with the HuggingFace API
    - Using pretrained models for NLP tasks
    - Understanding implementation (BONUS)
    - Evaluation (BONUS)
    - Quick Gradio DEMO (Toy Deployement)

---



### What is Hugging Face?

[`Hugging Face`](https://huggingface.co/) is a company that specializes in natural language processing (NLP) technologies. It provides one of the most popular platforms for state-of-the-art machine learning models, particularly those designed for tasks like text analysis, language understanding, and generation. Hugging Face is widely recognized for its Transformers library, which offers easy access to pre-trained models that can perform a range of NLP tasks.

![alt text](https://drive.google.com/uc?id=1oG1s7346pjEn_A_EOS1QT6obiAD9o050)


### How does the Hugging Face API work?

HF's Transformers library provides models that are hosted on their public model hub, and these can be accessed and used without an API key for local computations. When you use a function like pipeline, the library automatically downloads the specified model from the Hugging Face model hub if it's not already present on your local machine. This means that the actual computation using the model occurs locally on your device, not on Hugging Face's servers.

## Setting Up Your Environment in Google Colab

###Why Google Colab?

- Google Colab provides a cloud-based environment that allows you to write, run, and share Python code through the browser.
- It is especially useful for machine learning and data analysis applications because it offers free access to GPUs and TPUs, making it an ideal platform for training and testing large models.

###Preparing Colab

- To make the most of Google Colab for this tutorial, we need to ensure that the environment is correctly set up with all the necessary libraries and configurations.

###Installation Steps (Optional / If Available)

- Ensure GPU Availability: First, let's make sure that your notebook is set to use a GPU, which will speed up the model operations significantly.

- Go to Runtime > Change runtime type in the Colab menu.
Select GPU from the Hardware accelerator dropdown list and click Save.

> ### Important Notice

`When running this notebook on colab, you may run out of available memory (RAM). This is expected as we are running a lot of diffferent models, simply run the imports again and only run the desired/pending sections thereafter.`

## Importing the transformers library

In [1]:
# Install the transformers library with TensorFlow
!pip install -q transformers[tf]

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m570.5/570.5 kB[0m [31m11.5 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m475.3/475.3 MB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.2/5.2 MB[0m [31m40.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m83.8/83.8 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m455.8/455.8 kB[0m [31m22.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.7/1.7 MB[0m [31m23.3 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m31.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m16.0/16.0 MB[0m [31m30.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
#check transformers version
!pip show transformers

Name: transformers
Version: 4.48.3
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /usr/local/lib/python3.11/dist-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: peft, sentence-transformers


In [3]:
#importing specific modules from transformers
from transformers import pipeline, TFAutoModel, AutoModelForSequenceClassification, AutoTokenizer

In [4]:
#importing tensorflow
import tensorflow as tf

In [5]:
# Checking the available GPU
gpu_info = !nvidia-smi
gpu_info = '\n'.join(gpu_info)
if gpu_info.find('failed') >= 0:
  print('Select the Runtime > "Change runtime type" menu to enable a GPU accelerator, ')
  print('and then re-execute this cell.')
else:
  print(gpu_info)

# Print TensorFlow and Transformers versions
print("TensorFlow version:", tf.__version__)

Thu Feb 20 23:42:41 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   34C    P8              9W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### **1. Sentiment Analysis**

In [6]:
# **Model 1: BERT**
classifier = pipeline('sentiment-analysis', model='bert-base-cased')
result = classifier("I love using Hugging Face!")
print(result)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/436M [00:00<?, ?B/s]

Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'LABEL_1', 'score': 0.6673248410224915}]


In [7]:
del classifier

In [8]:
# **Model 2: RoBERTa**
classifier = pipeline('sentiment-analysis', model='roberta-base')
result = classifier("The food was absolutely delicious!")
print(result)

config.json:   0%|          | 0.00/481 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/499M [00:00<?, ?B/s]

Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'LABEL_0', 'score': 0.5049166083335876}]


Let's add a new sentence...

In [10]:
sample_sentence = "add a line about your favourite language model. Sure I love deepseek" #@param {type:"string"}

In [11]:
# **Model 3: DistilBERT**
classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')
result = classifier(sample_sentence)
print(result)

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cuda:0


[{'label': 'POSITIVE', 'score': 0.9994264841079712}]


### **2. Text Generation**

Set the maximum length of the output in the next cell.

**Note**: try for a value between 30 and 50 to start.

In [12]:
max_length = 40 # @param {type:"integer"}

In [13]:
# **Model 1: GPT-2**
generator = pipeline('text-generation', model='gpt2')
result = generator("Once upon a time", max_length=max_length, num_return_sequences=1)
print(result[0]['generated_text'])

config.json:   0%|          | 0.00/665 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/548M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.04M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Once upon a time my family had been quite the loneliest of people which I had not met so to speak. My brother-in-law James had been the one who had led their lives


In [14]:
# **Model 2: CTRL**
generator = pipeline('text-generation', model='ctrl')
result = generator("Legal Disclaimer:", max_length=max_length, num_return_sequences=1)
print(result[0]['generated_text'])

config.json:   0%|          | 0.00/635 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/6.55G [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/6.55G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/25.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/4.61M [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/2.26M [00:00<?, ?B/s]

Device set to use cuda:0
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Legal Disclaimer: I am not a lawyer and this is not legal advice. 
 
 I am a student at a university in the United States and I am currently in the process of applying for a job


### **3. Question Answering**

For basic Question-Answering,

We can use a model fine-tuned on the SQuAD (Stanford Question Answering Dataset) dataset, which is a standard benchmark for QA tasks.

In [15]:
# Load a QA model
qa_pipeline = pipeline("question-answering")

# Set context and question
context = """
Tesla, Inc. is an American electric vehicle and clean energy company based in Palo Alto, California.
Tesla's current products include electric cars, battery energy storage from home to grid scale, solar panels and solar roof tiles,
as well as other related products and services.
"""
question = "What does Tesla produce besides electric cars?"

# Get the answer
answer = qa_pipeline(question=question, context=context)
print(answer['answer'])

No model was supplied, defaulted to distilbert/distilbert-base-cased-distilled-squad and revision 564e9b5 (https://huggingface.co/distilbert/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cuda:0


battery energy storage from home to grid scale, solar panels and solar roof tiles


Further we can select our own choice of models too!

In [16]:
# **Model 1: BERT**
qa_pipeline = pipeline('question-answering', model='bert-base-cased')
context = "Hugging Face is a company that develops tools for building applications using machine learning."
question = "What does Hugging Face do?"
result = qa_pipeline({'question': question, 'context': context})
print(result)

Some weights of BertForQuestionAnswering were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Device set to use cuda:0


{'score': 0.006338176317512989, 'start': 46, 'end': 95, 'answer': 'for building applications using machine learning.'}




In [17]:
# **Model 2: DistilBERT**
qa_pipeline = pipeline('question-answering', model='distilbert-base-cased')
result = qa_pipeline({'question': "Who developed the transformer architecture?", 'context': "The transformer architecture was introduced by Vaswani et al. in the paper 'Attention is All You Need'."})
print(result)

config.json:   0%|          | 0.00/465 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/263M [00:00<?, ?B/s]

Some weights of DistilBertForQuestionAnswering were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['qa_outputs.bias', 'qa_outputs.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cuda:0


{'score': 0.001691892510280013, 'start': 16, 'end': 68, 'answer': 'architecture was introduced by Vaswani et al. in the'}


In [18]:
# **Model 3: RoBERTa**
qa_pipeline = pipeline('question-answering', model='deepset/roberta-base-squad2')
result = qa_pipeline({'question': "What is the capital of France?", 'context': "Paris is the capital of France."})
print(result)

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

Device set to use cuda:0


{'score': 0.5974364280700684, 'start': 0, 'end': 5, 'answer': 'Paris'}


### **4. Translation**

In [19]:
test_sentence ="I'm thinking about making cheese all the time"# @param {type:"string"}

In [20]:
# **Model 1: MarianMT**
translator = pipeline('translation_en_to_fr', model='Helsinki-NLP/opus-mt-en-fr')
result = translator(test_sentence)
print(result)

config.json:   0%|          | 0.00/1.42k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/301M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/301M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/778k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/802k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.34M [00:00<?, ?B/s]

Device set to use cuda:0


[{'translation_text': 'Je pense à faire du fromage tout le temps.'}]


In [21]:
# **Model 2: T5**
translator = pipeline('translation_en_to_de', model='t5-base')
result = translator(test_sentence)
print(result)

config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/892M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cuda:0


[{'translation_text': 'Ich denke immer daran, Käse herzustellen.'}]


**Bonus sentence**

In [22]:
translator = pipeline('translation_en_to_zh', model='Helsinki-NLP/opus-mt-en-zh')
result = translator("Professor Pavlos is the best DJ")
print(result)

config.json:   0%|          | 0.00/1.40k [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/312M [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/312M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/44.0 [00:00<?, ?B/s]

source.spm:   0%|          | 0.00/806k [00:00<?, ?B/s]

target.spm:   0%|          | 0.00/805k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/1.62M [00:00<?, ?B/s]

Device set to use cuda:0


[{'translation_text': 'Pavlos教授是最好的DJ'}]


### **5. NER**

In [23]:
# **Model 1: Bert Large Cased**

# Load the NER pipeline
ner_model = pipeline("ner", model="dbmdz/bert-large-cased-finetuned-conll03-english")

# Process text
ner_results = ner_model("Hugging Face is based in New York and was founded by Clement Delangue.")
print(ner_results)

config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


tokenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

Device set to use cuda:0


[{'entity': 'I-ORG', 'score': 0.9898335, 'index': 1, 'word': 'Hu', 'start': 0, 'end': 2}, {'entity': 'I-ORG', 'score': 0.8947252, 'index': 2, 'word': '##gging', 'start': 2, 'end': 7}, {'entity': 'I-ORG', 'score': 0.97231054, 'index': 3, 'word': 'Face', 'start': 8, 'end': 12}, {'entity': 'I-LOC', 'score': 0.9989222, 'index': 7, 'word': 'New', 'start': 25, 'end': 28}, {'entity': 'I-LOC', 'score': 0.99841297, 'index': 8, 'word': 'York', 'start': 29, 'end': 33}, {'entity': 'I-PER', 'score': 0.9989963, 'index': 13, 'word': 'Clement', 'start': 53, 'end': 60}, {'entity': 'I-PER', 'score': 0.99682236, 'index': 14, 'word': 'Del', 'start': 61, 'end': 64}, {'entity': 'I-PER', 'score': 0.97441894, 'index': 15, 'word': '##ang', 'start': 64, 'end': 67}, {'entity': 'I-PER', 'score': 0.9883802, 'index': 16, 'word': '##ue', 'start': 67, 'end': 69}]


Find the correct name of the model from the HuggingFace website and fill here

In [24]:
# **Model 2: DistilBERT NER model**
ner_distilbert = pipeline("ner", model="Davlan/distilbert-base-multilingual-cased-ner-hrl")

# Process another example
distilbert_ner_results = ner_distilbert("Apple is looking at buying U.K. startup for $1 billion.")
print(distilbert_ner_results)

config.json:   0%|          | 0.00/876 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/539M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/270 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/996k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

Device set to use cuda:0


[{'entity': 'B-ORG', 'score': 0.9998323, 'index': 1, 'word': 'Apple', 'start': 0, 'end': 5}, {'entity': 'B-LOC', 'score': 0.9956858, 'index': 7, 'word': 'U', 'start': 27, 'end': 28}, {'entity': 'I-LOC', 'score': 0.99430025, 'index': 8, 'word': '.', 'start': 28, 'end': 29}, {'entity': 'I-LOC', 'score': 0.9921933, 'index': 9, 'word': 'K', 'start': 29, 'end': 30}, {'entity': 'I-LOC', 'score': 0.9971691, 'index': 10, 'word': '.', 'start': 30, 'end': 31}]


### **6. Summarization**

Try to refer to the HuggingFace docs an complete this section on your own. Try to alter important parameters like `max_length` an `min_length` and assess the responses.

In [25]:
##your code here
# Load the summarization pipeline
summarizer = pipeline("summarization", model="facebook/bart-large-cnn")

# Sample text
text = """
The Chrysler Building, a famed example of Art Deco architecture, is an iconic skyscraper in New York City, located on the east side of Manhattan.
It was the world's tallest building before it was surpassed by the Empire State Building in 1931. Originally a project of Walter Chrysler,
the building was constructed by the architectural and engineering firm and completed in 1930. It is known for its terraced crown, composed of seven radiating terraced arches.
"""

# Generate summary
summary = summarizer(text, max_length=45, min_length=25, do_sample=False)
print(summary[0]['summary_text'])


config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]

Device set to use cuda:0


The Chrysler Building is a famed example of Art Deco architecture. It was the world's tallest building before it was surpassed by the Empire State Building in 1931.


## **Understanding Implementation** (BONUS)

### **Looking at BERT**

In [1]:
from transformers import BertModel, BertConfig

In [2]:
# Load BERT configuration
configuration = BertConfig()

# Load BERT with its predefined configuration
bert_model = BertModel(configuration)

# Print the model architecture
print(bert_model)

BertModel(
  (embeddings): BertEmbeddings(
    (word_embeddings): Embedding(30522, 768, padding_idx=0)
    (position_embeddings): Embedding(512, 768)
    (token_type_embeddings): Embedding(2, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): BertEncoder(
    (layer): ModuleList(
      (0-11): 12 x BertLayer(
        (attention): BertAttention(
          (self): BertSdpaSelfAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): BertSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False

### **Fine Tuning BERT**

In [3]:
from transformers import BertTokenizer, TFBertForSequenceClassification

In [4]:
# Load tokenizer and model
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = TFBertForSequenceClassification.from_pretrained('bert-base-uncased')

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

config.json:   0%|          | 0.00/570 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/440M [00:00<?, ?B/s]

All PyTorch model weights were used when initializing TFBertForSequenceClassification.

Some weights or buffers of the TF 2.0 model TFBertForSequenceClassification were not initialized from the PyTorch model and are newly initialized: ['classifier.weight', 'classifier.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [5]:
# Example: Fine-tuning the model on a custom dataset
# This is a placeholder; in a real scenario, you would load your dataset here
texts = ["I love this product!", "I hate this product!"]
labels = [1, 0]  # 1 for positive, 0 for negative sentiment

In [8]:
# Tokenize input
train_encodings = tokenizer(texts, truncation=True, padding=True, max_length=128)

In [7]:
# Convert data to TensorFlow format
train_dataset = tf.data.Dataset.from_tensor_slices((
    dict(train_encodings),
    labels
))

NameError: name 'tf' is not defined

In [33]:
# Define the training arguments
model.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=5e-5),
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=tf.keras.metrics.SparseCategoricalAccuracy())

In [34]:
# Train the model
model.fit(train_dataset.shuffle(100).batch(32), epochs=3)

Epoch 1/3


Cause: for/else statement not yet supported


Cause: for/else statement not yet supported
Epoch 2/3
Epoch 3/3


<tf_keras.src.callbacks.History at 0x793f09b9c950>

## **Evaluation** (BONUS)

Model evaluation is critical to determine the performance of your NLP models. This involves using metrics that are suitable for the task your model is performing.

**Common Evaluation Metrics**

- Accuracy: Measures the proportion of correct predictions for classification tasks.

- F1 Score: Balances the precision and recall of a model, especially useful for imbalanced datasets.

- BLEU Score: Commonly used in machine translation to compare the machine-generated text to a reference text.

In [None]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
import numpy as np

In [None]:
# Simulated predictions and true labels
true_labels = [0, 1, 0, 1, 1]  # 0 for negative, 1 for positive sentiment
predicted_labels = [0, 1, 0, 0, 1]

# Calculate metrics
accuracy = accuracy_score(true_labels, predicted_labels)
precision, recall, f1, _ = precision_recall_fscore_support(true_labels, predicted_labels, average='binary')

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

Accuracy: 0.80
Precision: 1.00
Recall: 0.67
F1 Score: 0.80


<img src="https://drive.google.com/uc?id=1TeGOA4-iCi8gSxEEQXbwpHosErRVdNh0" width="800" height="500">



**Source and Full cheatsheet linked [here](https://towardsdatascience.com/a-walk-through-imbalanced-classes-in-machine-learning-through-a-visual-cheat-sheet-974740b19094).**

#### **BLEU score**

In [None]:
from nltk.translate.bleu_score import corpus_bleu

In [None]:

# Example machine translation outputs and references
references = [[['this', 'is', 'a', 'test'], ['this', 'is', 'a', 'trial']]]
candidates = [['this', 'is', 'a', 'test']]

# Calculate BLEU score
bleu_score = corpus_bleu(references, candidates)
print(f"BLEU Score: {bleu_score:.2f}")

BLEU Score: 1.00


In [None]:
!pip install -q rouge

In [None]:
from rouge import Rouge

In [None]:
# Hypothetical generated summaries and reference summaries
generated_summaries = ["the cat was found under the bed"]
reference_summaries = ["the cat was hiding under the bed"]

# Initialize ROUGE scorer
rouge = Rouge()
scores = rouge.get_scores(generated_summaries, reference_summaries)

# Display ROUGE scores
print("ROUGE Scores:", scores[0])

ROUGE Scores: {'rouge-1': {'r': 0.8333333333333334, 'p': 0.8333333333333334, 'f': 0.8333333283333335}, 'rouge-2': {'r': 0.6666666666666666, 'p': 0.6666666666666666, 'f': 0.6666666616666668}, 'rouge-l': {'r': 0.8333333333333334, 'p': 0.8333333333333334, 'f': 0.8333333283333335}}


Note: We will learn more about Benchmarks used to asses LLMs later in this course.

## **Quick Gradio DEMO (Toy Deployement)**

In [None]:
!pip install -q gradio

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.2/62.2 MB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m322.0/322.0 kB[0m [31m20.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.8/94.8 kB[0m [31m7.1 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.6/12.6 MB[0m [31m101.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m71.5/71.5 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.3/62.3 kB[0m [31m5.4 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
# Please refer to the Hugging Face documentation for detailed instructions.
# Example using Gradio in Hugging Face Spaces

import gradio as gr

def translate(text):
    translator = pipeline('translation_en_to_fr', model='t5-small')
    return translator(text)[0]['translation_text']

iface = gr.Interface(fn=translate, inputs="text", outputs="text",title="English to French Translation",)  # title of the app)
iface.launch()

Running Gradio in a Colab notebook requires sharing enabled. Automatically setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
* Running on public URL: https://f4b92fc0ddeb44ddeb.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from the terminal in the working directory to deploy to Hugging Face Spaces (https://huggingface.co/spaces)




## Appendix

This cells just makes sure the transformers library is working properly.

Additionally, it shows that in addition to outputs we can even view/access hidden states and work as needed.

In [None]:
#@title Test Cell , (hidden state)
# Load a pre-trained model and tokenizer
model = TFAutoModel.from_pretrained("bert-base-uncased")
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Tokenize input
input_ids = tokenizer("Hello, world!", return_tensors="tf")

# Perform inference
outputs = model(input_ids)
print("Model outputs:", outputs.last_hidden_state)


Some weights of the PyTorch model were not used when initializing the TF 2.0 model TFBertModel: ['cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.seq_relationship.bias', 'cls.seq_relationship.weight', 'cls.predictions.bias']
- This IS expected if you are initializing TFBertModel from a PyTorch model trained on another task or with another architecture (e.g. initializing a TFBertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TFBertModel from a PyTorch model that you expect to be exactly identical (e.g. initializing a TFBertForSequenceClassification model from a BertForSequenceClassification model).
All the weights of TFBertModel were initialized from the PyTorch model.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFBertModel for predictions w

Model outputs: tf.Tensor(
[[[-0.0781377   0.15866576  0.03995339 ... -0.2805379   0.02479674
    0.40808395]
  [-0.20164014  0.17805077  0.41843385 ... -0.25220162  0.3629598
   -0.09791584]
  [-0.7155727   0.6750818   0.6016994  ... -1.1031796   0.07970794
    0.05665806]
  [ 0.05266613 -0.14828189  1.3608807  ... -0.45132554  0.1274358
    0.26551598]
  [-0.7121627  -0.4814777  -0.14383158 ...  0.56021523 -0.10615414
   -0.13010597]
  [ 0.99546814  0.13275973 -0.0620541  ...  0.24602401 -0.65023696
   -0.32960322]]], shape=(1, 6, 768), dtype=float32)
