<a href="https://colab.research.google.com/github/Viny2030/NLP-transformers/blob/main/01_introduction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
# Uncomment and run this cell if you're on Colab or Kaggle
!git clone https://github.com/nlp-with-transformers/notebooks.git
%cd notebooks
from install import *
install_requirements()

Cloning into 'notebooks'...
remote: Enumerating objects: 530, done.[K
remote: Counting objects: 100% (210/210), done.[K
remote: Compressing objects: 100% (51/51), done.[K
remote: Total 530 (delta 183), reused 159 (delta 159), pack-reused 320 (from 2)[K
Receiving objects: 100% (530/530), 30.79 MiB | 12.18 MiB/s, done.
Resolving deltas: 100% (252/252), done.
/content/notebooks
⏳ Installing base requirements ...
✅ Base requirements installed!
⏳ Installing Git LFS ...
✅ Git LFS installed!


In [2]:
#hide
from utils import *
setup_chapter()

No GPU was detected! This notebook can be *very* slow without a GPU 🐢
Go to Runtime > Change runtime type and select a GPU hardware accelerator.
Using transformers v4.16.2
Using datasets v1.16.1


# Hello Transformers

<img alt="transformer-timeline" caption="The transformers timeline" src="images/chapter01_timeline.png" id="transformer-timeline"/>

## The Encoder-Decoder Framework

<img alt="rnn" caption="Unrolling an RNN in time." src="images/chapter01_rnn.png" id="rnn"/>

<img alt="enc-dec" caption="Encoder-decoder architecture with a pair of RNNs. In general, there are many more recurrent layers than those shown." src="images/chapter01_enc-dec.png" id="enc-dec"/>

## Attention Mechanisms

<img alt="enc-dec-attn" caption="Encoder-decoder architecture with an attention mechanism for a pair of RNNs." src="images/chapter01_enc-dec-attn.png" id="enc-dec-attn"/>

<img alt="attention-alignment" width="500" caption="RNN encoder-decoder alignment of words in English and the generated translation in French (courtesy of Dzmitry Bahdanau)." src="images/chapter02_attention-alignment.png" id="attention-alignment"/>

<img alt="transformer-self-attn" caption="Encoder-decoder architecture of the original Transformer." src="images/chapter01_self-attention.png" id="transformer-self-attn"/>

## Transfer Learning in NLP

<img alt="transfer-learning" caption="Comparison of traditional supervised learning (left) and transfer learning (right)." src="images/chapter01_transfer-learning.png" id="transfer-learning"/>  

<img alt="ulmfit" width="500" caption="The ULMFiT process (courtesy of Jeremy Howard)." src="images/chapter01_ulmfit.png" id="ulmfit"/>

## Hugging Face Transformers: Bridging the Gap

## A Tour of Transformer Applications

In [3]:
text = """Dear Amazon, last week I ordered an Optimus Prime action figure \
from your online store in Germany. Unfortunately, when I opened the package, \
I discovered to my horror that I had been sent an action figure of Megatron \
instead! As a lifelong enemy of the Decepticons, I hope you can understand my \
dilemma. To resolve the issue, I demand an exchange of Megatron for the \
Optimus Prime figure I ordered. Enclosed are copies of my records concerning \
this purchase. I expect to hear from you soon. Sincerely, Bumblebee."""

In [4]:
text1 = """The Road from ‘Citizens United’ to Trump, Musk, and Corruption
Newsom's ties to CCP under microscope in new book exposing alleged corruption: 'Fleeced American citizens'
Tennis chair ump banned 6 years for corruption
Opinion | Here’s How New York Police Officers Start Their Day by Breaking the Law
What to Know About the Status of the Eric Adams Corruption Case
Lawyer appointed in NYC Mayor Eric Adams' case recommends dropping charges
Mayor Adams’ corruption charges should be dropped for good, court-appointed lawyer says
Feds seek San Leandro records naming Duong family as corruption probe expands beyond Oakland
Crack down on corruption and deploy tech – Xi’s twin tasks for the military
Sheng Thao corruption case: US attorney's office seeks Duong family records from city of San Leandro following Bryan Azevedo raid
Ex-Liverpool mayor Joe Anderson charged with bribery in corruption probe
‘Like a virus’: Corruption has infected the fight against climate change
If You Can Keep It: The Trump administration targets anti-corruption measures
A Simple Way to Check Police Corruption? Parking Tickets.
STOLEN FUTURES: The impact of corruption on children in Africa.
Murphy: Six Weeks In, This White House Is On Its Way To Being The Most Corrupt In U.S. History
How to help Nigerians who want to act against corruption
Blackfeet Nation takes action against corruption within legal team
Israeli prime minister appears in court for 16th time in his corruption trial
‘Mickey 17’ explores the human horrors of colonization, corruption and cloning
Ukraine Is Trying To Root Out Corruption. Is It Enough To Silence Critics?
AHA! Launches Anti-Corruption Task Force to Combat Food Fraud and Industry Deception
Corruption Perceptions Index 2024
Swiss court acquits Dahdaleh 12 years after UK corruption trial collapse
Iran Protests Erupt Over Housing, Pension Rights, and Government Corruption
Opinion | Making corruption great again
Drop Eric Adams Charges, Outside Counsel Advises Corruption Case Judge
Algeria: court orders detention of ex-interior minister for ‘corruption’
Letter | Vote Crawford to reject chaos and corruption
Whistleblowers in Office of Inspector General allege widespread corruption at highest levels of city government
Eric Adams investigation: Lawyer appointed as advisor recommends dismissing mayor's corruption case permanently
The ABCs of the CPI: How the Corruption Perceptions Index is calculated
Judge Dismisses Corruption Charges Against George Norcross, State Plans Appeal
Overcoming corruption vital for RI’s economy to ‘take off’, says Ray Dalio
Singapore working with Indonesia to extradite corruption fugitive: minister
Court-appointed lawyer tells judge to drop Eric Adams corruption case permanently
Have federal agents served warrants at California’s Capitol? The Legislature doesn’t want you to know
New York City Mayor Eric Adams' corruption case should be dismissed with prejudice, outside attorney tells judge
ByteDance dismisses hundreds of employees for corruption
Female students mark International Women's day by leading protests in Serbia against corruption
1015: Dial up corruption in Kurdistan! - Shafaq News
2 men arrested after reportedly trying to kill possible witness in Sheng Thao corruption case
Exclusive: Two men were arrested after attempted killing of potential witness in Sheng Thao corruption case
My Take | Here we go again: Ismail Sabri is third Malaysian ex-PM on corruption carousel
Why Serbian protesters set off smoke bombs in parliament
Corruption and waste, two dangers that threaten Brussels' plan to Rearm Europe
Nigeria's corruption watchdog recovers nearly $500 million in one year
Federal trial in Jackson corruption case set for summer 2026
Israeli protesters call Netanyahu 'corrupt' outside his trial
‘Profound betrayal’: 8 charged in North Charleston corruption investigation
Foreign journalists on China's anti-corruption drive: a global fight
Report: Despite Corruption Problems, China Progresses Toward Modernization
Trump’s 'Crypto Reserve' Is Such Brazen Corruption
BOMBSHELL VERDICT: Jury Confirms Corruption and Retaliation at the Highest Levels of Kansas City Leadership
Trump Weakens Anti-Corruption Measures, Potentially Hindering His Own Fight Against Fentanyl
Pertamina’s US$12 billion graft probe fuels Indonesians’ anger over ‘oil mafia’
Israeli PM Benjamin Netanyahu arrives in court to continue his testimony in ongoing corruption trial
4 of 8 charged in North Charleston corruption case plead guilty in federal court
Mayor's aides questioned by prosecution in corruption probe
Tennis chair umpire slapped with six-year ban after 'facilitating corruption'
Mapped: Which Countries Are Perceived as the Most Corrupt?
The Nantha Kumar Case Is Destroying The Integrity Of The Malaysian Anti-Corruption Commission – Analysis
Bardstown residents complain of latest corruption case after sheriff, chief deputy were indicted
​Deputy provincial governor jailed as corruption investigation ramps up​
Eric Adams’ corruption case should be permanently dismissed, court-appointed lawyer says
Dominican umpire suspended until 2030 for breaching anti-corruption code
Kosta Diamantis and Chris Ziogas charged in corruption scheme
2024 Corruption Perceptions Index: Corruption is playing a devastating role in the climate crisis
Strategic corruption reserve
Trump's Justice Department hits the brakes on anti-corruption enforcement
2024 Corruption Perceptions Index: Corruption fuels environmental crime across the Americas
Embattled Huntington Park mayor vows stalled pool project will proceed despite corruption probe
Israeli PM Netanyahu Returns To Court For Testimony In Corruption Trial, Defends Telecommunications Reforms
How corruption erodes support for democracy
Head of ARMA Duma is trying to regain trust with a new memorandum after the conflict with anti-corruption activists
Crypto, Crime & Corruption: A memecoin family’s checkered past puts the presidency of Argentina’s Javier M
When It Comes To Being an Officer, It’s Time To Go When Corruption Reigns
Israel's Benjamin Netanyahu takes stand in ongoing corruption trial
Workshop held to equip prosecutors combat corruption and money laundering
Somali Regional State denies corruption allegations, dismisses claims as “false” and part of “defamation campaign”
Denmark and Finland on top of list of least corrupt countries
Eric Adams' corruption case should be dismissed permanently, third-party attorney tells court
Xi calls on China’s military to tighten anti-corruption controls
Department of Prisons cracks down on corruption
Pellegrini suit alleging Hoboken corruption dismissed after his embezzlement plea
Are U.S. ‘news deserts’ hothouses of corruption?
Chaos and Corruption Weekly Digest: Week 5
Assistant Governor and accomplices arrested in $400,000 corruption bust
Israeli PM Benjamin Netanyahu arrives in court to continue his testimony in ongoing corruption trial
S/Court Decides Tweah’s, Others’ Fate in US$6.1M Corruption Case Today
Prisons dept cracks down on corruption
DeSoto County DA announces two corruption convictions
Trump and Musk Hunt for Corruption, Very Selectively
The International Anti-Corruption Day on 9 December 2024 is fast approaching!
Tahkout Corruption Scandal: New Trial of Property Ownership
Alabama city divided over whether to abolish police department accused of "culture of corruption"
HOS urges workers to support anti-corruption campaign
AHA! Launches Anti-Corruption Task Force to Combat Food Fraud and Industry Deception
Nigeria's corruption watchdog recovers nearly $500 million in one year
2025/2026 budget: Opposition to focus on corruption fight
Tennis chair umpire slapped with six-year ban after 'facilitating corruption'
Edo has regressed under Gov Okpebholo with massive corruption – Dr. Aziegbemi
ACU arrests Deputy Governor, two officials for corruption
LISTEN | Corruption, incompetence and a dark city: How City Power, Prasa flout the tender process
Tahkout Corruption Scandal: New Trial of Property Ownership
Swiss court acquits Dahdaleh 12 years after UK corruption trial collapse"""

In [5]:
text1

'The Road from ‘Citizens United’ to Trump, Musk, and Corruption\nNewsom\'s ties to CCP under microscope in new book exposing alleged corruption: \'Fleeced American citizens\'\nTennis chair ump banned 6 years for corruption\nOpinion | Here’s How New York Police Officers Start Their Day by Breaking the Law\nWhat to Know About the Status of the Eric Adams Corruption Case\nLawyer appointed in NYC Mayor Eric Adams\' case recommends dropping charges\nMayor Adams’ corruption charges should be dropped for good, court-appointed lawyer says\nFeds seek San Leandro records naming Duong family as corruption probe expands beyond Oakland\nCrack down on corruption and deploy tech – Xi’s twin tasks for the military\nSheng Thao corruption case: US attorney\'s office seeks Duong family records from city of San Leandro following Bryan Azevedo raid\nEx-Liverpool mayor Joe Anderson charged with bribery in corruption probe\n‘Like a virus’: Corruption has infected the fight against climate change\nIf You Can 

### Text Classification

In [6]:
#hide_output
from transformers import pipeline

classifier = pipeline("text-classification")

Downloading:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/255M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

In [7]:
import pandas as pd

outputs = classifier(text)
pd.DataFrame(outputs)

Unnamed: 0,label,score
0,NEGATIVE,0.901546


###La salida labelscore0NEGATIVE0.901546 representa el resultado de un análisis de texto, probablemente un análisis de sentimiento, donde se ha detectado una etiqueta negativa con una puntuación de confianza. Vamos a desglosar cada parte:

label: Indica que lo que sigue es la etiqueta o categoría detectada.
score: Indica que lo que sigue es la puntuación de confianza asociada a la etiqueta.
0: Este número podría representar el índice o la posición del resultado dentro de una lista de resultados. Si se analizaron múltiples textos o segmentos de texto, este número podría indicar cuál de ellos se está describiendo.
NEGATIVE: Esta es la etiqueta o categoría detectada. En este caso, indica que el texto analizado se clasificó como negativo.
0.901546: Esta es la puntuación de confianza o probabilidad asociada a la etiqueta NEGATIVE. Un valor de 0.901546 significa que el modelo está muy seguro (aproximadamente un 90%) de que el texto es negativo.

In [27]:
import pandas as pd

def classifier(text):
    sentences = text.split('.')
    results = []
    for sentence in sentences:
        if "happy" in sentence:
            results.append({"sentence": sentence, "sentiment": "positive"})
        elif "sad" in sentence:
            results.append({"sentence": sentence, "sentiment": "negative"})
        else:
            results.append({"sentence": sentence, "sentiment": "neutral"})
    return results

text = "I am very happy. He is very sad. The weather is ok."
outputs = classifier(text)
df = pd.DataFrame(outputs)
print(df)

             sentence sentiment
0     I am very happy  positive
1      He is very sad  negative
2   The weather is ok   neutral
3                       neutral


In [11]:
import pandas as pd
from transformers import pipeline

classifier = pipeline("text-classification")

# Truncate the input text to the maximum sequence length of the model
max_length = classifier.model.config.max_position_embeddings
text1_truncated = text1[:max_length]

outputs1 = classifier(text1_truncated)
pd.DataFrame(outputs1)

Unnamed: 0,label,score
0,NEGATIVE,0.9921


### Named Entity Recognition

In [12]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs = ner_tagger(text)
pd.DataFrame(outputs)

Downloading:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.24G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Unnamed: 0,entity_group,score,word,start,end
0,ORG,0.87901,Amazon,5,11
1,MISC,0.990859,Optimus Prime,36,49
2,LOC,0.999755,Germany,90,97
3,MISC,0.556571,Mega,208,212
4,PER,0.590256,##tron,212,216
5,ORG,0.669692,Decept,253,259
6,MISC,0.498349,##icons,259,264
7,MISC,0.775362,Megatron,350,358
8,MISC,0.987854,Optimus Prime,367,380
9,PER,0.812096,Bumblebee,502,511


####En resumen, este código realiza los siguientes pasos:

Crea un pipeline para el reconocimiento de entidades nombradas (NER) usando la biblioteca transformers de Hugging Face.
Utiliza el pipeline NER para analizar un texto y detectar entidades nombradas.
Convierte los resultados del pipeline en un DataFrame de pandas para facilitar la visualización y el análisis.

In [13]:
ner_tagger = pipeline("ner", aggregation_strategy="simple")
outputs1 = ner_tagger(text1)
pd.DataFrame(outputs1)

Unnamed: 0,label,score
0,NEGATIVE,0.9921


### Question Answering

In [14]:
reader = pipeline("question-answering")
question = "What does the customer want?"
outputs = reader(question=question, context=text)
pd.DataFrame([outputs])

Downloading:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/249M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/208k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/426k [00:00<?, ?B/s]

Unnamed: 0,score,start,end,answer
0,0.631292,335,358,an exchange of Megatron


In [16]:
reader = pipeline("question-answering")
question = "What does the customer want?"

# Truncate the context to a manageable length
max_length = reader.model.config.max_position_embeddings  # Get the maximum sequence length
truncated_context = text1[:max_length]  # Truncate the text

outputs1 = reader(question=question, context=truncated_context)
pd.DataFrame([outputs1])

Unnamed: 0,score,start,end,answer
0,0.065045,72,100,ties to CCP under microscope


### Summarization

In [17]:
summarizer = pipeline("summarization")
outputs = summarizer(text, max_length=45, clean_up_tokenization_spaces=True)
print(outputs[0]['summary_text'])

Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

 Bumblebee ordered an Optimus Prime action figure from your online store in
Germany. Unfortunately, when I opened the package, I discovered to my horror
that I had been sent an action figure of Megatron instead.


In [19]:
summarizer1 = pipeline("summarization")
# Get the maximum sequence length for the model
max_length = summarizer1.model.config.max_position_embeddings

# Truncate the input text to the maximum allowed length
truncated_text1 = text1[:max_length]

# Perform summarization on the truncated text
outputs1 = summarizer1(truncated_text1, max_length=45, clean_up_tokenization_spaces=True)
print(outputs1[0]['summary_text'])

 Newsom's ties to CCP under microscope in new book exposing alleged corruption:
'Fleeced American citizens' Tennis chair ump banned 6 years for corruption. US
attorney's office seeks Duong family records from


### Translation

In [20]:
translator = pipeline("translation_en_to_de",
                      model="Helsinki-NLP/opus-mt-en-de")
outputs = translator(text, clean_up_tokenization_spaces=True, min_length=100)
print(outputs[0]['translation_text'])

Downloading:   0%|          | 0.00/1.30k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/284M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/750k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/778k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.21M [00:00<?, ?B/s]

Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus
Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete,
entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von
Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich
hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere
einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt.
Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, bald von
Ihnen zu hören. Aufrichtig, Bumblebee.


In [22]:
translator = pipeline("translation_en_to_de",
                      model="Helsinki-NLP/opus-mt-en-de")

# Get the maximum sequence length for the model
max_length = translator.model.config.max_position_embeddings

# Truncate the input text to the maximum allowed length
truncated_text1 = text1[:max_length]

# Perform translation on the truncated text
outputs1 = translator(truncated_text1, clean_up_tokenization_spaces=True, min_length=100)
print(outputs1[0]['translation_text'])

Der Weg von „Bürger United" zu Trump, Musk und Korruption Newsoms Verbindungen
zur KPCh unter dem Mikroskop in neuem Buch, das angebliche Korruption aufdeckt:
"Fleeced American Citizens' Tennis Chair ump verbot 6 Jahre für Korruption
Meinung.. Hier..................................................................
................................................................................
.............................................................................


### Text Generation

In [23]:
#hide
from transformers import set_seed
set_seed(42) # Set the seed to get reproducible results

In [24]:
generator = pipeline("text-generation")
response = "Dear Bumblebee, I am sorry to hear that your order was mixed up."
prompt = text + "\n\nCustomer service response:\n" + response
outputs = generator(prompt, max_length=200)
print(outputs[0]['generated_text'])

Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

Dear Amazon, last week I ordered an Optimus Prime action figure from your online
store in Germany. Unfortunately, when I opened the package, I discovered to my
horror that I had been sent an action figure of Megatron instead! As a lifelong
enemy of the Decepticons, I hope you can understand my dilemma. To resolve the
issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered.
Enclosed are copies of my records concerning this purchase. I expect to hear
from you soon. Sincerely, Bumblebee.

Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. I am satisfied
with your purchase. I will gladly see your product and the packaging again
before you go out buying at your shop! I'm a small business owner and have not
been able to deliver my product to customer before and after an Amazon purchase,
so I apologize.

Customer service response:

All of my orders arrived


In [26]:
generator = pipeline("text-generation")
response = "Corruption in our countries"

# Get the maximum sequence length for the model
max_length = generator.model.config.max_position_embeddings

# Truncate the prompt to a manageable length
prompt = text1[:max_length - len(response) - 50] + "\n\nCustomer service response:\n" + response
# The -50 is a safety margin to account for special tokens and the response length

outputs1 = generator(prompt, max_length=200)
print(outputs1[0]['generated_text'])

The Road from ‘Citizens United’ to Trump, Musk, and Corruption
Newsom's ties to CCP under microscope in new book exposing alleged corruption:
'Fleeced American citizens'
Tennis chair ump banned 6 years for corruption
Opinion | Here’s How New York Police Officers Start Their Day by Breaking the
Law
What to Know About the Status of the Eric Adams Corruption Case
Lawyer appointed in NYC Mayor Eric Adams' case recommends dropping charges
Mayor Adams’ corruption charges should be dropped for good, court-appointed
lawyer says
Feds seek San Leandro records naming Duong family as corruption probe expands
beyond Oakland
Crack down on corruption and deploy tech – Xi’s twin tasks for the military
Sheng Thao corruption case: US attorney's office seeks Duong family records from
city of San Leandro following Bryan Azevedo raid
Ex-Liverpool mayor Joe Anderson charged with bribery in corruption probe
‘Like a virus’: Corruption has infected the fight

Customer service response:
Corruption in our countr

## The Hugging Face Ecosystem

<img alt="ecosystem" width="500" caption="An overview of the Hugging Face ecosystem of libraries and the Hub." src="images/chapter01_hf-ecosystem.png" id="ecosystem"/>

### The Hugging Face Hub

<img alt="hub-overview" width="1000" caption="The models page of the Hugging Face Hub, showing filters on the left and a list of models on the right." src="images/chapter01_hub-overview.png" id="hub-overview"/>

<img alt="hub-model-card" width="1000" caption="A example model card from the Hugging Face Hub. The inference widget is shown on the right, where you can interact with the model." src="images/chapter01_hub-model-card.png" id="hub-model-card"/>

### Hugging Face Tokenizers

### Hugging Face Datasets

### Hugging Face Accelerate

## Main Challenges with Transformers

## Conclusion