#Pre-Trained Models with Pipelines (Part II) 

In this tutorial, let's continue to explore how to use pre-trained models from *transformers* library in a very convenient way - using *pipelines*. 

Have fun!



In [1]:
!pip install transformers

Collecting transformers
  Downloading transformers-4.15.0-py3-none-any.whl (3.4 MB)
[K     |████████████████████████████████| 3.4 MB 5.4 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 45.7 MB/s 
[?25hCollecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.4.0-py3-none-any.whl (67 kB)
[K     |████████████████████████████████| 67 kB 4.1 MB/s 
Collecting tokenizers<0.11,>=0.10.1
  Downloading tokenizers-0.10.3-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (3.3 MB)
[K     |████████████████████████████████| 3.3 MB 30.1 MB/s 
Collecting sacremoses
  Downloading sacremoses-0.0.47-py2.py3-none-any.whl (895 kB)
[K     |████████████████████████████████| 895 kB 51.3 MB/s 
Installing collected packages: pyyaml, tokenizers, sacremoses, huggingface-hub, transformers
  A

#1. Text Generation
Models trained for the classic language modeling task (also known as causal language modelling) can be used for text generation. In this pipeline, GPT-2 is used by default. 

Let's try it. 

In [17]:
from transformers import pipeline
text_generator = pipeline("text-generation")

No model was supplied, defaulted to gpt2 (https://huggingface.co/gpt2)


Downloading:   0%|          | 0.00/665 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/523M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.29M [00:00<?, ?B/s]

## 1.1 Generating using Greedy Search

In [18]:
text = text_generator("As far as I am concerned, I will", max_length=100, do_sample=False)
print(text[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


As far as I am concerned, I will be the first to admit that I am not a fan of the idea of a "free market." I think that the idea of a free market is a bit of a stretch. I think that the idea of a free market is a bit of a stretch. I think that the idea of a free market is a bit of a stretch. I think that the idea of a free market is a bit of a stretch. I think that the idea of a


## 1.2 Bringing in random selection of the next word according to its conditional probability distribution

In [19]:

text = text_generator("As far as I am concerned, I will", max_length=100, do_sample=True)
print(text[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


As far as I am concerned, I will just throw it under the bus and let it die in the open."

HuffPo's Benjy Sarlin wrote of Mr Trump's suggestion that he did not meet the standards of "political correctness" at a major news conference. "Trump will win only by appealing to the public," Mr Sarlin wrote.

The New York Times, which won the Pulitzer Prize for publishing an article last year about Mr Trump, also quoted a source


## 1.3 Using beam search, other higher probability sequences get a chance, too. Try with different number of beams.

In [23]:

text = text_generator("As far as I am concerned, I will", max_length=100, num_beams=5)
print(text[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
  next_indices = next_tokens // vocab_size


As far as I am concerned, I will continue to work hard to make sure that we are doing everything possible to make sure that we are doing everything possible to make sure that we are doing everything possible to make sure that we are doing everything possible to make sure that we are doing everything possible to make sure that we are doing everything possible to make sure that we are doing everything possible to make sure that we are doing everything possible to make sure that we are doing everything possible to make sure that we are


## 1.4 Stopping the annoying repetition. Try different ngram sizes.

In [24]:

text = text_generator("As far as I am concerned, I will", max_length=100, num_beams=5, no_repeat_ngram_size=3)
print(text[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
  next_indices = next_tokens // vocab_size


As far as I am concerned, I will continue to work on this.

I hope that you all have enjoyed reading this, and I hope to see you all again soon.


## 1.5 Sampling can be helpful to avoid boredom. Let's try TopK Sampling

In [25]:

text = text_generator("As far as I am concerned, I will", max_length=100, do_sample=True, top_k=10)
print(text[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


As far as I am concerned, I will have the most favorable relationship with my husband. He's a great friend and a great family man.

I would like to say a few words on the topic, but it's a very important topic.

I don't think that the way you describe him is the way you want it to be. If you want to have a relationship that is a little more romantic than we've been talking, then you need to have a partner who can


## 1.6 And Top P Sampling

In [27]:
text = text_generator("As far as I am concerned, I will", max_length=100, do_sample=True, top_p=0.9)
print(text[0]['generated_text'])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


As far as I am concerned, I will have to go to see Dr. John and Dr. S. A. in the morning," said Dr. Smith in a little more than an hour's speech. The young man's voice shook the city like a mighty bell and in the hour's silence all the clergy assembled began to speak in one loud voice. The young men, who, in order to be able to understand Dr. Smith, had to hold on to a piece of white cloth,


#2. Text Summarization
To summarize a long text/article into a shorter text. Here the pipeline by default uses a Bart model that was fine-tuned on the CNN / Daily Mail data set.

In [28]:
#=====summarization
from transformers import pipeline
summarizer = pipeline("summarization")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 (https://huggingface.co/sshleifer/distilbart-cnn-12-6)


Downloading:   0%|          | 0.00/1.76k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.14G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/878k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

In [29]:
ARTICLE = """Democrats formally nominated Joe Biden for president on Tuesday (Aug 18), with elder statesmen and rising stars promising he would  repair a pandemic-devastated America and end the chaos of Republican President Donald Trump.
The convention's second night, under the theme "Leadership Matters", aimed to make the case that Biden would represent a return to normalcy.
"At a time like this, the Oval Office should be a command centre," former US President Bill Clinton said in a prerecorded video. 
"Instead, it's a storm centre. There's only chaos. Just one thing never changes - his determination to deny responsibility and shift the blame."
With the four-day convention largely virtual due to the coronavirus, delegates from around the country cast votes remotely to confirm Biden as the nominee.
In clips from around the country, Democrats of all stripes explained why they were supporting Biden while putting their own state-specific spin on the proceedings, from a calamari appetiser in Rhode Island to a herd of cattle in Montana.
Following his home state of Delaware, which went last in his honor, Biden appeared live for the first time at a Delaware school, where his wife, Jill, was set to deliver the night's headline address later in the evening.
"Thank you very, very much from the bottom of my heart," said Biden, who will deliver his acceptance speech on Thursday. "It means the world to me and my family."
Democratic presidential candidate and former Vice President Joe Biden and running mate Senator Kamala Harris are seen on screen at virtual 2020 Democratic Convention hosted from Milwaukee, Wisconsin.
The programme started by showcasing some of the party's rising politicians. But rather than a single keynote speech that could be a star-making turn, as it was for then-state Senator Barack Obama in 2004, the programme featured 17 stars in a video address, including Stacey Abrams, the one-time Georgia gubernatorial nominee whom Biden once considered for a running mate.
"America faces a triple threat: A public health catastrophe, and economic collapse and a reckoning with racial justice and inequality," Abrams said. 
"So our choice is clear: A steady experienced public servant who can lead us out of this crisis just like he's done before, or a man who only knows how to deny and distract."
As they did on Monday's opening night, Democrats featured a handful of Republicans who have crossed party lines to praise Biden, 77, over Trump, 74, ahead of the Nov 3 election.
Cindy McCain, widow of Republican Senator John McCain, was scheduled to appear in a video talking about her husband's long friendship with Biden, according to a preview posted online. Trump clashed with McCain, who was the Republican nominee for president in 2008, and the president criticised McCain even after his 2018 death.
Republican former Secretary of State Colin Powell, a retired four-star general who endorsed Biden in June, was one of several national security officials due to speak on the Democrat's behalf.
"Our country needs a commander in chief who takes care of our troops in the same way he would his own family," he said. 
“He will trust our diplomats and our intelligence community, not the flattery of dictators and despots. He will make it his job to know when anyone dares to threaten us. He will stand up to our adversaries with strength and experience. They will know he means business.”
Democratic former Secretary of State John Kerry said of Trump: "When this president goes overseas, it isn’t a goodwill mission, it’s a blooper reel. He breaks up with our allies and writes love letters to dictators. America deserves a president who is looked up to, not laughed at."
Biden's vice presidential pick, Senator Kamala Harris, will headline Wednesday night's programme along with Obama.
Without the cheering crowds at the in-person gathering originally planned for Milwaukee, Wisconsin, TV viewership on Monday was down from 2016. But an additional 10.2 million people watched on digital platforms, the Biden campaign said, for a total audience of nearly 30 million.
Aiming to draw attention away from Biden, Trump, trailing in opinion polls, held a campaign rally in Arizona, a hotly contested battleground state that can swing to either party and play a decisive role in the election.
The convention was being held amid worries about the safety of in-person voting. Democrats have pushed mail-in ballots as an alternative and pressured the head of the US Postal Service, a top Trump donor, to suspend cost cuts that delayed mail deliveries. 
Bowing to that pressure, Postmaster General Louis DeJoy put off the cost-cutting measures until after the election.
"""


In [30]:
print(summarizer(ARTICLE, max_length=100, min_length=20, do_sample=False))

[{'summary_text': " Democrats formally nominate Joe Biden for president on Tuesday (Aug 18), with elder statesmen and rising stars promising he would repair a pandemic-devastated America and end the chaos of Republican President Donald Trump . Biden appeared live for the first time at a Delaware school, where his wife, Jill, was set to deliver the night's headline address later in the evening . Biden's vice presidential pick, Senator Kamala Harris, will headline Wednesday night's programme along with Obama ."}]


We can also use "t5" for summarization task.

In [31]:
from transformers import AutoModelWithLMHead, AutoTokenizer
model = AutoModelWithLMHead.from_pretrained("t5-base")
tokenizer = AutoTokenizer.from_pretrained("t5-base")



In [34]:
# T5 uses a max_length of 512 so we cut the article to 512 tokens.
inputs = tokenizer.encode("summarize: " + ARTICLE, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(inputs, max_length=100, min_length=20, repetition_penalty=2.5, length_penalty=1.0, num_beams=2, early_stopping=True)
print(outputs)

tensor([[    0,     3, 22878,    45,   300,     8,   684,  4061, 11839, 20081,
            12,  3606,  4967,  2106,   537,    38,  2753,     3,     5,    96,
           155,   598,     8,   296,    12,   140,    11,    82,   384,   976,
           845,  2647,   537,     6,   113,    56,  2156, 11122,  5023,    30,
          2721,     3,     5,  4291,  8346,  6523,    57, 11064,    18,    89,
            23,  6079,     3,    17,   208,  1229,     3,    75,    29,    29,
            19,   294,    13,  6503,  2066,     3,     5,     1]])


In [35]:
print(tokenizer.decode(outputs[0]))

<pad> delegates from around the country cast votes remotely to confirm Joe Biden as president. "it means the world to me and my family," says biden, who will deliver acceptance speech on Thursday. virtual convention hosted by wi-fi giant tv network cnn is part of 2020 campaign.</s>


You can try both models on other example texts.
ARTICLE = """Hotels in Mumbai and other Indian cities are to train their staff to spot signs of sex trafficking such as frequent requests for bed linen changes or a "Do not disturb" sign left on the door for days on end. The group behind the initiative is also developing a mobile phone app - Rescue Me - which hotel staff can use to alert local police and senior anti-trafficking officers if they see suspicious behavior. "Hotels are breeding grounds for human trade," said Sanee Awsarmmel, chairman of the alumni group of Maharashtra State Institute of Hotel Management and Catering Technology. "(We) have hospitality professionals working in hotels across the country. We are committed to this cause."The initiative, spearheaded by the alumni group and backed by the Maharashtra state government, comes amid growing international recognition that hotels have a key role to play in fighting modern day slavery. MAHARASHTRA MAJOR DESTINATION FOR TRAFFICKED GIRLS Maharashtra, of which Mumbai is the capital, is a major destination for trafficked girls who are lured from poor states and nearby countries on the promise of jobs, but then sold into the sex trade or domestic servitude. With rising property prices, some traditional red light districts like those in Mumbai have started to disappear pushing the sex trade underground into private lodges and hotels, which makes it hard for police to monitor.Awsarmmel said hotels would be told about 50 signs that staff needed to watch out for.These include requests for rooms with a view of the car park which are favored by traffickers as they allow them to vet clients for signs of trouble and check out their cars to gauge how much to charge.Awsarmmel said hotel staff often noticed strange behavior such as a girl's reticence during the check-in process or her dependence on the person accompanying her to answer questions and provide her proof of identity.But in most cases, staff ignore these signs or have no idea what to do, he told the Thomson Reuters Foundation.RESCUE ME APP The Rescue Me app - to be launched in a couple of months - will have a text feature where hotel staff can fill in details including room numbers to send an alert to police.Human trafficking is the world's fastest growing criminal enterprise worth an estimated $150 billion a year, according to the International Labor Organization, which says nearly 21 million people globally are victims of forced labor and trafficking.Last year, major hotel groups, including the Hilton and Shiva Hotels, pledged to examine their supply chains for forced labor, and train staff how to spot and report signs of trafficking.Earlier this year, Mexico City also launched an initiative to train hotel staff about trafficking.Vijaya Rahatkar, chairwoman of the Maharashtra State Women's Commission, said the initiative would have an impact beyond the state as the alumni group had contact with about a million small hotels across India.The group is also developing a training module on trafficking for hotel staff and hospitality students which could be used across the country.ALSO READFYI | Legal revenge: Child sex trafficking survivors get 'School of Justice' to fight their own battlesMumbai: Woman DJ arrested in high-profile sex racket case
"""
ARTICLE = """At approximately 7:00 a.m.  on September 17  2008  Employee #1  a forklift  driver for the Sweetener Products Company  was working at the railroad dock in  the warehouse. The company converts granulated sugar into liquid sugar  products. Employee #1's duties included off-loading railcars of granulated  sugar. He was walking near the railroad tracks to get two support tubes that  are placed under the loading ramp for additional support while forklifts are  unloading rail cars. A coworker was in the warehouse  lowering the ramp so  they could off-load a rail car. To lower the ramp  the employees must push the  ramp until it leans forward. The employees then push a button on the wall   adjacent to the dock door  to activate the hydraulic system that controls the  ramp plate. However  when the coworker pushed on the ramp plate  it fell   striking Employee #1 on the back of his head and neck. He was transported to  USC Medical Center  where doctors performed a MRI and determined that he was  able to be released. He told his treating physician that he was in intense  pain and unable to walk. When the physician informed him that he was going to  be sent home  his wife informed the physician that he was covered by private  insurance through Kaiser. A nurse from USC verified his coverage and made  arrangements to have Employee #1 transported to Kaiser Sunset. He had  sustained fractures to the back of his neck. He underwent surgery on September  20  2008  and was hospitalized for nine days. 
"""

#3. Machine Translation
T5 supports machine translation between English and several European languages, like French, German, etc.

In [6]:
#=====translation===
from transformers import pipeline
translator = pipeline("translation_en_to_fr")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


Downloading:   0%|          | 0.00/1.17k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/850M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/773k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.32M [00:00<?, ?B/s]

[{'translation_text': 'Hugging Face est une entreprise technologique basée à New York et à Paris.'}]


In [7]:
translator = pipeline("translation_en_to_de")
print(translator("Hugging Face is a technology company based in New York and Paris", max_length=40))

No model was supplied, defaulted to t5-base (https://huggingface.co/t5-base)


[{'translation_text': 'Hugging Face ist ein Technologieunternehmen mit Sitz in New York und Paris.'}]


# 4. Conversation
With models trained on dialogue data, conversational responses can be generated based on user inputs.

In [36]:
from transformers import pipeline, Conversation
chat = pipeline("conversational")

No model was supplied, defaulted to microsoft/DialoGPT-medium (https://huggingface.co/microsoft/DialoGPT-medium)


Downloading:   0%|          | 0.00/642 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/823M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/0.99M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/446k [00:00<?, ?B/s]

In [38]:

conversation_1 = Conversation("Going to the movies tonight - any suggestions?")
conversation_2 = Conversation("What's the last book you have read?")
chat([conversation_1, conversation_2])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[Conversation id: 17277519-dbba-4358-80fa-d25ad1169df1 
 user >> Going to the movies tonight - any suggestions? 
 bot >> The Big Lebowski ,
 Conversation id: 1bc4c56e-3012-4a28-9059-dbcd4d39de87 
 user >> What's the last book you have read? 
 bot >> The Last Question ]

In [40]:
conversation_1.add_user_input("Is it an action movie?")
conversation_2.add_user_input("What is the genre of this book?")

chat([conversation_1, conversation_2])

User input added while unprocessed input was existing: "Is it an action movie?" new input ignored: "Is it an action movie?". Set `overwrite` to True to overwrite unprocessed user input
User input added while unprocessed input was existing: "What is the genre of this book?" new input ignored: "What is the genre of this book?". Set `overwrite` to True to overwrite unprocessed user input
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[Conversation id: 17277519-dbba-4358-80fa-d25ad1169df1 
 user >> Going to the movies tonight - any suggestions? 
 bot >> The Big Lebowski 
 user >> Is it an action movie? 
 bot >> It's a comedy. , Conversation id: 1bc4c56e-3012-4a28-9059-dbcd4d39de87 
 user >> What's the last book you have read? 
 bot >> The Last Question 
 user >> What is the genre of this book? 
 bot >> I'm not sure, but I think it's fantasy. ]

In [41]:
conv = Conversation("Going to the movies tonight. Do you have any suggestion for action movie?")
chat([conv])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Conversation id: 80e31b8a-00da-4055-b45c-b48b073c48ae 
user >> Going to the movies tonight. Do you have any suggestion for action movie? 
bot >> The Martian 

In [42]:
conv.add_user_input("Which theatre is showing it?")
chat([conv])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Conversation id: 80e31b8a-00da-4055-b45c-b48b073c48ae 
user >> Going to the movies tonight. Do you have any suggestion for action movie? 
bot >> The Martian 
user >> Which theatre is showing it? 
bot >> I saw it at the AMC in the Mission. 

In [43]:
conv.add_user_input("I want two tickets for The Martian.")
chat([conv])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Conversation id: 80e31b8a-00da-4055-b45c-b48b073c48ae 
user >> Going to the movies tonight. Do you have any suggestion for action movie? 
bot >> The Martian 
user >> Which theatre is showing it? 
bot >> I saw it at the AMC in the Mission. 
user >> I want two tickets for The Martian. 
bot >> I'll take one. 

In [44]:
conv.add_user_input("Can you help me to book two tickets for it?")
chat([conv])

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


Conversation id: 80e31b8a-00da-4055-b45c-b48b073c48ae 
user >> Going to the movies tonight. Do you have any suggestion for action movie? 
bot >> The Martian 
user >> Which theatre is showing it? 
bot >> I saw it at the AMC in the Mission. 
user >> I want two tickets for The Martian. 
bot >> I'll take one. 
user >> Can you help me to book two tickets for it? 
bot >> I can't. I'm not a ticket holder. 

#5. Zero-shot Classification
Zero-shot classification allows us to perform text classification on any labels without going through task-specific finetuning using labeled data.

In [4]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")

No model was supplied, defaulted to facebook/bart-large-mnli (https://huggingface.co/facebook/bart-large-mnli)


In [5]:
classifier(
    "Singapore new private home sales decline at slower pace than launches in December",
    candidate_labels=["politics", "business", "sports", "technology", "entertainment"],
)

{'labels': ['business', 'entertainment', 'technology', 'sports', 'politics'],
 'scores': [0.7892554998397827,
  0.09854736924171448,
  0.05912058427929878,
  0.028882605955004692,
  0.024193953722715378],
 'sequence': 'Singapore new private home sales decline at slower pace than launches in December'}

In [3]:
classifier(
    "Spicy Peanut Chicken Stir-Fry",
    candidate_labels=["Korean", "Chinese", "Western", "Mediterranean"],
)

{'labels': ['Chinese', 'Korean', 'Western', 'Mediterranean'],
 'scores': [0.7245069146156311,
  0.15263397991657257,
  0.07127302885055542,
  0.051586002111434937],
 'sequence': 'Spicy Peanut Chicken Stir-Fry'}

#Reference
Transformers documentations: https://huggingface.co/transformers/index.html