**1. Inference**


In [None]:
#Install Transformers library to run this notebook.

!pip install transformers

In [None]:
from transformers import pipeline


**1.1 - An Introduction to Inference Pipelines**

In [None]:
# The call to pipeline() specifies the task and the model

# The task specification is mandatory. In this case, we are creating a pipeline for sentiment analysis
# Model specification is optional. By default, the pipeline selects a particular pretrained model
# that has been fine-tuned for sentiment analysis in English: DistilBERT base uncased finetuned SST-2
# https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english


MSA1 = pipeline("sentiment-analysis")
MSA1("I've been waiting for a HuggingFace course my whole life.")

In [None]:
# Try  model MSA1 with several other sentences

str1 = ['I hate this so much!', 'Your support team is useless',
       'Disliking watercraft is not really my thing.', 'I would really truly love going out in this weather!',
       'You should see their decadent dessert menu.',  'I love my mobile but would not recommend it to any of my colleagues.']

MSA1(str1)

In [None]:
# Don't forget that language models can be biased and unfair.
# Most of the bias comes from training data

str2 = ['I am from Portugal.', 'I am from India.', 'I am from Iraq.']

MSA1(str2)

In [None]:
# Try the pipeline with other sentences

str3 = []

# Complete the inference call



In [None]:
# We can specify another model as a parameter:

# The bertweet-sentiment-analysis model was fine tuned for sentiment analysis (the base model is BERTweet, a RoBERTa model trained on English tweets)
# https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis

# Just add an additional parameter (model) to the pipeline function

MSA2 = pipeline("sentiment-analysis", model = 'finiteautomata/bertweet-base-sentiment-analysis')


In [None]:
# TRY

# Apply the new model on the previous sentences (str1, str2 and str3) and compare performance





Do you notice any difference in the sentiment analysis performed by this model?

In [None]:
# Zero-shot classification task: https://huggingface.co/tasks/zero-shot-classification

ZS1 = pipeline('zero-shot-classification')

ZS1('This is a course about the Transformers library', candidate_labels=['education', 'politics', 'business'])

Quiz:

Which model was selected by default to the Zero-shot classification task?

What is the address of the model card?

In [None]:
 # TRY

 # Apply pipeline ZS1 to other sentences / candidate labels

strC = ['I am travelling to Italy', 'I am listening to music.', 'This is the best pizza.']

candidate_labels=['education', 'holidays', 'business', 'travel', 'cooking']

ZS1(strC, candidate_labels)


In [None]:
 # TRY

 # 2. Select another model for this task, create pipeline ZS2 and compare performance



In [None]:
# Text Generation task: https://huggingface.co/tasks/text-generation

Gen1 = pipeline('text-generation')

Gen1('In this course, we will teach you how to',  max_length=100)


In [None]:
# TRY

 # 1. Apply pipeline Gen1 to generate other sentences

 # 2. Select another model for this task, create pipeline Gen2 and compare outputs




In [None]:
# Translation task: https://huggingface.co/tasks/translation

# In this task, we explicitly specify the model and address the problem of translating French to English
# https://huggingface.co/Helsinki-NLP/opus-mt-en-fr


T1 = pipeline('translation', model='Helsinki-NLP/opus-mt-fr-en')

T1(['Ce cours est produit par Hugging Face.', 'Bonne nuit.', 'Le Portugal bat la France en finale de l''Euro 2016.'])


In [None]:
# There are models that can handle several languages. This one can translate from many different languages to English
# https://huggingface.co/Helsinki-NLP/opus-mt-mul-en


T2 = pipeline('translation', model='Helsinki-NLP/opus-mt-mul-en')

T2(['Olá.', 'Boa noite.', 'A capital de Portugal é Lisboa'])



In [None]:
T2(['Hola.', 'Buenas noches.', 'Hoy no llueve.'])


**1.2 - A Detailed View on Pipeline Operations**

**A. Preprocessing with a Tokenizer**

In [None]:
# Tokenizers transform raw text input into tokens and then numerical values
# Two tokenizers are selected with AutoClasses - they guess which tokenizer to download, given the checkpoint name of the model

from transformers import AutoTokenizer

my_tok1 = AutoTokenizer.from_pretrained('bert-base-cased')
my_tok2 = AutoTokenizer.from_pretrained('facebook/bart-large-mnli')

sequence = ['Using a Transformer is simple. Dont you agree?', 'Are you feeling better?']



In [None]:
# Complete tokenization with my_tok1

r1 = my_tok1(sequence, padding=True, return_tensors="tf")

print(r1.input_ids)
print(r1.attention_mask)


In [None]:
# Complete tokenization with my_tok2

r2 = my_tok2(sequence, padding=True, return_tensors="tf")

print(r2.input_ids)
print(r2.attention_mask)

In [None]:
# Tokenization proceeds in two steps: split into tokens and map to integers

# Step 1:

tokens1 = my_tok1.tokenize(sequence)

print('Tok1: ', tokens1)

tokens2 = my_tok2.tokenize(sequence)

print('Tok2: ', tokens2)


In [None]:
# Step 2:

ids1 = my_tok1.convert_tokens_to_ids(tokens1)

print('Tok 1: ', ids1)

ids2 = my_tok2.convert_tokens_to_ids(tokens2)

print('Tok 2: ', ids2)


In [None]:
decoded_string = my_tok1.decode([7993, 170, 13809, 23763, 1110, 3014, 119, 1790, 1204, 1128, 5340, 136, 2372, 1128, 2296, 1618, 136])

print(decoded_string)

**B. Inference with a Model**

In [None]:
# Detailed documentation about the classes used in the next sections can be found here
# https://huggingface.co/transformers/v3.0.2/model_doc/auto.html


from transformers import AutoTokenizer
from transformers import TFAutoModel
from transformers import TFAutoModelForSequenceClassification

# Checkpoint name of the selected model: https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english

checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"



In [None]:
# Get the tokenizer from the model
# https://huggingface.co/transformers/v3.0.2/main_classes/tokenizer.html
# https://huggingface.co/docs/transformers/main_classes/tokenizer

tokenizer = AutoTokenizer.from_pretrained(checkpoint)

In [None]:
# Apply Tokenization to raw inputs

raw_inputs = [
    "I've been waiting for a HuggingFace course my whole life.",
    "I hate this so much!",
]
inputs = tokenizer(raw_inputs, padding=True, truncation=True, return_tensors="tf")
print(inputs)

Explain the parameters specified and the output of tokenization

In [None]:
# Retrieve the model
# https://huggingface.co/transformers/v3.0.2/model_doc/auto.html

# Two models are created. The first is the base Transformer module and the second is the full Transformer for sequence classification

modelH = TFAutoModel.from_pretrained(checkpoint)

modelFinal = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)

In [None]:
# Check the output of modelH

outH = modelH(inputs)

print(outH)

print('Shape: ', outH.last_hidden_state.shape)

In [None]:
# Check the outputs of modelFinal

outFinal = modelFinal(inputs)

print(outFinal)

print('Shape: ', outFinal.logits.shape)

Explain the shape of the outputs provided by the models

**C. Postprocessing the Output**

In [None]:
# The classification transformer outputs raw scores. They can be normalized (converted to probabilities)
# by passing them through a softmax layer


import tensorflow as tf
import numpy as np

predictions = tf.math.softmax(outFinal.logits, axis=-1)

print(np.argmax(predictions.numpy(), axis=1))

print('LABELS: ', modelFinal.config.id2label)


**1.3 - Models**

In [None]:
# Import a BERT model for a TensorFlow environment

# https://huggingface.co/transformers/v3.0.2/model_doc/bert.html#
# https://huggingface.co/docs/transformers/model_doc/bert

from transformers import TFBertModel

modelB = TFBertModel.from_pretrained('bert-base-cased')


In [None]:
# Check the model configuration details

modelB.config

In [None]:
# We will use this BERT model for sequence classification: https://huggingface.co/tasks/text-classification

sequences = ['This dog is cute.', 'I hate you.']




In [None]:
# Tokenize sentences for BERT

from transformers import BertTokenizer


tokenizerB = BertTokenizer.from_pretrained('bert-base-cased')

encoded = tokenizerB(sequences, padding=True, truncation=True, return_tensors="tf")

print(encoded)

In [None]:
# Apply the model to the encoded sentences and obtain results

outA = modelB(encoded)

print(outA)

print('Shape: ', outA.last_hidden_state.shape)

The output of the previous sections corresponds to the vector delivered by the final hidden state.

Perform the required changes, in order to obtain a final outcome for the text classification task. You can either:
1. Add a post processing unit to the model

2. Select another BERT model that already contains a classification head for your task (https://huggingface.co/transformers/v3.0.2/model_doc/bert.html#)

In [None]:
# Complete the missing code



