<a href="https://colab.research.google.com/github/FranciscoBPereira/DeepLearning-SeAMK/blob/main/SeAMK2223_Ex3_NLP.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#Install Transformers library to run this notebook.

!pip install transformers[sentencepiece]

In [None]:
from transformers import pipeline


**Section 1 - An Introduction to Inference Pipelines**

In [None]:
# The call to pipeline() specifies the task and the model
# The task specification is mandatory. In this case, we are creating a pipeline for sentiment analysis
# Model specification is optional. By default, the pipeline selects a particular pretrained model
# that has been fine-tuned for sentiment analysis in English: DistilBERT base uncased finetuned SST-2
# https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english


MSA1 = pipeline("sentiment-analysis")

MSA1("I've been waiting for a HuggingFace course my whole life.")

In [None]:
# Try  model MSA1 with several other sentences

str = ['I hate this so much!', 'Your support team is useless',
       'Disliking watercraft is not really my thing.', 'I would really truly love going out in this weather!',
       'You should see their decadent dessert menu.',  'I love my mobile but would not recommend it to any of my colleagues.']

MSA1(str)

**Change some of the sentences above and check if the analysis is different**

In [None]:
# We can specify another model as a parameter: bertweet-sentiment-analysis
# https://huggingface.co/finiteautomata/bertweet-base-sentiment-analysis


MSA2 = pipeline('sentiment-analysis', model='finiteautomata/bertweet-base-sentiment-analysis')

In [None]:
# TRY

# Apply the new model on the previous sentences and compare performance

MSA2(str)


In [None]:
# Zero-shot classification task: https://huggingface.co/tasks/zero-shot-classification

ZS1 = pipeline('zero-shot-classification')
ZS1('This is a course about the Transformers library', candidate_labels=['education', 'politics', 'business'])

Which model was selected by default to the Zero-shot classification task?

In [None]:
 # TRY

 # 1. Apply pipeline ZS1 to other sentences / candidate labels

 # 2. Select another model for this task, create pipeline ZS2 and compare performance

 ZS2 = pipeline('zero-shot-classification', model='typeform/distilbert-base-uncased-mnli')

ZS2('I am travelling to Italy', candidate_labels=['education', 'holidays', 'business'])


In [None]:
# Text Generation task: https://huggingface.co/tasks/text-generation

Gen1 = pipeline('text-generation')

Gen1('In this course, we will teach you how to',  max_length=100)


Which model was selected by default to the Text Generation task?

In [None]:
# TRY

 # 1. Apply pipeline Gen1 to generate other sentences

 # 2. Select another model for this task, create pipeline Gen2 and compare outputs

Gen2 = pipeline('text-generation', model = 'distilgpt2')

Gen2('In this course, we will teach you how to',  max_length=100)



In [None]:
# Translation task: https://huggingface.co/tasks/translation

# In this task, we explicitly specify the model and address the problem of translating English to Finnish
# https://huggingface.co/Helsinki-NLP/opus-mt-en-fi


T1 = pipeline('translation', model='Helsinki-NLP/opus-mt-en-fi')

T1(['Hugging Face is a great library.', 'Good night.', 'I am traveling to London'])


**Section 2 - Models**

In [None]:
# Import a BERT model for Sequence Classification (for a TensorFlow environment)

# https://huggingface.co/transformers/v3.0.2/model_doc/bert.html
# https://huggingface.co/docs/transformers/model_doc/bert


from transformers import TFBertForSequenceClassification

modelBERT = TFBertForSequenceClassification.from_pretrained('bert-base-cased')


In [None]:
# Check the model configuration details

modelBERT.config

In [None]:
# We will use this BERT model for sequence classification: https://huggingface.co/tasks/text-classification

sequences = ['This dog is cute.', 'I hate you.']




In [None]:
# Tokenize sentences for BERT

from transformers import BertTokenizer


tokenizerB = BertTokenizer.from_pretrained('bert-base-cased')

encoded = tokenizerB(sequences, padding=True, truncation=True, return_tensors="tf")

print(encoded)

In [None]:
# Apply the model to the encoded sentences and obtain results

out = modelBERT(encoded)

print(out)


In [None]:


import tensorflow as tf
import numpy as np

predictions = tf.math.softmax(out.logits, axis=-1)

print(np.argmax(predictions.numpy(), axis=1))

print('LABELS: ', modelBERT.config.id2label)
